This commit was manufactured by cvs2svn to create tag 'r241c1'.
[python/dscho.git] / Doc / lib / liblocale.tex
blobe6ba2c1148cd5d17816751399b858b58162a3895
1 \section{\module{locale} ---
2 Internationalization services}
4 \declaremodule{standard}{locale}
5 \modulesynopsis{Internationalization services.}
6 \moduleauthor{Martin von L\"owis}{martin@v.loewis.de}
7 \sectionauthor{Martin von L\"owis}{martin@v.loewis.de}
10 The \module{locale} module opens access to the \POSIX{} locale
11 database and functionality. The \POSIX{} locale mechanism allows
12 programmers to deal with certain cultural issues in an application,
13 without requiring the programmer to know all the specifics of each
14 country where the software is executed.
16 The \module{locale} module is implemented on top of the
17 \module{_locale}\refbimodindex{_locale} module, which in turn uses an
18 ANSI C locale implementation if available.
20 The \module{locale} module defines the following exception and
21 functions:
24 \begin{excdesc}{Error}
25 Exception raised when \function{setlocale()} fails.
26 \end{excdesc}
28 \begin{funcdesc}{setlocale}{category\optional{, locale}}
29 If \var{locale} is specified, it may be a string, a tuple of the
30 form \code{(\var{language code}, \var{encoding})}, or \code{None}.
31 If it is a tuple, it is converted to a string using the locale
32 aliasing engine. If \var{locale} is given and not \code{None},
33 \function{setlocale()} modifies the locale setting for the
34 \var{category}. The available categories are listed in the data
35 description below. The value is the name of a locale. An empty
36 string specifies the user's default settings. If the modification of
37 the locale fails, the exception \exception{Error} is raised. If
38 successful, the new locale setting is returned.
40 If \var{locale} is omitted or \code{None}, the current setting for
41 \var{category} is returned.
43 \function{setlocale()} is not thread safe on most systems.
44 Applications typically start with a call of
46 \begin{verbatim}
47 import locale
48 locale.setlocale(locale.LC_ALL, '')
49 \end{verbatim}
51 This sets the locale for all categories to the user's default
52 setting (typically specified in the \envvar{LANG} environment
53 variable). If the locale is not changed thereafter, using
54 multithreading should not cause problems.
56 \versionchanged[Added support for tuple values of the \var{locale}
57 parameter]{2.0}
58 \end{funcdesc}
60 \begin{funcdesc}{localeconv}{}
61 Returns the database of the local conventions as a dictionary.
62 This dictionary has the following strings as keys:
64 \begin{tableiii}{l|l|p{3in}}{constant}{Key}{Category}{Meaning}
65 \lineiii{LC_NUMERIC}{\code{'decimal_point'}}
66 {Decimal point character.}
67 \lineiii{}{\code{'grouping'}}
68 {Sequence of numbers specifying which relative positions
69 the \code{'thousands_sep'} is expected. If the sequence is
70 terminated with \constant{CHAR_MAX}, no further grouping
71 is performed. If the sequence terminates with a \code{0},
72 the last group size is repeatedly used.}
73 \lineiii{}{\code{'thousands_sep'}}
74 {Character used between groups.}\hline
75 \lineiii{LC_MONETARY}{\code{'int_curr_symbol'}}
76 {International currency symbol.}
77 \lineiii{}{\code{'currency_symbol'}}
78 {Local currency symbol.}
79 \lineiii{}{\code{'mon_decimal_point'}}
80 {Decimal point used for monetary values.}
81 \lineiii{}{\code{'mon_thousands_sep'}}
82 {Group separator used for monetary values.}
83 \lineiii{}{\code{'mon_grouping'}}
84 {Equivalent to \code{'grouping'}, used for monetary
85 values.}
86 \lineiii{}{\code{'positive_sign'}}
87 {Symbol used to annotate a positive monetary value.}
88 \lineiii{}{\code{'negative_sign'}}
89 {Symbol used to annotate a negative monetary value.}
90 \lineiii{}{\code{'frac_digits'}}
91 {Number of fractional digits used in local formatting
92 of monetary values.}
93 \lineiii{}{\code{'int_frac_digits'}}
94 {Number of fractional digits used in international
95 formatting of monetary values.}
96 \end{tableiii}
98 The possible values for \code{'p_sign_posn'} and
99 \code{'n_sign_posn'} are given below.
101 \begin{tableii}{c|l}{code}{Value}{Explanation}
102 \lineii{0}{Currency and value are surrounded by parentheses.}
103 \lineii{1}{The sign should precede the value and currency symbol.}
104 \lineii{2}{The sign should follow the value and currency symbol.}
105 \lineii{3}{The sign should immediately precede the value.}
106 \lineii{4}{The sign should immediately follow the value.}
107 \lineii{\constant{LC_MAX}}{Nothing is specified in this locale.}
108 \end{tableii}
109 \end{funcdesc}
111 \begin{funcdesc}{nl_langinfo}{option}
113 Return some locale-specific information as a string. This function is
114 not available on all systems, and the set of possible options might
115 also vary across platforms. The possible argument values are numbers,
116 for which symbolic constants are available in the locale module.
118 \end{funcdesc}
120 \begin{funcdesc}{getdefaultlocale}{\optional{envvars}}
121 Tries to determine the default locale settings and returns
122 them as a tuple of the form \code{(\var{language code},
123 \var{encoding})}.
125 According to \POSIX, a program which has not called
126 \code{setlocale(LC_ALL, '')} runs using the portable \code{'C'}
127 locale. Calling \code{setlocale(LC_ALL, '')} lets it use the
128 default locale as defined by the \envvar{LANG} variable. Since we
129 do not want to interfere with the current locale setting we thus
130 emulate the behavior in the way described above.
132 To maintain compatibility with other platforms, not only the
133 \envvar{LANG} variable is tested, but a list of variables given as
134 envvars parameter. The first found to be defined will be
135 used. \var{envvars} defaults to the search path used in GNU gettext;
136 it must always contain the variable name \samp{LANG}. The GNU
137 gettext search path contains \code{'LANGUAGE'}, \code{'LC_ALL'},
138 \code{'LC_CTYPE'}, and \code{'LANG'}, in that order.
140 Except for the code \code{'C'}, the language code corresponds to
141 \rfc{1766}. \var{language code} and \var{encoding} may be
142 \code{None} if their values cannot be determined.
143 \versionadded{2.0}
144 \end{funcdesc}
146 \begin{funcdesc}{getlocale}{\optional{category}}
147 Returns the current setting for the given locale category as
148 sequence containing \var{language code}, \var{encoding}.
149 \var{category} may be one of the \constant{LC_*} values except
150 \constant{LC_ALL}. It defaults to \constant{LC_CTYPE}.
152 Except for the code \code{'C'}, the language code corresponds to
153 \rfc{1766}. \var{language code} and \var{encoding} may be
154 \code{None} if their values cannot be determined.
155 \versionadded{2.0}
156 \end{funcdesc}
158 \begin{funcdesc}{getpreferredencoding}{\optional{do_setlocale}}
159 Return the encoding used for text data, according to user
160 preferences. User preferences are expressed differently on
161 different systems, and might not be available programmatically on
162 some systems, so this function only returns a guess.
164 On some systems, it is necessary to invoke \function{setlocale}
165 to obtain the user preferences, so this function is not thread-safe.
166 If invoking setlocale is not necessary or desired, \var{do_setlocale}
167 should be set to \code{False}.
169 \versionadded{2.3}
170 \end{funcdesc}
172 \begin{funcdesc}{normalize}{localename}
173 Returns a normalized locale code for the given locale name. The
174 returned locale code is formatted for use with
175 \function{setlocale()}. If normalization fails, the original name
176 is returned unchanged.
178 If the given encoding is not known, the function defaults to
179 the default encoding for the locale code just like
180 \function{setlocale()}.
181 \versionadded{2.0}
182 \end{funcdesc}
184 \begin{funcdesc}{resetlocale}{\optional{category}}
185 Sets the locale for \var{category} to the default setting.
187 The default setting is determined by calling
188 \function{getdefaultlocale()}. \var{category} defaults to
189 \constant{LC_ALL}.
190 \versionadded{2.0}
191 \end{funcdesc}
193 \begin{funcdesc}{strcoll}{string1, string2}
194 Compares two strings according to the current
195 \constant{LC_COLLATE} setting. As any other compare function,
196 returns a negative, or a positive value, or \code{0}, depending on
197 whether \var{string1} collates before or after \var{string2} or is
198 equal to it.
199 \end{funcdesc}
201 \begin{funcdesc}{strxfrm}{string}
202 Transforms a string to one that can be used for the built-in
203 function \function{cmp()}\bifuncindex{cmp}, and still returns
204 locale-aware results. This function can be used when the same
205 string is compared repeatedly, e.g. when collating a sequence of
206 strings.
207 \end{funcdesc}
209 \begin{funcdesc}{format}{format, val\optional{, grouping}}
210 Formats a number \var{val} according to the current
211 \constant{LC_NUMERIC} setting. The format follows the conventions
212 of the \code{\%} operator. For floating point values, the decimal
213 point is modified if appropriate. If \var{grouping} is true, also
214 takes the grouping into account.
215 \end{funcdesc}
217 \begin{funcdesc}{str}{float}
218 Formats a floating point number using the same format as the
219 built-in function \code{str(\var{float})}, but takes the decimal
220 point into account.
221 \end{funcdesc}
223 \begin{funcdesc}{atof}{string}
224 Converts a string to a floating point number, following the
225 \constant{LC_NUMERIC} settings.
226 \end{funcdesc}
228 \begin{funcdesc}{atoi}{string}
229 Converts a string to an integer, following the
230 \constant{LC_NUMERIC} conventions.
231 \end{funcdesc}
233 \begin{datadesc}{LC_CTYPE}
234 \refstmodindex{string}
235 Locale category for the character type functions. Depending on the
236 settings of this category, the functions of module
237 \refmodule{string} dealing with case change their behaviour.
238 \end{datadesc}
240 \begin{datadesc}{LC_COLLATE}
241 Locale category for sorting strings. The functions
242 \function{strcoll()} and \function{strxfrm()} of the
243 \module{locale} module are affected.
244 \end{datadesc}
246 \begin{datadesc}{LC_TIME}
247 Locale category for the formatting of time. The function
248 \function{time.strftime()} follows these conventions.
249 \end{datadesc}
251 \begin{datadesc}{LC_MONETARY}
252 Locale category for formatting of monetary values. The available
253 options are available from the \function{localeconv()} function.
254 \end{datadesc}
256 \begin{datadesc}{LC_MESSAGES}
257 Locale category for message display. Python currently does not
258 support application specific locale-aware messages. Messages
259 displayed by the operating system, like those returned by
260 \function{os.strerror()} might be affected by this category.
261 \end{datadesc}
263 \begin{datadesc}{LC_NUMERIC}
264 Locale category for formatting numbers. The functions
265 \function{format()}, \function{atoi()}, \function{atof()} and
266 \function{str()} of the \module{locale} module are affected by that
267 category. All other numeric formatting operations are not
268 affected.
269 \end{datadesc}
271 \begin{datadesc}{LC_ALL}
272 Combination of all locale settings. If this flag is used when the
273 locale is changed, setting the locale for all categories is
274 attempted. If that fails for any category, no category is changed at
275 all. When the locale is retrieved using this flag, a string
276 indicating the setting for all categories is returned. This string
277 can be later used to restore the settings.
278 \end{datadesc}
280 \begin{datadesc}{CHAR_MAX}
281 This is a symbolic constant used for different values returned by
282 \function{localeconv()}.
283 \end{datadesc}
285 The \function{nl_langinfo} function accepts one of the following keys.
286 Most descriptions are taken from the corresponding description in the
287 GNU C library.
289 \begin{datadesc}{CODESET}
290 Return a string with the name of the character encoding used in the
291 selected locale.
292 \end{datadesc}
294 \begin{datadesc}{D_T_FMT}
295 Return a string that can be used as a format string for strftime(3) to
296 represent time and date in a locale-specific way.
297 \end{datadesc}
299 \begin{datadesc}{D_FMT}
300 Return a string that can be used as a format string for strftime(3) to
301 represent a date in a locale-specific way.
302 \end{datadesc}
304 \begin{datadesc}{T_FMT}
305 Return a string that can be used as a format string for strftime(3) to
306 represent a time in a locale-specific way.
307 \end{datadesc}
309 \begin{datadesc}{T_FMT_AMPM}
310 The return value can be used as a format string for `strftime' to
311 represent time in the am/pm format.
312 \end{datadesc}
314 \begin{datadesc}{DAY_1 ... DAY_7}
315 Return name of the n-th day of the week. \warning{This
316 follows the US convention of \constant{DAY_1} being Sunday, not the
317 international convention (ISO 8601) that Monday is the first day of
318 the week.}
319 \end{datadesc}
321 \begin{datadesc}{ABDAY_1 ... ABDAY_7}
322 Return abbreviated name of the n-th day of the week.
323 \end{datadesc}
325 \begin{datadesc}{MON_1 ... MON_12}
326 Return name of the n-th month.
327 \end{datadesc}
329 \begin{datadesc}{ABMON_1 ... ABMON_12}
330 Return abbreviated name of the n-th month.
331 \end{datadesc}
333 \begin{datadesc}{RADIXCHAR}
334 Return radix character (decimal dot, decimal comma, etc.)
335 \end{datadesc}
337 \begin{datadesc}{THOUSEP}
338 Return separator character for thousands (groups of three digits).
339 \end{datadesc}
341 \begin{datadesc}{YESEXPR}
342 Return a regular expression that can be used with the regex
343 function to recognize a positive response to a yes/no question.
344 \warning{The expression is in the syntax suitable for the
345 \cfunction{regex()} function from the C library, which might differ
346 from the syntax used in \refmodule{re}.}
347 \end{datadesc}
349 \begin{datadesc}{NOEXPR}
350 Return a regular expression that can be used with the regex(3)
351 function to recognize a negative response to a yes/no question.
352 \end{datadesc}
354 \begin{datadesc}{CRNCYSTR}
355 Return the currency symbol, preceded by "-" if the symbol should
356 appear before the value, "+" if the symbol should appear after the
357 value, or "." if the symbol should replace the radix character.
358 \end{datadesc}
360 \begin{datadesc}{ERA}
361 The return value represents the era used in the current locale.
363 Most locales do not define this value. An example of a locale which
364 does define this value is the Japanese one. In Japan, the traditional
365 representation of dates includes the name of the era corresponding to
366 the then-emperor's reign.
368 Normally it should not be necessary to use this value directly.
369 Specifying the \code{E} modifier in their format strings causes the
370 \function{strftime} function to use this information. The format of the
371 returned string is not specified, and therefore you should not assume
372 knowledge of it on different systems.
373 \end{datadesc}
375 \begin{datadesc}{ERA_YEAR}
376 The return value gives the year in the relevant era of the locale.
377 \end{datadesc}
379 \begin{datadesc}{ERA_D_T_FMT}
380 This return value can be used as a format string for
381 \function{strftime} to represent dates and times in a locale-specific
382 era-based way.
383 \end{datadesc}
385 \begin{datadesc}{ERA_D_FMT}
386 This return value can be used as a format string for
387 \function{strftime} to represent time in a locale-specific era-based
388 way.
389 \end{datadesc}
391 \begin{datadesc}{ALT_DIGITS}
392 The return value is a representation of up to 100 values used to
393 represent the values 0 to 99.
394 \end{datadesc}
396 Example:
398 \begin{verbatim}
399 >>> import locale
400 >>> loc = locale.getlocale(locale.LC_ALL) # get current locale
401 >>> locale.setlocale(locale.LC_ALL, 'de_DE') # use German locale; name might vary with platform
402 >>> locale.strcoll('f\xe4n', 'foo') # compare a string containing an umlaut
403 >>> locale.setlocale(locale.LC_ALL, '') # use user's preferred locale
404 >>> locale.setlocale(locale.LC_ALL, 'C') # use default (C) locale
405 >>> locale.setlocale(locale.LC_ALL, loc) # restore saved locale
406 \end{verbatim}
409 \subsection{Background, details, hints, tips and caveats}
411 The C standard defines the locale as a program-wide property that may
412 be relatively expensive to change. On top of that, some
413 implementation are broken in such a way that frequent locale changes
414 may cause core dumps. This makes the locale somewhat painful to use
415 correctly.
417 Initially, when a program is started, the locale is the \samp{C} locale, no
418 matter what the user's preferred locale is. The program must
419 explicitly say that it wants the user's preferred locale settings by
420 calling \code{setlocale(LC_ALL, '')}.
422 It is generally a bad idea to call \function{setlocale()} in some library
423 routine, since as a side effect it affects the entire program. Saving
424 and restoring it is almost as bad: it is expensive and affects other
425 threads that happen to run before the settings have been restored.
427 If, when coding a module for general use, you need a locale
428 independent version of an operation that is affected by the locale
429 (such as \function{string.lower()}, or certain formats used with
430 \function{time.strftime()}), you will have to find a way to do it
431 without using the standard library routine. Even better is convincing
432 yourself that using locale settings is okay. Only as a last resort
433 should you document that your module is not compatible with
434 non-\samp{C} locale settings.
436 The case conversion functions in the
437 \refmodule{string}\refstmodindex{string} module are affected by the
438 locale settings. When a call to the \function{setlocale()} function
439 changes the \constant{LC_CTYPE} settings, the variables
440 \code{string.lowercase}, \code{string.uppercase} and
441 \code{string.letters} are recalculated. Note that this code that uses
442 these variable through `\keyword{from} ... \keyword{import} ...',
443 e.g.\ \code{from string import letters}, is not affected by subsequent
444 \function{setlocale()} calls.
446 The only way to perform numeric operations according to the locale
447 is to use the special functions defined by this module:
448 \function{atof()}, \function{atoi()}, \function{format()},
449 \function{str()}.
451 \subsection{For extension writers and programs that embed Python
452 \label{embedding-locale}}
454 Extension modules should never call \function{setlocale()}, except to
455 find out what the current locale is. But since the return value can
456 only be used portably to restore it, that is not very useful (except
457 perhaps to find out whether or not the locale is \samp{C}).
459 When Python code uses the \module{locale} module to change the locale,
460 this also affects the embedding application. If the embedding
461 application doesn't want this to happen, it should remove the
462 \module{_locale} extension module (which does all the work) from the
463 table of built-in modules in the \file{config.c} file, and make sure
464 that the \module{_locale} module is not accessible as a shared library.
467 \subsection{Access to message catalogs \label{locale-gettext}}
469 The locale module exposes the C library's gettext interface on systems
470 that provide this interface. It consists of the functions
471 \function{gettext()}, \function{dgettext()}, \function{dcgettext()},
472 \function{textdomain()}, \function{bindtextdomain()}, and
473 \function{bind_textdomain_codeset()}. These are similar to the same
474 functions in the \refmodule{gettext} module, but use the C library's
475 binary format for message catalogs, and the C library's search
476 algorithms for locating message catalogs.
478 Python applications should normally find no need to invoke these
479 functions, and should use \refmodule{gettext} instead. A known
480 exception to this rule are applications that link use additional C
481 libraries which internally invoke \cfunction{gettext()} or
482 \function{dcgettext()}. For these applications, it may be necessary to
483 bind the text domain, so that the libraries can properly locate their
484 message catalogs.