1 \section{\module{locale
} ---
2 Internationalization services
}
4 \declaremodule{standard
}{locale
}
5 \modulesynopsis{Internationalization services.
}
6 \moduleauthor{Martin von L\"owis
}{loewis@informatik.hu-berlin.de
}
7 \sectionauthor{Martin von L\"owis
}{loewis@informatik.hu-berlin.de
}
10 The
\module{locale
} module opens access to the
\POSIX{} locale
11 database and functionality. The
\POSIX{} locale mechanism allows
12 programmers to deal with certain cultural issues in an application,
13 without requiring the programmer to know all the specifics of each
14 country where the software is executed.
16 The
\module{locale
} module is implemented on top of the
17 \module{_locale
}\refbimodindex{_locale
} module, which in turn uses an
18 ANSI C locale implementation if available.
20 The
\module{locale
} module defines the following exception and
24 \begin{excdesc
}{Error
}
25 Exception raised when
\function{setlocale()
} fails.
28 \begin{funcdesc
}{setlocale
}{category
\optional{, locale
}}
29 If
\var{locale
} is specified, it may be a string, a tuple of the
30 form
\code{(
\var{language code
},
\var{encoding
})
}, or
\code{None
}.
31 If it is a tuple, it is converted to a string using the locale
32 aliasing engine. If
\var{locale
} is given and not
\code{None
},
33 \function{setlocale()
} modifies the locale setting for the
34 \var{category
}. The available categories are listed in the data
35 description below. The value is the name of a locale. An empty
36 string specifies the user's default settings. If the modification of
37 the locale fails, the exception
\exception{Error
} is raised. If
38 successful, the new locale setting is returned.
40 If
\var{locale
} is omitted or
\code{None
}, the current setting for
41 \var{category
} is returned.
43 \function{setlocale()
} is not thread safe on most systems.
44 Applications typically start with a call of
48 locale.setlocale(locale.LC_ALL, '')
51 This sets the locale for all categories to the user's default
52 setting (typically specified in the
\envvar{LANG
} environment
53 variable). If the locale is not changed thereafter, using
54 multithreading should not cause problems.
56 \versionchanged[Added support for tuple values of the
\var{locale
}
60 \begin{funcdesc
}{localeconv
}{}
61 Returns the database of of the local conventions as a dictionary.
62 This dictionary has the following strings as keys:
64 \begin{tableiii
}{l|l|p
{3in
}}{constant
}{Key
}{Category
}{Meaning
}
65 \lineiii{LC_NUMERIC
}{\code{'decimal_point'
}}
66 {Decimal point character.
}
67 \lineiii{}{\code{'grouping'
}}
68 {Sequence of numbers specifying which relative positions
69 the
\code{'thousands_sep'
} is expected. If the sequence is
70 terminated with
\constant{CHAR_MAX
}, no further grouping
71 is performed. If the sequence terminates with a
\code{0},
72 the last group size is repeatedly used.
}
73 \lineiii{}{\code{'thousands_sep'
}}
74 {Character used between groups.
}\hline
75 \lineiii{LC_MONETARY
}{\code{'int_curr_symbol'
}}
76 {International currency symbol.
}
77 \lineiii{}{\code{'currency_symbol'
}}
78 {Local currency symbol.
}
79 \lineiii{}{\code{'mon_decimal_point'
}}
80 {Decimal point used for monetary values.
}
81 \lineiii{}{\code{'mon_thousands_sep'
}}
82 {Group separator used for monetary values.
}
83 \lineiii{}{\code{'mon_grouping'
}}
84 {Equivalent to
\code{'grouping'
}, used for monetary
86 \lineiii{}{\code{'positive_sign'
}}
87 {Symbol used to annotate a positive monetary value.
}
88 \lineiii{}{\code{'negative_sign'
}}
89 {Symbol used to annotate a nnegative monetary value.
}
90 \lineiii{}{\code{'frac_digits'
}}
91 {Number of fractional digits used in local formatting
93 \lineiii{}{\code{'int_frac_digits'
}}
94 {Number of fractional digits used in international
95 formatting of monetary values.
}
98 The possible values for
\code{'p_sign_posn'
} and
99 \code{'n_sign_posn'
} are given below.
101 \begin{tableii
}{c|l
}{code
}{Value
}{Explanation
}
102 \lineii{0}{Currency and value are surrounded by parentheses.
}
103 \lineii{1}{The sign should precede the value and currency symbol.
}
104 \lineii{2}{The sign should follow the value and currency symbol.
}
105 \lineii{3}{The sign should immediately precede the value.
}
106 \lineii{4}{The sign should immediately follow the value.
}
107 \lineii{\constant{LC_MAX
}}{Nothing is specified in this locale.
}
111 \begin{funcdesc
}{nl_langinfo
}{option
}
113 Return some locale-specific information as a string. This function is
114 not available on all systems, and the set of possible options might
115 also vary across platforms. The possible argument values are numbers,
116 for which symbolic constants are available in the locale module.
120 \begin{funcdesc
}{getdefaultlocale
}{\optional{envvars
}}
121 Tries to determine the default locale settings and returns
122 them as a tuple of the form
\code{(
\var{language code
},
125 According to
\POSIX, a program which has not called
126 \code{setlocale(LC_ALL, '')
} runs using the portable
\code{'C'
}
127 locale. Calling
\code{setlocale(LC_ALL, '')
} lets it use the
128 default locale as defined by the
\envvar{LANG
} variable. Since we
129 do not want to interfere with the current locale setting we thus
130 emulate the behavior in the way described above.
132 To maintain compatibility with other platforms, not only the
133 \envvar{LANG
} variable is tested, but a list of variables given as
134 envvars parameter. The first found to be defined will be
135 used.
\var{envvars
} defaults to the search path used in GNU gettext;
136 it must always contain the variable name
\samp{LANG
}. The GNU
137 gettext search path contains
\code{'LANGUAGE'
},
\code{'LC_ALL'
},
138 \code{'LC_CTYPE'
}, and
\code{'LANG'
}, in that order.
140 Except for the code
\code{'C'
}, the language code corresponds to
141 \rfc{1766}.
\var{language code
} and
\var{encoding
} may be
142 \code{None
} if their values cannot be determined.
146 \begin{funcdesc
}{getlocale
}{\optional{category
}}
147 Returns the current setting for the given locale category as
148 sequence containing
\var{language code
},
\var{encoding
}.
149 \var{category
} may be one of the
\constant{LC_*
} values except
150 \constant{LC_ALL
}. It defaults to
\constant{LC_CTYPE
}.
152 Except for the code
\code{'C'
}, the language code corresponds to
153 \rfc{1766}.
\var{language code
} and
\var{encoding
} may be
154 \code{None
} if their values cannot be determined.
158 \begin{funcdesc
}{normalize
}{localename
}
159 Returns a normalized locale code for the given locale name. The
160 returned locale code is formatted for use with
161 \function{setlocale()
}. If normalization fails, the original name
162 is returned unchanged.
164 If the given encoding is not known, the function defaults to
165 the default encoding for the locale code just like
166 \function{setlocale()
}.
170 \begin{funcdesc
}{resetlocale
}{\optional{category
}}
171 Sets the locale for
\var{category
} to the default setting.
173 The default setting is determined by calling
174 \function{getdefaultlocale()
}.
\var{category
} defaults to
179 \begin{funcdesc
}{strcoll
}{string1, string2
}
180 Compares two strings according to the current
181 \constant{LC_COLLATE
} setting. As any other compare function,
182 returns a negative, or a positive value, or
\code{0}, depending on
183 whether
\var{string1
} collates before or after
\var{string2
} or is
187 \begin{funcdesc
}{strxfrm
}{string
}
188 Transforms a string to one that can be used for the built-in
189 function
\function{cmp()
}\bifuncindex{cmp
}, and still returns
190 locale-aware results. This function can be used when the same
191 string is compared repeatedly, e.g. when collating a sequence of
195 \begin{funcdesc
}{format
}{format, val
\optional{, grouping
}}
196 Formats a number
\var{val
} according to the current
197 \constant{LC_NUMERIC
} setting. The format follows the conventions
198 of the
\code{\%
} operator. For floating point values, the decimal
199 point is modified if appropriate. If
\var{grouping
} is true, also
200 takes the grouping into account.
203 \begin{funcdesc
}{str
}{float
}
204 Formats a floating point number using the same format as the
205 built-in function
\code{str(
\var{float
})
}, but takes the decimal
209 \begin{funcdesc
}{atof
}{string
}
210 Converts a string to a floating point number, following the
211 \constant{LC_NUMERIC
} settings.
214 \begin{funcdesc
}{atoi
}{string
}
215 Converts a string to an integer, following the
216 \constant{LC_NUMERIC
} conventions.
219 \begin{datadesc
}{LC_CTYPE
}
220 \refstmodindex{string
}
221 Locale category for the character type functions. Depending on the
222 settings of this category, the functions of module
223 \refmodule{string
} dealing with case change their behaviour.
226 \begin{datadesc
}{LC_COLLATE
}
227 Locale category for sorting strings. The functions
228 \function{strcoll()
} and
\function{strxfrm()
} of the
229 \module{locale
} module are affected.
232 \begin{datadesc
}{LC_TIME
}
233 Locale category for the formatting of time. The function
234 \function{time.strftime()
} follows these conventions.
237 \begin{datadesc
}{LC_MONETARY
}
238 Locale category for formatting of monetary values. The available
239 options are available from the
\function{localeconv()
} function.
242 \begin{datadesc
}{LC_MESSAGES
}
243 Locale category for message display. Python currently does not
244 support application specific locale-aware messages. Messages
245 displayed by the operating system, like those returned by
246 \function{os.strerror()
} might be affected by this category.
249 \begin{datadesc
}{LC_NUMERIC
}
250 Locale category for formatting numbers. The functions
251 \function{format()
},
\function{atoi()
},
\function{atof()
} and
252 \function{str()
} of the
\module{locale
} module are affected by that
253 category. All other numeric formatting operations are not
257 \begin{datadesc
}{LC_ALL
}
258 Combination of all locale settings. If this flag is used when the
259 locale is changed, setting the locale for all categories is
260 attempted. If that fails for any category, no category is changed at
261 all. When the locale is retrieved using this flag, a string
262 indicating the setting for all categories is returned. This string
263 can be later used to restore the settings.
266 \begin{datadesc
}{CHAR_MAX
}
267 This is a symbolic constant used for different values returned by
268 \function{localeconv()
}.
271 The
\function{nl_langinfo
} function accepts one of the following keys.
272 Most descriptions are taken from the corresponding description in the
275 \begin{datadesc
}{CODESET
}
276 Return a string with the name of the character encoding used in the
280 \begin{datadesc
}{D_T_FMT
}
281 Return a string that can be used as a format string for strftime(
3) to
282 represent time and date in a locale-specific way.
285 \begin{datadesc
}{D_FMT
}
286 Return a string that can be used as a format string for strftime(
3) to
287 represent a date in a locale-specific way.
290 \begin{datadesc
}{T_FMT
}
291 Return a string that can be used as a format string for strftime(
3) to
292 represent a time in a locale-specific way.
295 \begin{datadesc
}{T_FMT_AMPM
}
296 The return value can be used as a format string for `strftime' to
297 represent time in the am/pm format.
300 \begin{datadesc
}{DAY_1 ... DAY_7
}
301 Return name of the n-th day of the week.
\warning{This
302 follows the US convention of
\constant{DAY_1
} being Sunday, not the
303 international convention (ISO
8601) that Monday is the first day of
307 \begin{datadesc
}{ABDAY_1 ... ABDAY_7
}
308 Return abbreviated name of the n-th day of the week.
311 \begin{datadesc
}{MON_1 ... MON_12
}
312 Return name of the n-th month.
315 \begin{datadesc
}{ABMON_1 ... ABMON_12
}
316 Return abbreviated name of the n-th month.
319 \begin{datadesc
}{RADIXCHAR
}
320 Return radix character (decimal dot, decimal comma, etc.)
323 \begin{datadesc
}{THOUSEP
}
324 Return separator character for thousands (groups of three digits).
327 \begin{datadesc
}{YESEXPR
}
328 Return a regular expression that can be used with the regex
329 function to recognize a positive response to a yes/no question.
330 \warning{The expression is in the syntax suitable for the
331 \cfunction{regex()
} function from the C library, which might differ
332 from the syntax used in
\refmodule{re
}.
}
335 \begin{datadesc
}{NOEXPR
}
336 Return a regular expression that can be used with the regex(
3)
337 function to recognize a negative response to a yes/no question.
340 \begin{datadesc
}{CRNCYSTR
}
341 Return the currency symbol, preceded by "-" if the symbol should
342 appear before the value, "+" if the symbol should appear after the
343 value, or "." if the symbol should replace the radix character.
346 \begin{datadesc
}{ERA
}
347 The return value represents the era used in the current locale.
349 Most locales do not define this value. An example of a locale which
350 does define this value is the Japanese one. In Japan, the traditional
351 representation of dates includes the name of the era corresponding to
352 the then-emperor's reign.
354 Normally it should not be necessary to use this value directly.
355 Specifying the
\code{E
} modifier in their format strings causes the
356 \function{strftime
} function to use this information. The format of the
357 returned string is not specified, and therefore you should not assume
358 knowledge of it on different systems.
361 \begin{datadesc
}{ERA_YEAR
}
362 The return value gives the year in the relevant era of the locale.
365 \begin{datadesc
}{ERA_D_T_FMT
}
366 This return value can be used as a format string for
367 \function{strftime
} to represent dates and times in a locale-specific
371 \begin{datadesc
}{ERA_D_FMT
}
372 This return value can be used as a format string for
373 \function{strftime
} to represent time in a locale-specific era-based
377 \begin{datadesc
}{ALT_DIGITS
}
378 The return value is a representation of up to
100 values used to
379 represent the values
0 to
99.
386 >>> loc = locale.setlocale(locale.LC_ALL) # get current locale
387 >>> locale.setlocale(locale.LC_ALL, 'de') # use German locale
388 >>> locale.strcoll('f
\xe4n', 'foo') # compare a string containing an umlaut
389 >>> locale.setlocale(locale.LC_ALL, '') # use user's preferred locale
390 >>> locale.setlocale(locale.LC_ALL, 'C') # use default (C) locale
391 >>> locale.setlocale(locale.LC_ALL, loc) # restore saved locale
395 \subsection{Background, details, hints, tips and caveats
}
397 The C standard defines the locale as a program-wide property that may
398 be relatively expensive to change. On top of that, some
399 implementation are broken in such a way that frequent locale changes
400 may cause core dumps. This makes the locale somewhat painful to use
403 Initially, when a program is started, the locale is the
\samp{C
} locale, no
404 matter what the user's preferred locale is. The program must
405 explicitly say that it wants the user's preferred locale settings by
406 calling
\code{setlocale(LC_ALL, '')
}.
408 It is generally a bad idea to call
\function{setlocale()
} in some library
409 routine, since as a side effect it affects the entire program. Saving
410 and restoring it is almost as bad: it is expensive and affects other
411 threads that happen to run before the settings have been restored.
413 If, when coding a module for general use, you need a locale
414 independent version of an operation that is affected by the locale
415 (e.g.
\function{string.lower()
}, or certain formats used with
416 \function{time.strftime()
})), you will have to find a way to do it
417 without using the standard library routine. Even better is convincing
418 yourself that using locale settings is okay. Only as a last resort
419 should you
document that your module is not compatible with
420 non-
\samp{C
} locale settings.
422 The case conversion functions in the
423 \refmodule{string
}\refstmodindex{string
} module are affected by the
424 locale settings. When a call to the
\function{setlocale()
} function
425 changes the
\constant{LC_CTYPE
} settings, the variables
426 \code{string.lowercase
},
\code{string.uppercase
} and
427 \code{string.letters
} are recalculated. Note that this code that uses
428 these variable through `
\keyword{from
} ...
\keyword{import
} ...',
429 e.g.\
\code{from string import letters
}, is not affected by subsequent
430 \function{setlocale()
} calls.
432 The only way to perform numeric operations according to the locale
433 is to use the special functions defined by this module:
434 \function{atof()
},
\function{atoi()
},
\function{format()
},
437 \subsection{For extension writers and programs that embed Python
438 \label{embedding-locale
}}
440 Extension modules should never call
\function{setlocale()
}, except to
441 find out what the current locale is. But since the return value can
442 only be used portably to restore it, that is not very useful (except
443 perhaps to find out whether or not the locale is
\samp{C
}).
445 When Python is embedded in an application, if the application sets the
446 locale to something specific before initializing Python, that is
447 generally okay, and Python will use whatever locale is set,
448 \emph{except
} that the
\constant{LC_NUMERIC
} locale should always be
451 The
\function{setlocale()
} function in the
\module{locale
} module
452 gives the Python programmer the impression that you can manipulate the
453 \constant{LC_NUMERIC
} locale setting, but this not the case at the C
454 level: C code will always find that the
\constant{LC_NUMERIC
} locale
455 setting is
\samp{C
}. This is because too much would break when the
456 decimal point character is set to something else than a period
457 (e.g. the Python parser would break). Caveat: threads that run
458 without holding Python's global interpreter lock may occasionally find
459 that the numeric locale setting differs; this is because the only
460 portable way to implement this feature is to set the numeric locale
461 settings to what the user requests, extract the relevant
462 characteristics, and then restore the
\samp{C
} numeric locale.
464 When Python code uses the
\module{locale
} module to change the locale,
465 this also affects the embedding application. If the embedding
466 application doesn't want this to happen, it should remove the
467 \module{_locale
} extension module (which does all the work) from the
468 table of built-in modules in the
\file{config.c
} file, and make sure
469 that the
\module{_locale
} module is not accessible as a shared library.
472 \subsection{Access to message catalogs
\label{locale-gettext
}}
474 The locale module exposes the C library's gettext interface on systems
475 that provide this interface. It consists of the functions
476 \function{gettext()
},
\function{dgettext()
},
\function{dcgettext()
},
477 \function{textdomain()
}, and
\function{bindtextdomain()
}. These are
478 similar to the same functions in the
\refmodule{gettext
} module, but use
479 the C library's binary format for message catalogs, and the C
480 library's search algorithms for locating message catalogs.
482 Python applications should normally find no need to invoke these
483 functions, and should use
\refmodule{gettext
} instead. A known
484 exception to this rule are applications that link use additional C
485 libraries which internally invoke
\cfunction{gettext()
} or
486 \function{cdgettext()
}. For these applications, it may be necessary to
487 bind the text domain, so that the libraries can properly locate their