This commit was manufactured by cvs2svn to create tag 'r211c1'.
[python/dscho.git] / Doc / lib / liblocale.tex
blob14afdccf7ce68abf6a91c9bd4ccfe9abd176a925
1 \section{\module{locale} ---
2 Internationalization services}
4 \declaremodule{standard}{locale}
5 \modulesynopsis{Internationalization services.}
6 \moduleauthor{Martin von L\"owis}{loewis@informatik.hu-berlin.de}
7 \sectionauthor{Martin von L\"owis}{loewis@informatik.hu-berlin.de}
10 The \module{locale} module opens access to the \POSIX{} locale
11 database and functionality. The \POSIX{} locale mechanism allows
12 programmers to deal with certain cultural issues in an application,
13 without requiring the programmer to know all the specifics of each
14 country where the software is executed.
16 The \module{locale} module is implemented on top of the
17 \module{_locale}\refbimodindex{_locale} module, which in turn uses an
18 ANSI C locale implementation if available.
20 The \module{locale} module defines the following exception and
21 functions:
24 \begin{excdesc}{Error}
25 Exception raised when \function{setlocale()} fails.
26 \end{excdesc}
28 \begin{funcdesc}{setlocale}{category\optional{, locale}}
29 If \var{locale} is specified, it may be a string, a tuple of the
30 form \code{(\var{language code}, \var{encoding})}, or \code{None}.
31 If it is a tuple, it is converted to a string using the locale
32 aliasing engine. If \var{locale} is given and not \code{None},
33 \function{setlocale()} modifies the locale setting for the
34 \var{category}. The available categories are listed in the data
35 description below. The value is the name of a locale. An empty
36 string specifies the user's default settings. If the modification of
37 the locale fails, the exception \exception{Error} is raised. If
38 successful, the new locale setting is returned.
40 If \var{locale} is omitted or \code{None}, the current setting for
41 \var{category} is returned.
43 \function{setlocale()} is not thread safe on most systems.
44 Applications typically start with a call of
46 \begin{verbatim}
47 import locale
48 locale.setlocale(locale.LC_ALL, '')
49 \end{verbatim}
51 This sets the locale for all categories to the user's default
52 setting (typically specified in the \envvar{LANG} environment
53 variable). If the locale is not changed thereafter, using
54 multithreading should not cause problems.
56 \versionchanged[Added support for tuple values of the \var{locale}
57 parameter]{2.0}
58 \end{funcdesc}
60 \begin{funcdesc}{localeconv}{}
61 Returns the database of of the local conventions as a dictionary.
62 This dictionary has the following strings as keys:
64 \begin{tableiii}{l|l|p{3in}}{constant}{Key}{Category}{Meaning}
65 \lineiii{LC_NUMERIC}{\code{'decimal_point'}}
66 {Decimal point character.}
67 \lineiii{}{\code{'grouping'}}
68 {Sequence of numbers specifying which relative positions
69 the \code{'thousands_sep'} is expected. If the sequence is
70 terminated with \constant{CHAR_MAX}, no further grouping
71 is performed. If the sequence terminates with a \code{0},
72 the last group size is repeatedly used.}
73 \lineiii{}{\code{'thousands_sep'}}
74 {Character used between groups.}\hline
75 \lineiii{LC_MONETARY}{\code{'int_curr_symbol'}}
76 {International currency symbol.}
77 \lineiii{}{\code{'currency_symbol'}}
78 {Local currency symbol.}
79 \lineiii{}{\code{'mon_decimal_point'}}
80 {Decimal point used for monetary values.}
81 \lineiii{}{\code{'mon_thousands_sep'}}
82 {Group separator used for monetary values.}
83 \lineiii{}{\code{'mon_grouping'}}
84 {Equivalent to \code{'grouping'}, used for monetary
85 values.}
86 \lineiii{}{\code{'positive_sign'}}
87 {Symbol used to annotate a positive monetary value.}
88 \lineiii{}{\code{'negative_sign'}}
89 {Symbol used to annotate a nnegative monetary value.}
90 \lineiii{}{\code{'frac_digits'}}
91 {Number of fractional digits used in local formatting
92 of monetary values.}
93 \lineiii{}{\code{'int_frac_digits'}}
94 {Number of fractional digits used in international
95 formatting of monetary values.}
96 \end{tableiii}
98 The possible values for \code{'p_sign_posn'} and
99 \code{'n_sign_posn'} are given below.
101 \begin{tableii}{c|l}{code}{Value}{Explanation}
102 \lineii{0}{Currency and value are surrounded by parentheses.}
103 \lineii{1}{The sign should precede the value and currency symbol.}
104 \lineii{2}{The sign should follow the value and currency symbol.}
105 \lineii{3}{The sign should immediately precede the value.}
106 \lineii{4}{The sign should immediately follow the value.}
107 \lineii{\constant{LC_MAX}}{Nothing is specified in this locale.}
108 \end{tableii}
109 \end{funcdesc}
111 \begin{funcdesc}{getdefaultlocale}{\optional{envvars}}
112 Tries to determine the default locale settings and returns
113 them as a tuple of the form \code{(\var{language code},
114 \var{encoding})}.
116 According to \POSIX, a program which has not called
117 \code{setlocale(LC_ALL, '')} runs using the portable \code{'C'}
118 locale. Calling \code{setlocale(LC_ALL, '')} lets it use the
119 default locale as defined by the \envvar{LANG} variable. Since we
120 do not want to interfere with the current locale setting we thus
121 emulate the behavior in the way described above.
123 To maintain compatibility with other platforms, not only the
124 \envvar{LANG} variable is tested, but a list of variables given as
125 envvars parameter. The first found to be defined will be
126 used. \var{envvars} defaults to the search path used in GNU gettext;
127 it must always contain the variable name \samp{LANG}. The GNU
128 gettext search path contains \code{'LANGUAGE'}, \code{'LC_ALL'},
129 code{'LC_CTYPE'}, and \code{'LANG'}, in that order.
131 Except for the code \code{'C'}, the language code corresponds to
132 \rfc{1766}. \var{language code} and \var{encoding} may be
133 \code{None} if their values cannot be determined.
134 \versionadded{2.0}
135 \end{funcdesc}
137 \begin{funcdesc}{getlocale}{\optional{category}}
138 Returns the current setting for the given locale category as
139 tuple (language code, encoding). \var{category} may be one of the
140 \constant{LC_*} values except \constant{LC_ALL}. It defaults to
141 \constant{LC_CTYPE}.
143 Except for the code \code{'C'}, the language code corresponds to
144 \rfc{1766}. \var{language code} and \var{encoding} may be
145 \code{None} if their values cannot be determined.
146 \versionadded{2.0}
147 \end{funcdesc}
149 \begin{funcdesc}{normalize}{localename}
150 Returns a normalized locale code for the given locale name. The
151 returned locale code is formatted for use with
152 \function{setlocale()}. If normalization fails, the original name
153 is returned unchanged.
155 If the given encoding is not known, the function defaults to
156 the default encoding for the locale code just like
157 \function{setlocale()}.
158 \versionadded{2.0}
159 \end{funcdesc}
161 \begin{funcdesc}{resetlocale}{\optional{category}}
162 Sets the locale for \var{category} to the default setting.
164 The default setting is determined by calling
165 \function{getdefaultlocale()}. \var{category} defaults to
166 \constant{LC_ALL}.
167 \versionadded{2.0}
168 \end{funcdesc}
170 \begin{funcdesc}{strcoll}{string1, string2}
171 Compares two strings according to the current
172 \constant{LC_COLLATE} setting. As any other compare function,
173 returns a negative, or a positive value, or \code{0}, depending on
174 whether \var{string1} collates before or after \var{string2} or is
175 equal to it.
176 \end{funcdesc}
178 \begin{funcdesc}{strxfrm}{string}
179 Transforms a string to one that can be used for the built-in
180 function \function{cmp()}\bifuncindex{cmp}, and still returns
181 locale-aware results. This function can be used when the same
182 string is compared repeatedly, e.g. when collating a sequence of
183 strings.
184 \end{funcdesc}
186 \begin{funcdesc}{format}{format, val\optional{, grouping}}
187 Formats a number \var{val} according to the current
188 \constant{LC_NUMERIC} setting. The format follows the conventions
189 of the \code{\%} operator. For floating point values, the decimal
190 point is modified if appropriate. If \var{grouping} is true, also
191 takes the grouping into account.
192 \end{funcdesc}
194 \begin{funcdesc}{str}{float}
195 Formats a floating point number using the same format as the
196 built-in function \code{str(\var{float})}, but takes the decimal
197 point into account.
198 \end{funcdesc}
200 \begin{funcdesc}{atof}{string}
201 Converts a string to a floating point number, following the
202 \constant{LC_NUMERIC} settings.
203 \end{funcdesc}
205 \begin{funcdesc}{atoi}{string}
206 Converts a string to an integer, following the
207 \constant{LC_NUMERIC} conventions.
208 \end{funcdesc}
210 \begin{datadesc}{LC_CTYPE}
211 \refstmodindex{string}
212 Locale category for the character type functions. Depending on the
213 settings of this category, the functions of module
214 \refmodule{string} dealing with case change their behaviour.
215 \end{datadesc}
217 \begin{datadesc}{LC_COLLATE}
218 Locale category for sorting strings. The functions
219 \function{strcoll()} and \function{strxfrm()} of the
220 \module{locale} module are affected.
221 \end{datadesc}
223 \begin{datadesc}{LC_TIME}
224 Locale category for the formatting of time. The function
225 \function{time.strftime()} follows these conventions.
226 \end{datadesc}
228 \begin{datadesc}{LC_MONETARY}
229 Locale category for formatting of monetary values. The available
230 options are available from the \function{localeconv()} function.
231 \end{datadesc}
233 \begin{datadesc}{LC_MESSAGES}
234 Locale category for message display. Python currently does not
235 support application specific locale-aware messages. Messages
236 displayed by the operating system, like those returned by
237 \function{os.strerror()} might be affected by this category.
238 \end{datadesc}
240 \begin{datadesc}{LC_NUMERIC}
241 Locale category for formatting numbers. The functions
242 \function{format()}, \function{atoi()}, \function{atof()} and
243 \function{str()} of the \module{locale} module are affected by that
244 category. All other numeric formatting operations are not
245 affected.
246 \end{datadesc}
248 \begin{datadesc}{LC_ALL}
249 Combination of all locale settings. If this flag is used when the
250 locale is changed, setting the locale for all categories is
251 attempted. If that fails for any category, no category is changed at
252 all. When the locale is retrieved using this flag, a string
253 indicating the setting for all categories is returned. This string
254 can be later used to restore the settings.
255 \end{datadesc}
257 \begin{datadesc}{CHAR_MAX}
258 This is a symbolic constant used for different values returned by
259 \function{localeconv()}.
260 \end{datadesc}
262 Example:
264 \begin{verbatim}
265 >>> import locale
266 >>> loc = locale.setlocale(locale.LC_ALL) # get current locale
267 >>> locale.setlocale(locale.LC_ALL, 'de') # use German locale
268 >>> locale.strcoll('f\xe4n', 'foo') # compare a string containing an umlaut
269 >>> locale.setlocale(locale.LC_ALL, '') # use user's preferred locale
270 >>> locale.setlocale(locale.LC_ALL, 'C') # use default (C) locale
271 >>> locale.setlocale(locale.LC_ALL, loc) # restore saved locale
272 \end{verbatim}
275 \subsection{Background, details, hints, tips and caveats}
277 The C standard defines the locale as a program-wide property that may
278 be relatively expensive to change. On top of that, some
279 implementation are broken in such a way that frequent locale changes
280 may cause core dumps. This makes the locale somewhat painful to use
281 correctly.
283 Initially, when a program is started, the locale is the \samp{C} locale, no
284 matter what the user's preferred locale is. The program must
285 explicitly say that it wants the user's preferred locale settings by
286 calling \code{setlocale(LC_ALL, '')}.
288 It is generally a bad idea to call \function{setlocale()} in some library
289 routine, since as a side effect it affects the entire program. Saving
290 and restoring it is almost as bad: it is expensive and affects other
291 threads that happen to run before the settings have been restored.
293 If, when coding a module for general use, you need a locale
294 independent version of an operation that is affected by the locale
295 (e.g. \function{string.lower()}, or certain formats used with
296 \function{time.strftime()})), you will have to find a way to do it
297 without using the standard library routine. Even better is convincing
298 yourself that using locale settings is okay. Only as a last resort
299 should you document that your module is not compatible with
300 non-\samp{C} locale settings.
302 The case conversion functions in the
303 \refmodule{string}\refstmodindex{string} module are affected by the
304 locale settings. When a call to the \function{setlocale()} function
305 changes the \constant{LC_CTYPE} settings, the variables
306 \code{string.lowercase}, \code{string.uppercase} and
307 \code{string.letters} are recalculated. Note that this code that uses
308 these variable through `\keyword{from} ... \keyword{import} ...',
309 e.g.\ \code{from string import letters}, is not affected by subsequent
310 \function{setlocale()} calls.
312 The only way to perform numeric operations according to the locale
313 is to use the special functions defined by this module:
314 \function{atof()}, \function{atoi()}, \function{format()},
315 \function{str()}.
317 \subsection{For extension writers and programs that embed Python
318 \label{embedding-locale}}
320 Extension modules should never call \function{setlocale()}, except to
321 find out what the current locale is. But since the return value can
322 only be used portably to restore it, that is not very useful (except
323 perhaps to find out whether or not the locale is \samp{C}).
325 When Python is embedded in an application, if the application sets the
326 locale to something specific before initializing Python, that is
327 generally okay, and Python will use whatever locale is set,
328 \emph{except} that the \constant{LC_NUMERIC} locale should always be
329 \samp{C}.
331 The \function{setlocale()} function in the \module{locale} module
332 gives the Python programmer the impression that you can manipulate the
333 \constant{LC_NUMERIC} locale setting, but this not the case at the C
334 level: C code will always find that the \constant{LC_NUMERIC} locale
335 setting is \samp{C}. This is because too much would break when the
336 decimal point character is set to something else than a period
337 (e.g. the Python parser would break). Caveat: threads that run
338 without holding Python's global interpreter lock may occasionally find
339 that the numeric locale setting differs; this is because the only
340 portable way to implement this feature is to set the numeric locale
341 settings to what the user requests, extract the relevant
342 characteristics, and then restore the \samp{C} numeric locale.
344 When Python code uses the \module{locale} module to change the locale,
345 this also affects the embedding application. If the embedding
346 application doesn't want this to happen, it should remove the
347 \module{_locale} extension module (which does all the work) from the
348 table of built-in modules in the \file{config.c} file, and make sure
349 that the \module{_locale} module is not accessible as a shared library.