Ditched '_find_SET()', since it was a no-value-added wrapper around
[python/dscho.git] / Doc / lib / liblocale.tex
blobbb84343f3bbedada64021020756b5e899f0118bb
1 \section{\module{locale} ---
2 Internationalization services}
4 \declaremodule{standard}{locale}
5 \modulesynopsis{Internationalization services.}
6 \moduleauthor{Martin von Loewis}{loewis@informatik.hu-berlin.de}
7 \sectionauthor{Martin von Loewis}{loewis@informatik.hu-berlin.de}
10 The \module{locale} module opens access to the \POSIX{} locale database
11 and functionality. The \POSIX{} locale mechanism allows programmers
12 to deal with certain cultural issues in an application, without
13 requiring the programmer to know all the specifics of each country
14 where the software is executed.
16 The \module{locale} module is implemented on top of the
17 \module{_locale}\refbimodindex{_locale} module, which in turn uses an
18 ANSI C locale implementation if available.
20 The \module{locale} module defines the following exception and
21 functions:
24 \begin{funcdesc}{setlocale}{category\optional{, value}}
25 If \var{value} is specified, modifies the locale setting for the
26 \var{category}. The available categories are listed in the data
27 description below. The value is the name of a locale. An empty string
28 specifies the user's default settings. If the modification of the
29 locale fails, the exception \exception{Error} is
30 raised. If successful, the new locale setting is returned.
32 If no \var{value} is specified, the current setting for the
33 \var{category} is returned.
35 \function{setlocale()} is not thread safe on most systems. Applications
36 typically start with a call of
37 \begin{verbatim}
38 import locale
39 locale.setlocale(locale.LC_ALL,"")
40 \end{verbatim}
41 This sets the locale for all categories to the user's default setting
42 (typically specified in the \envvar{LANG} environment variable). If
43 the locale is not changed thereafter, using multithreading should not
44 cause problems.
45 \end{funcdesc}
47 \begin{excdesc}{Error}
48 Exception raised when \function{setlocale()} fails.
49 \end{excdesc}
51 \begin{funcdesc}{localeconv}{}
52 Returns the database of of the local conventions as a dictionary. This
53 dictionary has the following strings as keys:
54 \begin{itemize}
55 \item \code{decimal_point} specifies the decimal point used in
56 floating point number representations for the \constant{LC_NUMERIC}
57 category.
58 \item \code{grouping} is a sequence of numbers specifying at which
59 relative positions the \code{thousands_sep} is expected. If the
60 sequence is terminated with \constant{CHAR_MAX}, no further
61 grouping is performed. If the sequence terminates with a \code{0}, the last
62 group size is repeatedly used.
63 \item \code{thousands_sep} is the character used between groups.
64 \item \code{int_curr_symbol} specifies the international currency
65 symbol from the \constant{LC_MONETARY} category.
66 \item \code{currency_symbol} is the local currency symbol.
67 \item \code{mon_decimal_point} is the decimal point used in monetary
68 values.
69 \item \code{mon_thousands_sep} is the separator for grouping of
70 monetary values.
71 \item \code{mon_grouping} has the same format as the \code{grouping}
72 key; it is used for monetary values.
73 \item \code{positive_sign} and \code{negative_sign} gives the sign
74 used for positive and negative monetary quantities.
75 \item \code{int_frac_digits} and \code{frac_digits} specify the number
76 of fractional digits used in the international and local formatting
77 of monetary values.
78 \item \code{p_cs_precedes} and \code{n_cs_precedes} specifies whether
79 the currency symbol precedes the value for positive or negative
80 values.
81 \item \code{p_sep_by_space} and \code{n_sep_by_space} specifies
82 whether there is a space between the positive or negative value and
83 the currency symbol.
84 \item \code{p_sign_posn} and \code{n_sign_posn} indicate how the
85 sign should be placed for positive and negative monetary values.
86 \end{itemize}
88 The possible values for \code{p_sign_posn} and
89 \code{n_sign_posn} are given below.
91 \begin{tableii}{c|l}{code}{Value}{Explanation}
92 \lineii{0}{Currency and value are surrounded by parentheses.}
93 \lineii{1}{The sign should precede the value and currency symbol.}
94 \lineii{2}{The sign should follow the value and currency symbol.}
95 \lineii{3}{The sign should immediately precede the value.}
96 \lineii{4}{The sign should immediately follow the value.}
97 \lineii{LC_MAX}{Nothing is specified in this locale.}
98 \end{tableii}
99 \end{funcdesc}
101 \begin{funcdesc}{strcoll}{string1,string2}
102 Compares two strings according to the current \constant{LC_COLLATE}
103 setting. As any other compare function, returns a negative, or a
104 positive value, or \code{0}, depending on whether \var{string1}
105 collates before or after \var{string2} or is equal to it.
106 \end{funcdesc}
108 \begin{funcdesc}{strxfrm}{string}
109 Transforms a string to one that can be used for the built-in function
110 \function{cmp()}\bifuncindex{cmp}, and still returns locale-aware
111 results. This function can be used when the same string is compared
112 repeatedly, e.g. when collating a sequence of strings.
113 \end{funcdesc}
115 \begin{funcdesc}{format}{format, val, \optional{grouping\code{ = 0}}}
116 Formats a number \var{val} according to the current
117 \constant{LC_NUMERIC} setting. The format follows the conventions of
118 the \code{\%} operator. For floating point values, the decimal point
119 is modified if appropriate. If \var{grouping} is true, also takes the
120 grouping into account.
121 \end{funcdesc}
123 \begin{funcdesc}{str}{float}
124 Formats a floating point number using the same format as the built-in
125 function \code{str(\var{float})}, but takes the decimal point into
126 account.
127 \end{funcdesc}
129 \begin{funcdesc}{atof}{string}
130 Converts a string to a floating point number, following the
131 \constant{LC_NUMERIC} settings.
132 \end{funcdesc}
134 \begin{funcdesc}{atoi}{string}
135 Converts a string to an integer, following the \constant{LC_NUMERIC}
136 conventions.
137 \end{funcdesc}
139 \begin{datadesc}{LC_CTYPE}
140 \refstmodindex{string}
141 Locale category for the character type functions. Depending on the
142 settings of this category, the functions of module \refmodule{string}
143 dealing with case change their behaviour.
144 \end{datadesc}
146 \begin{datadesc}{LC_COLLATE}
147 Locale category for sorting strings. The functions
148 \function{strcoll()} and \function{strxfrm()} of the \module{locale}
149 module are affected.
150 \end{datadesc}
152 \begin{datadesc}{LC_TIME}
153 Locale category for the formatting of time. The function
154 \function{time.strftime()} follows these conventions.
155 \end{datadesc}
157 \begin{datadesc}{LC_MONETARY}
158 Locale category for formatting of monetary values. The available
159 options are available from the \function{localeconv()} function.
160 \end{datadesc}
162 \begin{datadesc}{LC_MESSAGES}
163 Locale category for message display. Python currently does not support
164 application specific locale-aware messages. Messages displayed by the
165 operating system, like those returned by \function{os.strerror()}
166 might be affected by this category.
167 \end{datadesc}
169 \begin{datadesc}{LC_NUMERIC}
170 Locale category for formatting numbers. The functions
171 \function{format()}, \function{atoi()}, \function{atof()} and
172 \function{str()} of the \module{locale} module are affected by that
173 category. All other numeric formatting operations are not affected.
174 \end{datadesc}
176 \begin{datadesc}{LC_ALL}
177 Combination of all locale settings. If this flag is used when the
178 locale is changed, setting the locale for all categories is
179 attempted. If that fails for any category, no category is changed at
180 all. When the locale is retrieved using this flag, a string indicating
181 the setting for all categories is returned. This string can be later
182 used to restore the settings.
183 \end{datadesc}
185 \begin{datadesc}{CHAR_MAX}
186 This is a symbolic constant used for different values returned by
187 \function{localeconv()}.
188 \end{datadesc}
190 Example:
192 \begin{verbatim}
193 >>> import locale
194 >>> loc = locale.setlocale(locale.LC_ALL) # get current locale
195 >>> locale.setlocale(locale.LC_ALL, "de") # use German locale
196 >>> locale.strcoll("f\344n", "foo") # compare a string containing an umlaut
197 >>> locale.setlocale(locale.LC_ALL, "") # use user's preferred locale
198 >>> locale.setlocale(locale.LC_ALL, "C") # use default (C) locale
199 >>> locale.setlocale(locale.LC_ALL, loc) # restore saved locale
200 \end{verbatim}
202 \subsection{Background, details, hints, tips and caveats}
204 The C standard defines the locale as a program-wide property that may
205 be relatively expensive to change. On top of that, some
206 implementation are broken in such a way that frequent locale changes
207 may cause core dumps. This makes the locale somewhat painful to use
208 correctly.
210 Initially, when a program is started, the locale is the \samp{C} locale, no
211 matter what the user's preferred locale is. The program must
212 explicitly say that it wants the user's preferred locale settings by
213 calling \code{setlocale(LC_ALL, "")}.
215 It is generally a bad idea to call \function{setlocale()} in some library
216 routine, since as a side effect it affects the entire program. Saving
217 and restoring it is almost as bad: it is expensive and affects other
218 threads that happen to run before the settings have been restored.
220 If, when coding a module for general use, you need a locale
221 independent version of an operation that is affected by the locale
222 (e.g. \function{string.lower()}, or certain formats used with
223 \function{time.strftime()})), you will have to find a way to do it
224 without using the standard library routine. Even better is convincing
225 yourself that using locale settings is okay. Only as a last resort
226 should you document that your module is not compatible with
227 non-\samp{C} locale settings.
229 The case conversion functions in the
230 \refmodule{string}\refstmodindex{string} and
231 \module{strop}\refbimodindex{strop} modules are affected by the locale
232 settings. When a call to the \function{setlocale()} function changes
233 the \constant{LC_CTYPE} settings, the variables
234 \code{string.lowercase}, \code{string.uppercase} and
235 \code{string.letters} (and their counterparts in \module{strop}) are
236 recalculated. Note that this code that uses these variable through
237 `\keyword{from} ... \keyword{import} ...', e.g. \code{from string
238 import letters}, is not affected by subsequent \function{setlocale()}
239 calls.
241 The only way to perform numeric operations according to the locale
242 is to use the special functions defined by this module:
243 \function{atof()}, \function{atoi()}, \function{format()},
244 \function{str()}.
246 \subsection{For extension writers and programs that embed Python}
247 \label{embedding-locale}
249 Extension modules should never call \function{setlocale()}, except to
250 find out what the current locale is. But since the return value can
251 only be used portably to restore it, that is not very useful (except
252 perhaps to find out whether or not the locale is \samp{C}).
254 When Python is embedded in an application, if the application sets the
255 locale to something specific before initializing Python, that is
256 generally okay, and Python will use whatever locale is set,
257 \emph{except} that the \constant{LC_NUMERIC} locale should always be
258 \samp{C}.
260 The \function{setlocale()} function in the \module{locale} module
261 gives the Python progammer the impression that you can manipulate the
262 \constant{LC_NUMERIC} locale setting, but this not the case at the C
263 level: C code will always find that the \constant{LC_NUMERIC} locale
264 setting is \samp{C}. This is because too much would break when the
265 decimal point character is set to something else than a period
266 (e.g. the Python parser would break). Caveat: threads that run
267 without holding Python's global interpreter lock may occasionally find
268 that the numeric locale setting differs; this is because the only
269 portable way to implement this feature is to set the numeric locale
270 settings to what the user requests, extract the relevant
271 characteristics, and then restore the \samp{C} numeric locale.
273 When Python code uses the \module{locale} module to change the locale,
274 this also affects the embedding application. If the embedding
275 application doesn't want this to happen, it should remove the
276 \module{_locale} extension module (which does all the work) from the
277 table of built-in modules in the \file{config.c} file, and make sure
278 that the \module{_locale} module is not accessible as a shared library.