1 .\" $NetBSD: nls.7,v 1.14 2008/04/30 13:10:57 martin Exp $
3 .\" Copyright (c) 2003 The NetBSD Foundation, Inc.
4 .\" All rights reserved.
6 .\" This code is derived from software contributed to The NetBSD Foundation
7 .\" by Gregory McGarry.
9 .\" Redistribution and use in source and binary forms, with or without
10 .\" modification, are permitted provided that the following conditions
12 .\" 1. Redistributions of source code must retain the above copyright
13 .\" notice, this list of conditions and the following disclaimer.
14 .\" 2. Redistributions in binary form must reproduce the above copyright
15 .\" notice, this list of conditions and the following disclaimer in the
16 .\" documentation and/or other materials provided with the distribution.
18 .\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
19 .\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
20 .\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
21 .\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
22 .\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
23 .\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
24 .\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
25 .\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
26 .\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
27 .\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
28 .\" POSSIBILITY OF SUCH DAMAGE.
35 .Nd Native Language Support Overview
37 Native Language Support (NLS) provides commands for a single
38 worldwide operating system base.
39 An internationalized system has no built-in assumptions or dependencies
40 on language-specific or cultural-specific conventions such as:
42 .Bl -bullet -offset indent -compact
44 Character classifications
46 Character comparison rules
48 Character collation order
50 Numeric and monetary formatting
52 Date and time formatting
59 All information pertaining to cultural conventions and language is
60 obtained at program run time.
62 .Dq Internationalization
65 refers to the operation by which system software is developed to support
66 multiple cultural-specific and language-specific conventions.
67 This is a generalization process by which the system is untied from
68 calling only English strings or other English-specific conventions.
72 refers to the operations by which the user environment is customized to
73 handle its input and output appropriate for specific language and cultural
75 This is a specialization process, by which generic methods already
76 implemented in an internationalized system are used in specific ways.
77 The formal description of cultural conventions for some country, together
78 with all associated translations targeted to the native language, is
83 provides extensive support to programmers and system developers to
84 enable internationalized software to be developed.
86 also supplies a large variety of locales for system localization.
87 .Ss Localization of Information
88 All locale information is accessible to programs at run time so that
89 data is processed and displayed correctly for specific cultural
90 conventions and language.
92 A locale is divided into categories.
93 A category is a group of language-specific and culture-specific conventions
94 as outlined in the list above.
95 ISO C specifies the following six standard categories supported by
98 .Bl -tag -compact -width LC_MONETARYXX
100 string-collation order information
102 character classification, case conversion, and other character attributes
104 the format for affirmative and negative responses
106 rules and symbols for formatting monetary numeric information
108 rules and symbols for formatting nonmonetary numeric information
110 rules and symbols for formatting time and date information
113 Localization of the system is achieved by setting appropriate values
114 in environment variables to identify which locale should be used.
115 The environment variables have the same names as their respective
122 environment variables are used.
125 environment variable specifies a colon-separated list of directory names
126 where the message catalog files of the NLS database are located.
131 environment variables also determine the current locale.
133 The values of these environment variables contains a string format as:
136 language[_territory][.codeset][@modifier]
139 Valid values for the language field come from the ISO639 standard which
140 defines two-character codes for many languages.
141 Some common language codes are:
143 .Bl -column "PERSIAN (farsi)" "Sy Code" "OCEANIC/INDONESIAN"
144 .It Sy Language Name Ta Sy Code Ta Sy Language Family
145 .It ABKHAZIAN AB IBERO-CAUCASIAN
146 .It AFAN (OROMO) OM HAMITIC
148 .It AFRIKAANS AF GERMANIC
149 .It ALBANIAN SQ INDO-EUROPEAN (OTHER)
150 .It AMHARIC AM SEMITIC
151 .It ARABIC AR SEMITIC
152 .It ARMENIAN HY INDO-EUROPEAN (OTHER)
153 .It ASSAMESE AS INDIAN
154 .It AYMARA AY AMERINDIAN
155 .It AZERBAIJANI AZ TURKIC/ALTAIC
156 .It BASHKIR BA TURKIC/ALTAIC
158 .It BENGALI BN INDIAN
161 .It BISLAMA Ta BI Ta ""
163 .It BULGARIAN BG SLAVIC
165 .It BYELORUSSIAN BE SLAVIC
166 .It CAMBODIAN KM ASIAN
167 .It CATALAN CA ROMANCE
169 .It CORSICAN CO ROMANCE
170 .It CROATIAN HR SLAVIC
172 .It DANISH DA GERMANIC
173 .It DUTCH NL GERMANIC
174 .It ENGLISH EN GERMANIC
175 .It ESPERANTO EO INTERNATIONAL AUX.
176 .It ESTONIAN ET FINNO-UGRIC
177 .It FAROESE FO GERMANIC
178 .It FIJI FJ OCEANIC/INDONESIAN
179 .It FINNISH FI FINNO-UGRIC
180 .It FRENCH FR ROMANCE
181 .It FRISIAN FY GERMANIC
182 .It GALICIAN GL ROMANCE
183 .It GEORGIAN KA IBERO-CAUCASIAN
184 .It GERMAN DE GERMANIC
185 .It GREEK EL LATIN/GREEK
186 .It GREENLANDIC KL ESKIMO
187 .It GUARANI GN AMERINDIAN
188 .It GUJARATI GU INDIAN
189 .It HAUSA HA NEGRO-AFRICAN
190 .It HEBREW HE SEMITIC
192 .It HUNGARIAN HU FINNO-UGRIC
193 .It ICELANDIC IS GERMANIC
194 .It INDONESIAN ID OCEANIC/INDONESIAN
195 .It INTERLINGUA IA INTERNATIONAL AUX.
196 .It INTERLINGUE IE INTERNATIONAL AUX.
197 .It INUKTITUT Ta IU Ta ""
198 .It INUPIAK IK ESKIMO
200 .It ITALIAN IT ROMANCE
201 .It JAPANESE JA ASIAN
202 .It JAVANESE JV OCEANIC/INDONESIAN
203 .It KANNADA KN DRAVIDIAN
204 .It KASHMIRI KS INDIAN
205 .It KAZAKH KK TURKIC/ALTAIC
206 .It KINYARWANDA RW NEGRO-AFRICAN
207 .It KIRGHIZ KY TURKIC/ALTAIC
208 .It KURUNDI RN NEGRO-AFRICAN
210 .It KURDISH KU IRANIAN
211 .It LAOTHIAN LO ASIAN
212 .It LATIN LA LATIN/GREEK
213 .It LATVIAN LV BALTIC
214 .It LINGALA LN NEGRO-AFRICAN
215 .It LITHUANIAN LT BALTIC
216 .It MACEDONIAN MK SLAVIC
217 .It MALAGASY MG OCEANIC/INDONESIAN
218 .It MALAY MS OCEANIC/INDONESIAN
219 .It MALAYALAM ML DRAVIDIAN
220 .It MALTESE MT SEMITIC
221 .It MAORI MI OCEANIC/INDONESIAN
222 .It MARATHI MR INDIAN
223 .It MOLDAVIAN MO ROMANCE
224 .It MONGOLIAN Ta MN Ta ""
225 .It NAURU Ta NA Ta ""
227 .It NORWEGIAN NO GERMANIC
228 .It OCCITAN OC ROMANCE
230 .It PASHTO PS IRANIAN
231 .It PERSIAN (farsi) FA IRANIAN
233 .It PORTUGUESE PT ROMANCE
234 .It PUNJABI PA INDIAN
235 .It QUECHUA QU AMERINDIAN
236 .It RHAETO-ROMANCE RM ROMANCE
237 .It ROMANIAN RO ROMANCE
238 .It RUSSIAN RU SLAVIC
239 .It SAMOAN SM OCEANIC/INDONESIAN
240 .It SANGHO SG NEGRO-AFRICAN
241 .It SANSKRIT SA INDIAN
242 .It SCOTS GAELIC GD CELTIC
243 .It SERBIAN SR SLAVIC
244 .It SERBO-CROATIAN SH SLAVIC
245 .It SESOTHO ST NEGRO-AFRICAN
246 .It SETSWANA TN NEGRO-AFRICAN
247 .It SHONA SN NEGRO-AFRICAN
249 .It SINGHALESE SI INDIAN
250 .It SISWATI SS NEGRO-AFRICAN
252 .It SLOVENIAN SL SLAVIC
253 .It SOMALI SO HAMITIC
254 .It SPANISH ES ROMANCE
255 .It SUNDANESE SU OCEANIC/INDONESIAN
256 .It SWAHILI SW NEGRO-AFRICAN
257 .It SWEDISH SV GERMANIC
258 .It TAGALOG TL OCEANIC/INDONESIAN
260 .It TAMIL TA DRAVIDIAN
261 .It TATAR TT TURKIC/ALTAIC
262 .It TELUGU TE DRAVIDIAN
265 .It TIGRINYA TI SEMITIC
266 .It TONGA TO OCEANIC/INDONESIAN
267 .It TSONGA TS NEGRO-AFRICAN
268 .It TURKISH TR TURKIC/ALTAIC
269 .It TURKMEN TK TURKIC/ALTAIC
270 .It TWI TW NEGRO-AFRICAN
271 .It UIGUR Ta UG Ta ""
272 .It UKRAINIAN UK SLAVIC
274 .It UZBEK UZ TURKIC/ALTAIC
275 .It VIETNAMESE VI ASIAN
276 .It VOLAPUK VO INTERNATIONAL AUX.
278 .It WOLOF WO NEGRO-AFRICAN
279 .It XHOSA XH NEGRO-AFRICAN
280 .It YIDDISH YI GERMANIC
281 .It YORUBA YO NEGRO-AFRICAN
282 .It ZHUANG Ta ZA Ta ""
283 .It ZULU ZU NEGRO-AFRICAN
286 For example, the locale for the Danish language spoken in Denmark
287 using the ISO 8859-1 character set is da_DK.ISO8859-1.
288 The da stands for the Danish language and the DK stands for Denmark.
289 The short form of da_DK is sufficient to indicate this locale.
291 The environment variable settings are queried by their priority level
292 in the following manner:
298 environment variable is set, all six categories use the locale it
303 environment variable is not set, each individual category uses the
304 locale specified by its corresponding environment variable.
308 environment variable is not set, and a value for a particular
310 environment variable is not set, the value of the
312 environment variable specifies the default locale for all categories.
315 environment variable should be set in /etc/profile, since it makes it
316 most easy for the user to override the system default using the individual
322 environment variable is not set, a value for a particular
324 environment variable is not set, and the value of the
326 environment variable is not set, the locale for that specific
327 category defaults to the C locale.
328 The C or POSIX locale assumes the ASCII character set and defines
329 information for the six categories.
332 A character is any symbol used for the organization, control, or
333 representation of data.
334 A group of such symbols used to describe a
335 particular language make up a character set.
336 It is the encoding values in a character set that provide
337 the interface between the system and its input and output devices.
339 The following character sets are supported in
341 .Bl -tag -width ISO_8859_family
343 The American Standard Code for Information Exchange (ASCII) standard
344 specifies 128 Roman characters and control codes, encoded in a 7-bit
345 character encoding scheme.
347 Industry-standard character sets specified by the ISO/IEC 8859
349 The standard is divided into 15 numbered parts, with each
350 part specifying broad script similarities.
351 Examples include Western European, Central European, Arabic, Cyrillic,
352 Hebrew, Greek, and Turkish.
353 The character sets use an 8-bit character encoding scheme which is
354 compatible with the ASCII character set.
356 The Unicode character set is the full set of known abstract characters of
357 all real-world scripts. It can be used in environments where multiple
358 scripts must be processed simultaneously.
359 Unicode is compatible with ISO 8859-1 (Western European) and ASCII.
360 Many character encoding schemes are available for Unicode, including UTF-8,
362 These encoding schemes are multi-byte encodings.
363 The UTF-8 encoding scheme uses 8-bit, variable-width encodings which is
364 compatible with ASCII.
365 The UTF-16 encoding scheme uses 16-bit, variable-width encodings.
366 The UTF-32 encoding scheme using 32-bit, fixed-width encodings.
369 A font set contains the glyphs to be displayed on the screen for a
370 corresponding character in a character set.
371 A display must support a suitable font to display a character set.
372 If suitable fonts are available to the X server, then X clients can
373 include support for different character sets.
375 includes support for Unicode with UTF-8 encoding.
377 is useful for displaying all the characters in an X font.
382 console provides support for loading fonts using the
385 Currently, only fonts for the ISO8859-1 family of character sets are
387 .Ss Internationalization for Programmers
388 To facilitate translations of messages into various languages and to
389 make the translated messages available to the program based on a
390 user's locale, it is necessary to keep messages separate from the
391 programs and provide them in the form of message catalogs that a
392 program can access at run time.
394 Access to locale information is provided through the
399 See their respective man pages for further information.
401 Message source files containing application messages are created by
402 the programmer and converted to message catalogs.
403 These catalogs are used by the application to retrieve and display
407 supports two message catalog interfaces: the X/Open
409 interface and the Uniforum
414 interface has the advantage that it belongs to a standard which is
416 Unfortunately the interface is complicated to use and
417 maintenance of the catalogs is difficult.
418 The implementation also doesn't support different character sets.
421 interface has not been standardized yet, however it is being supported
422 by an increasing number of systems.
423 It also provides many additional tools which make programming and
424 catalog maintenance much easier.
425 .Ss Support for Multi-byte Encodings
426 Some character sets with multi-byte encodings may be difficult to decode,
427 or may contain state (i.e., adjacent characters are dependent).
428 ISO C specifies a set of functions using 'wide characters' which can handle
429 multi-byte encodings properly.
430 The behaviour of these functions is affected
433 category of the current locale.
435 A wide character is specified in ISO C
436 as being a fixed number of bits wide and is stateless.
437 There are two types for wide characters:
442 is a type which can contain one wide character and operates like 'char'
443 type does for one character.
445 can contain one wide character or WEOF (wide EOF).
447 There are functions that operate on
449 and substitute for functions operating on 'char'.
455 There are some additional functions that operate on
463 Wide characters should be used for all I/O processing which may rely
464 on locale-specific strings.
465 The two primary issues requiring special use of wide characters are:
466 .Bl -bullet -offset indent
468 All I/O is performed using multibyte characters.
469 Input data is converted into wide characters immediately after
470 reading and data for output is converted from wide characters to
471 multi-byte encoding immediately before writing.
472 Conversion is controlled by the
482 Wide characters are used directly for I/O, using
493 They are also used for formatted I/O functions for wide characters
505 and wide character identifier of %lc, %C, %ls, %S for conventional
506 formatted I/O functions.
518 This man page is incomplete.