share/man/man7/nls.7

   1 .\"     $NetBSD: nls.7,v 1.15 2009/04/09 02:51:54 joerg Exp $
   2 .\"
   3 .\" Copyright (c) 2003 The NetBSD Foundation, Inc.
   4 .\" All rights reserved.
   5 .\"
   6 .\" This code is derived from software contributed to The NetBSD Foundation
   7 .\" by Gregory McGarry.
   8 .\"
   9 .\" Redistribution and use in source and binary forms, with or without
  10 .\" modification, are permitted provided that the following conditions
  11 .\" are met:
  12 .\" 1. Redistributions of source code must retain the above copyright
  13 .\"    notice, this list of conditions and the following disclaimer.
  14 .\" 2. Redistributions in binary form must reproduce the above copyright
  15 .\"    notice, this list of conditions and the following disclaimer in the
  16 .\"    documentation and/or other materials provided with the distribution.
  17 .\"
  18 .\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
  19 .\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
  20 .\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
  21 .\" PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
  22 .\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
  23 .\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
  24 .\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
  25 .\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
  26 .\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  27 .\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
  28 .\" POSSIBILITY OF SUCH DAMAGE.
  29 .\"
  30 .Dd February 21, 2007
  31 .Dt NLS 7
  32 .Os
  33 .Sh NAME
  34 .Nm NLS
  35 .Nd Native Language Support Overview
  36 .Sh DESCRIPTION
  37 Native Language Support (NLS) provides commands for a single
  38 worldwide operating system base.
  39 An internationalized system has no built-in assumptions or dependencies
  40 on language-specific or cultural-specific conventions such as:
  41 .Pp
  42 .Bl -bullet -offset indent -compact
  43 .It
  44 Character classifications
  45 .It
  46 Character comparison rules
  47 .It
  48 Character collation order
  49 .It
  50 Numeric and monetary formatting
  51 .It
  52 Date and time formatting
  53 .It
  54 Message-text language
  55 .It
  56 Character sets
  57 .El
  58 .Pp
  59 All information pertaining to cultural conventions and language is
  60 obtained at program run time.
  61 .Pp
  62 .Dq Internationalization
  63 (often abbreviated
  64 .Dq i18n )
  65 refers to the operation by which system software is developed to support
  66 multiple cultural-specific and language-specific conventions.
  67 This is a generalization process by which the system is untied from
  68 calling only English strings or other English-specific conventions.
  69 .Dq Localization
  70 (often abbreviated
  71 .Dq l10n )
  72 refers to the operations by which the user environment is customized to
  73 handle its input and output appropriate for specific language and cultural
  74 conventions.
  75 This is a specialization process, by which generic methods already
  76 implemented in an internationalized system are used in specific ways.
  77 The formal description of cultural conventions for some country, together
  78 with all associated translations targeted to the native language, is
  79 called the
  80 .Dq locale .
  81 .Pp
  82 .Nx
  83 provides extensive support to programmers and system developers to
  84 enable internationalized software to be developed.
  85 .Nx
  86 also supplies a large variety of locales for system localization.
  87 .Ss Localization of Information
  88 All locale information is accessible to programs at run time so that
  89 data is processed and displayed correctly for specific cultural
  90 conventions and language.
  91 .Pp
  92 A locale is divided into categories.
  93 A category is a group of language-specific and culture-specific conventions
  94 as outlined in the list above.
  95 ISO C specifies the following six standard categories supported by
  96 .Nx :
  97 .Pp
  98 .Bl -tag -compact -width LC_MONETARYXX
  99 .It Ev LC_COLLATE
 100 string-collation order information
 101 .It Ev LC_CTYPE
 102 character classification, case conversion, and other character attributes
 103 .It Ev LC_MESSAGES
 104 the format for affirmative and negative responses
 105 .It Ev LC_MONETARY
 106 rules and symbols for formatting monetary numeric information
 107 .It Ev LC_NUMERIC
 108 rules and symbols for formatting nonmonetary numeric information
 109 .It Ev LC_TIME
 110 rules and symbols for formatting time and date information
 111 .El
 112 .Pp
 113 Localization of the system is achieved by setting appropriate values
 114 in environment variables to identify which locale should be used.
 115 The environment variables have the same names as their respective
 116 locale categories.
 117 Additionally, the
 118 .Ev LANG ,
 119 .Ev LC_ALL ,
 120 and
 121 .Ev NLSPATH
 122 environment variables are used.
 123 The
 124 .Ev NLSPATH
 125 environment variable specifies a colon-separated list of directory names
 126 where the message catalog files of the NLS database are located.
 127 The
 128 .Ev LC_ALL
 129 and
 130 .Ev LANG
 131 environment variables also determine the current locale.
 132 .Pp
 133 The values of these environment variables contains a string format as:
 134 .Pp
 135 .Bd -literal
 136         language[_territory][.codeset][@modifier]
 137 .Ed
 138 .Pp
 139 Valid values for the language field come from the ISO639 standard which
 140 defines two-character codes for many languages.
 141 Some common language codes are:
 142 .Pp
 143 .Bl -column "PERSIAN (farsi)" "Sy Code" "OCEANIC/INDONESIAN"
 144 .It Sy Language Name Ta Sy Code Ta Sy Language Family
 145 .It ABKHAZIAN   AB      IBERO-CAUCASIAN
 146 .It AFAN (OROMO)        OM      HAMITIC
 147 .It AFAR        AA      HAMITIC
 148 .It AFRIKAANS   AF      GERMANIC
 149 .It ALBANIAN    SQ      INDO-EUROPEAN (OTHER)
 150 .It AMHARIC     AM      SEMITIC
 151 .It ARABIC      AR      SEMITIC
 152 .It ARMENIAN    HY      INDO-EUROPEAN (OTHER)
 153 .It ASSAMESE    AS      INDIAN
 154 .It AYMARA      AY      AMERINDIAN
 155 .It AZERBAIJANI AZ      TURKIC/ALTAIC
 156 .It BASHKIR     BA      TURKIC/ALTAIC
 157 .It BASQUE      EU      BASQUE
 158 .It BENGALI     BN      INDIAN
 159 .It BHUTANI     DZ      ASIAN
 160 .It BIHARI      BH      INDIAN
 161 .It BISLAMA     Ta BI   Ta ""
 162 .It BRETON      BR      CELTIC
 163 .It BULGARIAN   BG      SLAVIC
 164 .It BURMESE     MY      ASIAN
 165 .It BYELORUSSIAN        BE      SLAVIC
 166 .It CAMBODIAN   KM      ASIAN
 167 .It CATALAN     CA      ROMANCE
 168 .It CHINESE     ZH      ASIAN
 169 .It CORSICAN    CO      ROMANCE
 170 .It CROATIAN    HR      SLAVIC
 171 .It CZECH       CS      SLAVIC
 172 .It DANISH      DA      GERMANIC
 173 .It DUTCH       NL      GERMANIC
 174 .It ENGLISH     EN      GERMANIC
 175 .It ESPERANTO   EO      INTERNATIONAL AUX.
 176 .It ESTONIAN    ET      FINNO-UGRIC
 177 .It FAROESE     FO      GERMANIC
 178 .It FIJI        FJ      OCEANIC/INDONESIAN
 179 .It FINNISH     FI      FINNO-UGRIC
 180 .It FRENCH      FR      ROMANCE
 181 .It FRISIAN     FY      GERMANIC
 182 .It GALICIAN    GL      ROMANCE
 183 .It GEORGIAN    KA      IBERO-CAUCASIAN
 184 .It GERMAN      DE      GERMANIC
 185 .It GREEK       EL      LATIN/GREEK
 186 .It GREENLANDIC KL      ESKIMO
 187 .It GUARANI     GN      AMERINDIAN
 188 .It GUJARATI    GU      INDIAN
 189 .It HAUSA       HA      NEGRO-AFRICAN
 190 .It HEBREW      HE      SEMITIC
 191 .It HINDI       HI      INDIAN
 192 .It HUNGARIAN   HU      FINNO-UGRIC
 193 .It ICELANDIC   IS      GERMANIC
 194 .It INDONESIAN  ID      OCEANIC/INDONESIAN
 195 .It INTERLINGUA IA      INTERNATIONAL AUX.
 196 .It INTERLINGUE IE      INTERNATIONAL AUX.
 197 .It INUKTITUT   Ta IU   Ta ""
 198 .It INUPIAK     IK      ESKIMO
 199 .It IRISH       GA      CELTIC
 200 .It ITALIAN     IT      ROMANCE
 201 .It JAPANESE    JA      ASIAN
 202 .It JAVANESE    JV      OCEANIC/INDONESIAN
 203 .It KANNADA     KN      DRAVIDIAN
 204 .It KASHMIRI    KS      INDIAN
 205 .It KAZAKH      KK      TURKIC/ALTAIC
 206 .It KINYARWANDA RW      NEGRO-AFRICAN
 207 .It KIRGHIZ     KY      TURKIC/ALTAIC
 208 .It KURUNDI     RN      NEGRO-AFRICAN
 209 .It KOREAN      KO      ASIAN
 210 .It KURDISH     KU      IRANIAN
 211 .It LAOTHIAN    LO      ASIAN
 212 .It LATIN       LA      LATIN/GREEK
 213 .It LATVIAN     LV      BALTIC
 214 .It LINGALA     LN      NEGRO-AFRICAN
 215 .It LITHUANIAN  LT      BALTIC
 216 .It MACEDONIAN  MK      SLAVIC
 217 .It MALAGASY    MG      OCEANIC/INDONESIAN
 218 .It MALAY       MS      OCEANIC/INDONESIAN
 219 .It MALAYALAM   ML      DRAVIDIAN
 220 .It MALTESE     MT      SEMITIC
 221 .It MAORI       MI      OCEANIC/INDONESIAN
 222 .It MARATHI     MR      INDIAN
 223 .It MOLDAVIAN   MO      ROMANCE
 224 .It MONGOLIAN   Ta MN   Ta ""
 225 .It NAURU       Ta NA   Ta ""
 226 .It NEPALI      NE      INDIAN
 227 .It NORWEGIAN   NO      GERMANIC
 228 .It OCCITAN     OC      ROMANCE
 229 .It ORIYA       OR      INDIAN
 230 .It PASHTO      PS      IRANIAN
 231 .It PERSIAN (farsi)     FA      IRANIAN
 232 .It POLISH      PL      SLAVIC
 233 .It PORTUGUESE  PT      ROMANCE
 234 .It PUNJABI     PA      INDIAN
 235 .It QUECHUA     QU      AMERINDIAN
 236 .It RHAETO-ROMANCE      RM      ROMANCE
 237 .It ROMANIAN    RO      ROMANCE
 238 .It RUSSIAN     RU      SLAVIC
 239 .It SAMOAN      SM      OCEANIC/INDONESIAN
 240 .It SANGHO      SG      NEGRO-AFRICAN
 241 .It SANSKRIT    SA      INDIAN
 242 .It SCOTS GAELIC        GD      CELTIC
 243 .It SERBIAN     SR      SLAVIC
 244 .It SERBO-CROATIAN      SH      SLAVIC
 245 .It SESOTHO     ST      NEGRO-AFRICAN
 246 .It SETSWANA    TN      NEGRO-AFRICAN
 247 .It SHONA       SN      NEGRO-AFRICAN
 248 .It SINDHI      SD      INDIAN
 249 .It SINGHALESE  SI      INDIAN
 250 .It SISWATI     SS      NEGRO-AFRICAN
 251 .It SLOVAK      SK      SLAVIC
 252 .It SLOVENIAN   SL      SLAVIC
 253 .It SOMALI      SO      HAMITIC
 254 .It SPANISH     ES      ROMANCE
 255 .It SUNDANESE   SU      OCEANIC/INDONESIAN
 256 .It SWAHILI     SW      NEGRO-AFRICAN
 257 .It SWEDISH     SV      GERMANIC
 258 .It TAGALOG     TL      OCEANIC/INDONESIAN
 259 .It TAJIK       TG      IRANIAN
 260 .It TAMIL       TA      DRAVIDIAN
 261 .It TATAR       TT      TURKIC/ALTAIC
 262 .It TELUGU      TE      DRAVIDIAN
 263 .It THAI        TH      ASIAN
 264 .It TIBETAN     BO      ASIAN
 265 .It TIGRINYA    TI      SEMITIC
 266 .It TONGA       TO      OCEANIC/INDONESIAN
 267 .It TSONGA      TS      NEGRO-AFRICAN
 268 .It TURKISH     TR      TURKIC/ALTAIC
 269 .It TURKMEN     TK      TURKIC/ALTAIC
 270 .It TWI TW      NEGRO-AFRICAN
 271 .It UIGUR       Ta UG   Ta ""
 272 .It UKRAINIAN   UK      SLAVIC
 273 .It URDU        UR      INDIAN
 274 .It UZBEK       UZ      TURKIC/ALTAIC
 275 .It VIETNAMESE  VI      ASIAN
 276 .It VOLAPUK     VO      INTERNATIONAL AUX.
 277 .It WELSH       CY      CELTIC
 278 .It WOLOF       WO      NEGRO-AFRICAN
 279 .It XHOSA       XH      NEGRO-AFRICAN
 280 .It YIDDISH     YI      GERMANIC
 281 .It YORUBA      YO      NEGRO-AFRICAN
 282 .It ZHUANG      Ta ZA   Ta ""
 283 .It ZULU        ZU      NEGRO-AFRICAN
 284 .El
 285 .Pp
 286 For example, the locale for the Danish language spoken in Denmark
 287 using the ISO 8859-1 character set is da_DK.ISO8859-1.
 288 The da stands for the Danish language and the DK stands for Denmark.
 289 The short form of da_DK is sufficient to indicate this locale.
 290 .Pp
 291 The environment variable settings are queried by their priority level
 292 in the following manner:
 293 .Pp
 294 .Bl -bullet
 295 .It
 296 If the
 297 .Ev LC_ALL
 298 environment variable is set, all six categories use the locale it
 299 specifies.
 300 .It
 301 If the
 302 .Ev LC_ALL
 303 environment variable is not set, each individual category uses the
 304 locale specified by its corresponding environment variable.
 305 .It
 306 If the
 307 .Ev LC_ALL
 308 environment variable is not set, and a value for a particular
 309 .Ev LC_*
 310 environment variable is not set, the value of the
 311 .Ev LANG
 312 environment variable specifies the default locale for all categories.
 313 Only the
 314 .Ev LANG
 315 environment variable should be set in /etc/profile, since it makes it
 316 most easy for the user to override the system default using the individual
 317 .Ev LC_*
 318 variables.
 319 .It
 320 If the
 321 .Ev LC_ALL
 322 environment variable is not set, a value for a particular
 323 .Ev LC_*
 324 environment variable is not set, and the value of the
 325 .Ev LANG
 326 environment variable is not set, the locale for that specific
 327 category defaults to the C locale.
 328 The C or POSIX locale assumes the ASCII character set and defines
 329 information for the six categories.
 330 .El
 331 .Ss Character Sets
 332 A character is any symbol used for the organization, control, or
 333 representation of data.
 334 A group of such symbols used to describe a
 335 particular language make up a character set.
 336 It is the encoding values in a character set that provide
 337 the interface between the system and its input and output devices.
 338 .Pp
 339 The following character sets are supported in
 340 .Nx :
 341 .Bl -tag -width ISO_8859_family
 342 .It ASCII
 343 The American Standard Code for Information Exchange (ASCII) standard
 344 specifies 128 Roman characters and control codes, encoded in a 7-bit
 345 character encoding scheme.
 346 .It ISO 8859 family
 347 Industry-standard character sets specified by the ISO/IEC 8859
 348 standard.
 349 The standard is divided into 15 numbered parts, with each
 350 part specifying broad script similarities.
 351 Examples include Western European, Central European, Arabic, Cyrillic,
 352 Hebrew, Greek, and Turkish.
 353 The character sets use an 8-bit character encoding scheme which is
 354 compatible with the ASCII character set.
 355 .It Unicode
 356 The Unicode character set is the full set of known abstract characters of
 357 all real-world scripts.  It can be used in environments where multiple
 358 scripts must be processed simultaneously.
 359 Unicode is compatible with ISO 8859-1 (Western European) and ASCII.
 360 Many character encoding schemes are available for Unicode, including UTF-8,
 361 UTF-16 and UTF-32.
 362 These encoding schemes are multi-byte encodings.
 363 The UTF-8 encoding scheme uses 8-bit, variable-width encodings which is
 364 compatible with ASCII.
 365 The UTF-16 encoding scheme uses 16-bit, variable-width encodings.
 366 The UTF-32 encoding scheme using 32-bit, fixed-width encodings.
 367 .El
 368 .Ss Font Sets
 369 A font set contains the glyphs to be displayed on the screen for a
 370 corresponding character in a character set.
 371 A display must support a suitable font to display a character set.
 372 If suitable fonts are available to the X server, then X clients can
 373 include support for different character sets.
 374 .Xr xterm 1
 375 includes support for Unicode with UTF-8 encoding.
 376 .Xr xfd 1
 377 is useful for displaying all the characters in an X font.
 378 .Pp
 379 The
 380 .Nx
 381 .Xr wscons 4
 382 console provides support for loading fonts using the
 383 .Xr wsfontload 8
 384 utility.
 385 Currently, only fonts for the ISO8859-1 family of character sets are
 386 supported.
 387 .Ss Internationalization for Programmers
 388 To facilitate translations of messages into various languages and to
 389 make the translated messages available to the program based on a
 390 user's locale, it is necessary to keep messages separate from the
 391 programs and provide them in the form of message catalogs that a
 392 program can access at run time.
 393 .Pp
 394 Access to locale information is provided through the
 395 .Xr setlocale 3
 396 and
 397 .Xr nl_langinfo 3
 398 interfaces.
 399 See their respective man pages for further information.
 400 .Pp
 401 Message source files containing application messages are created by
 402 the programmer and converted to message catalogs.
 403 These catalogs are used by the application to retrieve and display
 404 messages, as needed.
 405 .Pp
 406 .Nx
 407 supports two message catalog interfaces: the X/Open
 408 .Xr catgets 3
 409 interface and the Uniforum
 410 .Xr gettext 3
 411 interface.
 412 The
 413 .Xr catgets 3
 414 interface has the advantage that it belongs to a standard which is
 415 well supported.
 416 Unfortunately the interface is complicated to use and
 417 maintenance of the catalogs is difficult.
 418 The implementation also doesn't support different character sets.
 419 The
 420 .Xr gettext 3
 421 interface has not been standardized yet, however it is being supported
 422 by an increasing number of systems.
 423 It also provides many additional tools which make programming and
 424 catalog maintenance much easier.
 425 .Ss Support for Multi-byte Encodings
 426 Some character sets with multi-byte encodings may be difficult to decode,
 427 or may contain state (i.e., adjacent characters are dependent).
 428 ISO C specifies a set of functions using 'wide characters' which can handle
 429 multi-byte encodings properly.
 430 The behaviour of these functions is affected
 431 by the
 432 .Ev LC_CTYPE
 433 category of the current locale.
 434 .Pp
 435 A wide character is specified in ISO C
 436 as being a fixed number of bits wide and is stateless.
 437 There are two types for wide characters:
 438 .Em wchar_t
 439 and
 440 .Em wint_t .
 441 .Em wchar_t
 442 is a type which can contain one wide character and operates like 'char'
 443 type does for one character.
 444 .Em wint_t
 445 can contain one wide character or WEOF (wide EOF).
 446 .Pp
 447 There are functions that operate on
 448 .Em wchar_t ,
 449 and substitute for functions operating on 'char'.
 450 See
 451 .Xr wmemchr 3
 452 and
 453 .Xr towlower 3
 454 for details.
 455 There are some additional functions that operate on
 456 .Em wchar_t .
 457 See
 458 .Xr wctype 3
 459 and
 460 .Xr wctrans 3
 461 for details.
 462 .Pp
 463 Wide characters should be used for all I/O processing which may rely
 464 on locale-specific strings.
 465 The two primary issues requiring special use of wide characters are:
 466 .Bl -bullet -offset indent
 467 .It
 468 All I/O is performed using multibyte characters.
 469 Input data is converted into wide characters immediately after
 470 reading and data for output is converted from wide characters to
 471 multi-byte encoding immediately before writing.
 472 Conversion is controlled by the
 473 .Xr mbstowcs 3 ,
 474 .Xr mbsrtowcs 3 ,
 475 .Xr wcstombs 3 ,
 476 .Xr wcsrtombs 3 ,
 477 .Xr mblen 3 ,
 478 .Xr mbrlen 3 ,
 479 and
 480 .Xr  mbsinit 3 .
 481 .It
 482 Wide characters are used directly for I/O, using
 483 .Xr getwchar 3 ,
 484 .Xr fgetwc 3 ,
 485 .Xr getwc 3 ,
 486 .Xr ungetwc 3 ,
 487 .Xr fgetws 3 ,
 488 .Xr putwchar 3 ,
 489 .Xr fputwc 3 ,
 490 .Xr putwc 3 ,
 491 and
 492 .Xr fputws 3 .
 493 They are also used for formatted I/O functions for wide characters
 494 such as
 495 .Xr fwscanf 3 ,
 496 .Xr wscanf 3 ,
 497 .Xr swscanf 3 ,
 498 .Xr fwprintf 3 ,
 499 .Xr wprintf 3 ,
 500 .Xr swprintf 3 ,
 501 .Xr vfwprintf 3 ,
 502 .Xr vwprintf 3 ,
 503 and
 504 .Xr vswprintf 3 ,
 505 and wide character identifier of %lc, %C, %ls, %S for conventional
 506 formatted I/O functions.
 507 .El
 508 .Sh SEE ALSO
 509 .Xr gencat 1 ,
 510 .Xr xfd 1 ,
 511 .Xr xterm 1 ,
 512 .Xr catgets 3 ,
 513 .Xr gettext 3 ,
 514 .Xr nl_langinfo 3 ,
 515 .Xr setlocale 3 ,
 516 .Xr wsfontload 8
 517 .Sh BUGS
 518 This man page is incomplete.