man-i18n.txt

   1 AUTHOR: Alexander E. Patrakov <patrakov@ums.usu.ru>
   2 DATE: 2007-09-30
   3 LICENSE: Public domain
   4 SYNOPSIS: Localized Manual Pages
   5 DESCRIPTION:
   6 Steps needed to view manual pages in languages other than English are
   7 described in more detail than it is done in the LFS book.
   8
   9 This hint describes outdated programs and no longer applies for LFS. LFS-6.2
  10 uses Man-DB, exactly because (as this hint shows) Man is too difficult to
  11 configure correctly. However, it may still be useful for people who want to
  12 understand why LFS switched to Man-DB, or for CLFS users.
  13
  14 PREREQUISITES: LFS 6.0 or later, or
  15                any LFS with locale and terminal properly configured
  16
  17 TODO:
  18 Japanese setup for Ghostscript (help needed)
  19
  20 HINT:
  21
  22 1. INTRODUCTION
  23
  24 Giving users the ability to read manuals in their native language is, without
  25 doubt, an important step in making the computer more user-friendly. However,
  26 it is not always the case that manual pages are readable out of the box.
  27 Possible causes of their unreadability and ways to eliminate them are
  28 discussed below.
  29
  30 The author intends this hint to be written in such a way so that native
  31 English speakers can understand it and use this information e.g. when building
  32 a live CD. It is, however, probable that the author (a citizen of Russia)
  33 made some assumptions about the reader's knowledge that are true only for
  34 native non-English speakers. Such assumptions are bugs, please report them
  35 and send other suggestions related to this hint to patrakov@ums.usu.ru
  36
  37 Disclaimer: this hint describes the current situation as it is,
  38 not as it should be.
  39
  40 2. ENCODING MISMATCH: DESCRIPTION OF THE PROBLEM
  41
  42 The most frequently occurring internationalization-related problem is when some
  43 character data are encoded not in the same way as expected. E.g., this happens
  44 when a string in the CP1251 encoding is sent to a terminal that expects KOI8-R.
  45 Let's investigate that process in more detail. A misbehaving application wants
  46 to send "Cyrillic Small Letter Em". Since this letter is encoded with the 0xec
  47 byte in CP1251, the application sends that byte to the terminal. However,
  48 the terminal assumes KOI8-R, and interprets the incoming 0xec byte according
  49 to that encoding, in which it means "Cyrillic Capital Letter El", and happily
  50 prints that letter. All subsequent Cyrillic letters are mangled in the similar
  51 way. The result is not readable by any language specialist -- it's whfg
  52 pbeehcgrq (go figure out what this means -- a user will instead just call
  53 devil, deity, the local admin or whoever is responsible for this mess). It's
  54 much worse than seeing untranslated English messages.
  55
  56 The rest of this hint discusses how to avoid encoding mismatches when viewing
  57 manual pages. It is assumed that the locale and the terminal are already set up
  58 properly, i.e. that the "ls --help" command produces readable output.
  59
  60 3. MESSAGES FROM MAN ITSELF
  61
  62 Sometimes man prints error messages, like this:
  63
  64         No manual entry for this_program
  65
  66 If the "+lang language_list" has been passed to the configure script during
  67 the compilation of man and the language indicated by the LANG variable is in
  68 the list, man will use a translation of the message from its message catalog.
  69 However, man uses the old "catgets" translation mechanism instead of "gettext".
  70 This "catgets" mechanism does not do any recoding of translated messages and
  71 therefore works only if the translator's locale is the same as the user's
  72 locale. This assumption breaks e.g. if a user works in a UTF-8 based locale
  73 while the translator uses a traditional 8-bit locale. Consequences of encoding
  74 mismatch are that the user will not be able to read the translated error
  75 message.
  76
  77 Thus, if UTF-8 based locales are allowed to  be used, man has to be compiled
  78 with a switch "+lang none" passed to its configure script, thus disabling
  79 translated error messages at all. Fedora Core re-encoded the translations in
  80 UTF-8 instead, but that means that users who revert to 8-bit locales will not
  81 be able to read error messages from man on those systems.
  82
  83 This "+lang none" switch doesn't prevent the user from viewing localized manual
  84 pages, but translated manual pages that come with man itself should be
  85 installed by hand in this case (see sections 6 and 8 below).
  86
  87 4. HOW MAN FINDS MANUAL PAGES
  88
  89 The process is simple.
  90
  91 First, man tries to figure out the wanted languages
  92 based on the values of the LC_ALL, LC_MESSAGES, LANG and LANGUAGE variables.
  93 For each locale found in those variables, it constructs its abbreviated forms.
  94 E.g. if LANG=ru_RU.KOI8-R, man constructs the following strings:
  95 "ru_RU.KOI8-R", "ru_RU" and "ru".
  96
  97 To set up the ordered list of language preferences, use the LANGUAGE variable,
  98 like this: LANGUAGE="es:it" (it says that you prefer Spanish manual pages to
  99 Italian ones).
 100
 101 Each of the constructed strings is appended to the value of each MANPATH
 102 statement in /etc/man.conf and to the directories found in the MANPATH
 103 environment variable. The result is a list like the following one:
 104
 105 /usr/share/man/ru_RU.KOI8-R
 106 /usr/share/man/ru_RU
 107 /usr/share/man/ru
 108 /usr/local/man/ru_RU.KOI8-R
 109 /usr/local/man/ru_RU
 110 /usr/local/man/ru
 111 /usr/X11R6/man/ru_RU.KOI8-R
 112 /usr/X11R6/man/ru_RU
 113 /usr/X11R6/man/ru
 114
 115 Finally, just the directories listed in the MANPATH statements and
 116 environment variable are appended to the list, e.g.:
 117
 118 /usr/share/man
 119 /usr/local/man
 120 /usr/X11R6/man
 121
 122 The manual page is searched in man1--man9 and mann subdirectories of
 123 directories in the above list. The first directory where it is found wins.
 124
 125 E.g., the Italian manual page of "cp" lives in /usr/share/man/it/man1/cp.1
 126
 127 Thus,
 128 1) localized manual pages have priority over English ones;
 129 2) the language of the wanted manual pages is determined from locale variables.
 130
 131 5. WHEN EVERYTHING WORKS BY DEFAULT
 132
 133 Man does not convert character data itself, it just constructs a pipeline of
 134 commands.
 135
 136 Let's start from the end of this pipeline. At the end, there is a pager,
 137 specified by the PAGER environment variable or the PAGER statement in
 138 /etc/man.conf, typically /bin/less -isR. This pager doesn't convert
 139 characters before sending them to a terminal. Therefore, character data sent
 140 to the pager must be in the same encoding as the terminal expects. This means
 141 the encoding specified by the current locale.
 142
 143 One step up the pipeline is the preprocessor. It is determined from the NROFF
 144 statement in /etc/man.conf in all locales except Japanese. In Japanese locales,
 145 JNROFF is used. By default, NROFF is /usr/bin/nroff -Tlatin1 -mandoc. JNROFF
 146 mentions -Tnippon switch for nroff, but that is supported only by patched
 147 versions of groff. Therefore, Japanese manual pages cannot be viewed by default
 148 in LFS or BLFS.
 149
 150 As mentioned above, man calls /usr/bin/nroff -Tlatin1 -mandoc for formatting
 151 non-Japanese manual pages. Unmodified groff by default expects its input to be
 152 in the Latin-1 (aka ISO-8859-1) encoding. The "latin1" groff device produces
 153 output in the Latin-1 encoding if the input is a valid groff input. So the
 154 pipeline in the default setup works without encoding mismatch if the manual
 155 page is in ISO-8859-1 and the terminal accepts ISO-8859-1 (therefore, the
 156 locale must be ISO-8859-1 based).
 157
 158 Starting with groff-1.19, there is a possibility to specify a different input
 159 encoding (ISO-8859-2 or ISO-8859-15). Details are available in the groff info
 160 page:
 161
 162 info groff concept "input encoding"
 163
 164 As explained there, it works well only with the "utf8" output device (described
 165 below). There is no clean way to make groff expect its input in ISO-8859-2
 166 and produce ISO-8859-2 output (in other words, there is no "latin2" device).
 167 That's why this facility is not used by Linux distros for Central European
 168 countries, where ISO-8859-2 is the preferred encoding. So, from now on, we
 169 will ignore this feature and assume that groff input must be in ISO-8859-1.
 170
 171 Besides "latin1", groff knows terminal devices "ascii" and "utf8". The "ascii"
 172 device accepts input in ISO-8859-1 and uses ASCII approximations for output.
 173 Since ASCII texts can be displayed on any terminal, the only requirement for
 174 this device to work properly is that the manual page is in ISO-8859-1. Accented
 175 characters lose their accents with this device.
 176
 177 The "utf8" device in the unmodified versions of groff accepts input in
 178 ISO-8859-1 and produces valid UTF-8 output. Therefore it works properly in
 179 UTF-8 locales if the manual page is ISO-8859-1 encoded. There is nothing wrong
 180 in having text files (manual pages) on disk in the encoding different from
 181 that of the current locale as long as there is a program (in this case, man)
 182 that can properly display them.
 183
 184 Thus, if one prepares an ISO-8859-1 encoded manual page, it can be viewed
 185 properly (modulo possibly missing accents) in any locale if the correct device
 186 parameter is passed to groff. There is a way to automatically pass the correct
 187 device parameter to groff based on the current locale: just don't pass the
 188 -T... argument to /usr/bin/nroff in /etc/man.conf. Look at the /usr/bin/nroff
 189 shell script to find out why and how this works.
 190
 191 So here is the working method of making sure that all installed localized
 192 manual pages can be viewed (and printed via "man -t the_page | lpr" command)
 193 without any configuration beyond setting up the locale.
 194
 195 1) In /etc/man.conf, edit NROFF and JNROFF lines to become:
 196
 197 NROFF           /usr/bin/nroff -mandoc
 198 JNROFF          /usr/bin/nroff -mandoc
 199
 200 2) Make sure that all installed manual pages are in ISO-8859-1 encoding.
 201 This means that one has to remove all manual pages for languages that cannot
 202 be represented using the ISO-8859-1 encoding, and make sure that they don't
 203 reappear. In locales corresponding to such languages, English manual pages
 204 and the "ascii" device will be used automatically in this setup. Thus, this
 205 setup is unfriendly to users of such locales.
 206
 207 3) Disable creation of cat pages by removing /usr/*/man/*/cat* directories if
 208 they exist. This is necessary so that a cat page created for an ISO-8859-1
 209 based locale does not get reused then in UTF-8 based locales, thus creating
 210 encoding mismatch problems.
 211
 212 The list of codes for languages that use the ISO-8859-1 encoding:
 213
 214 "da", Danish
 215 "de", German
 216 "en", English
 217 "es", Spanish
 218 "fi", Finnish
 219 "fr", French
 220 "ga", Irish
 221 "gl", Galician
 222 "id", Indonesian
 223 "is", Icelandic
 224 "it", Italian
 225 "nl", Dutch
 226 "no", Norwegian
 227 "pt", Portuguese
 228 "sv", Swedish
 229 (the list is possibly incomplete)
 230
 231 As explained in (2), manual pages for other languages have to be removed in
 232 order for this simple setup to work.
 233
 234 6. INSTALLATION OF LOCALIZED ISO-8859-1 ENCODED MANUAL PAGES
 235
 236 Packages with localized manual pages are usually called manpages-ll or
 237 man-pages-ll where "ll" is a two-letter language code, and can be found in
 238 Google. Some of them come with an English or translated README or INSTALL
 239 file. If it doesn't exist, or if you can't read it, see the instructions below.
 240
 241 Also some programs (e.g. Midnight Commander) come with translated manual pages.
 242
 243 As explained above, care should be taken to ensure that installed manual
 244 pages are in ISO-8859-1 encoding. This may be not the case in the original
 245 tarballs with the manual pages because some of such tarballs are for Fedora
 246 Core systems only (Fedora Core uses a patched version of groff that accepts
 247 UTF-8 as the input encoding). People who know the language the manual page is
 248 in can say whether it is in ISO-8859-1 by opening it in a text editor and
 249 looking if the text is readable. Those who don't know the language but still
 250 want to install that manual page (e.g. distro-builders or live CD makers) can
 251 look at the changelog or README file or use the following method.
 252
 253 First, check if the page uses only ASCII characters:
 254
 255 cat /path/to/manual/page | iconv -f us-ascii -t us-ascii >/dev/null
 256
 257 If it isn't, a warning is printed:
 258
 259         iconv: illegal input sequence at position XXX
 260
 261 ASCII-only manual pages are also valid ISO-8859-1 pages, and thus good.
 262
 263 Then, if the page is not pure ASCII, check if it is in UTF-8:
 264
 265 cat /path/to/manual/page | iconv -f UTF-8 -t UTF-8 >/dev/null
 266
 267 If it isn't, this means that the page is in some 8-bit encoding. For the
 268 languages in the list at the end of the previous section, this 8-bit encoding
 269 is ISO-8859-1 (i.e. the page is good). There may be a few pages that are
 270 incorrectly identified as UTF-8 by this method, so look at the majority of the
 271 pages in the package.
 272
 273 If a manual page is in UTF-8, it has to be converted to ISO-8859-1 before
 274 installation:
 275
 276 iconv -f UTF-8 -t ISO-8859-1 /path/to/manual/page >file.tmp
 277 mv file.tmp /path/to/manual/page
 278
 279 If the first command fails, try adding the "-c" switch that drops characters
 280 that can't be converted.
 281
 282 After converting all manual pages in the package to ISO-8859-1, it is safe to
 283 copy them to their final destination.
 284
 285 7. HACKS
 286
 287 The simple setup explained above works, but is unfriendly to people that can't
 288 use ISO-8859-1. Its use is recommended only when configuration steps other than
 289 setting the locale are to be avoided at all costs (e.g. on a live CD).
 290 The official position of the author of groff is as follows: source encodings
 291 other than ISO-8859-1 will not be supported well by the official groff 1.x
 292 package because groff is a text typesetting system, not just a manual page
 293 formatter. Adding such support would mean that a new formatting algorithm is
 294 needed, since in some languages (e.g. Japanese) spaces are not used for word
 295 separation, and lines can be broken almost anywhere. There are also different
 296 problems with Indic scripts. So groff 2.0 with promised Unicode support is
 297 probably in some rather distant future.
 298
 299 So people who can't write manual pages for their language in the ISO-8859-1
 300 encoding, or are worried about the loss of accents with the "ascii" device,
 301 have to use various hacks for now.
 302
 303 The first hack described here is for people who use ISO-8859-15 based locales
 304 and are unhappy with groff losing accents with the "ascii" device.
 305
 306 Using the "latin1" device results in encoding mismatch: a few bytes mean
 307 different characters in ISO-8859-1 and ISO-8859-15. Fortunately, they are not
 308 letters in ISO-8859-1, so the difference can be either ignored or taken into
 309 account. To ignore the difference (and get e.g. Latin Capital Letter Z With
 310 Caron instead of the Acute Accent, or the Latin Small Ligature OE instead of
 311 Vulgar Fraction One Half), pass the "-Tlatin1" switch to nroff in
 312 /etc/man.conf. To account for this difference and replace the affected
 313 characters with their approximate ASCII equivalents, save the following sed
 314 scriptlet as /etc/groff/lat1-to-lat9.sed:
 315
 316 s@\xa4@x@g
 317 s@\xa6@|@g
 318 s@\xb4@'@g
 319 s@\xb8@,@g
 320 s@\xbd@1/2@g
 321 s@\xbc@1/4@g
 322 s@\sbd@3/4@g
 323
 324 and use the following NROFF line:
 325
 326 NROFF    /usr/bin/nroff -Tlatin1 -mandoc | sed -f /etc/groff/lat1-to-lat9.sed
 327
 328 Of course, with this hack, manual pages will display correctly only in locales
 329 based on ISO-8859-1 and ISO-8859-15 character sets, as opposed to all locales
 330 in the method without hacks.
 331
 332 FIXME: it seems that it is possible to do the same by modifying files in
 333 /usr/share/groff/1.19.1/font/devlatin1
 334
 335 The second hack is for people who speak languages for which ISO-8859-1 does
 336 not contain all needed characters (e.g. Russian).
 337
 338 Manual pages written in such languages are in language-specific 8-bit
 339 encodings. There are also Fedora-specific packages which use UTF-8 for the
 340 encoding of manual pages. Both cases clearly don't constitute valid groff
 341 input. Around year 2000, such manual pages in 8-bit encodings were processed
 342 by groff using the "latin1" device. This worked (and still works) because two
 343 instances of encoding mismatch happening in this case almost cancelled each
 344 other. Details:
 345
 346 Assume that the manual page is encoded in the Russian KOI8-R encoding, and that
 347 the locale is ru_RU.KOI8-R. The manual page author writes the Cyrillic
 348 Capital Letter A using a text editor. Since the editor saves the file in the
 349 KOI8-R encoding, the 0xe1 byte gets written to the file instead of that letter.
 350 When this file is passed to groff, it reads that byte and (wrongly) interprets
 351 it as Latin Small Letter A With Acute (this letter is represented with the 0xe1
 352 byte in ISO-8859-1). Then it prints this letter to standard output, assuming
 353 (wrongly) that its output is ISO-8859-1. That results in the 0xe1 byte. The
 354 pager copies this byte to the terminal and (since the terminal accepts KOI8-R)
 355 the Cyrillic Capital Letter A appears, as the author of the manual page
 356 intended.
 357
 358 As you can see, this hack depends upon the following facts:
 359
 360 1) The source encoding of the manual page and the locale encoding are the same.
 361 2) The formatting rules for this language and encoding are the same as for
 362 ISO-8859-1 based languages, i.e.: one byte represents one character and it
 363 occupies one cell; words are separated by spaces.
 364
 365 It works (possibly with ignorable problems as in the ISO-8859-15 case above)
 366 in all 8-bit locales.
 367
 368 This setup also has the following drawbacks:
 369
 370 1) One cannot print manual pages with "man -t manual_page | lpr" command -- the
 371 page will be full of ISO-8859-1 characters and thus unreadable.
 372 2) Bullets are wrong since the 0xb7 byte that denotes a bullet in ISO-8859-1
 373 means some other character in other encodings. Other mismatches are usually OK
 374 because authors know about this effect and try to avoid ISO-8859-1 specific
 375 characters in groff output.
 376 3) This method does not work for Japanese because the formatting rules are
 377 different.
 378 4) This method breaks when one tries to switch from a 8-bit locale to the
 379 UTF-8 based one, because the source encoding of the manual page and the locale
 380 encoding are no longer the same.
 381
 382 Problem 1 is unsolvable until groff-2.0 comes out. Problems 2 and 3 don't
 383 exist in patched versions of groff-1.18.1.1 (see section 9). Problems 2 and 4
 384 can be also solved by modifying the NROFF line in /etc/man.conf.
 385
 386 Here is a sed script that replaces ISO-8859-1 specific non-letter characters
 387 with their approximate ASCII equivalents, thus partially solving Problem 2.
 388
 389 s@\xa0@ @g
 390 s@\xa1@i@g
 391 s@\xa2@c@g
 392 s@\xa3@L@g
 393 s@\xa4@x@g
 394 s@\xa5@Y@g
 395 s@\xa6@|@g
 396 s@\xa7@S@g
 397 s@\xa8@"@g
 398 s@\xa9@(C)@g
 399 s@\xaa@a@g
 400 s@\xab@<<@g
 401 s@\xac@~@g
 402 s@\xad@-@g
 403 s@\xae@(R)@g
 404 s@\xaf@-@g
 405 s@\xb0@o@g
 406 s@\xb1@+-@g
 407 s@\xb2@2@g
 408 s@\xb3@3@g
 409 s@\xb4@'@g
 410 s@\xb5@mu@g
 411 s@\xb6@9|@g
 412 s@\xb7@o@g
 413 s@\xb8@,@g
 414 s@\xb9@1@g
 415 s@\xba@o@g
 416 s@\xbb@>>@g
 417 s@\xbc@1/4@g
 418 s@\xbd@1/2@g
 419 s@\xbe@3/4@g
 420 s@\xbf@c@g
 421
 422 Save it as /etc/groff/remove-iso-chars.sed and edit your /etc/man.conf so that
 423 it contains the line:
 424
 425 NROFF  /usr/bin/nroff -Tlatin1 -mandoc | sed -f /etc/groff/remove-iso-chars.sed
 426
 427 Warning: this script assumes that no letters and other useful characters are in
 428 the 0xa0--0xbf range in the 8-bit encoding. It replaces not only characters
 429 (e.g. bullets) "generated" by groff (that should be replaced), but also
 430 characters passed through by groff from the original manual page (that should
 431 not be replaced). Thus, it is potentially harmful, i.e. just ignoring Problem 2
 432 may be better in some cases.
 433
 434 A solution based on Debian-patched groff-1.18.1.1 is preferred, because that
 435 version of groff does not "generate" ISO-8859-1 specific characters when the
 436 "ascii8" device is used. The sed is not needed then.
 437
 438 To solve Problem 4, one has to convert groff output to the locale encoding.
 439 This is achieved by use of this one long NROFF line:
 440
 441 NROFF  /usr/bin/nroff -Tlatin1 -mandoc | sed -f
 442     /etc/groff/remove-iso-chars.sed | iconv -c -f 8_BIT_ENCODING
 443
 444 (use of the Debian-patched groff-1.18.1.1 and the "ascii8" device is preferred
 445 over "latin1" + optional sed).
 446
 447 Replace 8_BIT_ENCODING above with the name of the encoding in which manual
 448 pages in your language are stored on disk. This line works for one (your)
 449 language only, but for both 8-bit and UTF-8 locales (in your 8-bit locale the
 450 iconv conversion is a no-op). Alternatively, you can put the iconv invocation
 451 in the PAGER statement, as described in the UTF-8 hint as of 2004-02-25.
 452
 453 Although the /usr/bin/nroff script doesn't support this hack, there is a
 454 program that does: man-db, to be used instead of man. This is the default
 455 manual page viewer on Debian systems. It can be downloaded from:
 456
 457 http://ftp.debian.org/debian/pool/main/m/man-db/
 458
 459 For the hack to work, you need both the man-db_2.4.2.orig.tar.gz tarball, the
 460 latest man-db_2.4.2-XX.diff.gz patch and Debian-patched groff-1.18.1.1.
 461
 462 8. INSTALLATION OF NON-ISO-8859-1 MANUAL PAGES
 463
 464 The instructions are mostly the same as in Section 6. One has to ensure that
 465 manual pages are in the 8-bit encoding proper for their language (i.e. not
 466 in UTF-8) and copy them to their destination directory. The method for
 467 identifying UTF-8 encoded pages described there still works. The difference
 468 is the action to be taken when a tarball of UTF-8 manual pages is found.
 469 Instead of conversion to ISO-8859-1, one should convert to the encoding
 470 specified in the table below.
 471
 472 "cs" (Czech): ISO-8859-2
 473 "hr" (Croatian): ISO-8859-2
 474 "hu" (Hungarian): ISO-8859-2
 475 "ja" (Japanese): EUC-JP
 476 "ko" (Korean): EUC-KR
 477 "pl" (Polish): ISO-8859-2
 478 "ru" (Russian): KOI8-R
 479 "sk" (Slovak): ISO-8859-2
 480 "tr" (Turkish): ISO-8859-9
 481 (table taken from the source of the man-db program)
 482
 483 For Japanese and Korean, Debian-patched groff-1.18.1.1 is needed because of the
 484 "nippon" output device.
 485
 486 9. AVAILABLE GROFF PATCHES
 487
 488 As explained above, unmodified groff supports only ISO-8859-1 source encoding
 489 well. Several patches for circumvention of this limitation exist in Linux
 490 distributions.
 491
 492 The first one is the Debian patch, available (for groff-1.18.1.1) from:
 493
 494 http://ftp.debian.org/debian/pool/main/g/groff/
 495
 496 The exact URL is not given because the patch version (groff_1.18.1.1-7.diff.gz
 497 at the time of this writing) changes frequently and old versions are deleted
 498 (but can still be downloaded from http://snapshot.debian.net/ )
 499
 500 To compile Debian-patched groff-1.18.1.1:
 501
 502 zcat ../groff_1.18.1.1-7.diff.gz | patch -Np1
 503 ./configure --prefix=/usr --enable-multibyte
 504 make -k
 505 make -k install
 506
 507 The "-k" switch is needed because, if you have enough programs installed in
 508 order to build the PostScript documentation, the build will fail on some
 509 versions of glibc because of double-free detection. This switch allows the
 510 compilation to proceed even though the documentation failed to build.
 511
 512 This patch adds the "ascii8" and (if the "--enable-multibyte" switch has been
 513 passed to the "configure" script) "nippon" devices.
 514
 515 The "ascii8" device passes 8-bit characters through without modifications
 516 (as "latin1" does) and never produces non-ASCII characters (e.g. for bullets)
 517 by itself. Thus, it is usable by itself if and only if the source encoding of
 518 the manual page and the encoding expected by the terminal is the same 8-bit
 519 encoding. Essentially, it is a better version of the second hack described
 520 in the previous section.
 521
 522 The "nippon" device is Japanese-specific. When the current LC_CTYPE locale is
 523 a Japanese one (i.e. begins with "ja"), this device accepts input in the
 524 locale encoding, and produces output in the same encoding as the input.
 525 Japanese formatting rules are respected.
 526
 527 The "utf8" device is also modified: in Japanese locales, it accepts input
 528 in the locale encoding and produces UTF-8 output.
 529
 530 For groff-1.19.1, a similar patch is available from:
 531
 532 http://developer.momonga-linux.org/viewcvs/*checkout*/trunk/pkgs/groff/groff-1.19.1-japanese.patch
 533
 534 To compile groff with this patch:
 535
 536 patch -Np1 -i ../groff-1.19.1-japanese.patch
 537 autoheader
 538 autoconf
 539 ./configure --prefix=/usr --enable-japanese
 540 make
 541 make install
 542 rm -rf /usr/share/groff/1.19.1/font/devascii8
 543
 544 The /usr/share/groff/1.19.1/font/devascii8 directory is removed after
 545 installation in order to disable the "ascii8" device that doesn't work
 546 correctly in this version of groff.
 547
 548 Note that for Japanese to show up properly, the PAGER line in /etc/man.conf
 549 should be changed to "less -isr", and for printing Japanese manuals,
 550 standard Japanese fonts must be available. With ghostscript, it seems like
 551 WadaLab fonts may be used. They are available from:
 552
 553 ftp://mirror.cs.wisc.edu/pub/mirrors/ghost/3rdparty/fonts/kanji/Font
 554
 555 Unfortunately, I don't know exact installation instructions. If you know them,
 556 please mail to patrakov@ums.usu.ru
 557
 558 RedHat also offers a patched groff-1.18.1.1, available in the source form as
 559 SRPM only. Get it from:
 560
 561 http://download.fedora.redhat.com/pub/fedora/linux/core/development/SRPMS/
 562
 563 The exact URL with a version number is not given for the same reasons as above.
 564
 565 Their patch is based on the Debian one. The difference is that it always
 566 (as opposed to in Japanese locales only) accepts input in the locale
 567 encoding (i.e. UTF-8, the only supported encoding in Fedora Core) and that
 568 the only device that doesn't spit out tons of "can't find character" errors is
 569 "utf8", which produces UTF-8 encoded output. Since this device accepts UTF-8
 570 input in this version of groff, all manual pages in Fedora Core are in UTF-8.
 571
 572 I recommend to avoid this version of groff because of its bugs. It randomly
 573 inserts spaces in the middle of Russian words or glues them together, and
 574 breaks lines in unpredictable (and usually wrong) places. Those who want to
 575 build it anyway should read instructions from the groff.spec file inside the
 576 SRPM. In order to get a successful build, one should either downgrade texinfo
 577 to version 4.6 from 4.7 or pass the "-k" flag to "make" and "make install"
 578 commands to continue the compilation even though texinfo documentation fails
 579 to build.
 580
 581 10. CAVEAT
 582
 583 Translated manual pages usually lag behind their English equivalents, so be
 584 careful while reading them.
 585
 586 CHANGELOG:
 587
 588 [2004-05-09]
 589     * Initial hint.
 590 [2007-09-30]
 591     * Updated the description.