1 % \iffalse meta-comment
3 % Copyright 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
4 % The LaTeX3 Project and any individual authors listed elsewhere
7 % This file is part of the LaTeX base system.
8 % -------------------------------------------
10 % It may be distributed and/or modified under the
11 % conditions of the LaTeX Project Public License, either version 1.3c
12 % of this license or (at your option) any later version.
13 % The latest version of this license is in
14 % http://www.latex-project.org/lppl.txt
15 % and version 1.3c or later is part of all distributions of LaTeX
16 % version 2005/12/01 or later.
18 % This file has the LPPL maintenance status "maintained".
20 % The list of all files belonging to the LaTeX base distribution is
21 % given in the file `manifest.txt'. See also `legal.txt' for additional
24 % The list of derived (unpacked) files belonging to the distribution
25 % and covered by LPPL is defined by the unpacking scripts (with
26 % extension .ins) which are part of the distribution.
29 \NeedsTeXFormat{LaTeX2e
}[1995/
12/
01]
31 \documentclass{ltxguide
}[1999/
02/
28]
33 \title{Cyrillic languages support in
\LaTeX}
35 \author{\copyright~Copyright
1998--
1999,\\ Vladimir Volovich,
36 Werner Lemberg and
\LaTeX3 Project Team.\\ All rights reserved.
}
46 This
document contains basic information on the Cyrillic setup for
47 \LaTeX{}: how to get the fonts, how to set them up, how to use
48 the interface, its interaction with
\babel{}, etc. This is only a first
49 draft of the
document and it will probably be modified in future; so
50 please send in comments on it via the
\texttt{latexbug
} system
55 \section{Introduction
}
57 Most Latin-based European languages were supported in
\LaTeX{} by
58 introducing the~|
T1|~font encoding and by using the
\textsf{fontenc}
59 and
\textsf{inputenc} packages; these use only standard
\TeX{} means
60 to support any
\mbox{8-bit
} input encoding and this one standard font
61 encoding. The restriction to a single font encoding guarantees that
62 multiple languages can happily coexist in one
document (
\eg
63 hyphenation will be correct for all languages).
65 Starting with the December~
1998 Release,
\LaTeX{} finally supports
66 Cyrillic languages. This support is based on the new standard
67 Cyrillic
\TeX{} font encodings---|T2A|, |T2B|, |T2C|, and~|X2|. The
68 first three of these satisfy some basic requirements for
69 \LaTeX{}~|T*|~encodings, and thus can be used in multi-lingual documents
70 with other languages based on standard font encodings.
72 The reason why we need four different Cyrillic font encodings is that
73 these font encodings support
\emph{all
} the Cyrillic languages that
74 have been used during the twentieth century (see
75 Section~
\ref{fontencs
})! The number of Cyrillic glyphs is large, so
76 they cannot be represented with
128~character slots; the other (lower)
77 128~slots are reserved for Latin letters and other invariant symbols
78 that are needed for the encoding to be a conformant
79 \LaTeX{}~
\texttt{T
}~encoding.
81 There are some glyphs in the |T2*|~encodings which do not yet have
82 associated characters in
\emph{Unicode
}, the world-wide character
83 standard. Also, one more font encoding, |T2D|,~is planned for a
84 forthcoming release of
\LaTeX{}. A lot of Cyrillic input encodings
85 are already supported (see Section~
\ref{inputencs
}), and additional
86 encodings could be added easily.
89 \subsection{Acknowledgments
}
92 The work on |T2*|~encodings was carried out by the T2~Team, led by
93 Alexander Berdnikov (other members are Mikhail Kolodin and Andrew
94 Janishewskii). The LH~fonts were produced by Olga Lapko (with
95 A.~Khodulev). The
\textsf{T2
} bundle and
\textsf{ruhyphen
} package
96 were written by Werner Lemberg and Vladimir Volovich (except that the
97 concrete hyphenation patterns which are part of
\textsf{ruhyphen
} came
98 from individual authors). The support for the Ukrainian language was
99 prepared by Andrij Shvaika.
102 \section{Installation
}
104 The
\textsf{fontenc} and
\textsf{inputenc} packages are installed
105 automatically in every base
\LaTeX{} distribution.
107 All the necessary extra files to use with these packages for Cyrillic
108 are in the
\textsf{cyrillic
} bundle, which at present contains the
109 following: four font encoding definition files (|t2aenc.def|,
110 |t2benc.def|, |t2cenc.def|, |x2enc.def|); several input encoding
111 definition files (all the other |*.def| files), and font definition
113 The installation of these is described here.
117 The default font families in
\LaTeX{} are the Computer Modern
118 families, namely the CM~fonts (|OT1|~encoded) and the EC~fonts
119 (|
T1|~encoded). The LH~fonts, which are now available, provide
120 Computer Modern fonts for all Cyrillic font encodings. They are
121 designed to be compatible with the EC~fonts, and they provide the same
122 font shapes and sizes; they are available at |CTAN:fonts/cyrillic/lh|
123 (the latest version is
3.20). The installation instructions for the
124 fonts are in the file |INSTALL| in the font distribution.
126 Other fonts, including Type~
1 fonts, can also be used, provided that
127 their encoding (for
\TeX{}) is
\mbox{|T2|-compatible
}. Some
128 ready-to-use packages supporting such fonts are also available,
\eg at
129 \URL{ftp://ftp.vsu.ru/pub/tex
} (they should soon be on
\ctan). Currently,
130 you will find two packages there:
\textsf{PsCyr
}, which contains some
131 freely distributable Cyrillic Type~
1 fonts with support for
\LaTeX{};
132 and
\textsf{c1fonts
}, which contains virtual fonts similar to the
133 \textsf{AE
}~fonts package using the BlueSky and BaKoMa fonts
134 available from
\ctan{} (see the |README| file in that package for
135 detailed information). Further font packages are expected soon.
137 \subsection{Hyphenation patterns
}
139 You can find a collection of hyphenation patterns for the Russian
140 language in the
\textsf{ruhyphen
} package at
141 |CTAN:language/hyphenation/ruhyphen|. These patterns support the
142 |T2*|~encodings, as well as other popular font encodings used for
143 Russian typesetting (including the Omega internal encoding).
144 Patterns for other Cyrillic languages should be adapted to work with
147 \subsection{\babel{} support for Russian and Ukrainian
}
150 Version~
3.6k of
\babel{} includes support for the |T2*|~encodings and
151 for typesetting both Russian and Ukrainian texts using the Cyrillic
152 letters. The temporary fontencoding |LWN|, which was used in earlier
153 releases of
\babel{}, will be withdrawn in the near future and replaced
154 by the |OT2| encoding.
156 \subsection{Getting pre-built packages
}
158 Many of the major
\TeX{} distributions, such as te
\TeX{}, fp
\TeX{} and
159 \TeX{}live, contain (or soon will) everything that is needed,
160 including the LH~fonts,
\textsf{ruhyphen
} and the latest version of
161 \babel{}. We hope that all
\TeX{} distributions will soon include all
162 of these, so that the chances are that you will not need to install
163 this by yourself (but it is not difficult).
165 If you are using em
\TeX, Mik
\TeX, or fp
\TeX, you
166 can download the
\textsf{ruemtex
} package from
167 \URL{ftp://ftp.vsu.ru/pub/tex
}.
171 Support for Cyrillic is based on these standard
\LaTeX{} mechanisms:
172 the
\textsf{fontenc} and
\textsf{inputenc} packages (and on
\babel{}).
173 Thus the basic principles for its use are similar to those for other
174 European languages: you simply add, to your
document preamble, lines
178 \usepackage[T2A
]{fontenc}
179 \usepackage[koi8-r
]{inputenc}
182 Here you can put any desired input encoding instead of
183 \mbox{\texttt{koi8-r
}}: for example, it would be
\texttt{cp866
} if you are
184 using a MS-DOS text editor with this Cyrillic code page to prepare your
185 documents, or
\texttt{cp1251
} if you are a MS~Windows user with Cyrillic
186 support. A full list of the available Cyrillic encodings can be found in
187 Section~
\ref{inputencs
} and in the file |cyinpenc.dtx|.
189 Documents are, naturally, not restricted to a single font encoding;
190 this is essential for multi-lingual journals or documents. Such
191 changes can be made by using the |
\fontencoding| command as part of a
192 font-change. However, it is best to access these font encodings via a
193 higher-level interface.
195 Since such changes are often closely related to other
196 language-dependent settings, it is often sensible to use the
\babel{}
197 system, which provides further useful `localisation' and standardised
198 multi-lingual interfaces (for further details, see
199 Section~
\ref{bblrus
}). Then you can use lines like the following in
203 \usepackage[koi8-r
]{inputenc}
204 \usepackage[russian
]{babel
}
207 This will automatically choose the default font encoding for Russian,
208 which is |T2A|, if available. Documentation of the complete set of
209 font-encoding selection rules can be found in |cyrillic.dtx| which is
212 These
\LaTeX{} interfaces are very convenient because they make your
213 documents completely portable, being based solely on standard
\TeX{}
214 features. This will mean that your documents can be processed on any
215 \TeX{} system without any need for re-encoding to the `native'
216 encoding used on each platform; this is because the encoding of the
217 document is specified in the
document itself.
219 Moreover, if necessary, more than one input encoding can be used
220 within a
document; this could be useful if, for example, you need to
221 combine articles prepared by authors on different machines. Each part
222 of the
document is then identified by a |
\inputencoding| command,
223 which can therefore only be used between paragraphs.
225 Please note that you must always use the two standard
\LaTeX{}
226 commands, |
\MakeUppercase| and |
\MakeLowercase| to produce uppercase
227 or lowercase text in your documents. This is because |
\uppercase| and
228 |
\lowercase| will not work at all for Cyrillic (note that these latter
229 two commands are not, and never have been, available for use directly
230 in
\LaTeX{} documents).
233 \section{Font encodings for Cyrillic languages
}
236 The Cyrillic font encodings support the following languages. Note
237 that some languages can be properly typeset with more than one
243 Abaza, Avar, Agul, Adyghei, Azerbaijani, Altai, Balkar, Bashkir,
244 Bulgarian, Buryat, Byelorussian, Gagauz, Dargin, Dungan, Ingush,
245 Kabardino-Cherkess, Kazakh, Kalmyk, Karakalpak, Karachaevskii,
246 Karelian, Kirghiz, Komi-Zyrian, Komi-Permyak, Kumyk, Lak, Lezghin,
247 Macedonian, Mari-Mountain, Mari-Valley, Moldavian, Mongolian,
248 Mordvin-Moksha, Mordvin-Erzya, Nogai, Oroch, Osetin, Russian, Rutul,
249 Serbian, Tabasaran, Tadzhik, Tatar, Tati, Teleut, Tofalar, Tuva,
250 Turkmen, Udmurt, Uzbek, Ukrainian, Hanty-Obskii, Hanty-Surgut,
251 Gipsi, Chechen, Chuvash, Crimean-Tatar.
253 Abaza, Avar, Agul, Adyghei, Aleut, Altai, Balkar, Byelorussian,
254 Bulgarian, Buryat, Gagauz, Dargin, Dolgan, Dungan, Ingush, Itelmen,
255 Kabardino-Cherkess, Kalmyk, Karakalpak, Karachaevskii, Karelian,
256 Ketskii, Kirghiz, Komi-Zyrian, Komi-Permyak, Koryak, Kumyk, Kurdian,
257 Lak, Lezghin, Mansi, Mari-Valley, Moldavian, Mongolian,
258 Mordvin-Moksha, Mordvin-Erzya, Nanai, Nganasan, Negidal, Nenets,
259 Nivh, Nogai, Oroch, Russian, Rutul, Selkup, Tabasaran, Tadzhik,
260 Tatar, Tati, Teleut, Tofalar, Tuva, Turkmen, Udyghei, Uigur, Ulch,
261 Khakass, Hanty-Vahovskii, Hanty-Kazymskii, Hanty-Obskii,
262 Hanty-Surgut, Hanty-Shurysharskii, Gipsi, Chechen, Chukcha, Shor,
263 Evenk, Even, Enets, Eskimo, Yukagir, Crimean Tatar, Yakut.
265 Abkhazian, Bulgarian, Gagauz, Karelian, Komi-Zyrian, Komi-Permyak,
266 Kumyk, Mansi, Moldavian, Mordvin-Moksha, Mordvin-Erzya, Nanai,
267 Orok (Uilta), Negidal, Nogai, Oroch, Russian, Saam, Old-Bulgarian,
268 Old-Russian, Tati, Teleut, Hanty-Obskii, Hanty-Surgut, Evenk,
272 The |X2|~encoding was designed to support all the above languages.
273 Its name does not start with |T| because, for example, it contains no
274 Latin letters (it is purely a Cyrillic glyph container); it therefore
275 cannot be used in mixed-script documents along with the other |T*|
276 encodings. Please consult Section~
6.4 \textit{Naming conventions
} of
277 the file |fntguide.tex| in the base
\LaTeX{} distribution for details
278 of the differences between
\LaTeX{} font encodings and how they are
281 There are two other
\LaTeX{} Cyrillic font encodings, |OT2| and |LCY|,
282 that are not included in the base
\LaTeX{} distribution. The first is
283 a
\mbox{7-bit
} encoding (hence the |O|) developed by the AMS; it is
284 useful for typesetting relatively small fragments of text in Cyrillic,
285 using a Latin transliteration scheme. The other, |LCY|, is an
286 \mbox{8-bit
} Cyrillic encoding which is not compatible with the
287 requirements for
\LaTeX{} |T*|~encodings (hence the |L|); thus it is not
288 suitable for typesetting multi-lingual documents, but it can be used in
289 Plain
\TeX{}-based macro packages because it is an extension of |OT1|.
290 These two encodings are supported by
\babel{} and by
\textsf{ot2cyr
}.
293 \section{Input encodings
}
296 Several Cyrillic code-pages are widely used. Currently,
\LaTeX{}
297 contains support for
20~Cyrillic input encodings (some of which are
298 variants of each other).
302 \item |cp855| --- the standard
\mbox{MS-DOS
} Cyrillic code-page.
304 \item |cp866| --- the standard
\mbox{MS-DOS
} Russian code-page.
305 Several code-pages very similar to this are also supported
306 (the differences are all in the range
242--
254).
308 \item |cp866av| -- the `Cyrillic Alternative' code-page (an
309 alternative variant of cp866);
310 \item |cp866mav| -- the `Modified Alternative Variant';
311 \item |cp866nav| -- the `New Alternative Variant';
312 \item |cp866tat| -- an experimental Tatarian code-page.
315 \item |cp1251| --- the standard MS Windows Cyrillic code-page.
317 \item \mbox{\texttt{koi8-r
}} --- a standard Cyrillic code-page widely
318 used in UNIX-like systems for Russian language support that is
319 specified in RFC~
1489. The situation with
\mbox{\texttt{koi8-r
}} is
320 somewhat similar to that for |cp866|: there are several similar
321 code-pages which coincide for all Russian letters but add some other
322 Cyrillic letters. The following are supported:
324 \item \mbox{\texttt{koi8-u
}} -- for Ukrainian;
325 \item \mbox{\texttt{koi8-ru
}} -- this is described in a draft RFC
326 document specifying a widely used character set for mail and news
327 exchange in the Ukrainian internet community, as well as for
328 presenting WWW information resources in the Ukrainian language;
329 \item |isoir111| -- the
\mbox{ISO-IR-
111 ECMA
} Cyrillic Code Page.
332 \item |iso88595| --- the
\mbox{ISO
8859-
5} Cyrillic code-page (also called
335 \item |maccyr| --- the Apple Macintosh Cyrillic code-page (also known
336 as Microsoft cp10007) and |macukr|, the Apple Macintosh Ukrainian
337 code-page, very similar to the Cyrillic code-page.
339 \item The Mongolian code-pages: |ctt| |dbk| |mnk| |mos| |ncc| |mls|.
340 These code-pages were taken from Oliver Corff's `Mon
\TeX' package
341 (available at |CTAN:language/mongolian/montex|). Since the |T2*|
342 encodings support the Mongolian Cyrillic script, it is convenient to
343 have support for Mongolian input encodings as well. Pointers to
344 documentation for these code-pages will be much appreciated.
349 \section{Reporting bugs
}
351 In case you find a bug and want to
report it, please follow the
352 guidelines given in the file |bugs.txt| in the base
\LaTeX{}
353 distributions. Note that there is a category specifically for
354 reporting any bugs that occur only when using Cyrillic fonts or
358 \section{Miscellanea in the
\textsf{T2
} bundle
}
361 The
\textsf{T2
}~bundle at |CTAN:macros/latex/contrib/supported/t2|
362 contains some other useful files, including support for Plain
363 \TeX{}-based macro packages, support for Bib
\TeX{} and MakeIndex (see
364 also the
\textsf{xindy
} program and package---highly recommended for
365 making indices with Cyrillic), support for the
\textsf{fontinst
}
366 package, mapping tables relating these Cyrillic font encodings (and
367 input encodings) to the Unicode character names and slots (these are
368 in the subdirectory |enc-maps|), and more!
370 To produce documented source listings of the
\textsf{T2
}~package, run
371 \LaTeX{} on the |*.dtx| and |*.fdd| files therein.
373 When typesetting Cyrillic texts, there is a tradition of using
374 Cyrillic letters (in some situations) within math formul
\ae, in
375 exactly the same way as most of the world uses Latin letters.
376 By default this does not work, because symbols declared with
377 |
\DeclareTextSymbol| may not be used in math.
379 If you need within math to `transparently' typeset glyphs declared in
380 font encoding definition files, then you could try using the
381 experimental
\textsf{mathtext
} package, which is also in the
382 \textsf{T2
}~bundle. Note that this package uses up at least one
383 additional math alphabet per font encoding. For this and other
384 reasons, The
\LaTeX3 Project Team considers that this experimental
385 extension to
\LaTeX{}'s glyph-handling mechanisms should be used with
386 caution; but please try it out and send us your opinions and ideas.
387 Note that it is not included in the core of
\LaTeX{} because both the
388 coding and the interfaces are likely to change at some point in the
391 Finally, here are some pointers to further information:
394 \URL{http://www.cemi.rssi.ru/cyrtug
}\\
395 \URL{http://xtalk.price.ru/tex
}