1 % \iffalse meta-comment
4 % The LaTeX3 Project and any individual authors listed elsewhere
7 % This file is part of the LaTeX base system.
8 % -------------------------------------------
10 % It may be distributed and/or modified under the
11 % conditions of the LaTeX Project Public License, either version 1.3c
12 % of this license or (at your option) any later version.
13 % The latest version of this license is in
14 % http://www.latex-project.org/lppl.txt
15 % and version 1.3c or later is part of all distributions of LaTeX
16 % version 2005/12/01 or later.
18 % This file has the LPPL maintenance status "maintained".
20 % The list of all files belonging to the LaTeX base distribution is
21 % given in the file `manifest.txt'. See also `legal.txt' for additional
24 % The list of derived (unpacked) files belonging to the distribution
25 % and covered by LPPL is defined by the unpacking scripts (with
26 % extension .ins) which are part of the distribution.
32 \documentclass{ltxdoc}
33 \GetFileInfo{utf8.def}
34 \title{Providing some UTF-8 support via \texttt{inputenc}}
35 \date{\fileversion\space\filedate{} printed \today}
37 Frank Mittelbach \and Chris Rowley\thanks{Borrowing heavily from
38 code by David Carlisle and tables by Sebastian Rahtz; some table
39 and code cleanup by Javier Bezos}}
40 \usepackage[utf8]{inputenc}
42 \MaintainedByLaTeXTeam{latex}
45 \DocInput{utf8ienc.dtx}
53 % \section{Introduction}
55 % [The whole section is rather unfinished \ldots\ just like the code, sorry!]
57 % \subsection{Background and general stuff}
59 % For many reasons what this package provides is a long way from any
60 % type of `Unicode compliance'.
62 % In stark contrast to 8-bit character sets, with 16 or more bits it can
63 % easily be very inefficient to support the full range.\footnote{In
64 % fact, \LaTeX's current 8-bit support does not go so far as to make
65 % all 8-bit characters into valid input.} Moreover, useful support of
66 % character input by a typesetting system overwhelmingly means finding
67 % an acceptable visual representation of a sequence of characters and
68 % this, for \LaTeX{}, means having available a suitably encoded 8-bit
71 % Unfortunately it is not possible to predict exactly what valid UTF-8
72 % octet sequences will appear in a particular file so it is best to
73 % make all the unsupported but valid sequences produce a reasonably
74 % clear and noticeable error message.
76 % There are two directions from which to approach the question of what
77 % to load. One is to specify the ranges of Unicode characters that will
78 % result in some sensible typesetting; this requires the provider to
79 % ensure that suitable fonts are loaded and that these input characters
80 % generate the correct typesetting via the encodings of those fonts. The
81 % other is to inspect the font encodings to be used and use these to
82 % define which input Unicode characters should be supported.
84 % For Western European languages, at least, going in either direction
85 % leads to many straightforward decisions and a few that are more
86 % subjective. In both cases some of the specifications are \TeX{}
87 % specific whilst most are independent of the particular typesetting
90 % As we have argued elsewhere, \LaTeX{} needs to refer to characters via
91 % `seven-bit-text' names and, so far, these have been chosen by
92 % reference to historical sources such as Plain \TeX{} or Adobe encoding
93 % descriptions. It is unclear whether this ad hoc naming structure should
94 % simply be extended or whether it would be useful to
95 % supplement it with standardised internal Unicode character names such as
96 % one or more of the following:\footnote{Burkhard und Holger Mittelbach
97 % spielen mit mir! Sie haben etwas hier geschrieben.}
100 % \ltxutwochar <4 hex digits>
102 % \ltxuchar {<hex digits>}
105 % \ltxueightchartwo <2 utf8 octets as 8-bit char tokens>
106 % \ltxueightcharthree <3 utf8 octets ...>
107 % \ltxueightcharfour <4 utf8 octets ...>
111 % \subsection{More specific stuff}
113 % In addition to setting up the mechanism for reading UTF-8 characters
114 % and specifying the \LaTeX-level support available, this package
115 % contains support for some default historically expected \TeX-related
116 % characters and some example `Unicode definition files' for standard
122 % This package does not support Unicode combining characters as \TeX{}
123 % is not really equipped to make this possible.
125 % No attempt is made to be useful beyond Latin, and maybe Cyrillic,
126 % for European languages (as of now).
129 % \subsection{Basic operation of the code}
131 % The \texttt{inputenc} package makes the upper 8-bit characters active and
132 % assigns to all of them an error message. It then waits for the
133 % input encoding files to change this set-up. Similarly, whenever
134 % |\inputencoding| is encountered in a document, first the upper
135 % 8-bit characters are set back to produce an error and then the
136 % definitions for the new input encoding are loaded, changing some of the
139 % The 8-bit input encodings currently supported by \texttt{inputenc}
140 % all use declarations such as |\DeclareInputText| and the like to map an
141 % 8-bit number to some \LaTeX{} internal form, e.g.~to |\"a|.
143 % The situation when supporting UTF-8 as the input encoding is
144 % different, however. Here we only have to set up the actions of
145 % those 8-bit numbers that can be the first octet in a UTF-8
146 % representation of a Unicode character. But we cannot simply set
147 % this to some internal \LaTeX{} form since the Unicode character
148 % consists of more than one octet; instead we have to define this
149 % starting octet to parse the right number of further octets that
150 % together form the UTF-8 representation of some Unicode character.
152 % Therefore when switching to \texttt{utf8} within the
153 % \texttt{inputenc} framework the characters with numbers (hex)
154 % from \texttt{"C2} to \texttt{"DF} are defined to parse for a
155 % second octet following, the characters from \texttt{"E0} to
156 % \texttt{"EF} are defined to parse for two more octets and finally
157 % the characters from \texttt{"F0} to \texttt{"F3} are defined to
158 % parse for three additional octets. These additional octets are
159 % always in the range \texttt{"80} to \texttt{"B9}.
161 % Thus, when such a character is encountered in the document (so
162 % long as expansion is not prohibited) a defined number of
163 % additional octets (8-bit characters) are read and from them a
164 % unique control sequence name is immediately constructed.
166 % This control sequence is either defined (good) or undefined
167 % (likely); in the latter case the user gets an error message
168 % saying that this UTF-8 sequence (or, better, Unicode character)
171 % If the control sequence is set up to do something useful then it will
172 % expand to a \LaTeX{} internal form: e.g.~for the utf8 sequence of
173 % two octets \texttt{"C3 "A4} we get |\"a| as the
174 % internal form which then, depending on the font encoding,
175 % eventually resolves to the single glyph `latin-a-umlaut' or to
176 % the composite glyph `latin-a with an umlaut accent'.
178 % These mappings from (UTF-8 encoded) Unicode characters to \LaTeX{}
179 % internal forms are made indirectly. The code below provides a
180 % declaration |\DeclareUnicodeCharacter| which maps Unicode numbers
181 % (as hexadecimal) to \LaTeX{} internal forms.
183 % This mapping needs to be set up only once so it is done at
184 % |\begin{document}| by looking at the list of font encodings that
185 % are loaded by the document and providing mappings related to
186 % those font encodings whenever these are available. Thus at most
187 % only those Unicode characters that can be represented by the glyphs
188 % available in these encodings will be defined.
190 % Technically this is done by loading one file per encoding,
191 % if available, that is supposed to provide the necessary mapping
202 % \subsection{Housekeeping}
204 % The usual introductory bits and pieces:
207 %<utf8>\ProvidesFile{utf8.def}
208 %<test>\ProvidesFile{utf8-test.tex}
209 %<+lcy> \ProvidesFile{lcyenc.dfu}
210 %<+ly1> \ProvidesFile{ly1enc.dfu}
211 %<+oms> \ProvidesFile{omsenc.dfu}
212 %<+ot1> \ProvidesFile{ot1enc.dfu}
213 %<+ot2> \ProvidesFile{ot2enc.dfu}
214 %<+t1> \ProvidesFile{t1enc.dfu}
215 %<+t2a> \ProvidesFile{t2aenc.dfu}
216 %<+t2b> \ProvidesFile{t2benc.dfu}
217 %<+t2c> \ProvidesFile{t2cenc.dfu}
218 %<+ts1> \ProvidesFile{ts1enc.dfu}
219 %<+x2> \ProvidesFile{x2enc.dfu}
220 %<+all> \ProvidesFile{utf8enc.dfu}
221 [2017/01/28 v1.1t UTF-8 support for inputenc]
228 % We restore the |\catcode| of space (which is set to ignore in
229 % \texttt{inputenc}) while reading \texttt{.def} files. Otherwise
230 % we would need to explicitly use |\space| all over the place in
231 % error and log messages.
232 % \changes{v1.1d}{2004/05/08}{Explicitly set catcode of space}
234 \catcode`\ \saved@space@catcode
239 % \subsection{Parsing UTF-8 input}
241 % \begin{macro}{\UTFviii@two@octets}
242 % \begin{macro}{\UTFviii@three@octets}
243 % \begin{macro}{\UTFviii@four@octets}
244 % A UTF-8 char (that is not actually a 7-bit char, i.e.~a single
245 % octet) is parsed as follows: each starting octet is an active
246 % \TeX{} character token; each of these is defined below to be a
247 % macro with one to three arguments nominally (depending on the
248 % starting octet). It calls one of |\UTFviii@two@octets|,
249 % |\UTFviii@three@octets|, or |\UTFviii@four@octets| which then
250 % actually picks up the argument(s).
252 % From the arguments a control sequence with a name of the form
253 % \verb=u8:#1#2...= is constructed where the |#i| ($i>1$) are the
254 % arguments and |#1| is the starting octet (as a \TeX{} character
255 % token). Since some or even all of these characters are active
256 % (when inputenc is loaded) we need to use |\string| when building
259 % The csname thus constructed can of course be undefined but to
260 % avoid producing an unhelpful low-level undefined command error we
261 % pass it to |\UTFviii@defined| which is responsible for producing
262 % a more sensible error message (not yet done!!). If, however, it is
263 % defined we simply execute the thing (which should then expand to
264 % an encoding specific internal \LaTeX{} form).
266 \def\UTFviii@two@octets#1#2{\expandafter
267 \UTFviii@defined\csname u8:#1\string#2\endcsname}
272 \def\UTFviii@three@octets#1#2#3{\expandafter
273 \UTFviii@defined\csname u8:#1\string#2\string#3\endcsname}
278 \def\UTFviii@four@octets#1#2#3#4{\expandafter
279 \UTFviii@defined\csname u8:#1\string#2\string#3\string#4\endcsname}
283 % \begin{macro}{\UTFviii@defined}
284 % This tests whether its argument is different from |\relax|: it
285 % either calls for a sensible error message (not done), or it gets
286 % the |\fi| out of the way (in case the command has arguments) and
289 \def\UTFviii@defined#1{%
292 % The endline character has a special definition within the
293 % inputenc package (it is gobbling spaces). For this reason we
294 % can't produce multiline strings without some precaution.
295 % \changes{v1.1b}{2004/02/09}{No newlines allowed in error messages}
296 % \changes{v1.1g}{2005/09/27}{Real spaces do not show up so use \cs{space}}
297 % \changes{v1.1o}{2015/08/28}{Show Unicode number of character in hex}
299 \PackageError{inputenc}{Unicode\space char\space\expandafter
300 \UTFviii@splitcsname\string#1\relax
302 not\space set\space up\space
303 for\space use\space with\space LaTeX}\@eha
311 % \begin{macro}{\UTFviii@loop}
312 % This wonderful bit of code from Dr Carlisle defines the starting
313 % octets to call |\UTFviii@two@octets| etc as appropriate. The starting
314 % octet itself is passed directly as the first argument, the others
315 % are picked up later en route.
317 % The |\UTFviii@loop| loops through the numbers starting at
318 % |\count@| and ending at |\@tempcnta|${} - 1$, each time executing
319 % the code in |\UTFviii@tmp|.
321 % All this is done in a group so that temporary catcode changes
322 % etc.~vanish after everything is set up.
324 % It may be a good idea to add code to deal with `illegal utf8 octets':
325 % at present these will be handled by whatever code was in use for 8-bit
326 % input before this code is executed.
337 \uppercase\expandafter{\UTFviii@tmp}%
339 \ifnum\count@<\@tempcnta
340 \expandafter\UTFviii@loop
344 % Setting up 2-byte UTF-8:
348 \def\UTFviii@tmp{\xdef~{\noexpand\UTFviii@two@octets\string~}}
351 % Setting up 3-byte UTF-8:
355 \def\UTFviii@tmp{\xdef~{\noexpand\UTFviii@three@octets\string~}}
359 % Setting up 4-byte UTF-8:
363 \def\UTFviii@tmp{\xdef~{\noexpand\UTFviii@four@octets\string~}}
369 % For this case we must disable the warning generated by
370 % \texttt{inputenc} if it doesn't see any new |\DeclareInputText|
377 % If this file (\texttt{utf8.def}) is not being read while setting
378 % up \texttt{inputenc}, i.e.~in the preamble, but when
379 % |\inputencoding| is called somewhere within the document, we do not
380 % need to input the specific Unicode mappings again. We therefore
381 % stop reading the file at this point.
383 \ifx\@begindocumenthook\@undefined
386 % The |\fi| must be on the same line as |\endinput| or else it will
393 % \subsection{Mapping Unicode codes to \LaTeX{} internal forms}
396 % \begin{macro}{\DeclareUnicodeCharacter}
397 % The |\DeclareUnicodeCharacter| declaration defines a mapping from
398 % a Unicode character code point to a \LaTeX{} internal form. The first
399 % argument is the Unicode number as hexadecimal digits and the second is
400 % the actual \LaTeX{} internal form.
402 % We start by making sure that some characters have the right
403 % |\catcode| when they are used in the definitions below.
416 \gdef\DeclareUnicodeCharacter#1#2{%
418 \wlog{ \space\space defining Unicode char U+#1 (decimal \the\count@)}%
421 % Next we do the parsing of the number stored in |\count@| and assign the
422 % result to |\UTFviii@tmp|. Actually all this could be done in-line,
423 % the macro |\parse@XML@charref| is only there to extend this code
424 % to parsing Unicode numbers in other contexts one day (perhaps).
429 % Here is an example of what is happening, for the pair \texttt{"C2 "A3}
430 % (which is the utf8 represenation for the character \textsterling{}).
431 % After |\parse@XML@charref| we have, stored in |\UTFviii@tmp|, a
432 % single command with two character tokens as arguments:
434 % [$t_{C2}$ and $t_{A3}$ are the characters corresponding to these
436 % |\UTFviii@two@octets| $t_{\rm C2}t_{\rm A3}$
438 % what we actually need to produce is a definition of the form
440 % |\def\u8:|$t_{\rm C2}$$t_{\rm A3}$ |{|\textit{\LaTeX{} internal form}|}|\,.
442 % So here we temporarily redefine the prefix commands
443 % |\UTFviii@two@octets|, etc.~to
444 % generate the csname that we wish to define> the |\string|s are
445 % added in case these tokens are still active.
447 \def\UTFviii@two@octets##1##2{\csname u8:##1\string##2\endcsname}%
448 \def\UTFviii@three@octets##1##2##3{\csname u8:##1%
449 \string##2\string##3\endcsname}%
450 \def\UTFviii@four@octets##1##2##3##4{\csname u8:##1%
451 \string##2\string##3\string##4\endcsname}%
453 % Now we simply:-) need to use the right number of |\expandafter|s to
454 % finally construct the definition: expanding |\UTFviii@tmp| once to get
455 % its contents, a second time to replace the prefix command by its
456 % |\csname| expansion, and a third time to turn the expansion into
457 % a csname after which the |\gdef| finally gets applied.
458 % We add an irrelevant |\IeC| and braces around the definition, in
459 % order to avoid any space after the command being gobbled up
460 % when the text is written out to an auxiliary file (see
461 % \texttt{inputenc} for further details
463 \expandafter\expandafter\expandafter
464 \expandafter\expandafter\expandafter
466 \gdef\UTFviii@tmp{\IeC{#2}}%
473 % \begin{macro}{\parse@XML@charref}
474 % This macro parses a Unicode number (decimal) and returns its
475 % UTF-8 representation as a sequence of non-active \TeX{} character
477 % original code it had two arguments delimited by \texttt{;} here,
478 % however, we supply the Unicode number implicitly.
480 \gdef\parse@XML@charref{%
482 % We need to keep a few things local, mainly the |\uccode|'s that
483 % are set up below. However, the group originally used here is
484 % actually unnecessary since we call this macro only within another
485 % group; but it will be important to restore the group if this
486 % macro gets used for other purposes.
490 % The original code from David supported the convention that a
491 % Unicode slot number could be given either as a decimal or as a
492 % hexadecimal (by starting with \texttt{x}). We do not do this so
493 % this code is also removed. This could be reactivated if one
494 % wants to support document commands that accept Unicode numbers
495 % (but then the first case needs to be changed from an error
496 % message back to something more useful again).
498 % \uppercase{\count@\if x\noexpand#1"\else#1\fi#2}\relax
500 % As |\count@| already contains the right value we make
501 % |\parse@XML@charref| work without arguments.
502 % \changes{v1.1g}{2005/09/27}{Real spaces do not show up so use \cs{space}}
504 \ifnum\count@<"A0\relax
505 \PackageError{inputenc}{Cannot\space define\space Unicode\space
506 char\space value\space <\space 00A0}\@eha
508 % Do not ask us to provide an explanation for the code below, it is
509 % borrowed straight from \texttt{xmltex} by David and we trust him
510 % totally (and we are too lazy to reread the Unicode book to see if
511 % this is the correct algorithm).\footnote{We were hoping to also
512 % find in his work the \TeX{} code for going the other way: from
513 % UTF-8 octets to Unicode slot number, but no luck!
514 % This has now been added as \cs{decode@UTFviii}}
516 \else\ifnum\count@<"800\relax
518 \parse@UTFviii@b C\UTFviii@two@octets.,%
519 \else\ifnum\count@<"10000\relax
522 \parse@UTFviii@b E\UTFviii@three@octets.{,;}%
527 \parse@UTFviii@b F\UTFviii@four@octets.{!,;}%
536 % \begin{macro}{\parse@UTFviii@a}
537 % \ldots so somebody else can document this part :-) \ldots~David?:-))))!
538 % \changes{v1.1b}{2004/02/09}{Space in the wrong place \cs{count @64}}
540 \gdef\parse@UTFviii@a#1{%
545 \advance\@tempcnta-\count@
546 \advance\@tempcnta 128
552 % \begin{macro}{\parse@UTFviii@b}
555 \gdef\parse@UTFviii@b#1#2#3#4{%
556 \advance\count@ "#10\relax
558 \uppercase{\gdef\UTFviii@tmp{#2#3#4}}}
562 % \begin{macro}{\decode@UTFviii}
563 % \changes{v1.1o}{2015/08/28}{Macro added}
564 % In the reverse direction, take a sequence of octects(bytes)
565 % representing a character in UTF-8 and construct the Unicode number.
566 % The sequence is terminated by |\relax|.
568 % In this version, if the sequence is not valid UTF-8 you probably
569 % get a low level arithmetic error from |\numexpr| or stray characters
570 % at the end. Getting a better error message would be somewhat expensive.
571 % As the main use is for reporting characters in messages, this is done
572 % just using expansion, so |\numexpr| is used, A stub returning 0 is defined
573 % if |\numexpr| is not available.
575 \ifx\numexpr\@undefined
579 \gdef\decode@UTFviii#1{0}
586 % If the input is malformed UTF-8 there may not be enough closing ) so
587 % add 5 so there are always some remaining then cleanup and remove
588 % any remaining ones at the end. This avoids |\numexpr| parse errors
589 % while outputting a package error.
591 \gdef\decode@UTFviii#1\relax{%
592 \expandafter\UTFviii@cleanup
593 \the\numexpr\dec@de@UTFviii#1\relax)))))\@empty}
597 \gdef\UTFviii@cleanup#1)#2\@empty{#1}
601 \gdef\dec@de@UTFviii#1{%
621 \expandafter\dec@de@UTFviii
630 % \begin{macro}{\UTFviii@hexnumber}
631 % \changes{v1.1o}{2015/08/28}{Macro added}
632 % Convert a number to a sequence of uppercase hex digits.
633 % If |\numexpr| is not available, it returns its argument unchanged.
635 \ifx\numexpr\@undefined
638 \global\let\UTFviii@hexnumber\@firstofone
639 \global\UTFviii@hexdigit\hexnumber@
645 \gdef\UTFviii@hexnumber#1{%
647 \expandafter\UTFviii@hexnumber\expandafter{\the\numexpr(#1-8)/16\relax}%
649 \UTFviii@hexdigit{\numexpr#1\ifnum#1>0-((#1-8)/16)*16\fi\relax}%
653 % Almost but not quite |\hexnumber@|.
655 \gdef\UTFviii@hexdigit#1{\ifcase\numexpr#1\relax
656 0\or1\or2\or3\or4\or5\or6\or7\or8\or9\or
657 A\or B\or C\or D\or E\or F\fi}
665 % \begin{macro}{\UTFviii@splitcsname}
666 % \changes{v1.1o}{2015/08/28}{Macro added}
667 % Split a csname representing a unicode character and return
668 % the character and (if |\numexpr| is defined) the unicode number in hex.
670 \ifx\numexpr\@undefined
671 \gdef\UTFviii@splitcsname#1:#2\relax{#2}}
673 \gdef\UTFviii@splitcsname#1:#2\relax{%
675 % Need to pre-expand the argument to ensure cleanup in case of mal-formed UTF-8.
677 #2 (U+\expandafter\UTFviii@hexnumber\expandafter{%
678 \the\numexpr\decode@UTFviii#2\relax})
689 \@onlypreamble\DeclareUnicodeCharacter
691 % These are preamble only as long as we don't support Unicode
692 % charrefs in documents.
694 \@onlypreamble\parse@XML@charref
695 \@onlypreamble\parse@UTFviii@a
696 \@onlypreamble\parse@UTFviii@b
700 % \subsection{Loading Unicode mappings at begin document}
702 % The original plan was to set up the UTF-8 support at
703 % |\begin{document}|; but then any text characters used in the preamble
704 % (as people do even though advised against it) would fail in one way or
706 % So the implementation was changed and the Unicode definition files
707 % for already defined encodings are loaded here.
709 % We loop through all defined font encodings
710 % (stored in |\cdp@list|) and for each load a file
711 % \textit{name}\texttt{enc.dfu} if it exist. That file is then
712 % supposed to contain |\DeclareUnicodeCharacter| declarations.
715 \def\cdp@elt#1#2#3#4{%
716 \wlog{Now handling font encoding #1 ...}%
718 \InputIfFileExists{#1enc.dfu}}%
719 {\wlog{... processing UTF-8 mapping file for font %
722 % \changes{v1.1m}{2008/04/05}{Ensure we don't lose spaces in the log}
723 % The previous line is written to the log with the newline char being
724 % ignored (thus not producing a space). Therefore either everything has to
725 % be on a single input line or some special care must be taken. From this
726 % point on we ignore spaces again, i.e., while we are reading the
727 % \texttt{.dfu} file. The |\endgroup| below will restore it again.
728 % \changes{v1.1d}{2004/05/08}{Explicitly set catcode of space}
729 % \changes{v1.1g}{2005/09/27}{We lost the ``false'' case}
732 {\wlog{... no UTF-8 mapping file for font encoding #1}}%
737 % However, we don't know if there are font encodings still to be
738 % loaded (either with \texttt{fontenc} or directly with |\input| by
739 % some some package). Font encoding files are loaded only if the
740 % corresponding encoding has not been loaded yet, and they always
741 % begin with |\DeclareFontEncoding|. We now redefine the internal
742 % kernel version of the latter to load the Unicode file if available.
745 \def\DeclareFontEncoding@#1#2#3{%
747 \ifx\csname T@#1\endcsname\relax
748 \def\cdp@elt{\noexpand\cdp@elt}%
749 \xdef\cdp@list{\cdp@list\cdp@elt{#1}%
750 {\default@family}{\default@series}%
752 \expandafter\let\csname#1-cmd\endcsname\@changed@cmd
754 \wlog{Now handling font encoding #1 ...}%
756 \InputIfFileExists{#1enc.dfu}}%
757 {\wlog{... processing UTF-8 mapping file for font %
759 {\wlog{... no UTF-8 mapping file for font encoding #1}}%
762 \@font@info{Redeclaring font encoding #1}%
764 \global\@namedef{T@#1}{#2}%
765 \global\@namedef{M@#1}{\default@M#3}%
766 \xdef\LastDeclaredEncoding{#1}%
773 % \section{Mapping characters ---\newline based on font (glyph) encodings}
775 % This section is a first attempt to provide Unicode definitions for
776 % characters whose standard glyphs are currently provided by the
777 % standard \LaTeX{} font-encodings |T1|, |OT1|, etc. They are by
778 % no means completed and need checking.
780 % For example, one should check the already existing input encodings
781 % for glyphs that may in fact be available and required,
782 % e.g.~\texttt{latin4} has a number of glyphs with the |\=|
783 % accent. Since the |T1| encoding does not provide such glyphs,
784 % these characters are not listed below (yet).
786 % The list below was generated by looking at the current \LaTeX{} font
787 % encoding files, e.g., \texttt{t1enc.def} and using the work by
788 % Sebastian Rahtz (in \texttt{ucharacters.sty}) with a few
789 % modifications. In combinations such as |\^\i| the preferred form
790 % is that and not |\^i|.
792 % This list has been built from several sources, obviously including
793 % the Unicode Standard itself. These sources include Passive \TeX{} by
794 % Sebastian Rahtz, the \texttt{unicode}
795 % package by Dominique P. G. Unruh (mainly for Latin encodings) and
796 % \texttt{text4ht} by Eitan Gurari (for Cyrillic ones).
798 % Note that it strictly follows the Mittelbach principles for
799 % input character encodings: thus it offers no support for using utf8
800 % representations of math symbols such as $\times$ or $\div$ (in math mode).
803 % \subsection{About the table itself}
805 % In addition to generating individual files, the table below is, at present,
806 % a one-one (we think) partial relationship between the (ill-defined) set
807 % of LICRs and the Unicode slots "0080 to "FFFF. At present these entries
808 % are used only to define a collection of partial mappings from Unicode
809 % slots to LICRs; each of these mappings becomes full if we add an exception
810 % value (`not defined') to the set of LICRs.
812 % It is probably not essential for the relationship in the full table to be
813 % one-one; this raises questions such as: the exact role of LICRs;
814 % the formal relationships on the set of LICRs; the (non-mathematical)
815 % relationship between
816 % LICRs and Unicode (which has its own somewhat fuzzy equivalences);
817 % and ultimately what a character is and what a character representation
820 % It is unclear the extent to which entries in this table should
821 % resemble the closely related ones in the 8-bit \texttt{inputenc} files.
822 % The Unicode standard claims that the first 256 slots `are' ASCII and
825 % Of course, \TeX{} itself typically does not treat even many perfectly
826 % `normal text' 7-bit slots as text characters, so it is unclear
827 % whether \LaTeX{} should even attempt to deal in any consistent way with
828 % those Unicode slots that are not definitive text characters.
831 % \subsection{The mapping table}
833 % Note that the first argument must be a hex-digit number greater
834 % than \texttt{00BF} and at most \texttt{10FFFF}.
836 % There are few notes about inconsistencies etc at the end of the table.
838 % \changes{v1.1o}{2015/08/28}{Add U+00A0 and U+00AD}
839 % \changes{v1.1q}{2015/12/02}{Add remaining latin uses of accents in T1}
840 % \changes{v1.1r}{2015/12/03}{Add some more ogoneck cases}
841 % \changes{v1.1s}{2016/01/11}{Add some more caron and acute}
842 % \changes{v1.1t}{2017/01/28}{Add caron cominations for GgYy}
844 %<all,t1,ot1,ly1>\DeclareUnicodeCharacter{00A0}{\nobreakspace}
845 %<all,t1,ot1,ly1>\DeclareUnicodeCharacter{00A1}{\textexclamdown}
846 %<all,ts1,ly1>\DeclareUnicodeCharacter{00A2}{\textcent}
847 %<all,ts1,t1,ot1,ly1>\DeclareUnicodeCharacter{00A3}{\textsterling}
848 %<all,x2,ts1,t2c,t2b,t2a,ly1,lcy>\DeclareUnicodeCharacter{00A4}{\textcurrency}
849 %<all,ts1,ly1>\DeclareUnicodeCharacter{00A5}{\textyen}
850 %<all,ts1,ly1>\DeclareUnicodeCharacter{00A6}{\textbrokenbar}
851 %<all,x2,ts1,t2c,t2b,t2a,oms,ly1>\DeclareUnicodeCharacter{00A7}{\textsection}
852 %<all,ts1>\DeclareUnicodeCharacter{00A8}{\textasciidieresis}
853 %<all,ts1,utf8>\DeclareUnicodeCharacter{00A9}{\textcopyright}
854 %<all,ts1,ly1,utf8>\DeclareUnicodeCharacter{00AA}{\textordfeminine}
855 %<*all,x2,t2c,t2b,t2a,t1,ot2,ly1,lcy>
856 \DeclareUnicodeCharacter{00AB}{\guillemotleft}
857 %</all,x2,t2c,t2b,t2a,t1,ot2,ly1,lcy>
858 %<all,ts1>\DeclareUnicodeCharacter{00AC}{\textlnot}
859 %<all,t1,ot1,ly1>\DeclareUnicodeCharacter{00AD}{\-}
860 %<all,ts1,ly1,utf8>\DeclareUnicodeCharacter{00AE}{\textregistered}
861 %<all,ts1>\DeclareUnicodeCharacter{00AF}{\textasciimacron}
862 %<all,ts1,ly1>\DeclareUnicodeCharacter{00B0}{\textdegree}
863 %<all,ts1>\DeclareUnicodeCharacter{00B1}{\textpm}
864 %<all,ts1>\DeclareUnicodeCharacter{00B2}{\texttwosuperior}
865 %<all,ts1>\DeclareUnicodeCharacter{00B3}{\textthreesuperior}
866 %<all,ts1>\DeclareUnicodeCharacter{00B4}{\textasciiacute}
867 %<all,ts1,ly1>\DeclareUnicodeCharacter{00B5}{\textmu} % micro sign
868 %<all,ts1,oms,ly1>\DeclareUnicodeCharacter{00B6}{\textparagraph}
869 %<all,oms,ts1,ly1>\DeclareUnicodeCharacter{00B7}{\textperiodcentered}
870 %<all,ot1>\DeclareUnicodeCharacter{00B8}{\c\ }
871 %<all,ts1>\DeclareUnicodeCharacter{00B9}{\textonesuperior}
872 %<all,ts1,ly1,utf8>\DeclareUnicodeCharacter{00BA}{\textordmasculine}
873 %<*all,x2,t2c,t2b,t2a,t1,ot2,ly1,lcy>
874 \DeclareUnicodeCharacter{00BB}{\guillemotright}
875 %</all,x2,t2c,t2b,t2a,t1,ot2,ly1,lcy>
876 %<all,ts1,ly1>\DeclareUnicodeCharacter{00BC}{\textonequarter}
877 %<all,ts1,ly1>\DeclareUnicodeCharacter{00BD}{\textonehalf}
878 %<all,ts1,ly1>\DeclareUnicodeCharacter{00BE}{\textthreequarters}
879 %<all,t1,ot1,ly1>\DeclareUnicodeCharacter{00BF}{\textquestiondown}
880 %<all,t1,ly1>\DeclareUnicodeCharacter{00C0}{\@tabacckludge`A}
881 %<all,t1,ly1>\DeclareUnicodeCharacter{00C1}{\@tabacckludge'A}
882 %<all,t1,ly1>\DeclareUnicodeCharacter{00C2}{\^A}
883 %<all,t1,ly1>\DeclareUnicodeCharacter{00C3}{\~A}
884 %<all,t1,ly1>\DeclareUnicodeCharacter{00C4}{\"A}
885 %<all,t1,ot1,ly1>\DeclareUnicodeCharacter{00C5}{\r A}
886 %<all,t1,ot1,ly1,lcy>\DeclareUnicodeCharacter{00C6}{\AE}
887 %<all,t1,ly1>\DeclareUnicodeCharacter{00C7}{\c C}
888 %<all,t1,ly1>\DeclareUnicodeCharacter{00C8}{\@tabacckludge`E}
889 %<all,t1,ly1>\DeclareUnicodeCharacter{00C9}{\@tabacckludge'E}
890 %<all,t1,ly1>\DeclareUnicodeCharacter{00CA}{\^E}
891 %<all,t1,ly1>\DeclareUnicodeCharacter{00CB}{\"E}
892 %<all,t1,ly1>\DeclareUnicodeCharacter{00CC}{\@tabacckludge`I}
893 %<all,t1,ly1>\DeclareUnicodeCharacter{00CD}{\@tabacckludge'I}
894 %<all,t1,ly1>\DeclareUnicodeCharacter{00CE}{\^I}
895 %<all,t1,ly1>\DeclareUnicodeCharacter{00CF}{\"I}
896 %<all,t1,ly1>\DeclareUnicodeCharacter{00D0}{\DH}
897 %<all,t1,ly1>\DeclareUnicodeCharacter{00D1}{\~N}
898 %<all,t1,ly1>\DeclareUnicodeCharacter{00D2}{\@tabacckludge`O}
899 %<all,t1,ly1>\DeclareUnicodeCharacter{00D3}{\@tabacckludge'O}
900 %<all,t1,ly1>\DeclareUnicodeCharacter{00D4}{\^O}
901 %<all,t1,ly1>\DeclareUnicodeCharacter{00D5}{\~O}
902 %<all,t1,ly1>\DeclareUnicodeCharacter{00D6}{\"O}
903 %<all,ts1>\DeclareUnicodeCharacter{00D7}{\texttimes}
904 %<all,t1,ot1,ly1,lcy>\DeclareUnicodeCharacter{00D8}{\O}
905 %<all,t1,ly1>\DeclareUnicodeCharacter{00D9}{\@tabacckludge`U}
906 %<all,t1,ly1>\DeclareUnicodeCharacter{00DA}{\@tabacckludge'U}
907 %<all,t1,ly1>\DeclareUnicodeCharacter{00DB}{\^U}
908 %<all,t1,ly1>\DeclareUnicodeCharacter{00DC}{\"U}
909 %<all,t1,ly1>\DeclareUnicodeCharacter{00DD}{\@tabacckludge'Y}
910 %<all,t1,ly1>\DeclareUnicodeCharacter{00DE}{\TH}
911 %<all,t1,ot1,ly1,lcy>\DeclareUnicodeCharacter{00DF}{\ss}
912 %<all,t1,ly1>\DeclareUnicodeCharacter{00E0}{\@tabacckludge`a}
913 %<all,t1,ly1>\DeclareUnicodeCharacter{00E1}{\@tabacckludge'a}
914 %<all,t1,ly1>\DeclareUnicodeCharacter{00E2}{\^a}
915 %<all,t1,ly1>\DeclareUnicodeCharacter{00E3}{\~a}
916 %<all,t1,ly1>\DeclareUnicodeCharacter{00E4}{\"a}
917 %<all,t1,ly1>\DeclareUnicodeCharacter{00E5}{\r a}
918 %<all,t1,ot1,ly1,lcy>\DeclareUnicodeCharacter{00E6}{\ae}
919 %<all,t1,ly1>\DeclareUnicodeCharacter{00E7}{\c c}
920 %<all,t1,ly1>\DeclareUnicodeCharacter{00E8}{\@tabacckludge`e}
921 %<all,t1,ly1>\DeclareUnicodeCharacter{00E9}{\@tabacckludge'e}
922 %<all,t1,ly1>\DeclareUnicodeCharacter{00EA}{\^e}
923 %<all,t1,ly1>\DeclareUnicodeCharacter{00EB}{\"e}
924 %<all,t1,ot1,ly1>\DeclareUnicodeCharacter{00EC}{\@tabacckludge`\i}
925 %<all,t1,ot1,ly1>\DeclareUnicodeCharacter{00ED}{\@tabacckludge'\i}
926 %<all,t1,ot1,ly1>\DeclareUnicodeCharacter{00EE}{\^\i}
927 %<all,t1,ot1,ly1>\DeclareUnicodeCharacter{00EF}{\"\i}
928 %<all,t1,ly1>\DeclareUnicodeCharacter{00F0}{\dh}
929 %<all,t1,ly1>\DeclareUnicodeCharacter{00F1}{\~n}
930 %<all,t1,ly1>\DeclareUnicodeCharacter{00F2}{\@tabacckludge`o}
931 %<all,t1,ly1>\DeclareUnicodeCharacter{00F3}{\@tabacckludge'o}
932 %<all,t1,ly1>\DeclareUnicodeCharacter{00F4}{\^o}
933 %<all,t1,ly1>\DeclareUnicodeCharacter{00F5}{\~o}
934 %<all,t1,ly1>\DeclareUnicodeCharacter{00F6}{\"o}
935 %<all,ts1>\DeclareUnicodeCharacter{00F7}{\textdiv}
936 %<all,t1,ot1,ly1,lcy>\DeclareUnicodeCharacter{00F8}{\o}
937 %<all,t1,ly1>\DeclareUnicodeCharacter{00F9}{\@tabacckludge`u}
938 %<all,t1,ly1>\DeclareUnicodeCharacter{00FA}{\@tabacckludge'u}
939 %<all,t1,ly1>\DeclareUnicodeCharacter{00FB}{\^u}
940 %<all,t1,ly1>\DeclareUnicodeCharacter{00FC}{\"u}
941 %<all,t1,ly1>\DeclareUnicodeCharacter{00FD}{\@tabacckludge'y}
942 %<all,t1,ly1>\DeclareUnicodeCharacter{00FE}{\th}
943 %<all,t1,ly1>\DeclareUnicodeCharacter{00FF}{\"y}
944 %<all,t1>\DeclareUnicodeCharacter{0100}{\@tabacckludge=A}
945 %<all,t1>\DeclareUnicodeCharacter{0101}{\@tabacckludge=a}
946 %<all,t1>\DeclareUnicodeCharacter{0102}{\u A}
947 %<all,t1>\DeclareUnicodeCharacter{0103}{\u a}
948 %<all,t1>\DeclareUnicodeCharacter{0104}{\k A}
949 %<all,t1>\DeclareUnicodeCharacter{0105}{\k a}
950 %<all,t1>\DeclareUnicodeCharacter{0106}{\@tabacckludge'C}
951 %<all,t1>\DeclareUnicodeCharacter{0107}{\@tabacckludge'c}
952 %<all,t1>\DeclareUnicodeCharacter{0108}{\^C}
953 %<all,t1>\DeclareUnicodeCharacter{0109}{\^c}
954 %<all,t1>\DeclareUnicodeCharacter{010A}{\.C}
955 %<all,t1>\DeclareUnicodeCharacter{010B}{\.c}
956 %<all,t1>\DeclareUnicodeCharacter{010C}{\v C}
957 %<all,t1>\DeclareUnicodeCharacter{010D}{\v c}
958 %<all,t1>\DeclareUnicodeCharacter{010E}{\v D}
959 %<all,t1>\DeclareUnicodeCharacter{010F}{\v d}
960 %<all,t1>\DeclareUnicodeCharacter{0110}{\DJ}
961 %<all,t1>\DeclareUnicodeCharacter{0111}{\dj}
962 %<all,t1>\DeclareUnicodeCharacter{0112}{\@tabacckludge=E}
963 %<all,t1>\DeclareUnicodeCharacter{0113}{\@tabacckludge=e}
964 %<all,t1>\DeclareUnicodeCharacter{0114}{\u E}
965 %<all,t1>\DeclareUnicodeCharacter{0115}{\u e}
966 %<all,t1>\DeclareUnicodeCharacter{0116}{\.E}
967 %<all,t1>\DeclareUnicodeCharacter{0117}{\.e}
968 %<all,t1>\DeclareUnicodeCharacter{0118}{\k E}
969 %<all,t1>\DeclareUnicodeCharacter{0119}{\k e}
970 %<all,t1>\DeclareUnicodeCharacter{011A}{\v E}
971 %<all,t1>\DeclareUnicodeCharacter{011B}{\v e}
972 %<all,t1>\DeclareUnicodeCharacter{011C}{\^G}
973 %<all,t1>\DeclareUnicodeCharacter{011D}{\^g}
974 %<all,t1>\DeclareUnicodeCharacter{011E}{\u G}
975 %<all,t1>\DeclareUnicodeCharacter{011F}{\u g}
976 %<all,t1>\DeclareUnicodeCharacter{0120}{\.G}
977 %<all,t1>\DeclareUnicodeCharacter{0121}{\.g}
978 %<all,t1>\DeclareUnicodeCharacter{0122}{\c G}
979 %<all,t1>\DeclareUnicodeCharacter{0123}{\c g}
980 %<all,t1>\DeclareUnicodeCharacter{0124}{\^H}
981 %<all,t1>\DeclareUnicodeCharacter{0125}{\^h}
982 %<all,t1>\DeclareUnicodeCharacter{0128}{\~I}
983 %<all,t1>\DeclareUnicodeCharacter{0129}{\~\i}
984 %<all,t1>\DeclareUnicodeCharacter{012A}{\@tabacckludge=I}
985 %<all,t1>\DeclareUnicodeCharacter{012B}{\@tabacckludge=\i}
986 %<all,t1>\DeclareUnicodeCharacter{012C}{\u I}
987 %<all,t1>\DeclareUnicodeCharacter{012D}{\u\i}
988 %<all,t1>\DeclareUnicodeCharacter{012E}{\k I}
989 %<all,t1>\DeclareUnicodeCharacter{012F}{\k\i}
990 %<all,t1>\DeclareUnicodeCharacter{0130}{\.I}
991 %<all,t2c,t2b,t2a,t1,ot2,ot1,ly1,lcy>\DeclareUnicodeCharacter{0131}{\i}
992 %<all,t1>\DeclareUnicodeCharacter{0132}{\IJ}
993 %<all,t1>\DeclareUnicodeCharacter{0133}{\ij}
994 %<all,t1>\DeclareUnicodeCharacter{0134}{\^J}
995 %<all,t1>\DeclareUnicodeCharacter{0135}{\^\j}
996 %<all,t1>\DeclareUnicodeCharacter{0136}{\c K}
997 %<all,t1>\DeclareUnicodeCharacter{0137}{\c k}
998 %<all,t1>\DeclareUnicodeCharacter{0139}{\@tabacckludge'L}
999 %<all,t1>\DeclareUnicodeCharacter{013A}{\@tabacckludge'l}
1000 %<all,t1>\DeclareUnicodeCharacter{013B}{\c L}
1001 %<all,t1>\DeclareUnicodeCharacter{013C}{\c l}
1002 %<all,t1>\DeclareUnicodeCharacter{013D}{\v L}
1003 %<all,t1>\DeclareUnicodeCharacter{013E}{\v l}
1004 %<all,t1,ot1,ly1>\DeclareUnicodeCharacter{0141}{\L}
1005 %<all,t1,ot1,ly1>\DeclareUnicodeCharacter{0142}{\l}
1006 %<all,t1>\DeclareUnicodeCharacter{0143}{\@tabacckludge'N}
1007 %<all,t1>\DeclareUnicodeCharacter{0144}{\@tabacckludge'n}
1008 %<all,t1>\DeclareUnicodeCharacter{0145}{\c N}
1009 %<all,t1>\DeclareUnicodeCharacter{0146}{\c n}
1010 %<all,t1>\DeclareUnicodeCharacter{0147}{\v N}
1011 %<all,t1>\DeclareUnicodeCharacter{0148}{\v n}
1012 %<all,t1>\DeclareUnicodeCharacter{014A}{\NG}
1013 %<all,t1>\DeclareUnicodeCharacter{014B}{\ng}
1014 %<all,t1>\DeclareUnicodeCharacter{014C}{\@tabacckludge=O}
1015 %<all,t1>\DeclareUnicodeCharacter{014D}{\@tabacckludge=o}
1016 %<all,t1>\DeclareUnicodeCharacter{014E}{\u O}
1017 %<all,t1>\DeclareUnicodeCharacter{014F}{\u o}
1018 %<all,t1>\DeclareUnicodeCharacter{0150}{\H O}
1019 %<all,t1>\DeclareUnicodeCharacter{0151}{\H o}
1020 %<all,t1,ot1,ly1,lcy>\DeclareUnicodeCharacter{0152}{\OE}
1021 %<all,t1,ot1,ly1,lcy>\DeclareUnicodeCharacter{0153}{\oe}
1022 %<all,t1>\DeclareUnicodeCharacter{0154}{\@tabacckludge'R}
1023 %<all,t1>\DeclareUnicodeCharacter{0155}{\@tabacckludge'r}
1024 %<all,t1>\DeclareUnicodeCharacter{0156}{\c R}
1025 %<all,t1>\DeclareUnicodeCharacter{0157}{\c r}
1026 %<all,t1>\DeclareUnicodeCharacter{0158}{\v R}
1027 %<all,t1>\DeclareUnicodeCharacter{0159}{\v r}
1028 %<all,t1>\DeclareUnicodeCharacter{015A}{\@tabacckludge'S}
1029 %<all,t1>\DeclareUnicodeCharacter{015B}{\@tabacckludge's}
1030 %<all,t1>\DeclareUnicodeCharacter{015C}{\^S}
1031 %<all,t1>\DeclareUnicodeCharacter{015D}{\^s}
1032 %<all,t1>\DeclareUnicodeCharacter{015E}{\c S}
1033 %<all,t1>\DeclareUnicodeCharacter{015F}{\c s}
1034 %<all,t1,ly1>\DeclareUnicodeCharacter{0160}{\v S}
1035 %<all,t1,ly1>\DeclareUnicodeCharacter{0161}{\v s}
1036 %<all,t1>\DeclareUnicodeCharacter{0162}{\c T}
1037 %<all,t1>\DeclareUnicodeCharacter{0163}{\c t}
1038 %<all,t1>\DeclareUnicodeCharacter{0164}{\v T}
1039 %<all,t1>\DeclareUnicodeCharacter{0165}{\v t}
1040 %<all,t1>\DeclareUnicodeCharacter{0168}{\~U}
1041 %<all,t1>\DeclareUnicodeCharacter{0169}{\~u}
1042 %<all,t1>\DeclareUnicodeCharacter{016A}{\@tabacckludge=U}
1043 %<all,t1>\DeclareUnicodeCharacter{016B}{\@tabacckludge=u}
1044 %<all,t1>\DeclareUnicodeCharacter{016C}{\u U}
1045 %<all,t1>\DeclareUnicodeCharacter{016D}{\u u}
1046 %<all,t1>\DeclareUnicodeCharacter{016E}{\r U}
1047 %<all,t1>\DeclareUnicodeCharacter{016F}{\r u}
1048 %<all,t1>\DeclareUnicodeCharacter{0170}{\H U}
1049 %<all,t1>\DeclareUnicodeCharacter{0171}{\H u}
1050 %<all,t1>\DeclareUnicodeCharacter{0172}{\k U}
1051 %<all,t1>\DeclareUnicodeCharacter{0173}{\k u}
1054 % \changes{v1.1p}{2015/09/07}{Welsh circumflex combinations}
1056 %<all,t1,ot1,ly1>\DeclareUnicodeCharacter{0174}{\^W}
1057 %<all,t1,ot1,ly1>\DeclareUnicodeCharacter{0175}{\^w}
1058 %<all,t1,ot1,ly1>\DeclareUnicodeCharacter{0176}{\^Y}
1059 %<all,t1,ot1,ly1>\DeclareUnicodeCharacter{0177}{\^y}
1060 %<all,t1,ly1>\DeclareUnicodeCharacter{0178}{\"Y}
1061 %<all,t1>\DeclareUnicodeCharacter{0179}{\@tabacckludge'Z}
1062 %<all,t1>\DeclareUnicodeCharacter{017A}{\@tabacckludge'z}
1063 %<all,t1>\DeclareUnicodeCharacter{017B}{\.Z}
1064 %<all,t1>\DeclareUnicodeCharacter{017C}{\.z}
1065 %<all,t1,ly1>\DeclareUnicodeCharacter{017D}{\v Z}
1066 %<all,t1,ly1>\DeclareUnicodeCharacter{017E}{\v z}
1067 %<all,ts1,ly1>\DeclareUnicodeCharacter{0192}{\textflorin}
1069 % \changes{v1.1s}{2016/01/11}{add 01CD-01F4}
1071 %<all,t1>\DeclareUnicodeCharacter{01CD}{\v A}
1072 %<all,t1>\DeclareUnicodeCharacter{01CE}{\v a}
1073 %<all,t1>\DeclareUnicodeCharacter{01CF}{\v I}
1074 %<all,t1>\DeclareUnicodeCharacter{01D0}{\v \i}
1075 %<all,t1>\DeclareUnicodeCharacter{01D1}{\v O}
1076 %<all,t1>\DeclareUnicodeCharacter{01D2}{\v o}
1077 %<all,t1>\DeclareUnicodeCharacter{01D3}{\v U}
1078 %<all,t1>\DeclareUnicodeCharacter{01D4}{\v u}
1079 %<all,t1>\DeclareUnicodeCharacter{01E2}{\@tabacckludge=\AE}
1080 %<all,t1>\DeclareUnicodeCharacter{01E3}{\@tabacckludge=\ae}
1081 %<all,t1>\DeclareUnicodeCharacter{01E6}{\v G}
1082 %<all,t1>\DeclareUnicodeCharacter{01E7}{\v g}
1083 %<all,t1>\DeclareUnicodeCharacter{01E8}{\v K}
1084 %<all,t1>\DeclareUnicodeCharacter{01E9}{\v k}
1085 %<all,t1>\DeclareUnicodeCharacter{01EA}{\k O}
1086 %<all,t1>\DeclareUnicodeCharacter{01EB}{\k o}
1087 %<all,t1>\DeclareUnicodeCharacter{01F0}{\v\j}
1088 %<all,t1>\DeclareUnicodeCharacter{01F4}{\@tabacckludge'G}
1089 %<all,t1>\DeclareUnicodeCharacter{01F5}{\@tabacckludge'g}
1091 % \changes{v1.1o}{2015/08/28}{comma accent latex/4414}
1093 %<all,t1,ot1,ly1>\DeclareUnicodeCharacter{0218}{\textcommabelow S}
1094 %<all,t1,ot1,ly1>\DeclareUnicodeCharacter{0219}{\textcommabelow s}
1095 %<all,t1,ot1,ly1>\DeclareUnicodeCharacter{021A}{\textcommabelow T}
1096 %<all,t1,ot1,ly1>\DeclareUnicodeCharacter{021B}{\textcommabelow t}
1100 %<all,t1>\DeclareUnicodeCharacter{0232}{\@tabacckludge=Y}
1101 %<all,t1>\DeclareUnicodeCharacter{0233}{\@tabacckludge=y}
1102 %<all,ly1,utf8>\DeclareUnicodeCharacter{02C6}{\textasciicircum}
1103 %<all,ts1>\DeclareUnicodeCharacter{02C7}{\textasciicaron}
1104 %<all,ly1,utf8>\DeclareUnicodeCharacter{02DC}{\textasciitilde}
1105 %<all,ts1>\DeclareUnicodeCharacter{02D8}{\textasciibreve}
1106 %<all,ts1>\DeclareUnicodeCharacter{02DD}{\textacutedbl}
1108 % The Cyrillic code points have been recently checked (2007) and extended
1109 % and corrected by Matthias Noe (\verb=a9931078@unet.univie.ac.at=) --- thanks.
1110 % \changes{v1.1j}{2007/11/09}{Added a few new unicode decls in cyrillic (pr/3988)}
1111 % \changes{v1.1k}{2007/11/11}{Added and further unicode decls in cyrillic}
1112 % \changes{v1.1n}{2015/06/27}{correct accent http://tex.stackexchange.com/q/252521}
1114 %<*all,x2,t2c,t2b,t2a,ot2,lcy>
1115 \DeclareUnicodeCharacter{0400}{\@tabacckludge`\CYRE}
1116 %</all,x2,t2c,t2b,t2a,ot2,lcy>
1117 %<all,x2,t2c,t2b,t2a,ot2,lcy>\DeclareUnicodeCharacter{0401}{\CYRYO}
1118 %<all,x2,t2a,ot2>\DeclareUnicodeCharacter{0402}{\CYRDJE}
1119 %<*all,x2,t2c,t2b,t2a,ot2,lcy>
1120 \DeclareUnicodeCharacter{0403}{\@tabacckludge'\CYRG}
1121 %</all,x2,t2c,t2b,t2a,ot2,lcy>
1122 %<all,x2,t2a,ot2,lcy>\DeclareUnicodeCharacter{0404}{\CYRIE}
1123 %<all,x2,t2c,t2b,t2a,ot2>\DeclareUnicodeCharacter{0405}{\CYRDZE}
1124 %<all,x2,t2c,t2b,t2a,ot2,lcy>\DeclareUnicodeCharacter{0406}{\CYRII}
1125 %<all,x2,t2a,lcy>\DeclareUnicodeCharacter{0407}{\CYRYI}
1126 %<all,x2,t2c,t2b,t2a,ot2>\DeclareUnicodeCharacter{0408}{\CYRJE}
1127 %<all,x2,t2b,t2a,ot2>\DeclareUnicodeCharacter{0409}{\CYRLJE}
1128 %<all,x2,t2b,t2a,ot2>\DeclareUnicodeCharacter{040A}{\CYRNJE}
1129 %<all,x2,t2a,ot2>\DeclareUnicodeCharacter{040B}{\CYRTSHE}
1130 %<*all,x2,t2c,t2b,t2a,ot2,lcy>
1131 \DeclareUnicodeCharacter{040C}{\@tabacckludge'\CYRK}
1132 \DeclareUnicodeCharacter{040D}{\@tabacckludge`\CYRI}
1133 %</all,x2,t2c,t2b,t2a,ot2,lcy>
1134 %<all,x2,t2b,t2a,lcy>\DeclareUnicodeCharacter{040E}{\CYRUSHRT}
1135 %<all,x2,t2c,t2a,ot2>\DeclareUnicodeCharacter{040F}{\CYRDZHE}
1136 %<*all,x2,t2c,t2b,t2a,ot2,lcy>
1137 \DeclareUnicodeCharacter{0410}{\CYRA}
1138 \DeclareUnicodeCharacter{0411}{\CYRB}
1139 \DeclareUnicodeCharacter{0412}{\CYRV}
1140 \DeclareUnicodeCharacter{0413}{\CYRG}
1141 \DeclareUnicodeCharacter{0414}{\CYRD}
1142 \DeclareUnicodeCharacter{0415}{\CYRE}
1143 \DeclareUnicodeCharacter{0416}{\CYRZH}
1144 \DeclareUnicodeCharacter{0417}{\CYRZ}
1145 \DeclareUnicodeCharacter{0418}{\CYRI}
1146 \DeclareUnicodeCharacter{0419}{\CYRISHRT}
1147 \DeclareUnicodeCharacter{041A}{\CYRK}
1148 \DeclareUnicodeCharacter{041B}{\CYRL}
1149 \DeclareUnicodeCharacter{041C}{\CYRM}
1150 \DeclareUnicodeCharacter{041D}{\CYRN}
1151 \DeclareUnicodeCharacter{041E}{\CYRO}
1152 \DeclareUnicodeCharacter{041F}{\CYRP}
1153 \DeclareUnicodeCharacter{0420}{\CYRR}
1154 \DeclareUnicodeCharacter{0421}{\CYRS}
1155 \DeclareUnicodeCharacter{0422}{\CYRT}
1156 \DeclareUnicodeCharacter{0423}{\CYRU}
1157 \DeclareUnicodeCharacter{0424}{\CYRF}
1158 \DeclareUnicodeCharacter{0425}{\CYRH}
1159 \DeclareUnicodeCharacter{0426}{\CYRC}
1160 \DeclareUnicodeCharacter{0427}{\CYRCH}
1161 \DeclareUnicodeCharacter{0428}{\CYRSH}
1162 \DeclareUnicodeCharacter{0429}{\CYRSHCH}
1163 \DeclareUnicodeCharacter{042A}{\CYRHRDSN}
1164 \DeclareUnicodeCharacter{042B}{\CYRERY}
1165 \DeclareUnicodeCharacter{042C}{\CYRSFTSN}
1166 \DeclareUnicodeCharacter{042D}{\CYREREV}
1167 \DeclareUnicodeCharacter{042E}{\CYRYU}
1168 \DeclareUnicodeCharacter{042F}{\CYRYA}
1169 \DeclareUnicodeCharacter{0430}{\cyra}
1170 \DeclareUnicodeCharacter{0431}{\cyrb}
1171 \DeclareUnicodeCharacter{0432}{\cyrv}
1172 \DeclareUnicodeCharacter{0433}{\cyrg}
1173 \DeclareUnicodeCharacter{0434}{\cyrd}
1174 \DeclareUnicodeCharacter{0435}{\cyre}
1175 \DeclareUnicodeCharacter{0436}{\cyrzh}
1176 \DeclareUnicodeCharacter{0437}{\cyrz}
1177 \DeclareUnicodeCharacter{0438}{\cyri}
1178 \DeclareUnicodeCharacter{0439}{\cyrishrt}
1179 \DeclareUnicodeCharacter{043A}{\cyrk}
1180 \DeclareUnicodeCharacter{043B}{\cyrl}
1181 \DeclareUnicodeCharacter{043C}{\cyrm}
1182 \DeclareUnicodeCharacter{043D}{\cyrn}
1183 \DeclareUnicodeCharacter{043E}{\cyro}
1184 \DeclareUnicodeCharacter{043F}{\cyrp}
1185 \DeclareUnicodeCharacter{0440}{\cyrr}
1186 \DeclareUnicodeCharacter{0441}{\cyrs}
1187 \DeclareUnicodeCharacter{0442}{\cyrt}
1188 \DeclareUnicodeCharacter{0443}{\cyru}
1189 \DeclareUnicodeCharacter{0444}{\cyrf}
1190 \DeclareUnicodeCharacter{0445}{\cyrh}
1191 \DeclareUnicodeCharacter{0446}{\cyrc}
1192 \DeclareUnicodeCharacter{0447}{\cyrch}
1193 \DeclareUnicodeCharacter{0448}{\cyrsh}
1194 \DeclareUnicodeCharacter{0449}{\cyrshch}
1195 \DeclareUnicodeCharacter{044A}{\cyrhrdsn}
1196 \DeclareUnicodeCharacter{044B}{\cyrery}
1197 \DeclareUnicodeCharacter{044C}{\cyrsftsn}
1198 \DeclareUnicodeCharacter{044D}{\cyrerev}
1199 \DeclareUnicodeCharacter{044E}{\cyryu}
1200 \DeclareUnicodeCharacter{044F}{\cyrya}
1201 \DeclareUnicodeCharacter{0450}{\@tabacckludge`\cyre}
1202 \DeclareUnicodeCharacter{0451}{\cyryo}
1203 %</all,x2,t2c,t2b,t2a,ot2,lcy>
1204 %<all,x2,t2a,ot2>\DeclareUnicodeCharacter{0452}{\cyrdje}
1205 %<*all,x2,t2c,t2b,t2a,ot2,lcy>
1206 \DeclareUnicodeCharacter{0453}{\@tabacckludge'\cyrg}
1207 %</all,x2,t2c,t2b,t2a,ot2,lcy>
1208 %<all,x2,t2a,ot2,lcy>\DeclareUnicodeCharacter{0454}{\cyrie}
1209 %<all,x2,t2c,t2b,t2a,ot2>\DeclareUnicodeCharacter{0455}{\cyrdze}
1210 %<all,x2,t2c,t2b,t2a,ot2,lcy>\DeclareUnicodeCharacter{0456}{\cyrii}
1211 %<all,x2,t2a,lcy>\DeclareUnicodeCharacter{0457}{\cyryi}
1212 %<all,x2,t2c,t2b,t2a,ot2>\DeclareUnicodeCharacter{0458}{\cyrje}
1213 %<all,x2,t2b,t2a,ot2>\DeclareUnicodeCharacter{0459}{\cyrlje}
1214 %<all,x2,t2b,t2a,ot2>\DeclareUnicodeCharacter{045A}{\cyrnje}
1215 %<all,x2,t2a,ot2>\DeclareUnicodeCharacter{045B}{\cyrtshe}
1216 %<*all,x2,t2c,t2b,t2a,ot2,lcy>
1217 \DeclareUnicodeCharacter{045C}{\@tabacckludge'\cyrk}
1218 \DeclareUnicodeCharacter{045D}{\@tabacckludge`\cyri}
1219 %</all,x2,t2c,t2b,t2a,ot2,lcy>
1220 %<all,x2,t2b,t2a,lcy>\DeclareUnicodeCharacter{045E}{\cyrushrt}
1221 %<all,x2,t2c,t2a,ot2>\DeclareUnicodeCharacter{045F}{\cyrdzhe}
1222 %<all,x2,ot2>\DeclareUnicodeCharacter{0462}{\CYRYAT}
1223 %<all,x2,ot2>\DeclareUnicodeCharacter{0463}{\cyryat}
1224 %<all,x2>\DeclareUnicodeCharacter{046A}{\CYRBYUS}
1225 %<all,x2>\DeclareUnicodeCharacter{046B}{\cyrbyus}
1227 % The next two declarations are questionable, the encoding definition
1228 % should probably contain |\CYROTLD| and |\cyrotld|. Or alternatively, if
1229 % the characters in the X2 encodings are really meant to represent the
1230 % historical characters in Ux0472 and Ux0473 (they look like them) then
1231 % they would need to change instead.
1233 % However, their looks are probably a font designers decision and the next
1234 % two mappings are wrong or rather the names in OT2 should change for
1237 % On the other hand the names |\CYROTLD| are somewhat questionabled as the
1238 % Unicode standard only describes ``Cyrillic barred O'' while |TLD| refers
1239 % to a tilde (which is more less what the ``Cyrillic FITA looks according
1240 % to the Unicode book).
1242 %<all,ot2>\DeclareUnicodeCharacter{0472}{\CYRFITA}
1243 %<all,ot2>\DeclareUnicodeCharacter{0473}{\cyrfita}
1247 %<all,x2,ot2>\DeclareUnicodeCharacter{0474}{\CYRIZH}
1248 %<all,x2,ot2>\DeclareUnicodeCharacter{0475}{\cyrizh}
1250 % While the double grave accent seems to exist in X2, T2A, T2B and T2C
1251 % encoding, the letter izhitsa exists only in X2 and OT2. Therefore,
1252 % izhitsa with double grave seems to be possible only using X2.
1254 %<all,x2>\DeclareUnicodeCharacter{0476}{\C\CYRIZH}
1255 %<all,x2>\DeclareUnicodeCharacter{0477}{\C\cyrizh}
1259 %<all,t2c>\DeclareUnicodeCharacter{048C}{\CYRSEMISFTSN}
1260 %<all,t2c>\DeclareUnicodeCharacter{048D}{\cyrsemisftsn}
1261 %<all,t2c>\DeclareUnicodeCharacter{048E}{\CYRRTICK}
1262 %<all,t2c>\DeclareUnicodeCharacter{048F}{\cyrrtick}
1263 %<all,x2,t2a,lcy>\DeclareUnicodeCharacter{0490}{\CYRGUP}
1264 %<all,x2,t2a,lcy>\DeclareUnicodeCharacter{0491}{\cyrgup}
1265 %<all,x2,t2b,t2a>\DeclareUnicodeCharacter{0492}{\CYRGHCRS}
1266 %<all,x2,t2b,t2a>\DeclareUnicodeCharacter{0493}{\cyrghcrs}
1267 %<all,x2,t2c,t2b>\DeclareUnicodeCharacter{0494}{\CYRGHK}
1268 %<all,x2,t2c,t2b>\DeclareUnicodeCharacter{0495}{\cyrghk}
1269 %<all,x2,t2b,t2a>\DeclareUnicodeCharacter{0496}{\CYRZHDSC}
1270 %<all,x2,t2b,t2a>\DeclareUnicodeCharacter{0497}{\cyrzhdsc}
1271 %<all,x2,t2a>\DeclareUnicodeCharacter{0498}{\CYRZDSC}
1272 %<all,x2,t2a>\DeclareUnicodeCharacter{0499}{\cyrzdsc}
1273 %<all,x2,t2c,t2b,t2a>\DeclareUnicodeCharacter{049A}{\CYRKDSC}
1274 %<all,x2,t2c,t2b,t2a>\DeclareUnicodeCharacter{049B}{\cyrkdsc}
1275 %<all,x2,t2a>\DeclareUnicodeCharacter{049C}{\CYRKVCRS}
1276 %<all,x2,t2a>\DeclareUnicodeCharacter{049D}{\cyrkvcrs}
1277 %<all,x2,t2c>\DeclareUnicodeCharacter{049E}{\CYRKHCRS}
1278 %<all,x2,t2c>\DeclareUnicodeCharacter{049F}{\cyrkhcrs}
1279 %<all,x2,t2a>\DeclareUnicodeCharacter{04A0}{\CYRKBEAK}
1280 %<all,x2,t2a>\DeclareUnicodeCharacter{04A1}{\cyrkbeak}
1281 %<all,x2,t2c,t2b,t2a>\DeclareUnicodeCharacter{04A2}{\CYRNDSC}
1282 %<all,x2,t2c,t2b,t2a>\DeclareUnicodeCharacter{04A3}{\cyrndsc}
1283 %<all,x2,t2b,t2a>\DeclareUnicodeCharacter{04A4}{\CYRNG}
1284 %<all,x2,t2b,t2a>\DeclareUnicodeCharacter{04A5}{\cyrng}
1285 %<all,x2,t2c>\DeclareUnicodeCharacter{04A6}{\CYRPHK}
1286 %<all,x2,t2c>\DeclareUnicodeCharacter{04A7}{\cyrphk}
1287 %<all,x2,t2c>\DeclareUnicodeCharacter{04A8}{\CYRABHHA}
1288 %<all,x2,t2c>\DeclareUnicodeCharacter{04A9}{\cyrabhha}
1289 %<all,x2,t2a>\DeclareUnicodeCharacter{04AA}{\CYRSDSC}
1290 %<all,x2,t2a>\DeclareUnicodeCharacter{04AB}{\cyrsdsc}
1291 %<all,x2,t2c>\DeclareUnicodeCharacter{04AC}{\CYRTDSC}
1292 %<all,x2,t2c>\DeclareUnicodeCharacter{04AD}{\cyrtdsc}
1293 %<all,x2,t2b,t2a>\DeclareUnicodeCharacter{04AE}{\CYRY}
1294 %<all,x2,t2b,t2a>\DeclareUnicodeCharacter{04AF}{\cyry}
1295 %<all,x2,t2a>\DeclareUnicodeCharacter{04B0}{\CYRYHCRS}
1296 %<all,x2,t2a>\DeclareUnicodeCharacter{04B1}{\cyryhcrs}
1297 %<all,x2,t2c,t2b,t2a>\DeclareUnicodeCharacter{04B2}{\CYRHDSC}
1298 %<all,x2,t2c,t2b,t2a>\DeclareUnicodeCharacter{04B3}{\cyrhdsc}
1299 %<all,x2,t2c>\DeclareUnicodeCharacter{04B4}{\CYRTETSE}
1300 %<all,x2,t2c>\DeclareUnicodeCharacter{04B5}{\cyrtetse}
1301 %<all,x2,t2c,t2b,t2a>\DeclareUnicodeCharacter{04B6}{\CYRCHRDSC}
1302 %<all,x2,t2c,t2b,t2a>\DeclareUnicodeCharacter{04B7}{\cyrchrdsc}
1303 %<all,x2,t2a>\DeclareUnicodeCharacter{04B8}{\CYRCHVCRS}
1304 %<all,x2,t2a>\DeclareUnicodeCharacter{04B9}{\cyrchvcrs}
1305 %<all,x2,t2c,t2b,t2a>\DeclareUnicodeCharacter{04BA}{\CYRSHHA}
1306 %<all,x2,t2c,t2b,t2a>\DeclareUnicodeCharacter{04BB}{\cyrshha}
1307 %<all,x2,t2c>\DeclareUnicodeCharacter{04BC}{\CYRABHCH}
1308 %<all,x2,t2c>\DeclareUnicodeCharacter{04BD}{\cyrabhch}
1309 %<all,x2,t2c>\DeclareUnicodeCharacter{04BE}{\CYRABHCHDSC}
1310 %<all,x2,t2c>\DeclareUnicodeCharacter{04BF}{\cyrabhchdsc}
1312 % The character |\CYRpalochka| is not defined by OT2 and LCY. However it is
1313 % looking identical to |\CYRII| and the Unicode standard explicitly refers
1314 % to that (and to Latin I). So perhaps those encodings could get an alias?
1315 % On the other hand, why are there two distinct slots in the T2 encodings
1316 % even though they are so pressed for space? Perhaps they don't always look
1319 %<all,x2,t2c,t2b,t2a>\DeclareUnicodeCharacter{04C0}{\CYRpalochka}
1323 %<all,x2,t2c,t2b,t2a,ot2,lcy>\DeclareUnicodeCharacter{04C1}{\U\CYRZH}
1324 %<all,x2,t2c,t2b,t2a,ot2,lcy>\DeclareUnicodeCharacter{04C2}{\U\cyrzh}
1325 %<all,x2,t2b>\DeclareUnicodeCharacter{04C3}{\CYRKHK}
1326 %<all,x2,t2b>\DeclareUnicodeCharacter{04C4}{\cyrkhk}
1328 % According to the Unicode standard Ux04C5 should be an L with ``tail'' not
1329 % with descender (which also exists as Ux04A2) but it looks as if the char
1330 % names do not make this distinction). Should they?
1332 %<all,x2,t2c,t2b>\DeclareUnicodeCharacter{04C5}{\CYRLDSC}
1333 %<all,x2,t2c,t2b>\DeclareUnicodeCharacter{04C6}{\cyrldsc}
1337 %<all,x2,t2c,t2b>\DeclareUnicodeCharacter{04C7}{\CYRNHK}
1338 %<all,x2,t2c,t2b>\DeclareUnicodeCharacter{04C8}{\cyrnhk}
1339 %<all,x2,t2b>\DeclareUnicodeCharacter{04CB}{\CYRCHLDSC}
1340 %<all,x2,t2b>\DeclareUnicodeCharacter{04CC}{\cyrchldsc}
1342 % According to the Unicode standard Ux04CD should be an M with ``tail'' not
1343 % with descender. However this time there is no M with descender in the
1346 %<all,x2,t2c>\DeclareUnicodeCharacter{04CD}{\CYRMDSC}
1347 %<all,x2,t2c>\DeclareUnicodeCharacter{04CE}{\cyrmdsc}
1351 %<all,x2,t2c,t2b,t2a,ot2,lcy>\DeclareUnicodeCharacter{04D0}{\U\CYRA}
1352 %<all,x2,t2c,t2b,t2a,ot2,lcy>\DeclareUnicodeCharacter{04D1}{\U\cyra}
1353 %<all,x2,t2c,t2b,t2a,ot2,lcy>\DeclareUnicodeCharacter{04D2}{\"\CYRA}
1354 %<all,x2,t2c,t2b,t2a,ot2,lcy>\DeclareUnicodeCharacter{04D3}{\"\cyra}
1355 %<all,x2,t2a>\DeclareUnicodeCharacter{04D4}{\CYRAE}
1356 %<all,x2,t2a>\DeclareUnicodeCharacter{04D5}{\cyrae}
1357 %<all,x2,t2c,t2b,t2a,ot2,lcy>\DeclareUnicodeCharacter{04D6}{\U\CYRE}
1358 %<all,x2,t2c,t2b,t2a,ot2,lcy>\DeclareUnicodeCharacter{04D7}{\U\cyre}
1359 %<all,x2,t2c,t2b,t2a>\DeclareUnicodeCharacter{04D8}{\CYRSCHWA}
1360 %<all,x2,t2c,t2b,t2a>\DeclareUnicodeCharacter{04D9}{\cyrschwa}
1361 %<all,x2,t2c,t2b,t2a>\DeclareUnicodeCharacter{04DA}{\"\CYRSCHWA}
1362 %<all,x2,t2c,t2b,t2a>\DeclareUnicodeCharacter{04DB}{\"\cyrschwa}
1363 %<all,x2,t2c,t2b,t2a,ot2,lcy>\DeclareUnicodeCharacter{04DC}{\"\CYRZH}
1364 %<all,x2,t2c,t2b,t2a,ot2,lcy>\DeclareUnicodeCharacter{04DD}{\"\cyrzh}
1365 %<all,x2,t2c,t2b,t2a,ot2,lcy>\DeclareUnicodeCharacter{04DE}{\"\CYRZ}
1366 %<all,x2,t2c,t2b,t2a,ot2,lcy>\DeclareUnicodeCharacter{04DF}{\"\cyrz}
1367 %<all,x2,t2c,t2b>\DeclareUnicodeCharacter{04E0}{\CYRABHDZE}
1368 %<all,x2,t2c,t2b>\DeclareUnicodeCharacter{04E1}{\cyrabhdze}
1369 %<all,x2,t2c,t2b,t2a,ot2,lcy>\DeclareUnicodeCharacter{04E2}{\@tabacckludge=\CYRI}
1370 %<all,x2,t2c,t2b,t2a,ot2,lcy>\DeclareUnicodeCharacter{04E3}{\@tabacckludge=\cyri}
1371 %<all,x2,t2c,t2b,t2a,ot2,lcy>\DeclareUnicodeCharacter{04E4}{\"\CYRI}
1372 %<all,x2,t2c,t2b,t2a,ot2,lcy>\DeclareUnicodeCharacter{04E5}{\"\cyri}
1373 %<all,x2,t2c,t2b,t2a,ot2,lcy>\DeclareUnicodeCharacter{04E6}{\"\CYRO}
1374 %<all,x2,t2c,t2b,t2a,ot2,lcy>\DeclareUnicodeCharacter{04E7}{\"\cyro}
1375 %<all,x2,t2c,t2b,t2a>\DeclareUnicodeCharacter{04E8}{\CYROTLD}
1376 %<all,x2,t2c,t2b,t2a>\DeclareUnicodeCharacter{04E9}{\cyrotld}
1377 %<all,x2,t2c,t2b,t2a,ot2,lcy>\DeclareUnicodeCharacter{04EC}{\"\CYREREV}
1378 %<all,x2,t2c,t2b,t2a,ot2,lcy>\DeclareUnicodeCharacter{04ED}{\"\cyrerev}
1379 %<all,x2,t2c,t2b,t2a,ot2,lcy>\DeclareUnicodeCharacter{04EE}{\@tabacckludge=\CYRU}
1380 %<all,x2,t2c,t2b,t2a,ot2,lcy>\DeclareUnicodeCharacter{04EF}{\@tabacckludge=\cyru}
1381 %<all,x2,t2c,t2b,t2a,ot2,lcy>\DeclareUnicodeCharacter{04F0}{\"\CYRU}
1382 %<all,x2,t2c,t2b,t2a,ot2,lcy>\DeclareUnicodeCharacter{04F1}{\"\cyru}
1383 %<all,x2,t2c,t2b,t2a,ot2,lcy>\DeclareUnicodeCharacter{04F2}{\H\CYRU}
1384 %<all,x2,t2c,t2b,t2a,ot2,lcy>\DeclareUnicodeCharacter{04F3}{\H\cyru}
1385 %<all,x2,t2c,t2b,t2a,ot2,lcy>\DeclareUnicodeCharacter{04F4}{\"\CYRCH}
1386 %<all,x2,t2c,t2b,t2a,ot2,lcy>\DeclareUnicodeCharacter{04F5}{\"\cyrch}
1387 %<all,x2,t2b>\DeclareUnicodeCharacter{04F6}{\CYRGDSC}
1388 %<all,x2,t2b>\DeclareUnicodeCharacter{04F7}{\cyrgdsc}
1389 %<all,x2,t2c,t2b,t2a,ot2,lcy>\DeclareUnicodeCharacter{04F8}{\"\CYRERY}
1390 %<all,x2,t2c,t2b,t2a,ot2,lcy>\DeclareUnicodeCharacter{04F9}{\"\cyrery}
1391 %<all,t2b>\DeclareUnicodeCharacter{04FA}{\CYRGDSCHCRS}
1392 %<all,t2b>\DeclareUnicodeCharacter{04FB}{\cyrgdschcrs}
1393 %<all,x2,t2b>\DeclareUnicodeCharacter{04FC}{\CYRHHK}
1394 %<all,x2,t2b>\DeclareUnicodeCharacter{04FD}{\cyrhhk}
1395 %<all,t2b>\DeclareUnicodeCharacter{04FE}{\CYRHHCRS}
1396 %<all,t2b>\DeclareUnicodeCharacter{04FF}{\cyrhhcrs}
1397 %<all,ts1>\DeclareUnicodeCharacter{0E3F}{\textbaht}
1398 %<all,t1>\DeclareUnicodeCharacter{1E02}{\.B}
1399 %<all,t1>\DeclareUnicodeCharacter{1E03}{\.b}
1400 %<all,x2,t2c,t2b,t2a,t1,utf8>\DeclareUnicodeCharacter{200C}{\textcompwordmark}
1402 % \changes{v1.1s}{2016/02/28}{Add more hyphens and dashes}
1404 %<all,t1>\DeclareUnicodeCharacter{2010}{-}
1405 %<all,t1>\DeclareUnicodeCharacter{2011}{\mbox{-}}
1407 % U+2012 should be the width of a digit, endash is OK in many fonts including cm.
1409 %<all,t1>\DeclareUnicodeCharacter{2012}{\textendash}
1410 %<*all,x2,t2c,t2b,t2a,t1,ot2,ot1,ly1,lcy>
1411 \DeclareUnicodeCharacter{2013}{\textendash}
1412 \DeclareUnicodeCharacter{2014}{\textemdash}
1414 % U+2015 is Horizontal bar
1416 %<all,t1>\DeclareUnicodeCharacter{2015}{\textemdash}
1417 %</all,x2,t2c,t2b,t2a,t1,ot2,ot1,ly1,lcy>
1418 %<all,ts1>\DeclareUnicodeCharacter{2016}{\textbardbl}
1419 %<*all,x2,t2c,t2b,t2a,t1,ot2,ot1,lcy>
1420 \DeclareUnicodeCharacter{2018}{\textquoteleft}
1421 \DeclareUnicodeCharacter{2019}{\textquoteright}
1422 %</all,x2,t2c,t2b,t2a,t1,ot2,ot1,lcy>
1423 %<all,t1>\DeclareUnicodeCharacter{201A}{\quotesinglbase}
1424 %<*all,x2,t2c,t2b,t2a,t1,ot2,ot1,ly1,lcy>
1425 \DeclareUnicodeCharacter{201C}{\textquotedblleft}
1426 \DeclareUnicodeCharacter{201D}{\textquotedblright}
1427 %</all,x2,t2c,t2b,t2a,t1,ot2,ot1,ly1,lcy>
1428 %<all,x2,t2c,t2b,t2a,t1,lcy>\DeclareUnicodeCharacter{201E}{\quotedblbase}
1429 %<all,ts1,oms,ly1>\DeclareUnicodeCharacter{2020}{\textdagger}
1430 %<all,ts1,oms,ly1>\DeclareUnicodeCharacter{2021}{\textdaggerdbl}
1431 %<all,ts1,oms,ly1>\DeclareUnicodeCharacter{2022}{\textbullet}
1432 %<all,ly1,utf8>\DeclareUnicodeCharacter{2026}{\textellipsis}
1433 %<*all,x2,ts1,t2c,t2b,t2a,t1,ly1>
1434 \DeclareUnicodeCharacter{2030}{\textperthousand}
1435 %</all,x2,ts1,t2c,t2b,t2a,t1,ly1>
1436 %<*all,x2,ts1,t2c,t2b,t2a,t1>
1437 \DeclareUnicodeCharacter{2031}{\textpertenthousand}
1438 %</all,x2,ts1,t2c,t2b,t2a,t1>
1439 %<all,t1,ly1>\DeclareUnicodeCharacter{2039}{\guilsinglleft}
1440 %<all,t1,ly1>\DeclareUnicodeCharacter{203A}{\guilsinglright}
1441 %<all,ts1>\DeclareUnicodeCharacter{203B}{\textreferencemark}
1442 %<all,ts1>\DeclareUnicodeCharacter{203D}{\textinterrobang}
1443 %<all,ts1>\DeclareUnicodeCharacter{2044}{\textfractionsolidus}
1444 %<all,ts1>\DeclareUnicodeCharacter{204E}{\textasteriskcentered}
1445 %<all,ts1>\DeclareUnicodeCharacter{2052}{\textdiscount}
1446 %<all,ts1>\DeclareUnicodeCharacter{20A1}{\textcolonmonetary}
1447 %<all,ts1>\DeclareUnicodeCharacter{20A4}{\textlira}
1448 %<all,ts1>\DeclareUnicodeCharacter{20A6}{\textnaira}
1449 %<all,ts1>\DeclareUnicodeCharacter{20A9}{\textwon}
1450 %<all,ts1>\DeclareUnicodeCharacter{20AB}{\textdong}
1451 %<all,ts1>\DeclareUnicodeCharacter{20AC}{\texteuro}
1452 %<all,ts1>\DeclareUnicodeCharacter{20B1}{\textpeso}
1453 %<all,ts1>\DeclareUnicodeCharacter{2103}{\textcelsius}
1454 %<all,x2,ts1,t2c,t2b,t2a,ot2,lcy>\DeclareUnicodeCharacter{2116}{\textnumero}
1455 %<all,ts1>\DeclareUnicodeCharacter{2117}{\textcircledP}
1456 %<all,ts1>\DeclareUnicodeCharacter{211E}{\textrecipe}
1457 %<all,ts1>\DeclareUnicodeCharacter{2120}{\textservicemark}
1458 %<all,ts1,ly1,utf8>\DeclareUnicodeCharacter{2122}{\texttrademark}
1459 %<all,ts1>\DeclareUnicodeCharacter{2126}{\textohm}
1460 %<all,ts1>\DeclareUnicodeCharacter{2127}{\textmho}
1461 %<all,ts1>\DeclareUnicodeCharacter{212E}{\textestimated}
1462 %<all,ts1>\DeclareUnicodeCharacter{2190}{\textleftarrow}
1463 %<all,ts1>\DeclareUnicodeCharacter{2191}{\textuparrow}
1464 %<all,ts1>\DeclareUnicodeCharacter{2192}{\textrightarrow}
1465 %<all,ts1>\DeclareUnicodeCharacter{2193}{\textdownarrow}
1466 %<all,x2,ts1,t2c,t2b,t2a>\DeclareUnicodeCharacter{2329}{\textlangle}
1467 %<all,x2,ts1,t2c,t2b,t2a>\DeclareUnicodeCharacter{232A}{\textrangle}
1468 %<all,ts1>\DeclareUnicodeCharacter{2422}{\textblank}
1469 %<all,x2,t2c,t2b,t2a,t1,utf8>\DeclareUnicodeCharacter{2423}{\textvisiblespace}
1470 %<all,ts1>\DeclareUnicodeCharacter{25E6}{\textopenbullet}
1471 %<all,ts1>\DeclareUnicodeCharacter{25EF}{\textbigcircle}
1472 %<all,ts1>\DeclareUnicodeCharacter{266A}{\textmusicalnote}
1473 %<all,t1>\DeclareUnicodeCharacter{1E20}{\@tabacckludge=G}
1474 %<all,t1>\DeclareUnicodeCharacter{1E21}{\@tabacckludge=g}
1477 % \subsection{Notes}
1479 % \changes{v1.1e}{2004/05/22}{Added notes on inconsistency with `8-bit files'.}
1480 % The following inputs are inconsistent with the 8-bit inputenc files
1481 % since they will always only produce the `text character'. This is an
1482 % area where inputenc is notoriously confused.
1484 % %<all,ts1,t1,ot1,ly1>\DeclareUnicodeCharacter{00A3}{\textsterling}
1485 % %<*all,x2,ts1,t2c,t2b,t2a,oms,ly1>
1486 % \DeclareUnicodeCharacter{00A7}{\textsection}
1487 % %</all,x2,ts1,t2c,t2b,t2a,oms,ly1>
1488 % %<all,ts1,utf8>\DeclareUnicodeCharacter{00A9}{\textcopyright}
1489 % %<all,ts1>\DeclareUnicodeCharacter{00B1}{\textpm}
1490 % %<all,ts1,oms,ly1>\DeclareUnicodeCharacter{00B6}{\textparagraph}
1491 % %<all,ts1,oms,ly1>\DeclareUnicodeCharacter{2020}{\textdagger}
1492 % %<all,ts1,oms,ly1>\DeclareUnicodeCharacter{2021}{\textdaggerdbl}
1493 % %<all,ly1,utf8>\DeclareUnicodeCharacter{2026}{\textellipsis}
1496 % The following definitions are in an encoding file but have no
1497 % direct equivalent in Unicode, or they simply do not make sense in that
1498 % context (or we have not yet found anything or \ldots :-). For
1499 % example, the non-combining accent characters are certainly
1500 % available somewhere but these are not equivalent to a \TeX{}
1503 %\DeclareTextSymbol{\j}{OT1}{17}
1504 %\DeclareTextSymbol{\SS}{T1}{223}
1505 %\DeclareTextSymbol{\textcompwordmark}{T1}{23}
1507 %\DeclareTextAccent{\"}{OT1}{127}
1508 %\DeclareTextAccent{\'}{OT1}{19}
1509 %\DeclareTextAccent{\.}{OT1}{95}
1510 %\DeclareTextAccent{\=}{OT1}{22}
1511 %\DeclareTextAccent{\H}{OT1}{125}
1512 %\DeclareTextAccent{\^}{OT1}{94}
1513 %\DeclareTextAccent{\`}{OT1}{18}
1514 %\DeclareTextAccent{\r}{OT1}{23}
1515 %\DeclareTextAccent{\u}{OT1}{21}
1516 %\DeclareTextAccent{\v}{OT1}{20}
1517 %\DeclareTextAccent{\~}{OT1}{126}
1518 %\DeclareTextCommand{\b}{OT1}[1]
1519 %\DeclareTextCommand{\c}{OT1}[1]
1520 %\DeclareTextCommand{\d}{OT1}[1]
1521 %\DeclareTextCommand{\k}{T1}[1]
1526 % \subsection{Mappings for OT1 glyphs}
1528 % This is even more incomplete as again it covers only the single
1529 % glyphs from |OT1| plus some that have been explicitly defined for
1530 % this encoding. Everything that is provided in |T1|, and that
1531 % could be provided as composite glyphs via |OT1|, could and
1532 % probably should be set up as well. Which leaves the many things
1533 % that are not provided in |T1| but can be provided in |OT1| (and
1534 % in |T1|) by composite glyphs.
1536 % Stuff not mapped (note that |\j| ($\jmath$) is not equivalent to any
1537 % Unicode character):
1539 %\DeclareTextSymbol{\j}{OT1}{17}
1540 %\DeclareTextAccent{\"}{OT1}{127}
1541 %\DeclareTextAccent{\'}{OT1}{19}
1542 %\DeclareTextAccent{\.}{OT1}{95}
1543 %\DeclareTextAccent{\=}{OT1}{22}
1544 %\DeclareTextAccent{\^}{OT1}{94}
1545 %\DeclareTextAccent{\`}{OT1}{18}
1546 %\DeclareTextAccent{\~}{OT1}{126}
1547 %\DeclareTextAccent{\H}{OT1}{125}
1548 %\DeclareTextAccent{\u}{OT1}{21}
1549 %\DeclareTextAccent{\v}{OT1}{20}
1550 %\DeclareTextAccent{\r}{OT1}{23}
1551 %\DeclareTextCommand{\b}{OT1}[1]
1552 %\DeclareTextCommand{\c}{OT1}[1]
1553 %\DeclareTextCommand{\d}{OT1}[1]
1558 % \subsection{Mappings for OMS glyphs}
1560 % Characters like |\textbackslash| are not mapped as they are
1561 % (primarily) only in the lower 127 and the code here only sets up
1562 % mappings for UTF-8 characters that are at least 2 octets long.
1564 %\DeclareTextSymbol{\textbackslash}{OMS}{110} % "6E
1565 %\DeclareTextSymbol{\textbar}{OMS}{106} % "6A
1566 %\DeclareTextSymbol{\textbraceleft}{OMS}{102} % "66
1567 %\DeclareTextSymbol{\textbraceright}{OMS}{103} % "67
1570 % But the following (and some others) might actually lurk in Unicode
1573 %\DeclareTextSymbol{\textasteriskcentered}{OMS}{3} % "03
1574 %\DeclareTextCommand{\textcircled}{OMS}
1580 % \subsection{Mappings for TS1 glyphs}
1582 % Exercise for somebody else.
1585 % \subsection{Mappings for \texttt{latex.ltx} glyphs}
1587 % There is also a collection of characters already set up in the kernel,
1588 % one way or the other. Since these do not clearly relate to any
1589 % particular font encoding they are mapped when the
1590 % \texttt{utf8} support is first set up.
1592 % Also there are a number of |\providecommand|s in the various input
1593 % encoding files which may or may not go into this part.
1594 % \changes{v1.1b}{2004/02/09}{Added commands already defined in the kernel}
1597 % This space is intentionally empty ...
1602 % \section{A test document}
1604 % Here is a very small test document which may or may not survive
1605 % if the current document is transferred from one place to the
1609 \documentclass{article}
1611 \usepackage[latin1,utf8]{inputenc}
1612 \usepackage[T1]{fontenc}
1615 \scrollmode % to run past the error below
1619 German umlauts in UTF-8: ^^c3^^a4^^c3^^b6^^c3^^bc %%% äöü
1621 \inputencoding{latin1} % switch to latin1
1623 German umlauts in UTF-8 but read by latin1 (and will produce one
1624 error since \verb=\textcurrency= is not provided):
1625 ^^c3^^a4^^c3^^b6^^c3^^bc
1627 \inputencoding{utf8} % switch back to utf8
1629 German umlauts in UTF-8: ^^c3^^a4^^c3^^b6^^c3^^bc
1632 Some codes that should produce errors as nothing is set up
1633 for them: ^^c3F ^^e1^^a4^^b6
1635 And some that are not legal utf8 sequences: ^^c3X ^^e1XY