components/x11/xorg-docs/src/ctext/ctext.html

   1 <?xml version="1.0" encoding="UTF-8"?>
   2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
   3 <html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>Compound Text Encoding</title><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot_9276" /><style xmlns="" type="text/css">/*
   4  * Copyright (c) 2011 Gaetan Nadon
   5  * Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved.
   6  *
   7  * Permission is hereby granted, free of charge, to any person obtaining a
   8  * copy of this software and associated documentation files (the "Software"),
   9  * to deal in the Software without restriction, including without limitation
  10  * the rights to use, copy, modify, merge, publish, distribute, sublicense,
  11  * and/or sell copies of the Software, and to permit persons to whom the
  12  * Software is furnished to do so, subject to the following conditions:
  13  *
  14  * The above copyright notice and this permission notice (including the next
  15  * paragraph) shall be included in all copies or substantial portions of the
  16  * Software.
  17  *
  18  * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
  19  * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
  20  * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
  21  * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
  22  * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
  23  * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
  24  * DEALINGS IN THE SOFTWARE.
  25  */
  26
  27 /*
  28  * Shared stylesheet for X.Org documentation translated to HTML format
  29  * http://www.sagehill.net/docbookxsl/UsingCSS.html
  30  * http://www.w3schools.com/css/default.asp
  31  * https://addons.mozilla.org/en-US/firefox/addon/web-developer/developers
  32  * https://addons.mozilla.org/en-US/firefox/addon/font-finder/
  33  */
  34
  35 /*
  36  * The sans-serif fonts are considered more legible on a computer screen
  37  * http://dry.sailingissues.com/linux-equivalents-verdana-arial.html
  38  *
  39  */
  40 body {
  41   font-family: "Bitstream Vera Sans", "DejaVu Sans", Tahoma, Geneva, Arial, Sans-serif;
  42   /* In support of using "em" font size unit, the w3c recommended method */
  43   font-size: 100%;
  44 }
  45
  46 /*
  47  * Selection: all elements requiring mono spaced fonts.
  48  *
  49  * The family names attempt to match the proportionally spaced font
  50  * family names such that the same font name is used for both.
  51  * We'd like to use Bitstream, for example, in both proportionally and
  52  * mono spaced font text.
  53  */
  54 .command,
  55 .errorcode,
  56 .errorname,
  57 .errortype,
  58 .filename,
  59 .funcsynopsis,
  60 .function,
  61 .parameter,
  62 .programlisting,
  63 .property,
  64 .screen,
  65 .structname,
  66 .symbol,
  67 .synopsis,
  68 .type
  69 {
  70   font-family:  "Bitstream Vera Sans Mono", "DejaVu Sans Mono", Courier, "Liberation Mono", Monospace;
  71 }
  72
  73 /*
  74  * Books have a title page, a preface, some chapters and appendices,
  75  * a glossary, an index and a bibliography, in that order.
  76  *
  77  * An Article has no preface and no chapters. It has sections, appendices,
  78  * a glossary, an index and a bibliography.
  79  */
  80
  81 /*
  82  * Selection: book main title and subtitle
  83  */
  84 div.book>div.titlepage h1.title,
  85 div.book>div.titlepage h2.subtitle {
  86   text-align: center;
  87 }
  88
  89 /*
  90  * Selection: article main title and subtitle
  91  */
  92 div.article>div.titlepage h2.title,
  93 div.article>div.titlepage h3.subtitle,
  94 div.article>div.sect1>div.titlepage h2.title,
  95 div.article>div.section>div.titlepage h2.title {
  96   text-align: center;
  97 }
  98
  99 /*
 100  * Selection: various types of authors and collaborators, individuals or corporate
 101  *
 102  * These authors are not always contained inside an authorgroup.
 103  * They can be contained inside a lot of different parent types where they might
 104  * not be centered.
 105  * Reducing the margin at the bottom makes a visual separation between authors
 106  * We specify here the ones on the title page, others may be added based on merit.
 107  */
 108 div.titlepage .authorgroup,
 109 div.titlepage .author,
 110 div.titlepage .collab,
 111 div.titlepage .corpauthor,
 112 div.titlepage .corpcredit,
 113 div.titlepage .editor,
 114 div.titlepage .othercredit {
 115   text-align: center;
 116   margin-bottom: 0.25em;
 117 }
 118
 119 /*
 120  * Selection: the affiliation of various types of authors and collaborators,
 121  * individuals or corporate.
 122  */
 123 div.titlepage .affiliation {
 124   text-align: center;
 125 }
 126
 127 /*
 128  * Selection: product release information (X Version 11, Release 7)
 129  *
 130  * The releaseinfo element can be contained inside a lot of different parent
 131  * types where it might not be centered.
 132  * We specify here the one on the title page, others may be added based on merit.
 133  */
 134 div.titlepage p.releaseinfo {
 135   font-weight: bold;
 136   text-align: center;
 137 }
 138
 139 /*
 140  * Selection: publishing date
 141  */
 142 div.titlepage .pubdate {
 143   text-align: center;
 144 }
 145
 146 /*
 147  * The legal notices are displayed in smaller sized fonts
 148  * Justification is only supported in IE and therefore not requested.
 149  *
 150  */
 151 .legalnotice {
 152   font-size: small;
 153   font-style: italic;
 154 }
 155
 156 /*
 157  * For documentation having multiple licenses, the copyright and legalnotice
 158  * elements sequence cannot instantiated multiple times.
 159  * The copyright notice and license text are therefore coded inside a legalnotice
 160  * element. The role attribute on the paragraph is used to allow styling of the
 161  * copyright notice text which should not be italicized.
 162  */
 163 p.multiLicensing {
 164   font-style: normal;
 165   font-size: medium;
 166 }
 167
 168 /*
 169  * Selection: book or article main ToC title
 170  * A paragraph is generated for the title rather than a level 2 heading.
 171  * We do not want to select chapters sub table of contents, only the main one
 172  */
 173 div.book>div.toc>p,
 174 div.article>div.toc>p {
 175   font-size: 1.5em;
 176   text-align: center;
 177 }
 178
 179 /*
 180  * Selection: major sections of a book or an article
 181  *
 182  * Unlike books, articles do not have a titlepage element for appendix.
 183  * Using the selector "div.titlepage h2.title" would be too general.
 184  */
 185 div.book>div.preface>div.titlepage h2.title,
 186 div.book>div.chapter>div.titlepage h2.title,
 187 div.article>div.sect1>div.titlepage h2.title,
 188 div.article>div.section>div.titlepage h2.title,
 189 div.book>div.appendix>div.titlepage h2.title,
 190 div.article>div.appendix h2.title,
 191 div.glossary>div.titlepage h2.title,
 192 div.index>div.titlepage h2.title,
 193 div.bibliography>div.titlepage h2.title {
 194    /* Add a border top over the major parts, just like printed books */
 195    /* The Gray color is already used for the ruler over the main ToC. */
 196   border-top-style: solid;
 197   border-top-width: 2px;
 198   border-top-color: Gray;
 199   /* Put some space between the border and the title */
 200   padding-top: 0.2em;
 201   text-align: center;
 202 }
 203
 204 /*
 205  * A Screen is a verbatim environment for displaying text that the user might
 206  * see on a computer terminal. It is often used to display the results of a command.
 207  *
 208  * http://www.css3.info/preview/rounded-border/
 209  */
 210 .screen {
 211   background: #e0ffff;
 212   border-width: 1px;
 213   border-style: solid;
 214   border-color: #B0C4DE;
 215   border-radius: 1.0em;
 216   /* Browser's vendor properties prior to CSS 3 */
 217   -moz-border-radius: 1.0em;
 218   -webkit-border-radius: 1.0em;
 219   -khtml-border-radius: 1.0em;
 220   margin-left: 1.0em;
 221   margin-right: 1.0em;
 222   padding: 0.5em;
 223 }
 224
 225 /*
 226  * Emphasis program listings with a light shade of gray similar to what
 227  * DocBook XSL guide does: http://www.sagehill.net/docbookxsl/ProgramListings.html
 228  * Found many C API docs on the web using like shades of gray.
 229  */
 230 .programlisting {
 231   background: #F4F4F4;
 232   border-width: 1px;
 233   border-style: solid;
 234   border-color: Gray;
 235   padding: 0.5em;
 236 }
 237
 238 /*
 239  * Emphasis functions synopsis using a darker shade of gray.
 240  * Add a border such that it stands out more.
 241  * Set the padding so the text does not touch the border.
 242  */
 243 .funcsynopsis, .synopsis {
 244   background: #e6e6fa;
 245   border-width: 1px;
 246   border-style: solid;
 247   border-color: Gray;
 248   clear: both;
 249   margin: 0.5em;
 250   padding: 0.25em;
 251 }
 252
 253 /*
 254  * Selection: paragraphs inside synopsis
 255  *
 256  * Removes the default browser margin, let the container set the padding.
 257  * Paragraphs are not always used in synopsis
 258  */
 259 .funcsynopsis p,
 260 .synopsis p {
 261   margin: 0;
 262   padding: 0;
 263 }
 264
 265 /*
 266  * Selection: variable lists, informal tables and tables
 267  *
 268  * Note the parameter name "variablelist.as.table" in xorg-xhtml.xsl
 269  * A table with rows and columns is constructed inside div.variablelist
 270  *
 271  * Set the left margin so it is indented to the right
 272  * Display informal tables with single line borders
 273  */
 274 table {
 275   margin-left: 0.5em;
 276   border-collapse: collapse;
 277 }
 278
 279 /*
 280  * Selection: paragraphs inside tables
 281  *
 282  * Removes the default browser margin, let the container set the padding.
 283  * Paragraphs are not always used in tables
 284  */
 285 td p {
 286   margin: 0;
 287   padding: 0;
 288 }
 289
 290 /*
 291  * Add some space between the left and right column.
 292  * The vertical alignment helps the reader associate a term
 293  * with a multi-line definition.
 294  */
 295 td, th {
 296   padding-left: 1.0em;
 297   padding-right: 1.0em;
 298   vertical-align: top;
 299 }
 300
 301 .warning {
 302   border: 1px solid red;
 303   background: #FFFF66;
 304   padding-left: 0.5em;
 305 }
 306 </style></head><body><div class="article"><div class="titlepage"><div><div><h2 class="title"><a id="ctext"></a>Compound Text Encoding</h2></div><div><h3 class="subtitle"><em>X Consortium Standard</em></h3></div><div><div class="authorgroup"><div class="author"><h3 class="author"><span class="firstname">Robert</span> <span class="othername">W.</span> <span class="surname">Scheifler</span></h3><div class="affiliation"><span class="orgname">X Consortium<br /></span></div></div></div></div><div><p class="releaseinfo">X Version 11, Release 7.7</p></div><div><p class="releaseinfo">Version 1.1</p></div><div><p class="copyright">Copyright © 1989 X Consortium</p></div><div><div class="legalnotice"><a id="id2525274"></a><p>
 307 Permission is hereby granted, free of charge, to any person obtaining a copy
 308 of this software and associated documentation files (the "Software"), to deal
 309 in the Software without restriction, including without limitation the rights
 310 to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 311 copies of the Software, and to permit persons to whom the Software is
 312 furnished to do so, subject to the following conditions:
 313 </p><p>
 314 The above copyright notice and this permission notice shall be included in
 315 all copies or substantial portions of the Software.
 316 </p><p>
 317 THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 318 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 319 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
 320 X CONSORTIUM BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN
 321 AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
 322 CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 323 </p><p>
 324 Except as contained in this notice, the name of the X Consortium shall not be
 325 used in advertising or otherwise to promote the sale, use or other dealings
 326 in this Software without prior written authorization from the X Consortium.
 327 </p><p>X Window System is a trademark of The Open Group.</p></div></div></div><hr /></div><div class="toc"><p><strong>Table of Contents</strong></p><dl><dt><span class="sect1"><a href="#Overview">Overview</a></span></dt><dt><span class="sect1"><a href="#Values">Values</a></span></dt><dt><span class="sect1"><a href="#Control_Characters">Control Characters</a></span></dt><dt><span class="sect1"><a href="#Standard_Character_Set_Encodings">Standard Character Set Encodings</a></span></dt><dt><span class="sect1"><a href="#Approved_Standard_Encodings">Approved Standard Encodings</a></span></dt><dt><span class="sect1"><a href="#Non_Standard_Character_Set_Encodings">Non-Standard Character Set Encodings</a></span></dt><dt><span class="sect1"><a href="#Directionality">Directionality</a></span></dt><dt><span class="sect1"><a href="#Resources">Resources</a></span></dt><dt><span class="sect1"><a href="#Font_Names">Font Names</a></span></dt><dt><span class="sect1"><a href="#Extensions">Extensions</a></span></dt><dt><span class="sect1"><a href="#Errors">Errors</a></span></dt></dl></div><div class="sect1"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="Overview"></a>Overview</h2></div></div></div><p>
 328 Compound Text is a format for multiple character set data, such as
 329 multi-lingual text.  The format is based on ISO
 330 standards for encoding and combining character sets.  Compound Text is intended
 331 to be used in three main contexts: inter-client communication using selections,
 332 as defined in the
 333 <span class="emphasis"><em>Inter-Client Communication Conventions Manual</em></span>
 334 (ICCCM);
 335 window properties (e.g., window manager hints as defined in the ICCCM);
 336 and resources (e.g., as defined in Xlib and the Xt Intrinsics).
 337 </p><p>
 338 Compound Text is intended as an external representation, or interchange format,
 339 not as an internal representation.  It is expected (but not required) that
 340 clients will convert Compound Text to some internal representation for
 341 processing and rendering, and convert from that internal representation to
 342 Compound Text when providing textual data to another client.
 343 </p></div><div class="sect1"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="Values"></a>Values</h2></div></div></div><p>
 344
 345 The name of this encoding is "COMPOUND_TEXT".  When text values are used in
 346 the ICCCM-compliant selection mechanism or are stored as window properties in
 347 the server, the type used should be the atom for "COMPOUND_TEXT".
 348 </p><p>
 349
 350 Octet values are represented in this document as two decimal numbers in the
 351 form col/row.  This means the value (col * 16) + row.  For example, 02/01 means
 352 the value 33.
 353 </p><p>
 354 For our purposes, the octet encoding space is divided into four ranges:
 355 </p><div class="informaltable"><table border="0"><colgroup><col align="left" class="c1" /><col align="left" class="c2" /></colgroup><tbody><tr><td align="left">C0</td><td align="left">octets from 00/00 to 01/15</td></tr><tr><td align="left">GL</td><td align="left">octets from 02/00 to 07/15</td></tr><tr><td align="left">C1</td><td align="left">octets from 08/00 to 09/15</td></tr><tr><td align="left">GR</td><td align="left">octets from 10/00 to 15/15</td></tr></tbody></table></div><p>
 356
 357 C0 and C1 are "control character" sets, while GL and GR are "graphic
 358 character" sets.  Only a subset of C0 and C1 octets are used in the encoding,
 359 and depending on the character set encoding defined as GL or GR, a subset of
 360 GL and GR octets may be used; see below for details.  All octets (00/00 to
 361 15/15) may appear inside the text of extended segments (defined below).
 362 </p><p>
 363
 364 [For those familiar with ISO 2022, we will use only an 8-bit environment, and
 365 we will always use G0 for GL and G1 for GR.]
 366 </p></div><div class="sect1"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="Control_Characters"></a>Control Characters</h2></div></div></div><p>
 367 In C0, only the following values will be used:
 368 </p><div class="informaltable"><table border="0"><colgroup><col align="left" class="c1" /><col align="left" class="c2" /><col align="left" class="c3" /></colgroup><tbody><tr><td align="left">00/09</td><td align="left">HT</td><td align="left">HORIZONTAL TABULATION</td></tr><tr><td align="left">00/10</td><td align="left">NL</td><td align="left">NEW LINE</td></tr><tr><td align="left">01/11</td><td align="left">ESC</td><td align="left">(ESCAPE)</td></tr></tbody></table></div><p>
 369 In C1, only the following value will be used:
 370 </p><div class="informaltable"><table border="0"><colgroup><col align="left" class="c1" /><col align="left" class="c2" /><col align="left" class="c3" /></colgroup><tbody><tr><td align="left">09/11</td><td align="left">CSI</td><td align="left">CONTROL SEQUENCE INTRODUCER</td></tr></tbody></table></div><p>
 371
 372 [The alternate 7-bit CSI encoding 01/11 05/11 is not used in Compound Text.]
 373 </p><p>
 374
 375 No control sequences are defined in Compound Text for changing the C0 and C1
 376 sets.
 377 </p><p>
 378
 379 A horizontal tab can be represented with the octet 00/09.  Specification of
 380 tabulation width settings is not part of Compound Text and must be obtained
 381 from context (in an unspecified manner).
 382 </p><p>
 383
 384 [Inclusion of horizontal tab is for consistency with the STRING type currently
 385 defined in the ICCCM.]
 386 </p><p>
 387
 388 A newline (line separator/terminator) can be represented with the octet 00/10.
 389 </p><p>
 390
 391 [Note that 00/10 is normally LINEFEED, but is being interpreted as NEWLINE.
 392 This can be thought of as using the (deprecated) NEW LINE mode, E.1.3, in ISO
 393 6429.  Use of this value instead of 08/05 (NEL, NEXT LINE) is for consistency
 394 with the STRING type currently defined in the ICCCM.]
 395 </p><p>
 396
 397 The remaining C0 and C1 values (01/11 and 09/11) are only used in the control
 398 sequences defined below.
 399 </p></div><div class="sect1"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="Standard_Character_Set_Encodings"></a>Standard Character Set Encodings</h2></div></div></div><p>
 400
 401 The default GL and GR sets in Compound Text correspond to the left and right
 402 halves of ISO 8859-1 (Latin 1).  As such, any legal instance of a STRING type
 403 (as defined in the ICCCM) is also a legal instance of type COMPOUND_TEXT.
 404 </p><p>
 405 [The implied initial state in ISO 2022 is defined with the sequence:
 406  01/11 02/00 04/03  GO and G1 in an 8-bit environment only.  Designation also invokes.
 407  01/11 02/00 04/07  In an 8-bit environment, C1 represented as 8-bits.
 408  01/11 02/00 04/09  Graphic character sets can be 94 or 96.
 409  01/11 02/00 04/11  8-bit code is used.
 410  01/11 02/08 04/02  Designate ASCII into G0.
 411  01/11 02/13 04/01  Designate right-hand part of ISO Latin-1 into G1.
 412 ]
 413 </p><p>
 414 To define one of the approved standard character set encodings to be
 415 the GL set, one of the following control sequences is used:
 416 </p><div class="informaltable"><table border="0"><colgroup><col align="left" class="c1" /><col align="left" class="c2" /><col align="left" class="c3" /><col align="left" class="c4" /></colgroup><tbody><tr><td align="left">01/11</td><td align="left">02/08</td><td align="left">{I} F</td><td align="left">94 character set</td></tr><tr><td align="left">01/11</td><td align="left">02/04</td><td align="left">02/08{I} F</td><td align="left">94<sup>N</sup> character set</td></tr></tbody></table></div><p>
 417
 418 To define one of the approved standard character set encodings to be
 419 the GR set, one of the following control sequences is used:
 420 </p><div class="informaltable"><table border="0"><colgroup><col align="left" class="c1" /><col align="left" class="c2" /><col align="left" class="c3" /><col align="left" class="c4" /></colgroup><tbody><tr><td align="left">01/11</td><td align="left">02/09</td><td align="left">{I} F</td><td align="left">94 character set</td></tr><tr><td align="left">01/11</td><td align="left">02/13</td><td align="left">{I} F</td><td align="left">96 character set</td></tr><tr><td align="left">01/11</td><td align="left">02/04</td><td align="left">02/09 {I} F</td><td align="left">94<sup>N</sup> character set</td></tr></tbody></table></div><p>
 421
 422 The "F"in the control sequences above stands for "Final character", which
 423 is always in the range 04/00 to 07/14.  The "{I}" stands for zero or more
 424 "intermediate characters", which are always in the range 02/00 to 02/15, with
 425 the first intermediate character always in the range 02/01 to 02/03.  The
 426 registration authority has defined an "{I} F" sequence for each registered
 427 character set encoding.
 428 </p><p>
 429
 430 [Final characters for private encodings (in the range 03/00 to 03/15) are not
 431 permitted here in Compound Text.]
 432 </p><p>
 433
 434 For GL, octet 02/00 is always defined as SPACE, and octet 07/15 (normally
 435 DELETE) is never used.  For a 94-character set defined as GR, octets 10/00 and
 436 15/15 are never used.
 437 </p><p>
 438
 439 [This is consistent with ISO 2022.]
 440 </p><p>
 441
 442 A 94<sup>N</sup> character set uses N octets (N &gt; 1) for each character.
 443 The value of N is derived from the column value for F:
 444 </p><div class="informaltable"><table border="0"><colgroup><col align="left" class="c1" /><col align="left" class="c2" /></colgroup><tbody><tr><td align="left">column 04 or 05</td><td align="left">2 octets</td></tr><tr><td align="left">column 06</td><td align="left">3 octets</td></tr><tr><td align="left">column 07</td><td align="left">4 or more octets</td></tr></tbody></table></div><p>
 445
 446 In a 94<sup>N</sup> encoding, the octet values 02/00 and 07/15 (in GL) and
 447 10/00 and 15/15 (in GR) are never used.
 448 </p><p>
 449
 450 [The column definitions come from ISO 2022.]
 451 </p><p>
 452
 453 Once a GL or GR set has been defined, all further octets in that range (except
 454 within control sequences and extended segments) are interpreted with respect to
 455 that character set encoding, until the GL or GR set is redefined.  GL and GR
 456 sets can be defined independently, they do not have to be defined in pairs.
 457 </p><p>
 458
 459 Note that when actually using a character set encoding as the GR set, you must
 460 force the most significant bit (08/00) of each octet to be a one, so that it
 461 falls in the range 10/00 to 15/15.
 462 </p><p>
 463
 464 [Control sequences to specify character set encoding revisions (as in section
 465 6.3.13 of ISO 2022) are not used in Compound Text.  Revision indicators do not
 466 appear to provide useful information in the context of Compound Text.  The most
 467 recent revision can always be assumed, since revisions are upward compatible.]
 468 </p></div><div class="sect1"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="Approved_Standard_Encodings"></a>Approved Standard Encodings</h2></div></div></div><p>
 469 The following are the approved standard encodings to be used with Compound
 470 Text.  Note that none have Intermediate characters; however, a good parser will
 471 still deal with Intermediate characters in the event that additional encodings
 472 are later added to this list.
 473 </p><div class="informaltable"><table border="1"><colgroup><col align="left" class="c1" /><col align="left" class="c2" /><col align="left" class="c3" /></colgroup><thead><tr><th align="left">{I} F</th><th align="left">94/96</th><th align="left">Description</th></tr></thead><tbody><tr><td align="left">4/02</td><td align="left">94</td><td align="left">
 474 7-bit ASCII graphics (ANSI X3.4-1968), Left half of ISO 8859 sets
 475       </td></tr><tr><td align="left">04/09</td><td align="left">94</td><td align="left">
 476 Right half of JIS X0201-1976 (reaffirmed 1984),
 477 8-Bit Alphanumeric-Katakana Code
 478       </td></tr><tr><td align="left">04/10</td><td align="left">94</td><td align="left">
 479 Left half of JIS X0201-1976 (reaffirmed 1984),
 480 8-Bit Alphanumeric-Katakana Code
 481       </td></tr><tr><td align="left">04/01</td><td align="left">96</td><td align="left">Right half of ISO 8859-1, Latin alphabet No. 1</td></tr><tr><td align="left">04/02</td><td align="left">96</td><td align="left">Right half of ISO 8859-2, Latin alphabet No. 2</td></tr><tr><td align="left">04/03</td><td align="left">96</td><td align="left">Right half of ISO 8859-3, Latin alphabet No. 3</td></tr><tr><td align="left">04/04</td><td align="left">96</td><td align="left">Right half of ISO 8859-4, Latin alphabet No. 4</td></tr><tr><td align="left">04/06</td><td align="left">96</td><td align="left">Right half of ISO 8859-7, Latin/Greek alphabet</td></tr><tr><td align="left">04/07</td><td align="left">96</td><td align="left">Right half of ISO 8859-6, Latin/Arabic alphabet</td></tr><tr><td align="left">04/08</td><td align="left">96</td><td align="left">Right half of ISO 8859-8, Latin/Hebrew alphabet</td></tr><tr><td align="left">04/12</td><td align="left">96</td><td align="left">Right half of ISO 8859-5, Latin/Cyrillic alphabet</td></tr><tr><td align="left">04/13</td><td align="left">96</td><td align="left">Right half of ISO 8859-9, Latin alphabet No. 5</td></tr><tr><td align="left">04/01</td><td align="left">942</td><td align="left">GB2312-1980, China (PRC) Hanzi</td></tr><tr><td align="left">04/02</td><td align="left">942</td><td align="left">JIS X0208-1983, Japanese Graphic Character Set</td></tr><tr><td align="left">04/03</td><td align="left">942</td><td align="left">KS C5601-1987, Korean Graphic Character Set</td></tr></tbody></table></div><p>
 482
 483 The sets listed as "Left half of ..." should always be defined as GL.  The
 484 sets listed as "Right half of ..." should always be defined as GR.  Other
 485 sets can be defined either as GL or GR.
 486 </p></div><div class="sect1"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="Non_Standard_Character_Set_Encodings"></a>Non-Standard Character Set Encodings</h2></div></div></div><p>
 487 Character set encodings that are not in the list of approved standard
 488 encodings can be included
 489 using "extended segments".  An extended segment begins with one of the
 490 following sequences:
 491 </p><div class="informaltable"><table border="0"><colgroup><col align="left" class="c1" /><col align="left" class="c2" /></colgroup><tbody><tr><td align="left">01/11 2/05 02/15 03/00 M L</td><td align="left">variable number of octets per character</td></tr><tr><td align="left">01/11 2/05 02/15 03/01 M L</td><td align="left">1 octet per character</td></tr><tr><td align="left">01/11 2/05 02/15 03/02 M L</td><td align="left">2 octet per character</td></tr><tr><td align="left">01/11 2/05 02/15 03/03 M L</td><td align="left">3 octet per character</td></tr><tr><td align="left">01/11 2/05 02/15 03/04 M L</td><td align="left">4 octet per character</td></tr></tbody></table></div><p>
 492 [This uses the "other coding system" of ISO 2022, using private Final
 493 characters.]
 494 </p><p>
 495
 496 The "M" and "L" octets represent a 14-bit unsigned value giving the number
 497 of octets that appear in the remainder of the segment.  The number is computed
 498 as ((M - 128) * 128) + (L - 128).  The most significant bit M and L are always
 499 set to one.  The remainder of the segment consists of two parts, the name of
 500 the character set encoding and the actual text.  The name of the encoding comes
 501 first and is separated from the text by the octet 00/02 (STX, START OF TEXT).
 502 Note that the length defined by M and L includes the encoding name and
 503 separator.
 504 </p><p>
 505
 506 [The encoding of the length is chosen to avoid having zero octets in Compound
 507 Text when possible, because embedded NUL values are problematic in many C
 508 language routines.  The use of zero octets cannot be ruled out entirely
 509 however, since some octets in the actual text of the extended segment may have
 510 to be zero.]
 511 </p><p>
 512
 513 The name of the encoding should be registered with the X Consortium to avoid
 514 conflicts and should when appropriate match the CharSet Registry and Encoding
 515 registration used in the X Logical Font Description.  The name itself should be
 516 encoded using ISO 8859-1 (Latin 1), should not use question mark (03/15) or
 517 asterisk (02/10), and should use hyphen (02/13) only in accordance with the X
 518 Logical Font Description.
 519 </p><p>
 520
 521 Extended segments are not to be used for any character set encoding that can
 522 be constructed from a GL/GR pair of approved standard encodings. For
 523 example, it is incorrect to use an extended segment for any of the ISO 8859
 524 family of encodings.
 525 </p><p>
 526
 527 It should be noted that the contents of an extended segment are arbitrary;
 528 for example,
 529 they may contain octets in the C0 and C1 ranges, including 00/00, and
 530 octets comprising a given character may differ in their most significant bit.
 531 </p><p>
 532
 533 [ISO-registered "other coding systems" are not used in Compound Text;
 534 extended segments are the only mechanism for non-2022 encodings.]
 535 </p></div><div class="sect1"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="Directionality"></a>Directionality</h2></div></div></div><p>
 536
 537 If desired, horizontal text direction can be indicated using the following
 538 control sequences:
 539 </p><div class="informaltable"><table border="0"><colgroup><col align="left" class="c1" /><col align="left" class="c2" /></colgroup><tbody><tr><td align="left">09/11 03/01 05/13</td><td align="left">begin left-to-right text</td></tr><tr><td align="left">09/11 03/02 05/13</td><td align="left">begin right-to-left text</td></tr><tr><td align="left">09/11 05/13</td><td align="left">end of string</td></tr></tbody></table></div><p>
 540
 541 [This is a subset of the SDS (START DIRECTED STRING) control in the Draft
 542 Bidirectional Addendum to ISO 6429.]
 543 </p><p>
 544
 545 Directionality can be nested.  Logically, a stack of directions is maintained.
 546 Each of the first two control sequences pushes a new direction on the stack,
 547 and the third sequence (revert) pops a direction from the stack.  The stack
 548 starts out empty at the beginning of a Compound Text string.  When the stack is
 549 empty, the directionality of the text is unspecified.
 550 </p><p>
 551
 552 Directionality applies to all subsequent text, whether in GL, GR, or an
 553 extended segment.  If the desired directionality of GL, GR, or extended
 554 segments differs, then directionality control sequences must be inserted when
 555 switching between them.
 556 </p><p>
 557
 558 Note that definition of GL and GR sets is independent of directionality;
 559 defining a new GL or GR set does not change the current directionality, and
 560 pushing or popping a directionality does not change the current GL and GR
 561 definitions.
 562 </p><p>
 563
 564 Specification of directionality is entirely optional; text direction should be
 565 clear from context in most cases.  However, it must be the case that either
 566 all characters in a Compound Text string have explicitly specified direction
 567 or that all characters have unspecified direction.  That is, if directionality
 568 control sequences are used, the first such control sequence must precede the
 569 first graphic character in a Compound Text string, and graphic characters are
 570 not permitted whenever the directionality stack is empty.
 571 </p></div><div class="sect1"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="Resources"></a>Resources</h2></div></div></div><p>
 572
 573 To use Compound Text in a resource, you can simply treat all octets as if they
 574 were ASCII/Latin-1 and just replace all "\" octets (05/12) with the two
 575 octets "\\", all newline octets (00/10) with the two octets "\n", and
 576 all zero octets with the four octets "\000".
 577 It is up to the client making use of the resource to interpret the data as
 578 Compound Text; the policy by which this is ascertained is not constrained by
 579 the Compound Text specification.
 580 </p></div><div class="sect1"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="Font_Names"></a>Font Names</h2></div></div></div><p>
 581 The following CharSet names for the standard character set encodings are
 582 registered for use in font names under the X Logical Font Description:
 583 </p><div class="informaltable"><table border="1"><colgroup><col align="left" class="c1" /><col align="left" class="c2" /><col align="left" class="c3" /></colgroup><thead><tr><th align="left">Name</th><th align="left">Encoding Standard</th><th align="left">Description</th></tr></thead><tbody><tr><td align="left">ISO8859-1</td><td align="left">ISO8859-1</td><td align="left">Latinalphabet No. 1</td></tr><tr><td align="left">ISO8859-2</td><td align="left">ISO8859-2</td><td align="left">Latinalphabet No. 2</td></tr><tr><td align="left">ISO8859-3</td><td align="left">ISO8859-3</td><td align="left">Latinalphabet No. 3</td></tr><tr><td align="left">ISO8859-4</td><td align="left">ISO8859-4</td><td align="left">Latinalphabet No. 4</td></tr><tr><td align="left">ISO8859-5</td><td align="left">ISO 8859-5</td><td align="left">Latin/Cyrillic alphabet</td></tr><tr><td align="left">ISO8859-6</td><td align="left">ISO 8859-6</td><td align="left">Latin/Arabic alphabet</td></tr><tr><td align="left">ISO8859-7</td><td align="left">ISO8859-7</td><td align="left">Latin/Greekalphabet</td></tr><tr><td align="left">ISO8859-8</td><td align="left">ISO8859-8</td><td align="left">Latin/Hebrew alphabet</td></tr><tr><td align="left">ISO8859-9</td><td align="left">ISO8859-9</td><td align="left">Latinalphabet No. 5</td></tr><tr><td align="left">JISX0201.1976-0</td><td align="left">JIS X0201-1976 (reaffirmed 1984)</td><td align="left">8-bit Alphanumeric-Katakana Code</td></tr><tr><td align="left">GB2312.1980-0</td><td align="left">GB2312-1980, GL encoding</td><td align="left">China (PRC) Hanzi</td></tr><tr><td align="left">JISX0208.1983-0</td><td align="left">JIS X0208-1983, GL encoding</td><td align="left">Japanese Graphic Character Set</td></tr><tr><td align="left">KSC5601.1987-0</td><td align="left">KS C5601-1987, GL encoding</td><td align="left">Korean Graphic Character Set</td></tr></tbody></table></div></div><div class="sect1"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="Extensions"></a>Extensions</h2></div></div></div><p>
 584
 585 There is no absolute requirement for a parser to deal with anything but the
 586 particular encoding syntax defined in this specification.  However, it is
 587 possible that Compound Text may be extended in the future, and as such it may
 588 be desirable to construct the parser to handle 2022/6429 syntax more generally.
 589 </p><p>
 590
 591 There are two general formats covering all control sequences that are expected
 592 to appear in extensions:
 593 </p><p>
 594 01/11 {I} F
 595 </p><p>
 596 For this format, I is always in the range 02/00 to 02/15, and F is always
 597 in the range 03/00 to 07/14.
 598 </p><p>
 599 09/11 {P} {I} F
 600 </p><p>
 601 For this format, P is always in the range 03/00 to 03/15, I is always in
 602 the range 02/00 to 02/15, and F is always in the range 04/00 to 07/14.
 603 </p><p>
 604
 605 In addition, new (singleton) control characters (in the C0 and C1 ranges) might
 606 be defined in the future.
 607 </p><p>
 608
 609 Finally, new kinds of "segments" might be defined in the future using syntax
 610 similar to extended segments:
 611 </p><p>
 612 01/11 02/05 02/15 F M L
 613 </p><p>
 614 For this format, F is in the range 03/05 to 3/15.  M and L are as defined
 615 in extended segments.  Such a segment will always be followed by the number
 616 of octets defined by M and L.  These octets can have arbitrary values and
 617 need not follow the internal structure defined for current extended
 618 segments.
 619 </p><p>
 620
 621 If extensions to this specification are defined in the future, then any string
 622 incorporating instances of such extensions must start with one of the following
 623 control sequences:
 624 </p><div class="informaltable"><table border="0"><colgroup><col align="left" class="c1" /><col align="left" class="c2" /></colgroup><tbody><tr><td align="left">01/11 02/03 V 03/00</td><td align="left">ignoring extensions is OK</td></tr><tr><td align="left">01/11 02/03 V 03/01</td><td align="left">ignoring extensions is not OK</td></tr></tbody></table></div><p>
 625
 626 In either case, V is in the range 02/00 to 02/15 and indicates the major
 627 version
 628 minus one of the specification being used.  These version control sequences are
 629 for use by clients that implement earlier versions, but have implemented a
 630 general parser.  The first control sequence indicates that it is acceptable to
 631 ignore all extension control sequences; no mandatory information will be lost
 632 in the process.  The second control sequence indicates that it is unacceptable
 633 to ignore any extension control sequences; mandatory information would be lost
 634 in the process.  In general, it will be up to the client generating the
 635 Compound Text to decide which control sequence to use.
 636 </p></div><div class="sect1"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="Errors"></a>Errors</h2></div></div></div><p>
 637
 638 If a Compound Text string does not match the specification here (e.g., uses
 639 undefined control characters, or undefined control sequences, or incorrectly
 640 formatted extended segments), it is best to treat the entire string as invalid,
 641 except as indicated by a version control sequence.
 642 </p></div></div></body></html>