doc/src/sgml/charset.sgml

   1 <!-- doc/src/sgml/charset.sgml -->
   2
   3 <chapter id="charset">
   4  <title>Localization</title>
   5
   6  <para>
   7   This chapter describes the available localization features from the
   8   point of view of the administrator.
   9   <productname>PostgreSQL</productname> supports two localization
  10   facilities:
  11
  12    <itemizedlist>
  13     <listitem>
  14      <para>
  15       Using the locale features of the operating system to provide
  16       locale-specific collation order, number formatting, translated
  17       messages, and other aspects.
  18       This is covered in <xref linkend="locale"/> and
  19       <xref linkend="collation"/>.
  20      </para>
  21     </listitem>
  22
  23     <listitem>
  24      <para>
  25       Providing a number of different character sets to support storing text
  26       in all kinds of languages, and providing character set translation
  27       between client and server.
  28       This is covered in <xref linkend="multibyte"/>.
  29      </para>
  30     </listitem>
  31    </itemizedlist>
  32   </para>
  33
  34
  35  <sect1 id="locale">
  36   <title>Locale Support</title>
  37
  38   <indexterm zone="locale"><primary>locale</primary></indexterm>
  39
  40   <para>
  41    <firstterm>Locale</firstterm> support refers to an application respecting
  42    cultural preferences regarding alphabets, sorting, number
  43    formatting, etc.  <productname>PostgreSQL</productname> uses the standard ISO
  44    C and <acronym>POSIX</acronym> locale facilities provided by the server operating
  45    system.  For additional information refer to the documentation of your
  46    system.
  47   </para>
  48
  49   <sect2 id="locale-overview">
  50    <title>Overview</title>
  51
  52    <para>
  53     Locale support is automatically initialized when a database
  54     cluster is created using <command>initdb</command>.
  55     <command>initdb</command> will initialize the database cluster
  56     with the locale setting of its execution environment by default,
  57     so if your system is already set to use the locale that you want
  58     in your database cluster then there is nothing else you need to
  59     do.  If you want to use a different locale (or you are not sure
  60     which locale your system is set to), you can instruct
  61     <command>initdb</command> exactly which locale to use by
  62     specifying the <option>--locale</option> option. For example:
  63 <screen>
  64 initdb --locale=sv_SE
  65 </screen>
  66    </para>
  67
  68    <para>
  69     This example for Unix systems sets the locale to Swedish
  70     (<literal>sv</literal>) as spoken
  71     in Sweden (<literal>SE</literal>).  Other possibilities might include
  72     <literal>en_US</literal> (U.S. English) and <literal>fr_CA</literal> (French
  73     Canadian).  If more than one character set can be used for a
  74     locale then the specifications can take the form
  75     <replaceable>language_territory.codeset</replaceable>.  For example,
  76     <literal>fr_BE.UTF-8</literal> represents the French language (fr) as
  77     spoken in Belgium (BE), with a <acronym>UTF-8</acronym> character set
  78     encoding.
  79    </para>
  80
  81    <para>
  82     What locales are available on your
  83     system under what names depends on what was provided by the operating
  84     system vendor and what was installed.  On most Unix systems, the command
  85     <literal>locale -a</literal> will provide a list of available locales.
  86     Windows uses more verbose locale names, such as <literal>German_Germany</literal>
  87     or <literal>Swedish_Sweden.1252</literal>, but the principles are the same.
  88    </para>
  89
  90    <para>
  91     Occasionally it is useful to mix rules from several locales, e.g.,
  92     use English collation rules but Spanish messages.  To support that, a
  93     set of locale subcategories exist that control only certain
  94     aspects of the localization rules:
  95
  96     <informaltable>
  97      <tgroup cols="2">
  98       <colspec colname="col1" colwidth="1*"/>
  99       <colspec colname="col2" colwidth="3*"/>
 100       <tbody>
 101        <row>
 102         <entry><envar>LC_COLLATE</envar></entry>
 103         <entry>String sort order</entry>
 104        </row>
 105        <row>
 106         <entry><envar>LC_CTYPE</envar></entry>
 107         <entry>Character classification (What is a letter? Its upper-case equivalent?)</entry>
 108        </row>
 109        <row>
 110         <entry><envar>LC_MESSAGES</envar></entry>
 111         <entry>Language of messages</entry>
 112        </row>
 113        <row>
 114         <entry><envar>LC_MONETARY</envar></entry>
 115         <entry>Formatting of currency amounts</entry>
 116        </row>
 117        <row>
 118         <entry><envar>LC_NUMERIC</envar></entry>
 119         <entry>Formatting of numbers</entry>
 120        </row>
 121        <row>
 122         <entry><envar>LC_TIME</envar></entry>
 123         <entry>Formatting of dates and times</entry>
 124        </row>
 125       </tbody>
 126      </tgroup>
 127     </informaltable>
 128
 129     The category names translate into names of
 130     <command>initdb</command> options to override the locale choice
 131     for a specific category.  For instance, to set the locale to
 132     French Canadian, but use U.S. rules for formatting currency, use
 133     <literal>initdb --locale=fr_CA --lc-monetary=en_US</literal>.
 134    </para>
 135
 136    <para>
 137     If you want the system to behave as if it had no locale support,
 138     use the special locale name <literal>C</literal>, or equivalently
 139     <literal>POSIX</literal>.
 140    </para>
 141
 142    <para>
 143     Some locale categories must have their values
 144     fixed when the database is created.  You can use different settings
 145     for different databases, but once a database is created, you cannot
 146     change them for that database anymore. <literal>LC_COLLATE</literal>
 147     and <literal>LC_CTYPE</literal> are these categories.  They affect
 148     the sort order of indexes, so they must be kept fixed, or indexes on
 149     text columns would become corrupt.
 150     (But you can alleviate this restriction using collations, as discussed
 151     in <xref linkend="collation"/>.)
 152     The default values for these
 153     categories are determined when <command>initdb</command> is run, and
 154     those values are used when new databases are created, unless
 155     specified otherwise in the <command>CREATE DATABASE</command> command.
 156    </para>
 157
 158    <para>
 159     The other locale categories can be changed whenever desired
 160     by setting the server configuration parameters
 161     that have the same name as the locale categories (see <xref
 162     linkend="runtime-config-client-format"/> for details).  The values
 163     that are chosen by <command>initdb</command> are actually only written
 164     into the configuration file <filename>postgresql.conf</filename> to
 165     serve as defaults when the server is started.  If you remove these
 166     assignments from <filename>postgresql.conf</filename> then the
 167     server will inherit the settings from its execution environment.
 168    </para>
 169
 170    <para>
 171     Note that the locale behavior of the server is determined by the
 172     environment variables seen by the server, not by the environment
 173     of any client.  Therefore, be careful to configure the correct locale settings
 174     before starting the server.  A consequence of this is that if
 175     client and server are set up in different locales, messages might
 176     appear in different languages depending on where they originated.
 177    </para>
 178
 179    <note>
 180     <para>
 181      When we speak of inheriting the locale from the execution
 182      environment, this means the following on most operating systems:
 183      For a given locale category, say the collation, the following
 184      environment variables are consulted in this order until one is
 185      found to be set: <envar>LC_ALL</envar>, <envar>LC_COLLATE</envar>
 186      (or the variable corresponding to the respective category),
 187      <envar>LANG</envar>.  If none of these environment variables are
 188      set then the locale defaults to <literal>C</literal>.
 189     </para>
 190
 191     <para>
 192      Some message localization libraries also look at the environment
 193      variable <envar>LANGUAGE</envar> which overrides all other locale
 194      settings for the purpose of setting the language of messages.  If
 195      in doubt, please refer to the documentation of your operating
 196      system, in particular the documentation about
 197      <application>gettext</application>.
 198     </para>
 199    </note>
 200
 201    <para>
 202     To enable messages to be translated to the user's preferred language,
 203     <acronym>NLS</acronym> must have been selected at build time
 204     (<literal>configure --enable-nls</literal>).  All other locale support is
 205     built in automatically.
 206    </para>
 207   </sect2>
 208
 209   <sect2 id="locale-behavior">
 210    <title>Behavior</title>
 211
 212    <para>
 213     The locale settings influence the following SQL features:
 214
 215     <itemizedlist>
 216      <listitem>
 217       <para>
 218        Sort order in queries using <literal>ORDER BY</literal> or the standard
 219        comparison operators on textual data
 220        <indexterm><primary>ORDER BY</primary><secondary>and locales</secondary></indexterm>
 221       </para>
 222      </listitem>
 223
 224      <listitem>
 225       <para>
 226        The <function>upper</function>, <function>lower</function>, and <function>initcap</function>
 227        functions
 228        <indexterm><primary>upper</primary><secondary>and locales</secondary></indexterm>
 229        <indexterm><primary>lower</primary><secondary>and locales</secondary></indexterm>
 230       </para>
 231      </listitem>
 232
 233      <listitem>
 234       <para>
 235        Pattern matching operators (<literal>LIKE</literal>, <literal>SIMILAR TO</literal>,
 236        and POSIX-style regular expressions); locales affect both case
 237        insensitive matching and the classification of characters by
 238        character-class regular expressions
 239        <indexterm><primary>LIKE</primary><secondary>and locales</secondary></indexterm>
 240        <indexterm><primary>regular expressions</primary><secondary>and locales</secondary></indexterm>
 241       </para>
 242      </listitem>
 243
 244      <listitem>
 245       <para>
 246        The <function>to_char</function> family of functions
 247        <indexterm><primary>to_char</primary><secondary>and locales</secondary></indexterm>
 248       </para>
 249      </listitem>
 250
 251      <listitem>
 252       <para>
 253        The ability to use indexes with <literal>LIKE</literal> clauses
 254       </para>
 255      </listitem>
 256     </itemizedlist>
 257    </para>
 258
 259    <para>
 260     The drawback of using locales other than <literal>C</literal> or
 261     <literal>POSIX</literal> in <productname>PostgreSQL</productname> is its performance
 262     impact. It slows character handling and prevents ordinary indexes
 263     from being used by <literal>LIKE</literal>. For this reason use locales
 264     only if you actually need them.
 265    </para>
 266
 267    <para>
 268     As a workaround to allow <productname>PostgreSQL</productname> to use indexes
 269     with <literal>LIKE</literal> clauses under a non-C locale, several custom
 270     operator classes exist. These allow the creation of an index that
 271     performs a strict character-by-character comparison, ignoring
 272     locale comparison rules. Refer to <xref linkend="indexes-opclass"/>
 273     for more information.  Another approach is to create indexes using
 274     the <literal>C</literal> collation, as discussed in
 275     <xref linkend="collation"/>.
 276    </para>
 277   </sect2>
 278
 279   <sect2 id="locale-selecting-locales">
 280    <title>Selecting Locales</title>
 281
 282    <para>
 283     Locales can be selected in different scopes depending on requirements.
 284     The above overview showed how locales are specified using
 285     <command>initdb</command> to set the defaults for the entire cluster.  The
 286     following list shows where locales can be selected.  Each item provides
 287     the defaults for the subsequent items, and each lower item allows
 288     overriding the defaults on a finer granularity.
 289    </para>
 290
 291    <orderedlist>
 292     <listitem>
 293      <para>
 294       As explained above, the environment of the operating system provides the
 295       defaults for the locales of a newly initialized database cluster.  In
 296       many cases, this is enough: if the operating system is configured for
 297       the desired language/territory, by default
 298       <productname>PostgreSQL</productname> will also behave according
 299       to that locale.
 300      </para>
 301     </listitem>
 302
 303     <listitem>
 304      <para>
 305       As shown above, command-line options for <command>initdb</command>
 306       specify the locale settings for a newly initialized database cluster.
 307       Use this if the operating system does not have the locale configuration
 308       you want for your database system.
 309      </para>
 310     </listitem>
 311
 312     <listitem>
 313      <para>
 314       A locale can be selected separately for each database.  The SQL command
 315       <command>CREATE DATABASE</command> and its command-line equivalent
 316       <command>createdb</command> have options for that.  Use this for example
 317       if a database cluster houses databases for multiple tenants with
 318       different requirements.
 319      </para>
 320     </listitem>
 321
 322     <listitem>
 323      <para>
 324       Locale settings can be made for individual table columns.  This uses an
 325       SQL object called <firstterm>collation</firstterm> and is explained in
 326       <xref linkend="collation"/>.  Use this for example to sort data in
 327       different languages or customize the sort order of a particular table.
 328      </para>
 329     </listitem>
 330
 331     <listitem>
 332      <para>
 333       Finally, locales can be selected for an individual query.  Again, this
 334       uses SQL collation objects.  This could be used to change the sort order
 335       based on run-time choices or for ad-hoc experimentation.
 336      </para>
 337     </listitem>
 338    </orderedlist>
 339   </sect2>
 340
 341   <sect2 id="locale-providers">
 342    <title>Locale Providers</title>
 343
 344    <para>
 345     A locale provider specifies which library defines the locale behavior for
 346     collations and character classifications.
 347    </para>
 348
 349    <para>
 350     The commands and tools that select the locale settings, as described
 351     above, each have an option to select the locale provider. Here is an
 352     example to initialize a database cluster using the ICU provider:
 353 <programlisting>
 354 initdb --locale-provider=icu --icu-locale=en
 355 </programlisting>
 356     See the description of the respective commands and programs for
 357     details.  Note that you can mix locale providers at different
 358     granularities, for example use <literal>libc</literal> by default for the
 359     cluster but have one database that uses the <literal>icu</literal>
 360     provider, and then have collation objects using either provider within
 361     those databases.
 362    </para>
 363
 364    <para>
 365     Regardless of the locale provider, the operating system is still used to
 366     provide some locale-aware behavior, such as messages (see <xref
 367     linkend="guc-lc-messages"/>).
 368    </para>
 369
 370    <para>
 371     The available locale providers are listed below:
 372    </para>
 373
 374    <variablelist>
 375     <varlistentry>
 376      <term><literal>builtin</literal></term>
 377      <listitem>
 378       <para>
 379        The <literal>builtin</literal> provider uses built-in operations. Only
 380        the <literal>C</literal> and <literal>C.UTF-8</literal> locales are
 381        supported for this provider.
 382       </para>
 383       <para>
 384        The <literal>C</literal> locale behavior is identical to the
 385        <literal>C</literal> locale in the libc provider. When using this
 386        locale, the behavior may depend on the database encoding.
 387       </para>
 388       <para>
 389        The <literal>C.UTF-8</literal> locale is available only for when the
 390        database encoding is <literal>UTF-8</literal>, and the behavior is
 391        based on Unicode. The collation uses the code point values only. The
 392        regular expression character classes are based on the "POSIX
 393        Compatible" semantics, and the case mapping is the "simple" variant.
 394       </para>
 395      </listitem>
 396     </varlistentry>
 397
 398     <varlistentry>
 399      <term><literal>icu</literal></term>
 400      <listitem>
 401       <para>
 402        The <literal>icu</literal> provider uses the external
 403        ICU<indexterm><primary>ICU</primary></indexterm>
 404        library. <productname>PostgreSQL</productname> must have been
 405        configured with support.
 406       </para>
 407       <para>
 408        ICU provides collation and character classification behavior that is
 409        independent of the operating system and database encoding, which is
 410        preferable if you expect to transition to other platforms without any
 411        change in results. <literal>LC_COLLATE</literal> and
 412        <literal>LC_CTYPE</literal> can be set independently of the ICU
 413        locale.
 414       </para>
 415       <note>
 416        <para>
 417         For the ICU provider, results may depend on the version of the ICU
 418         library used, as it is updated to reflect changes in natural language
 419         over time.
 420        </para>
 421       </note>
 422      </listitem>
 423     </varlistentry>
 424
 425     <varlistentry>
 426      <term><literal>libc</literal></term>
 427      <listitem>
 428       <para>
 429        The <literal>libc</literal> provider uses the operating system's C
 430        library. The collation and character classification behavior is
 431        controlled by the settings <literal>LC_COLLATE</literal> and
 432        <literal>LC_CTYPE</literal>, so they cannot be set independently.
 433       </para>
 434       <note>
 435        <para>
 436         The same locale name may have different behavior on different
 437         platforms when using the libc provider.
 438        </para>
 439       </note>
 440      </listitem>
 441     </varlistentry>
 442    </variablelist>
 443   </sect2>
 444
 445   <sect2 id="icu-locales">
 446    <title>ICU Locales</title>
 447
 448    <sect3 id="icu-locale-names">
 449     <title>ICU Locale Names</title>
 450
 451     <para>
 452      The ICU format for the locale name is a <link
 453      linkend="icu-language-tag">Language Tag</link>.
 454
 455 <programlisting>
 456 CREATE COLLATION mycollation1 (provider = icu, locale = 'ja-JP');
 457 CREATE COLLATION mycollation2 (provider = icu, locale = 'fr');
 458 </programlisting>
 459     </para>
 460    </sect3>
 461
 462    <sect3 id="icu-canonicalization">
 463     <title>Locale Canonicalization and Validation</title>
 464     <para>
 465      When defining a new ICU collation object or database with ICU as the
 466      provider, the given locale name is transformed ("canonicalized") into a
 467      language tag if not already in that form. For instance,
 468
 469 <screen>
 470 CREATE COLLATION mycollation3 (provider = icu, locale = 'en-US-u-kn-true');
 471 NOTICE:  using standard form "en-US-u-kn" for locale "en-US-u-kn-true"
 472 CREATE COLLATION mycollation4 (provider = icu, locale = 'de_DE.utf8');
 473 NOTICE:  using standard form "de-DE" for locale "de_DE.utf8"
 474 </screen>
 475
 476      If you see this notice, ensure that the <symbol>provider</symbol> and
 477      <symbol>locale</symbol> are the expected result. For consistent results
 478      when using the ICU provider, specify the canonical <link
 479      linkend="icu-language-tag">language tag</link> instead of relying on the
 480      transformation.
 481     </para>
 482
 483     <para>
 484      A locale with no language name, or the special language name
 485      <literal>root</literal>, is transformed to have the language
 486      <literal>und</literal> ("undefined").
 487     </para>
 488
 489     <para>
 490      ICU can transform most libc locale names, as well as some other formats,
 491      into language tags for easier transition to ICU. If a libc locale name is
 492      used in ICU, it may not have precisely the same behavior as in libc.
 493     </para>
 494
 495     <para>
 496      If there is a problem interpreting the locale name, or if the locale name
 497      represents a language or region that ICU does not recognize, you will see
 498      the following warning:
 499
 500 <screen>
 501 CREATE COLLATION nonsense (provider = icu, locale = 'nonsense');
 502 WARNING:  ICU locale "nonsense" has unknown language "nonsense"
 503 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
 504 CREATE COLLATION
 505 </screen>
 506
 507      <xref linkend="guc-icu-validation-level"/> controls how the message is
 508      reported. Unless set to <literal>ERROR</literal>, the collation will
 509      still be created, but the behavior may not be what the user intended.
 510     </para>
 511    </sect3>
 512
 513    <sect3 id="icu-language-tag">
 514     <title>Language Tag</title>
 515
 516     <para>
 517      A language tag, defined in BCP 47, is a standardized identifier used to
 518      identify languages, regions, and other information about a locale.
 519     </para>
 520
 521     <para>
 522      Basic language tags are simply
 523      <replaceable>language</replaceable><literal>-</literal><replaceable>region</replaceable>;
 524      or even just <replaceable>language</replaceable>. The
 525      <replaceable>language</replaceable> is a language code
 526      (e.g. <literal>fr</literal> for French), and
 527      <replaceable>region</replaceable> is a region code
 528      (e.g. <literal>CA</literal> for Canada). Examples:
 529      <literal>ja-JP</literal>, <literal>de</literal>, or
 530      <literal>fr-CA</literal>.
 531     </para>
 532
 533     <para>
 534      Collation settings may be included in the language tag to customize
 535      collation behavior. ICU allows extensive customization, such as
 536      sensitivity (or insensitivity) to accents, case, and punctuation;
 537      treatment of digits within text; and many other options to satisfy a
 538      variety of uses.
 539     </para>
 540
 541     <para>
 542      To include this additional collation information in a language tag,
 543      append <literal>-u</literal>, which indicates there are additional
 544      collation settings, followed by one or more
 545      <literal>-</literal><replaceable>key</replaceable><literal>-</literal><replaceable>value</replaceable>
 546      pairs. The <replaceable>key</replaceable> is the key for a <link
 547      linkend="icu-collation-settings">collation setting</link> and
 548      <replaceable>value</replaceable> is a valid value for that setting. For
 549      boolean settings, the <literal>-</literal><replaceable>key</replaceable>
 550      may be specified without a corresponding
 551      <literal>-</literal><replaceable>value</replaceable>, which implies a
 552      value of <literal>true</literal>.
 553     </para>
 554
 555     <para>
 556      For example, the language tag <literal>en-US-u-kn-ks-level2</literal>
 557      means the locale with the English language in the US region, with
 558      collation settings <literal>kn</literal> set to <literal>true</literal>
 559      and <literal>ks</literal> set to <literal>level2</literal>. Those
 560      settings mean the collation will be case-insensitive and treat a sequence
 561      of digits as a single number:
 562
 563 <screen>
 564 CREATE COLLATION mycollation5 (provider = icu, deterministic = false, locale = 'en-US-u-kn-ks-level2');
 565 SELECT 'aB' = 'Ab' COLLATE mycollation5 as result;
 566  result
 567 --------
 568  t
 569 (1 row)
 570
 571 SELECT 'N-45' &lt; 'N-123' COLLATE mycollation5 as result;
 572  result
 573 --------
 574  t
 575 (1 row)
 576 </screen>
 577     </para>
 578
 579     <para>
 580      See <xref linkend="icu-custom-collations"/> for details and additional
 581      examples of using language tags with custom collation information for the
 582      locale.
 583     </para>
 584    </sect3>
 585   </sect2>
 586
 587   <sect2 id="locale-problems">
 588    <title>Problems</title>
 589
 590    <para>
 591     If locale support doesn't work according to the explanation above,
 592     check that the locale support in your operating system is
 593     correctly configured.  To check what locales are installed on your
 594     system, you can use the command <literal>locale -a</literal> if
 595     your operating system provides it.
 596    </para>
 597
 598    <para>
 599     Check that <productname>PostgreSQL</productname> is actually using the locale
 600     that you think it is.  The <envar>LC_COLLATE</envar> and <envar>LC_CTYPE</envar>
 601     settings are determined when a database is created, and cannot be
 602     changed except by creating a new database.  Other locale
 603     settings including <envar>LC_MESSAGES</envar> and <envar>LC_MONETARY</envar>
 604     are initially determined by the environment the server is started
 605     in, but can be changed on-the-fly.  You can check the active locale
 606     settings using the <command>SHOW</command> command.
 607    </para>
 608
 609    <para>
 610     The directory <filename>src/test/locale</filename> in the source
 611     distribution contains a test suite for
 612     <productname>PostgreSQL</productname>'s locale support.
 613    </para>
 614
 615    <para>
 616     Client applications that handle server-side errors by parsing the
 617     text of the error message will obviously have problems when the
 618     server's messages are in a different language.  Authors of such
 619     applications are advised to make use of the error code scheme
 620     instead.
 621    </para>
 622
 623    <para>
 624     Maintaining catalogs of message translations requires the on-going
 625     efforts of many volunteers that want to see
 626     <productname>PostgreSQL</productname> speak their preferred language well.
 627     If messages in your language are currently not available or not fully
 628     translated, your assistance would be appreciated.  If you want to
 629     help, refer to <xref linkend="nls"/> or write to the developers'
 630     mailing list.
 631    </para>
 632   </sect2>
 633  </sect1>
 634
 635
 636  <sect1 id="collation">
 637   <title>Collation Support</title>
 638
 639   <indexterm zone="collation"><primary>collation</primary></indexterm>
 640
 641   <para>
 642    The collation feature allows specifying the sort order and character
 643    classification behavior of data per-column, or even per-operation.
 644    This alleviates the restriction that the
 645    <symbol>LC_COLLATE</symbol> and <symbol>LC_CTYPE</symbol> settings
 646    of a database cannot be changed after its creation.
 647   </para>
 648
 649   <sect2 id="collation-concepts">
 650    <title>Concepts</title>
 651
 652    <para>
 653     Conceptually, every expression of a collatable data type has a
 654     collation.  (The built-in collatable data types are
 655     <type>text</type>, <type>varchar</type>, and <type>char</type>.
 656     User-defined base types can also be marked collatable, and of course
 657     a <glossterm linkend="glossary-domain">domain</glossterm> over a
 658     collatable data type is collatable.)  If the
 659     expression is a column reference, the collation of the expression is the
 660     defined collation of the column.  If the expression is a constant, the
 661     collation is the default collation of the data type of the
 662     constant.  The collation of a more complex expression is derived
 663     from the collations of its inputs, as described below.
 664    </para>
 665
 666    <para>
 667     The collation of an expression can be the <quote>default</quote>
 668     collation, which means the locale settings defined for the
 669     database.  It is also possible for an expression's collation to be
 670     indeterminate.  In such cases, ordering operations and other
 671     operations that need to know the collation will fail.
 672    </para>
 673
 674    <para>
 675     When the database system has to perform an ordering or a character
 676     classification, it uses the collation of the input expression.  This
 677     happens, for example, with <literal>ORDER BY</literal> clauses
 678     and function or operator calls such as <literal>&lt;</literal>.
 679     The collation to apply for an <literal>ORDER BY</literal> clause
 680     is simply the collation of the sort key.  The collation to apply for a
 681     function or operator call is derived from the arguments, as described
 682     below.  In addition to comparison operators, collations are taken into
 683     account by functions that convert between lower and upper case
 684     letters, such as <function>lower</function>, <function>upper</function>, and
 685     <function>initcap</function>; by pattern matching operators; and by
 686     <function>to_char</function> and related functions.
 687    </para>
 688
 689    <para>
 690     For a function or operator call, the collation that is derived by
 691     examining the argument collations is used at run time for performing
 692     the specified operation.  If the result of the function or operator
 693     call is of a collatable data type, the collation is also used at parse
 694     time as the defined collation of the function or operator expression,
 695     in case there is a surrounding expression that requires knowledge of
 696     its collation.
 697    </para>
 698
 699    <para>
 700     The <firstterm>collation derivation</firstterm> of an expression can be
 701     implicit or explicit.  This distinction affects how collations are
 702     combined when multiple different collations appear in an
 703     expression.  An explicit collation derivation occurs when a
 704     <literal>COLLATE</literal> clause is used; all other collation
 705     derivations are implicit.  When multiple collations need to be
 706     combined, for example in a function call, the following rules are
 707     used:
 708
 709     <orderedlist>
 710      <listitem>
 711       <para>
 712        If any input expression has an explicit collation derivation, then
 713        all explicitly derived collations among the input expressions must be
 714        the same, otherwise an error is raised.  If any explicitly
 715        derived collation is present, that is the result of the
 716        collation combination.
 717       </para>
 718      </listitem>
 719
 720      <listitem>
 721       <para>
 722        Otherwise, all input expressions must have the same implicit
 723        collation derivation or the default collation.  If any non-default
 724        collation is present, that is the result of the collation combination.
 725        Otherwise, the result is the default collation.
 726       </para>
 727      </listitem>
 728
 729      <listitem>
 730       <para>
 731        If there are conflicting non-default implicit collations among the
 732        input expressions, then the combination is deemed to have indeterminate
 733        collation.  This is not an error condition unless the particular
 734        function being invoked requires knowledge of the collation it should
 735        apply.  If it does, an error will be raised at run-time.
 736       </para>
 737      </listitem>
 738     </orderedlist>
 739
 740     For example, consider this table definition:
 741 <programlisting>
 742 CREATE TABLE test1 (
 743     a text COLLATE "de_DE",
 744     b text COLLATE "es_ES",
 745     ...
 746 );
 747 </programlisting>
 748
 749     Then in
 750 <programlisting>
 751 SELECT a &lt; 'foo' FROM test1;
 752 </programlisting>
 753     the <literal>&lt;</literal> comparison is performed according to
 754     <literal>de_DE</literal> rules, because the expression combines an
 755     implicitly derived collation with the default collation.  But in
 756 <programlisting>
 757 SELECT a &lt; ('foo' COLLATE "fr_FR") FROM test1;
 758 </programlisting>
 759     the comparison is performed using <literal>fr_FR</literal> rules,
 760     because the explicit collation derivation overrides the implicit one.
 761     Furthermore, given
 762 <programlisting>
 763 SELECT a &lt; b FROM test1;
 764 </programlisting>
 765     the parser cannot determine which collation to apply, since the
 766     <structfield>a</structfield> and <structfield>b</structfield> columns have conflicting
 767     implicit collations.  Since the <literal>&lt;</literal> operator
 768     does need to know which collation to use, this will result in an
 769     error.  The error can be resolved by attaching an explicit collation
 770     specifier to either input expression, thus:
 771 <programlisting>
 772 SELECT a &lt; b COLLATE "de_DE" FROM test1;
 773 </programlisting>
 774     or equivalently
 775 <programlisting>
 776 SELECT a COLLATE "de_DE" &lt; b FROM test1;
 777 </programlisting>
 778     On the other hand, the structurally similar case
 779 <programlisting>
 780 SELECT a || b FROM test1;
 781 </programlisting>
 782     does not result in an error, because the <literal>||</literal> operator
 783     does not care about collations: its result is the same regardless
 784     of the collation.
 785    </para>
 786
 787    <para>
 788     The collation assigned to a function or operator's combined input
 789     expressions is also considered to apply to the function or operator's
 790     result, if the function or operator delivers a result of a collatable
 791     data type.  So, in
 792 <programlisting>
 793 SELECT * FROM test1 ORDER BY a || 'foo';
 794 </programlisting>
 795     the ordering will be done according to <literal>de_DE</literal> rules.
 796     But this query:
 797 <programlisting>
 798 SELECT * FROM test1 ORDER BY a || b;
 799 </programlisting>
 800     results in an error, because even though the <literal>||</literal> operator
 801     doesn't need to know a collation, the <literal>ORDER BY</literal> clause does.
 802     As before, the conflict can be resolved with an explicit collation
 803     specifier:
 804 <programlisting>
 805 SELECT * FROM test1 ORDER BY a || b COLLATE "fr_FR";
 806 </programlisting>
 807    </para>
 808   </sect2>
 809
 810   <sect2 id="collation-managing">
 811    <title>Managing Collations</title>
 812
 813    <para>
 814     A collation is an SQL schema object that maps an SQL name to locales
 815     provided by libraries installed in the operating system.  A collation
 816     definition has a <firstterm>provider</firstterm> that specifies which
 817     library supplies the locale data.  One standard provider name
 818     is <literal>libc</literal>, which uses the locales provided by the
 819     operating system C library.  These are the locales used by most tools
 820     provided by the operating system.  Another provider
 821     is <literal>icu</literal>, which uses the external
 822     ICU<indexterm><primary>ICU</primary></indexterm> library.  ICU locales can only be
 823     used if support for ICU was configured when PostgreSQL was built.
 824    </para>
 825
 826    <para>
 827     A collation object provided by <literal>libc</literal> maps to a
 828     combination of <symbol>LC_COLLATE</symbol> and <symbol>LC_CTYPE</symbol>
 829     settings, as accepted by the <literal>setlocale()</literal> system library call.  (As
 830     the name would suggest, the main purpose of a collation is to set
 831     <symbol>LC_COLLATE</symbol>, which controls the sort order.  But
 832     it is rarely necessary in practice to have an
 833     <symbol>LC_CTYPE</symbol> setting that is different from
 834     <symbol>LC_COLLATE</symbol>, so it is more convenient to collect
 835     these under one concept than to create another infrastructure for
 836     setting <symbol>LC_CTYPE</symbol> per expression.)  Also,
 837     a <literal>libc</literal> collation
 838     is tied to a character set encoding (see <xref linkend="multibyte"/>).
 839     The same collation name may exist for different encodings.
 840    </para>
 841
 842    <para>
 843     A collation object provided by <literal>icu</literal> maps to a named
 844     collator provided by the ICU library.  ICU does not support
 845     separate <quote>collate</quote> and <quote>ctype</quote> settings, so
 846     they are always the same.  Also, ICU collations are independent of the
 847     encoding, so there is always only one ICU collation of a given name in
 848     a database.
 849    </para>
 850
 851    <sect3 id="collation-managing-standard">
 852     <title>Standard Collations</title>
 853
 854    <para>
 855     On all platforms, the following collations are supported:
 856
 857     <variablelist>
 858      <varlistentry>
 859       <term><literal>unicode</literal></term>
 860       <listitem>
 861        <para>
 862         This SQL standard collation sorts using the Unicode Collation
 863         Algorithm with the Default Unicode Collation Element Table.  It is
 864         available in all encodings.  ICU support is required to use this
 865         collation, and behavior may change if <productname>PostgreSQL</productname> is built with a
 866         different version of ICU.  (This collation has the same behavior as
 867         the ICU root locale; see <xref
 868         linkend="collation-managing-predefined-icu-und-x-icu"/>.)
 869        </para>
 870       </listitem>
 871      </varlistentry>
 872
 873      <varlistentry>
 874       <term><literal>ucs_basic</literal></term>
 875       <listitem>
 876        <para>
 877         This SQL standard collation sorts using the Unicode code point values
 878         rather than natural language order, and only the ASCII letters
 879         <quote><literal>A</literal></quote> through
 880         <quote><literal>Z</literal></quote> are treated as letters.  The
 881         behavior is efficient and stable across all versions.  Only available
 882         for encoding <literal>UTF8</literal>.  (This collation has the same
 883         behavior as the libc locale specification <literal>C</literal> in
 884         <literal>UTF8</literal> encoding.)
 885        </para>
 886       </listitem>
 887      </varlistentry>
 888
 889      <varlistentry>
 890       <term><literal>pg_c_utf8</literal></term>
 891       <listitem>
 892        <para>
 893         This collation sorts by Unicode code point values rather than natural
 894         language order.  For the functions <function>lower</function>,
 895         <function>initcap</function>, and <function>upper</function>, it uses
 896         Unicode simple case mapping.  For pattern matching (including regular
 897         expressions), it uses the POSIX Compatible variant of Unicode <ulink
 898         url="https://www.unicode.org/reports/tr18/#Compatibility_Properties">Compatibility
 899         Properties</ulink>.  Behavior is efficient and stable within a
 900         <productname>PostgreSQL</productname> major version.  This collation is
 901         only available for encoding <literal>UTF8</literal>.
 902        </para>
 903       </listitem>
 904      </varlistentry>
 905
 906      <varlistentry>
 907       <term><literal>C</literal> (equivalent to <literal>POSIX</literal>)</term>
 908       <listitem>
 909        <para>
 910         The <literal>C</literal> and <literal>POSIX</literal> collations are
 911         based on <quote>traditional C</quote> behavior.  They sort by byte
 912         values rather than natural language order, and only the ASCII letters
 913         <quote><literal>A</literal></quote> through
 914         <quote><literal>Z</literal></quote> are treated as letters.  The
 915         behavior is efficient and stable across all versions for a given
 916         database encoding, but behavior may vary between different database
 917         encodings.
 918        </para>
 919       </listitem>
 920      </varlistentry>
 921
 922      <varlistentry>
 923       <term><literal>default</literal></term>
 924       <listitem>
 925        <para>
 926         The <literal>default</literal> collation selects the locale specified
 927         at database creation time.
 928        </para>
 929       </listitem>
 930      </varlistentry>
 931     </variablelist>
 932    </para>
 933
 934    <para>
 935     Additional collations may be available depending on operating system
 936     support.  The efficiency and stability of these additional collations
 937     depend on the collation provider, the provider version, and the locale.
 938    </para>
 939   </sect3>
 940
 941   <sect3 id="collation-managing-predefined">
 942    <title>Predefined Collations</title>
 943
 944    <para>
 945     If the operating system provides support for using multiple locales
 946     within a single program (<function>newlocale</function> and related functions),
 947     or if support for ICU is configured,
 948     then when a database cluster is initialized, <command>initdb</command>
 949     populates the system catalog <literal>pg_collation</literal> with
 950     collations based on all the locales it finds in the operating
 951     system at the time.
 952    </para>
 953
 954    <para>
 955     To inspect the currently available locales, use the query <literal>SELECT
 956     * FROM pg_collation</literal>, or the command <command>\dOS+</command>
 957     in <application>psql</application>.
 958    </para>
 959
 960   <sect4 id="collation-managing-predefined-libc">
 961    <title>libc Collations</title>
 962
 963    <para>
 964     For example, the operating system might
 965     provide a locale named <literal>de_DE.utf8</literal>.
 966     <command>initdb</command> would then create a collation named
 967     <literal>de_DE.utf8</literal> for encoding <literal>UTF8</literal>
 968     that has both <symbol>LC_COLLATE</symbol> and
 969     <symbol>LC_CTYPE</symbol> set to <literal>de_DE.utf8</literal>.
 970     It will also create a collation with the <literal>.utf8</literal>
 971     tag stripped off the name.  So you could also use the collation
 972     under the name <literal>de_DE</literal>, which is less cumbersome
 973     to write and makes the name less encoding-dependent.  Note that,
 974     nevertheless, the initial set of collation names is
 975     platform-dependent.
 976    </para>
 977
 978    <para>
 979     The default set of collations provided by <literal>libc</literal> map
 980     directly to the locales installed in the operating system, which can be
 981     listed using the command <literal>locale -a</literal>.  In case
 982     a <literal>libc</literal> collation is needed that has different values
 983     for <symbol>LC_COLLATE</symbol> and <symbol>LC_CTYPE</symbol>, or if new
 984     locales are installed in the operating system after the database system
 985     was initialized, then a new collation may be created using
 986     the <xref linkend="sql-createcollation"/> command.
 987     New operating system locales can also be imported en masse using
 988     the <link linkend="functions-admin-collation"><function>pg_import_system_collations()</function></link> function.
 989    </para>
 990
 991    <para>
 992     Within any particular database, only collations that use that
 993     database's encoding are of interest.  Other entries in
 994     <literal>pg_collation</literal> are ignored.  Thus, a stripped collation
 995     name such as <literal>de_DE</literal> can be considered unique
 996     within a given database even though it would not be unique globally.
 997     Use of the stripped collation names is recommended, since it will
 998     make one fewer thing you need to change if you decide to change to
 999     another database encoding.  Note however that the <literal>default</literal>,
1000     <literal>C</literal>, and <literal>POSIX</literal> collations can be used regardless of
1001     the database encoding.
1002    </para>
1003
1004    <para>
1005     <productname>PostgreSQL</productname> considers distinct collation
1006     objects to be incompatible even when they have identical properties.
1007     Thus for example,
1008 <programlisting>
1009 SELECT a COLLATE "C" &lt; b COLLATE "POSIX" FROM test1;
1010 </programlisting>
1011     will draw an error even though the <literal>C</literal> and <literal>POSIX</literal>
1012     collations have identical behaviors.  Mixing stripped and non-stripped
1013     collation names is therefore not recommended.
1014    </para>
1015   </sect4>
1016
1017   <sect4 id="collation-managing-predefined-icu">
1018    <title>ICU Collations</title>
1019
1020    <para>
1021     With ICU, it is not sensible to enumerate all possible locale names.  ICU
1022     uses a particular naming system for locales, but there are many more ways
1023     to name a locale than there are actually distinct locales.
1024     <command>initdb</command> uses the ICU APIs to extract a set of distinct
1025     locales to populate the initial set of collations.  Collations provided by
1026     ICU are created in the SQL environment with names in BCP 47 language tag
1027     format, with a <quote>private use</quote>
1028     extension <literal>-x-icu</literal> appended, to distinguish them from
1029     libc locales.
1030    </para>
1031
1032    <para>
1033     Here are some example collations that might be created:
1034
1035     <variablelist>
1036      <varlistentry id="collation-managing-predefined-icu-de-x-icu">
1037       <term><literal>de-x-icu</literal></term>
1038       <listitem>
1039        <para>German collation, default variant</para>
1040       </listitem>
1041      </varlistentry>
1042
1043      <varlistentry id="collation-managing-predefined-icu-de-at-x-icu">
1044       <term><literal>de-AT-x-icu</literal></term>
1045       <listitem>
1046        <para>German collation for Austria, default variant</para>
1047        <para>
1048         (There are also, say, <literal>de-DE-x-icu</literal>
1049         or <literal>de-CH-x-icu</literal>, but as of this writing, they are
1050         equivalent to <literal>de-x-icu</literal>.)
1051        </para>
1052       </listitem>
1053      </varlistentry>
1054
1055      <varlistentry id="collation-managing-predefined-icu-und-x-icu">
1056       <term><literal>und-x-icu</literal> (for <quote>undefined</quote>)</term>
1057       <listitem>
1058        <para>
1059         ICU <quote>root</quote> collation.  Use this to get a reasonable
1060         language-agnostic sort order.
1061        </para>
1062       </listitem>
1063      </varlistentry>
1064     </variablelist>
1065    </para>
1066
1067    <para>
1068     Some (less frequently used) encodings are not supported by ICU.  When the
1069     database encoding is one of these, ICU collation entries
1070     in <literal>pg_collation</literal> are ignored.  Attempting to use one
1071     will draw an error along the lines of <quote>collation "de-x-icu" for
1072     encoding "WIN874" does not exist</quote>.
1073    </para>
1074   </sect4>
1075   </sect3>
1076
1077   <sect3 id="collation-create">
1078    <title>Creating New Collation Objects</title>
1079
1080    <para>
1081     If the standard and predefined collations are not sufficient, users can
1082     create their own collation objects using the SQL
1083     command <xref linkend="sql-createcollation"/>.
1084    </para>
1085
1086    <para>
1087     The standard and predefined collations are in the
1088     schema <literal>pg_catalog</literal>, like all predefined objects.
1089     User-defined collations should be created in user schemas.  This also
1090     ensures that they are saved by <command>pg_dump</command>.
1091    </para>
1092
1093    <sect4 id="collation-managing-create-libc">
1094     <title>libc Collations</title>
1095
1096     <para>
1097      New libc collations can be created like this:
1098 <programlisting>
1099 CREATE COLLATION german (provider = libc, locale = 'de_DE');
1100 </programlisting>
1101      The exact values that are acceptable for the <literal>locale</literal>
1102      clause in this command depend on the operating system.  On Unix-like
1103      systems, the command <literal>locale -a</literal> will show a list.
1104     </para>
1105
1106     <para>
1107      Since the predefined libc collations already include all collations
1108      defined in the operating system when the database instance is
1109      initialized, it is not often necessary to manually create new ones.
1110      Reasons might be if a different naming system is desired (in which case
1111      see also <xref linkend="collation-copy"/>) or if the operating system has
1112      been upgraded to provide new locale definitions (in which case see
1113      also <link linkend="functions-admin-collation"><function>pg_import_system_collations()</function></link>).
1114     </para>
1115    </sect4>
1116
1117    <sect4 id="collation-managing-create-icu">
1118     <title>ICU Collations</title>
1119
1120     <para>
1121      ICU collations can be created like:
1122
1123 <programlisting>
1124 CREATE COLLATION german (provider = icu, locale = 'de-DE');
1125 </programlisting>
1126
1127      ICU locales are specified as a BCP 47 <link
1128      linkend="icu-language-tag">Language Tag</link>, but can also accept most
1129      libc-style locale names. If possible, libc-style locale names are
1130      transformed into language tags.
1131     </para>
1132     <para>
1133      New ICU collations can customize collation behavior extensively by
1134      including collation attributes in the language tag. See <xref
1135      linkend="icu-custom-collations"/> for details and examples.
1136     </para>
1137    </sect4>
1138    <sect4 id="collation-copy">
1139    <title>Copying Collations</title>
1140
1141    <para>
1142     The command <xref linkend="sql-createcollation"/> can also be used to
1143     create a new collation from an existing collation, which can be useful to
1144     be able to use operating-system-independent collation names in
1145     applications, create compatibility names, or use an ICU-provided collation
1146     under a more readable name.  For example:
1147 <programlisting>
1148 CREATE COLLATION german FROM "de_DE";
1149 CREATE COLLATION french FROM "fr-x-icu";
1150 </programlisting>
1151    </para>
1152    </sect4>
1153    </sect3>
1154
1155    <sect3 id="collation-nondeterministic">
1156     <title>Nondeterministic Collations</title>
1157
1158     <para>
1159      A collation is either <firstterm>deterministic</firstterm> or
1160      <firstterm>nondeterministic</firstterm>.  A deterministic collation uses
1161      deterministic comparisons, which means that it considers strings to be
1162      equal only if they consist of the same byte sequence.  Nondeterministic
1163      comparison may determine strings to be equal even if they consist of
1164      different bytes.  Typical situations include case-insensitive comparison,
1165      accent-insensitive comparison, as well as comparison of strings in
1166      different Unicode normal forms.  It is up to the collation provider to
1167      actually implement such insensitive comparisons; the deterministic flag
1168      only determines whether ties are to be broken using bytewise comparison.
1169      See also <ulink url="https://www.unicode.org/reports/tr10">Unicode Technical
1170      Standard 10</ulink> for more information on the terminology.
1171     </para>
1172
1173     <para>
1174      To create a nondeterministic collation, specify the property
1175      <literal>deterministic = false</literal> to <command>CREATE
1176      COLLATION</command>, for example:
1177 <programlisting>
1178 CREATE COLLATION ndcoll (provider = icu, locale = 'und', deterministic = false);
1179 </programlisting>
1180      This example would use the standard Unicode collation in a
1181      nondeterministic way.  In particular, this would allow strings in
1182      different normal forms to be compared correctly.  More interesting
1183      examples make use of the ICU customization facilities explained above.
1184      For example:
1185 <programlisting>
1186 CREATE COLLATION case_insensitive (provider = icu, locale = 'und-u-ks-level2', deterministic = false);
1187 CREATE COLLATION ignore_accents (provider = icu, locale = 'und-u-ks-level1-kc-true', deterministic = false);
1188 </programlisting>
1189     </para>
1190
1191     <para>
1192      All standard and predefined collations are deterministic, all
1193      user-defined collations are deterministic by default.  While
1194      nondeterministic collations give a more <quote>correct</quote> behavior,
1195      especially when considering the full power of Unicode and its many
1196      special cases, they also have some drawbacks.  Foremost, their use leads
1197      to a performance penalty.  Note, in particular, that B-tree cannot use
1198      deduplication with indexes that use a nondeterministic collation.  Also,
1199      certain operations are not possible with nondeterministic collations,
1200      such as some pattern matching operations.  Therefore, they should be used
1201      only in cases where they are specifically wanted.
1202     </para>
1203
1204     <tip>
1205      <para>
1206       To deal with text in different Unicode normalization forms, it is also
1207       an option to use the functions/expressions
1208       <function>normalize</function> and <literal>is normalized</literal> to
1209       preprocess or check the strings, instead of using nondeterministic
1210       collations.  There are different trade-offs for each approach.
1211      </para>
1212     </tip>
1213    </sect3>
1214   </sect2>
1215
1216   <sect2 id="icu-custom-collations">
1217    <title>ICU Custom Collations</title>
1218
1219    <para>
1220     ICU allows extensive control over collation behavior by defining new
1221     collations with collation settings as a part of the language tag. These
1222     settings can modify the collation order to suit a variety of needs. For
1223     instance:
1224
1225 <programlisting>
1226 -- ignore differences in accents and case
1227 CREATE COLLATION ignore_accent_case (provider = icu, deterministic = false, locale = 'und-u-ks-level1');
1228 SELECT 'Å' = 'A' COLLATE ignore_accent_case; -- true
1229 SELECT 'z' = 'Z' COLLATE ignore_accent_case; -- true
1230
1231 -- upper case letters sort before lower case.
1232 CREATE COLLATION upper_first (provider = icu, locale = 'und-u-kf-upper');
1233 SELECT 'B' &lt; 'b' COLLATE upper_first; -- true
1234
1235 -- treat digits numerically and ignore punctuation
1236 CREATE COLLATION num_ignore_punct (provider = icu, deterministic = false, locale = 'und-u-ka-shifted-kn');
1237 SELECT 'id-45' &lt; 'id-123' COLLATE num_ignore_punct; -- true
1238 SELECT 'w;x*y-z' = 'wxyz' COLLATE num_ignore_punct; -- true
1239 </programlisting>
1240
1241     Many of the available options are described in <xref
1242     linkend="icu-collation-settings"/>, or see <xref
1243     linkend="icu-external-references"/> for more details.
1244    </para>
1245
1246    <sect3 id="icu-collation-comparison-levels">
1247     <title>ICU Comparison Levels</title>
1248
1249     <para>
1250      Comparison of two strings (collation) in ICU is determined by a
1251      multi-level process, where textual features are grouped into
1252      "levels". Treatment of each level is controlled by the <link
1253      linkend="icu-collation-settings-table">collation settings</link>. Higher
1254      levels correspond to finer textual features.
1255     </para>
1256
1257     <para>
1258      <xref linkend="icu-collation-levels"/> shows which textual feature
1259      differences are considered significant when determining equality at the
1260      given level. The Unicode character <literal>U+2063</literal> is an
1261      invisible separator, and as seen in the table, is ignored for at all
1262      levels of comparison less than <literal>identic</literal>.
1263     </para>
1264
1265      <table id="icu-collation-levels">
1266       <title>ICU Collation Levels</title>
1267       <tgroup cols="8">
1268        <colspec colname="col1" colwidth="1*"/>
1269        <colspec colname="col2" colwidth="1.25*"/>
1270        <colspec colname="col3" colwidth="1*"/>
1271        <colspec colname="col4" colwidth="1*"/>
1272        <colspec colname="col5" colwidth="1*"/>
1273        <colspec colname="col6" colwidth="1*"/>
1274        <colspec colname="col7" colwidth="1*"/>
1275        <colspec colname="col8" colwidth="1*"/>
1276
1277        <thead>
1278         <row>
1279          <entry>Level</entry>
1280          <entry>Description</entry>
1281          <entry><literal>'f' = 'f'</literal></entry>
1282          <entry><literal>'ab' = U&amp;'a\2063b'</literal></entry>
1283          <entry><literal>'x-y' = 'x_y'</literal></entry>
1284          <entry><literal>'g' = 'G'</literal></entry>
1285          <entry><literal>'n' = 'ñ'</literal></entry>
1286          <entry><literal>'y' = 'z'</literal></entry>
1287         </row>
1288        </thead>
1289
1290        <tbody>
1291         <row>
1292          <entry>level1</entry>
1293          <entry>Base Character</entry>
1294          <entry><literal>true</literal></entry>
1295          <entry><literal>true</literal></entry>
1296          <entry><literal>true</literal></entry>
1297          <entry><literal>true</literal></entry>
1298          <entry><literal>true</literal></entry>
1299          <entry><literal>false</literal></entry>
1300         </row>
1301         <row>
1302          <entry>level2</entry>
1303          <entry>Accents</entry>
1304          <entry><literal>true</literal></entry>
1305          <entry><literal>true</literal></entry>
1306          <entry><literal>true</literal></entry>
1307          <entry><literal>true</literal></entry>
1308          <entry><literal>false</literal></entry>
1309          <entry><literal>false</literal></entry>
1310         </row>
1311         <row>
1312          <entry>level3</entry>
1313          <entry>Case/Variants</entry>
1314          <entry><literal>true</literal></entry>
1315          <entry><literal>true</literal></entry>
1316          <entry><literal>true</literal></entry>
1317          <entry><literal>false</literal></entry>
1318          <entry><literal>false</literal></entry>
1319          <entry><literal>false</literal></entry>
1320         </row>
1321         <row>
1322          <entry>level4</entry>
1323          <entry>Punctuation<footnote><para>only with
1324          <literal>ka-shifted</literal>; see <xref
1325          linkend="icu-collation-settings-table"/></para></footnote></entry>
1326          <entry><literal>true</literal></entry>
1327          <entry><literal>true</literal></entry>
1328          <entry><literal>false</literal></entry>
1329          <entry><literal>false</literal></entry>
1330          <entry><literal>false</literal></entry>
1331          <entry><literal>false</literal></entry>
1332         </row>
1333         <row>
1334          <entry>identic</entry>
1335          <entry>All</entry>
1336          <entry><literal>true</literal></entry>
1337          <entry><literal>false</literal></entry>
1338          <entry><literal>false</literal></entry>
1339          <entry><literal>false</literal></entry>
1340          <entry><literal>false</literal></entry>
1341          <entry><literal>false</literal></entry>
1342         </row>
1343        </tbody>
1344       </tgroup>
1345      </table>
1346
1347     <para>
1348      At every level, even with full normalization off, basic normalization is
1349      performed. For example, <literal>'á'</literal> may be composed of the
1350      code points <literal>U&amp;'\0061\0301'</literal> or the single code
1351      point <literal>U&amp;'\00E1'</literal>, and those sequences will be
1352      considered equal even at the <literal>identic</literal> level. To treat
1353      any difference in code point representation as distinct, use a collation
1354      created with <symbol>deterministic</symbol> set to
1355      <literal>true</literal>.
1356     </para>
1357
1358     <sect4 id="icu-collation-level-examples">
1359      <title>Collation Level Examples</title>
1360
1361 <programlisting>
1362 CREATE COLLATION level3 (provider = icu, deterministic = false, locale = 'und-u-ka-shifted-ks-level3');
1363 CREATE COLLATION level4 (provider = icu, deterministic = false, locale = 'und-u-ka-shifted-ks-level4');
1364 CREATE COLLATION identic (provider = icu, deterministic = false, locale = 'und-u-ka-shifted-ks-identic');
1365
1366 -- invisible separator ignored at all levels except identic
1367 SELECT 'ab' = U&amp;'a\2063b' COLLATE level4; -- true
1368 SELECT 'ab' = U&amp;'a\2063b' COLLATE identic; -- false
1369
1370 -- punctuation ignored at level3 but not at level 4
1371 SELECT 'x-y' = 'x_y' COLLATE level3; -- true
1372 SELECT 'x-y' = 'x_y' COLLATE level4; -- false
1373 </programlisting>
1374
1375     </sect4>
1376    </sect3>
1377
1378    <sect3 id="icu-collation-settings">
1379     <title>Collation Settings for an ICU Locale</title>
1380
1381     <para>
1382      <xref linkend="icu-collation-settings-table"/> shows the available
1383      collation settings, which can be used as part of a language tag to
1384      customize a collation.
1385     </para>
1386
1387      <table id="icu-collation-settings-table">
1388       <title>ICU Collation Settings</title>
1389       <tgroup cols="4">
1390        <colspec colname="col1" colwidth="1*"/>
1391        <colspec colname="col2" colwidth="2*"/>
1392        <colspec colname="col3" colwidth="2*"/>
1393        <colspec colname="col4" colwidth="5*"/>
1394
1395        <thead>
1396         <row>
1397          <entry>Key</entry>
1398          <entry>Values</entry>
1399          <entry>Default</entry>
1400          <entry>Description</entry>
1401         </row>
1402        </thead>
1403
1404        <tbody>
1405         <row>
1406          <entry><literal>co</literal></entry>
1407          <entry><literal>emoji</literal>, <literal>phonebk</literal>, <literal>standard</literal>, <replaceable>...</replaceable></entry>
1408          <entry><literal>standard</literal></entry>
1409          <entry>
1410           Collation type. See <xref linkend="icu-external-references"/> for additional options and details.
1411          </entry>
1412         </row>
1413
1414         <row>
1415          <entry><literal>ka</literal></entry>
1416          <entry><literal>noignore</literal>, <literal>shifted</literal></entry>
1417          <entry><literal>noignore</literal></entry>
1418          <entry>
1419           If set to <literal>shifted</literal>, causes some characters
1420           (e.g. punctuation or space) to be ignored in comparison. Key
1421           <literal>ks</literal> must be set to <literal>level3</literal> or
1422           lower to take effect. Set key <literal>kv</literal> to control which
1423           character classes are ignored.
1424          </entry>
1425         </row>
1426
1427         <row>
1428          <entry><literal>kb</literal></entry>
1429          <entry><literal>true</literal>, <literal>false</literal></entry>
1430          <entry><literal>false</literal></entry>
1431          <entry>
1432           Backwards comparison for the level 2 differences. For example,
1433           locale <literal>und-u-kb</literal> sorts <literal>'àe'</literal>
1434           before <literal>'aé'</literal>.
1435          </entry>
1436         </row>
1437
1438         <row>
1439          <entry><literal>kc</literal></entry>
1440          <entry><literal>true</literal>, <literal>false</literal></entry>
1441          <entry><literal>false</literal></entry>
1442          <entry>
1443           <para>
1444            Separates case into a "level 2.5" that falls between accents and
1445            other level 3 features.
1446           </para>
1447           <para>
1448            If set to <literal>true</literal> and <literal>ks</literal> is set
1449            to <literal>level1</literal>, will ignore accents but take case
1450            into account.
1451           </para>
1452          </entry>
1453         </row>
1454
1455         <row>
1456          <entry><literal>kf</literal></entry>
1457          <entry>
1458           <literal>upper</literal>, <literal>lower</literal>,
1459           <literal>false</literal>
1460          </entry>
1461          <entry><literal>false</literal></entry>
1462          <entry>
1463           If set to <literal>upper</literal>, upper case sorts before lower
1464           case. If set to <literal>lower</literal>, lower case sorts before
1465           upper case. If set to <literal>false</literal>, the sort depends on
1466           the rules of the locale.
1467          </entry>
1468         </row>
1469
1470         <row>
1471          <entry><literal>kn</literal></entry>
1472          <entry><literal>true</literal>, <literal>false</literal></entry>
1473          <entry><literal>false</literal></entry>
1474          <entry>
1475           If set to <literal>true</literal>, numbers within a string are
1476           treated as a single numeric value rather than a sequence of
1477           digits. For example, <literal>'id-45'</literal> sorts before
1478           <literal>'id-123'</literal>.
1479          </entry>
1480         </row>
1481
1482         <row>
1483          <entry><literal>kk</literal></entry>
1484          <entry><literal>true</literal>, <literal>false</literal></entry>
1485          <entry><literal>false</literal></entry>
1486          <entry>
1487           <para>
1488            Enable full normalization; may affect performance. Basic
1489            normalization is performed even when set to
1490            <literal>false</literal>. Locales for languages that require full
1491            normalization typically enable it by default.
1492           </para>
1493           <para>
1494            Full normalization is important in some cases, such as when
1495            multiple accents are applied to a single character. For example,
1496            the code point sequences <literal>U&amp;'\0065\0323\0302'</literal>
1497            and <literal>U&amp;'\0065\0302\0323'</literal> represent
1498            an <literal>e</literal> with circumflex and dot-below accents
1499            applied in different orders. With full normalization
1500            on, these code point sequences are treated as equal; otherwise they
1501            are unequal.
1502           </para>
1503          </entry>
1504         </row>
1505
1506         <row>
1507          <entry><literal>kr</literal></entry>
1508          <entry>
1509           <literal>space</literal>, <literal>punct</literal>,
1510           <literal>symbol</literal>, <literal>currency</literal>,
1511           <literal>digit</literal>, <replaceable>script-id</replaceable>
1512          </entry>
1513          <entry></entry>
1514          <entry>
1515           <para>
1516            Set to one or more of the valid values, or any BCP 47
1517            <replaceable>script-id</replaceable>, e.g. <literal>latn</literal>
1518            ("Latin") or <literal>grek</literal> ("Greek"). Multiple values are
1519            separated by "<literal>-</literal>".
1520           </para>
1521           <para>
1522            Redefines the ordering of classes of characters; those characters
1523            belonging to a class earlier in the list sort before characters
1524            belonging to a class later in the list. For instance, the value
1525            <literal>digit-currency-space</literal> (as part of a language tag
1526            like <literal>und-u-kr-digit-currency-space</literal>) sorts
1527            punctuation before digits and spaces.
1528           </para>
1529          </entry>
1530         </row>
1531
1532         <row>
1533          <entry><literal>ks</literal></entry>
1534          <entry><literal>level1</literal>, <literal>level2</literal>, <literal>level3</literal>, <literal>level4</literal>, <literal>identic</literal></entry>
1535          <entry><literal>level3</literal></entry>
1536          <entry>
1537           Sensitivity (or "strength") when determining equality, with
1538           <literal>level1</literal> the least sensitive to differences and
1539           <literal>identic</literal> the most sensitive to differences. See
1540           <xref linkend="icu-collation-levels"/> for details.
1541          </entry>
1542         </row>
1543
1544         <row>
1545          <entry><literal>kv</literal></entry>
1546          <entry>
1547           <literal>space</literal>, <literal>punct</literal>,
1548           <literal>symbol</literal>, <literal>currency</literal>
1549          </entry>
1550          <entry><literal>punct</literal></entry>
1551          <entry>
1552           Classes of characters ignored during comparison at level 3. Setting
1553           to a later value includes earlier values;
1554           e.g. <literal>symbol</literal> also includes
1555           <literal>punct</literal> and <literal>space</literal> in the
1556           characters to be ignored. Key <literal>ka</literal> must be set to
1557           <literal>shifted</literal> and key <literal>ks</literal> must be set
1558           to <literal>level3</literal> or lower to take effect.
1559          </entry>
1560         </row>
1561        </tbody>
1562       </tgroup>
1563      </table>
1564
1565     <para>
1566      Defaults may depend on locale. The above table is not meant to be
1567      complete. See <xref linkend="icu-external-references"/> for additional
1568      options and details.
1569     </para>
1570
1571     <note>
1572      <para>
1573       For many collation settings, you must create the collation with
1574       <option>deterministic</option> set to <literal>false</literal> for the
1575       setting to have the desired effect (see <xref
1576       linkend="collation-nondeterministic"/>). Additionally, some settings
1577       only take effect when the key <literal>ka</literal> is set to
1578       <literal>shifted</literal> (see <xref
1579       linkend="icu-collation-settings-table"/>).
1580      </para>
1581     </note>
1582    </sect3>
1583
1584    <sect3 id="icu-locale-examples">
1585     <title>Collation Settings Examples</title>
1586
1587      <variablelist>
1588       <varlistentry id="collation-managing-create-icu-de-u-co-phonebk-x-icu">
1589        <term><literal>CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, locale = 'de-u-co-phonebk');</literal></term>
1590        <listitem>
1591         <para>German collation with phone book collation type</para>
1592        </listitem>
1593       </varlistentry>
1594
1595       <varlistentry id="collation-managing-create-icu-und-u-co-emoji-x-icu">
1596        <term><literal>CREATE COLLATION "und-u-co-emoji-x-icu" (provider = icu, locale = 'und-u-co-emoji');</literal></term>
1597        <listitem>
1598         <para>
1599          Root collation with Emoji collation type, per Unicode Technical Standard #51
1600         </para>
1601        </listitem>
1602       </varlistentry>
1603
1604       <varlistentry id="collation-managing-create-icu-en-u-kr-grek-latn">
1605        <term><literal>CREATE COLLATION latinlast (provider = icu, locale = 'en-u-kr-grek-latn');</literal></term>
1606        <listitem>
1607         <para>
1608          Sort Greek letters before Latin ones.  (The default is Latin before Greek.)
1609         </para>
1610        </listitem>
1611       </varlistentry>
1612
1613       <varlistentry id="collation-managing-create-icu-en-u-kf-upper">
1614        <term><literal>CREATE COLLATION upperfirst (provider = icu, locale = 'en-u-kf-upper');</literal></term>
1615        <listitem>
1616         <para>
1617          Sort upper-case letters before lower-case letters.  (The default is
1618          lower-case letters first.)
1619         </para>
1620        </listitem>
1621       </varlistentry>
1622
1623       <varlistentry id="collation-managing-create-icu-en-u-kf-upper-kr-grek-latn">
1624        <term><literal>CREATE COLLATION special (provider = icu, locale = 'en-u-kf-upper-kr-grek-latn');</literal></term>
1625        <listitem>
1626         <para>
1627          Combines both of the above options.
1628         </para>
1629        </listitem>
1630       </varlistentry>
1631      </variablelist>
1632    </sect3>
1633
1634    <sect3 id="icu-tailoring-rules">
1635     <title>ICU Tailoring Rules</title>
1636
1637     <para>
1638      If the options provided by the collation settings shown above are not
1639      sufficient, the order of collation elements can be changed with tailoring
1640      rules, whose syntax is detailed at <ulink
1641      url="https://unicode-org.github.io/icu/userguide/collation/customization/"></ulink>.
1642     </para>
1643
1644     <para>
1645      This small example creates a collation based on the root locale with a
1646      tailoring rule:
1647 <programlisting>
1648 <![CDATA[CREATE COLLATION custom (provider = icu, locale = 'und', rules = '&V << w <<< W');]]>
1649 </programlisting>
1650      With this rule, the letter <quote>W</quote> is sorted after
1651      <quote>V</quote>, but is treated as a secondary difference similar to an
1652      accent.  Rules like this are contained in the locale definitions of some
1653      languages.  (Of course, if a locale definition already contains the
1654      desired rules, then they don't need to be specified again explicitly.)
1655     </para>
1656
1657     <para>
1658      Here is a more complex example.  The following statement sets up a
1659      collation named <literal>ebcdic</literal> with rules to sort US-ASCII
1660      characters in the order of the EBCDIC encoding.
1661
1662 <programlisting>
1663 <![CDATA[CREATE COLLATION ebcdic (provider = icu, locale = 'und',
1664 rules = $$
1665 & ' ' < '.' < '<' < '(' < '+' < \|
1666 < '&' < '!' < '$' < '*' < ')' < ';'
1667 < '-' < '/' < ',' < '%' < '_' < '>' < '?'
1668 < '`' < ':' < '#' < '@' < \' < '=' < '"'
1669 <*a-r < '~' <*s-z < '^' < '[' < ']'
1670 < '{' <*A-I < '}' <*J-R < '\' <*S-Z <*0-9
1671 $$);]]>
1672
1673 SELECT c
1674 FROM (VALUES ('a'), ('b'), ('A'), ('B'), ('1'), ('2'), ('!'), ('^')) AS x(c)
1675 ORDER BY c COLLATE ebcdic;
1676  c
1677 ---
1678  !
1679  a
1680  b
1681  ^
1682  A
1683  B
1684  1
1685  2
1686 </programlisting>
1687     </para>
1688    </sect3>
1689
1690    <sect3 id="icu-external-references">
1691     <title>External References for ICU</title>
1692
1693     <para>
1694      This section (<xref linkend="icu-custom-collations"/>) is only a brief
1695      overview of ICU behavior and language tags. Refer to the following
1696      documents for technical details, additional options, and new behavior:
1697     </para>
1698
1699     <itemizedlist>
1700      <listitem>
1701       <para>
1702        <ulink url="https://www.unicode.org/reports/tr35/tr35-collation.html">Unicode Technical Standard #35</ulink>
1703       </para>
1704      </listitem>
1705      <listitem>
1706       <para>
1707        <ulink url="https://www.rfc-editor.org/info/bcp47">BCP 47</ulink>
1708       </para>
1709      </listitem>
1710      <listitem>
1711       <para>
1712        <ulink url="https://github.com/unicode-org/cldr/blob/master/common/bcp47/collation.xml">CLDR repository</ulink>
1713       </para>
1714      </listitem>
1715      <listitem>
1716       <para>
1717        <ulink url="https://unicode-org.github.io/icu/userguide/locale/"></ulink>
1718       </para>
1719      </listitem>
1720      <listitem>
1721       <para>
1722        <ulink url="https://unicode-org.github.io/icu/userguide/collation/"></ulink>
1723       </para>
1724      </listitem>
1725     </itemizedlist>
1726    </sect3>
1727   </sect2>
1728  </sect1>
1729
1730  <sect1 id="multibyte">
1731   <title>Character Set Support</title>
1732
1733   <indexterm zone="multibyte"><primary>character set</primary></indexterm>
1734
1735   <para>
1736    The character set support in <productname>PostgreSQL</productname>
1737    allows you to store text in a variety of character sets (also called
1738    encodings), including
1739    single-byte character sets such as the ISO 8859 series and
1740    multiple-byte character sets such as <acronym>EUC</acronym> (Extended Unix
1741    Code), UTF-8, and Mule internal code.  All supported character sets
1742    can be used transparently by clients, but a few are not supported
1743    for use within the server (that is, as a server-side encoding).
1744    The default character set is selected while
1745    initializing your <productname>PostgreSQL</productname> database
1746    cluster using <command>initdb</command>.  It can be overridden when you
1747    create a database, so you can have multiple
1748    databases each with a different character set.
1749   </para>
1750
1751   <para>
1752    An important restriction, however, is that each database's character set
1753    must be compatible with the database's <envar>LC_CTYPE</envar> (character
1754    classification) and <envar>LC_COLLATE</envar> (string sort order) locale
1755    settings. For <literal>C</literal> or
1756    <literal>POSIX</literal> locale, any character set is allowed, but for other
1757    libc-provided locales there is only one character set that will work
1758    correctly.
1759    (On Windows, however, UTF-8 encoding can be used with any locale.)
1760    If you have ICU support configured, ICU-provided locales can be used
1761    with most but not all server-side encodings.
1762   </para>
1763
1764    <sect2 id="multibyte-charset-supported">
1765     <title>Supported Character Sets</title>
1766
1767     <para>
1768      <xref linkend="charset-table"/> shows the character sets available
1769      for use in <productname>PostgreSQL</productname>.
1770     </para>
1771
1772      <table id="charset-table">
1773       <title><productname>PostgreSQL</productname> Character Sets</title>
1774       <tgroup cols="7">
1775        <colspec colname="col1" colwidth="3*"/>
1776        <colspec colname="col2" colwidth="2*"/>
1777        <colspec colname="col3" colwidth="2*"/>
1778        <colspec colname="col4" colwidth="1.25*"/>
1779        <colspec colname="col5" colwidth="1*"/>
1780        <colspec colname="col6" colwidth="1*"/>
1781        <colspec colname="col7" colwidth="2*"/>
1782        <thead>
1783         <row>
1784          <entry>Name</entry>
1785          <entry>Description</entry>
1786          <entry>Language</entry>
1787          <entry>Server?</entry>
1788          <entry>ICU?</entry>
1789          <!--
1790           The Bytes/Char field is populated by looking at the values returned
1791           by pg_wchar_table.mblen function for each encoding.
1792          -->
1793          <entry>Bytes/&zwsp;Char</entry>
1794          <entry>Aliases</entry>
1795         </row>
1796        </thead>
1797        <tbody>
1798         <row>
1799          <entry><literal>BIG5</literal></entry>
1800          <entry>Big Five</entry>
1801          <entry>Traditional Chinese</entry>
1802          <entry>No</entry>
1803          <entry>No</entry>
1804          <entry>1&ndash;2</entry>
1805          <entry><literal>WIN950</literal>, <literal>Windows950</literal></entry>
1806         </row>
1807         <row>
1808          <entry><literal>EUC_CN</literal></entry>
1809          <entry>Extended UNIX Code-CN</entry>
1810          <entry>Simplified Chinese</entry>
1811          <entry>Yes</entry>
1812          <entry>Yes</entry>
1813          <entry>1&ndash;3</entry>
1814          <entry></entry>
1815         </row>
1816         <row>
1817          <entry><literal>EUC_JP</literal></entry>
1818          <entry>Extended UNIX Code-JP</entry>
1819          <entry>Japanese</entry>
1820          <entry>Yes</entry>
1821          <entry>Yes</entry>
1822          <entry>1&ndash;3</entry>
1823          <entry></entry>
1824         </row>
1825         <row>
1826          <entry><literal>EUC_JIS_2004</literal></entry>
1827          <entry>Extended UNIX Code-JP, JIS X 0213</entry>
1828          <entry>Japanese</entry>
1829          <entry>Yes</entry>
1830          <entry>No</entry>
1831          <entry>1&ndash;3</entry>
1832          <entry></entry>
1833         </row>
1834         <row>
1835          <entry><literal>EUC_KR</literal></entry>
1836          <entry>Extended UNIX Code-KR</entry>
1837          <entry>Korean</entry>
1838          <entry>Yes</entry>
1839          <entry>Yes</entry>
1840          <entry>1&ndash;3</entry>
1841          <entry></entry>
1842         </row>
1843         <row>
1844          <entry><literal>EUC_TW</literal></entry>
1845          <entry>Extended UNIX Code-TW</entry>
1846          <entry>Traditional Chinese, Taiwanese</entry>
1847          <entry>Yes</entry>
1848          <entry>Yes</entry>
1849          <entry>1&ndash;3</entry>
1850          <entry></entry>
1851         </row>
1852         <row>
1853          <entry><literal>GB18030</literal></entry>
1854          <entry>National Standard</entry>
1855          <entry>Chinese</entry>
1856          <entry>No</entry>
1857          <entry>No</entry>
1858          <entry>1&ndash;4</entry>
1859          <entry></entry>
1860         </row>
1861         <row>
1862          <entry><literal>GBK</literal></entry>
1863          <entry>Extended National Standard</entry>
1864          <entry>Simplified Chinese</entry>
1865          <entry>No</entry>
1866          <entry>No</entry>
1867          <entry>1&ndash;2</entry>
1868          <entry><literal>WIN936</literal>, <literal>Windows936</literal></entry>
1869         </row>
1870         <row>
1871          <entry><literal>ISO_8859_5</literal></entry>
1872          <entry>ISO 8859-5, <acronym>ECMA</acronym> 113</entry>
1873          <entry>Latin/Cyrillic</entry>
1874          <entry>Yes</entry>
1875          <entry>Yes</entry>
1876          <entry>1</entry>
1877          <entry></entry>
1878         </row>
1879         <row>
1880          <entry><literal>ISO_8859_6</literal></entry>
1881          <entry>ISO 8859-6, <acronym>ECMA</acronym> 114</entry>
1882          <entry>Latin/Arabic</entry>
1883          <entry>Yes</entry>
1884          <entry>Yes</entry>
1885          <entry>1</entry>
1886          <entry></entry>
1887         </row>
1888         <row>
1889          <entry><literal>ISO_8859_7</literal></entry>
1890          <entry>ISO 8859-7, <acronym>ECMA</acronym> 118</entry>
1891          <entry>Latin/Greek</entry>
1892          <entry>Yes</entry>
1893          <entry>Yes</entry>
1894          <entry>1</entry>
1895          <entry></entry>
1896         </row>
1897         <row>
1898          <entry><literal>ISO_8859_8</literal></entry>
1899          <entry>ISO 8859-8, <acronym>ECMA</acronym> 121</entry>
1900          <entry>Latin/Hebrew</entry>
1901          <entry>Yes</entry>
1902          <entry>Yes</entry>
1903          <entry>1</entry>
1904          <entry></entry>
1905         </row>
1906         <row>
1907          <entry><literal>JOHAB</literal></entry>
1908          <entry><acronym>JOHAB</acronym></entry>
1909          <entry>Korean (Hangul)</entry>
1910          <entry>No</entry>
1911          <entry>No</entry>
1912          <entry>1&ndash;3</entry>
1913          <entry></entry>
1914         </row>
1915         <row>
1916          <entry><literal>KOI8R</literal></entry>
1917          <entry><acronym>KOI</acronym>8-R</entry>
1918          <entry>Cyrillic (Russian)</entry>
1919          <entry>Yes</entry>
1920          <entry>Yes</entry>
1921          <entry>1</entry>
1922          <entry><literal>KOI8</literal></entry>
1923         </row>
1924         <row>
1925          <entry><literal>KOI8U</literal></entry>
1926          <entry><acronym>KOI</acronym>8-U</entry>
1927          <entry>Cyrillic (Ukrainian)</entry>
1928          <entry>Yes</entry>
1929          <entry>Yes</entry>
1930          <entry>1</entry>
1931          <entry></entry>
1932         </row>
1933         <row>
1934          <entry><literal>LATIN1</literal></entry>
1935          <entry>ISO 8859-1, <acronym>ECMA</acronym> 94</entry>
1936          <entry>Western European</entry>
1937          <entry>Yes</entry>
1938          <entry>Yes</entry>
1939          <entry>1</entry>
1940          <entry><literal>ISO88591</literal></entry>
1941         </row>
1942         <row>
1943          <entry><literal>LATIN2</literal></entry>
1944          <entry>ISO 8859-2, <acronym>ECMA</acronym> 94</entry>
1945          <entry>Central European</entry>
1946          <entry>Yes</entry>
1947          <entry>Yes</entry>
1948          <entry>1</entry>
1949          <entry><literal>ISO88592</literal></entry>
1950         </row>
1951         <row>
1952          <entry><literal>LATIN3</literal></entry>
1953          <entry>ISO 8859-3, <acronym>ECMA</acronym> 94</entry>
1954          <entry>South European</entry>
1955          <entry>Yes</entry>
1956          <entry>Yes</entry>
1957          <entry>1</entry>
1958          <entry><literal>ISO88593</literal></entry>
1959         </row>
1960         <row>
1961          <entry><literal>LATIN4</literal></entry>
1962          <entry>ISO 8859-4, <acronym>ECMA</acronym> 94</entry>
1963          <entry>North European</entry>
1964          <entry>Yes</entry>
1965          <entry>Yes</entry>
1966          <entry>1</entry>
1967          <entry><literal>ISO88594</literal></entry>
1968         </row>
1969         <row>
1970          <entry><literal>LATIN5</literal></entry>
1971          <entry>ISO 8859-9, <acronym>ECMA</acronym> 128</entry>
1972          <entry>Turkish</entry>
1973          <entry>Yes</entry>
1974          <entry>Yes</entry>
1975          <entry>1</entry>
1976          <entry><literal>ISO88599</literal></entry>
1977         </row>
1978         <row>
1979          <entry><literal>LATIN6</literal></entry>
1980          <entry>ISO 8859-10, <acronym>ECMA</acronym> 144</entry>
1981          <entry>Nordic</entry>
1982          <entry>Yes</entry>
1983          <entry>Yes</entry>
1984          <entry>1</entry>
1985          <entry><literal>ISO885910</literal></entry>
1986         </row>
1987         <row>
1988          <entry><literal>LATIN7</literal></entry>
1989          <entry>ISO 8859-13</entry>
1990          <entry>Baltic</entry>
1991          <entry>Yes</entry>
1992          <entry>Yes</entry>
1993          <entry>1</entry>
1994          <entry><literal>ISO885913</literal></entry>
1995         </row>
1996         <row>
1997          <entry><literal>LATIN8</literal></entry>
1998          <entry>ISO 8859-14</entry>
1999          <entry>Celtic</entry>
2000          <entry>Yes</entry>
2001          <entry>Yes</entry>
2002          <entry>1</entry>
2003          <entry><literal>ISO885914</literal></entry>
2004         </row>
2005         <row>
2006          <entry><literal>LATIN9</literal></entry>
2007          <entry>ISO 8859-15</entry>
2008          <entry>LATIN1 with Euro and accents</entry>
2009          <entry>Yes</entry>
2010          <entry>Yes</entry>
2011          <entry>1</entry>
2012          <entry><literal>ISO885915</literal></entry>
2013         </row>
2014         <row>
2015          <entry><literal>LATIN10</literal></entry>
2016          <entry>ISO 8859-16, <acronym>ASRO</acronym> SR 14111</entry>
2017          <entry>Romanian</entry>
2018          <entry>Yes</entry>
2019          <entry>No</entry>
2020          <entry>1</entry>
2021          <entry><literal>ISO885916</literal></entry>
2022         </row>
2023         <row>
2024          <entry><literal>MULE_INTERNAL</literal></entry>
2025          <entry>Mule internal code</entry>
2026          <entry>Multilingual Emacs</entry>
2027          <entry>Yes</entry>
2028          <entry>No</entry>
2029          <entry>1&ndash;4</entry>
2030          <entry></entry>
2031         </row>
2032         <row>
2033          <entry><literal>SJIS</literal></entry>
2034          <entry>Shift JIS</entry>
2035          <entry>Japanese</entry>
2036          <entry>No</entry>
2037          <entry>No</entry>
2038          <entry>1&ndash;2</entry>
2039          <entry><literal>Mskanji</literal>, <literal>ShiftJIS</literal>, <literal>WIN932</literal>, <literal>Windows932</literal></entry>
2040         </row>
2041         <row>
2042          <entry><literal>SHIFT_JIS_2004</literal></entry>
2043          <entry>Shift JIS, JIS X 0213</entry>
2044          <entry>Japanese</entry>
2045          <entry>No</entry>
2046          <entry>No</entry>
2047          <entry>1&ndash;2</entry>
2048          <entry></entry>
2049         </row>
2050         <row>
2051          <entry><literal>SQL_ASCII</literal></entry>
2052          <entry>unspecified (see text)</entry>
2053          <entry><emphasis>any</emphasis></entry>
2054          <entry>Yes</entry>
2055          <entry>No</entry>
2056          <entry>1</entry>
2057          <entry></entry>
2058         </row>
2059         <row>
2060          <entry><literal>UHC</literal></entry>
2061          <entry>Unified Hangul Code</entry>
2062          <entry>Korean</entry>
2063          <entry>No</entry>
2064          <entry>No</entry>
2065          <entry>1&ndash;2</entry>
2066          <entry><literal>WIN949</literal>, <literal>Windows949</literal></entry>
2067         </row>
2068         <row>
2069          <entry><literal>UTF8</literal></entry>
2070          <entry>Unicode, 8-bit</entry>
2071          <entry><emphasis>all</emphasis></entry>
2072          <entry>Yes</entry>
2073          <entry>Yes</entry>
2074          <entry>1&ndash;4</entry>
2075          <entry><literal>Unicode</literal></entry>
2076         </row>
2077         <row>
2078          <entry><literal>WIN866</literal></entry>
2079          <entry>Windows CP866</entry>
2080          <entry>Cyrillic</entry>
2081          <entry>Yes</entry>
2082          <entry>Yes</entry>
2083          <entry>1</entry>
2084          <entry><literal>ALT</literal></entry>
2085         </row>
2086         <row>
2087          <entry><literal>WIN874</literal></entry>
2088          <entry>Windows CP874</entry>
2089          <entry>Thai</entry>
2090          <entry>Yes</entry>
2091          <entry>No</entry>
2092          <entry>1</entry>
2093          <entry></entry>
2094         </row>
2095         <row>
2096          <entry><literal>WIN1250</literal></entry>
2097          <entry>Windows CP1250</entry>
2098          <entry>Central European</entry>
2099          <entry>Yes</entry>
2100          <entry>Yes</entry>
2101          <entry>1</entry>
2102          <entry></entry>
2103         </row>
2104         <row>
2105          <entry><literal>WIN1251</literal></entry>
2106          <entry>Windows CP1251</entry>
2107          <entry>Cyrillic</entry>
2108          <entry>Yes</entry>
2109          <entry>Yes</entry>
2110          <entry>1</entry>
2111          <entry><literal>WIN</literal></entry>
2112         </row>
2113         <row>
2114          <entry><literal>WIN1252</literal></entry>
2115          <entry>Windows CP1252</entry>
2116          <entry>Western European</entry>
2117          <entry>Yes</entry>
2118          <entry>Yes</entry>
2119          <entry>1</entry>
2120          <entry></entry>
2121         </row>
2122         <row>
2123          <entry><literal>WIN1253</literal></entry>
2124          <entry>Windows CP1253</entry>
2125          <entry>Greek</entry>
2126          <entry>Yes</entry>
2127          <entry>Yes</entry>
2128          <entry>1</entry>
2129          <entry></entry>
2130         </row>
2131         <row>
2132          <entry><literal>WIN1254</literal></entry>
2133          <entry>Windows CP1254</entry>
2134          <entry>Turkish</entry>
2135          <entry>Yes</entry>
2136          <entry>Yes</entry>
2137          <entry>1</entry>
2138          <entry></entry>
2139         </row>
2140         <row>
2141          <entry><literal>WIN1255</literal></entry>
2142          <entry>Windows CP1255</entry>
2143          <entry>Hebrew</entry>
2144          <entry>Yes</entry>
2145          <entry>Yes</entry>
2146          <entry>1</entry>
2147          <entry></entry>
2148         </row>
2149         <row>
2150          <entry><literal>WIN1256</literal></entry>
2151          <entry>Windows CP1256</entry>
2152          <entry>Arabic</entry>
2153          <entry>Yes</entry>
2154          <entry>Yes</entry>
2155          <entry>1</entry>
2156          <entry></entry>
2157         </row>
2158         <row>
2159          <entry><literal>WIN1257</literal></entry>
2160          <entry>Windows CP1257</entry>
2161          <entry>Baltic</entry>
2162          <entry>Yes</entry>
2163          <entry>Yes</entry>
2164          <entry>1</entry>
2165          <entry></entry>
2166         </row>
2167         <row>
2168          <entry><literal>WIN1258</literal></entry>
2169          <entry>Windows CP1258</entry>
2170          <entry>Vietnamese</entry>
2171          <entry>Yes</entry>
2172          <entry>Yes</entry>
2173          <entry>1</entry>
2174          <entry><literal>ABC</literal>, <literal>TCVN</literal>, <literal>TCVN5712</literal>, <literal>VSCII</literal></entry>
2175         </row>
2176        </tbody>
2177       </tgroup>
2178      </table>
2179
2180      <para>
2181       Not all client <acronym>API</acronym>s support all the listed character sets. For example, the
2182       <productname>PostgreSQL</productname>
2183       JDBC driver does not support <literal>MULE_INTERNAL</literal>, <literal>LATIN6</literal>,
2184       <literal>LATIN8</literal>, and <literal>LATIN10</literal>.
2185      </para>
2186
2187      <para>
2188       The <literal>SQL_ASCII</literal> setting behaves considerably differently
2189       from the other settings.  When the server character set is
2190       <literal>SQL_ASCII</literal>, the server interprets byte values 0&ndash;127
2191       according to the ASCII standard, while byte values 128&ndash;255 are taken
2192       as uninterpreted characters.  No encoding conversion will be done when
2193       the setting is <literal>SQL_ASCII</literal>.  Thus, this setting is not so
2194       much a declaration that a specific encoding is in use, as a declaration
2195       of ignorance about the encoding.  In most cases, if you are
2196       working with any non-ASCII data, it is unwise to use the
2197       <literal>SQL_ASCII</literal> setting because
2198       <productname>PostgreSQL</productname> will be unable to help you by
2199       converting or validating non-ASCII characters.
2200      </para>
2201     </sect2>
2202
2203    <sect2 id="multibyte-setting">
2204     <title>Setting the Character Set</title>
2205
2206     <para>
2207      <command>initdb</command> defines the default character set (encoding)
2208      for a <productname>PostgreSQL</productname> cluster. For example,
2209
2210 <screen>
2211 initdb -E EUC_JP
2212 </screen>
2213
2214      sets the default character set to
2215      <literal>EUC_JP</literal> (Extended Unix Code for Japanese).  You
2216      can use <option>--encoding</option> instead of
2217      <option>-E</option> if you prefer longer option strings.
2218      If no <option>-E</option> or <option>--encoding</option> option is
2219      given, <command>initdb</command> attempts to determine the appropriate
2220      encoding to use based on the specified or default locale.
2221     </para>
2222
2223     <para>
2224      You can specify a non-default encoding at database creation time,
2225      provided that the encoding is compatible with the selected locale:
2226
2227 <screen>
2228 createdb -E EUC_KR -T template0 --lc-collate=ko_KR.euckr --lc-ctype=ko_KR.euckr korean
2229 </screen>
2230
2231      This will create a database named <literal>korean</literal> that
2232      uses the character set <literal>EUC_KR</literal>, and locale <literal>ko_KR</literal>.
2233      Another way to accomplish this is to use this SQL command:
2234
2235 <programlisting>
2236 CREATE DATABASE korean WITH ENCODING 'EUC_KR' LC_COLLATE='ko_KR.euckr' LC_CTYPE='ko_KR.euckr' TEMPLATE=template0;
2237 </programlisting>
2238
2239      Notice that the above commands specify copying the <literal>template0</literal>
2240      database.  When copying any other database, the encoding and locale
2241      settings cannot be changed from those of the source database, because
2242      that might result in corrupt data.  For more information see
2243      <xref linkend="manage-ag-templatedbs"/>.
2244     </para>
2245
2246     <para>
2247      The encoding for a database is stored in the system catalog
2248      <literal>pg_database</literal>.  You can see it by using the
2249      <command>psql</command> <option>-l</option> option or the
2250      <command>\l</command> command.
2251
2252 <screen>
2253 $ <userinput>psql -l</userinput>
2254                                          List of databases
2255    Name    |  Owner   | Encoding  |  Collation  |    Ctype    |          Access Privileges
2256 -----------+----------+-----------+-------------+-------------+-------------------------------------
2257  clocaledb | hlinnaka | SQL_ASCII | C           | C           |
2258  englishdb | hlinnaka | UTF8      | en_GB.UTF8  | en_GB.UTF8  |
2259  japanese  | hlinnaka | UTF8      | ja_JP.UTF8  | ja_JP.UTF8  |
2260  korean    | hlinnaka | EUC_KR    | ko_KR.euckr | ko_KR.euckr |
2261  postgres  | hlinnaka | UTF8      | fi_FI.UTF8  | fi_FI.UTF8  |
2262  template0 | hlinnaka | UTF8      | fi_FI.UTF8  | fi_FI.UTF8  | {=c/hlinnaka,hlinnaka=CTc/hlinnaka}
2263  template1 | hlinnaka | UTF8      | fi_FI.UTF8  | fi_FI.UTF8  | {=c/hlinnaka,hlinnaka=CTc/hlinnaka}
2264 (7 rows)
2265 </screen>
2266     </para>
2267
2268     <important>
2269      <para>
2270       On most modern operating systems, <productname>PostgreSQL</productname>
2271       can determine which character set is implied by the <envar>LC_CTYPE</envar>
2272       setting, and it will enforce that only the matching database encoding is
2273       used.  On older systems it is your responsibility to ensure that you use
2274       the encoding expected by the locale you have selected.  A mistake in
2275       this area is likely to lead to strange behavior of locale-dependent
2276       operations such as sorting.
2277      </para>
2278
2279      <para>
2280       <productname>PostgreSQL</productname> will allow superusers to create
2281       databases with <literal>SQL_ASCII</literal> encoding even when
2282       <envar>LC_CTYPE</envar> is not <literal>C</literal> or <literal>POSIX</literal>.  As noted
2283       above, <literal>SQL_ASCII</literal> does not enforce that the data stored in
2284       the database has any particular encoding, and so this choice poses risks
2285       of locale-dependent misbehavior.  Using this combination of settings is
2286       deprecated and may someday be forbidden altogether.
2287      </para>
2288     </important>
2289    </sect2>
2290
2291    <sect2 id="multibyte-automatic-conversion">
2292     <title>Automatic Character Set Conversion Between Server and Client</title>
2293
2294     <para>
2295      <productname>PostgreSQL</productname> supports automatic character
2296      set conversion between server and client for many combinations of
2297      character sets (<xref linkend="multibyte-conversions-supported"/>
2298      shows which ones).
2299     </para>
2300
2301     <para>
2302      To enable automatic character set conversion, you have to
2303      tell <productname>PostgreSQL</productname> the character set
2304      (encoding) you would like to use in the client. There are several
2305      ways to accomplish this:
2306
2307      <itemizedlist>
2308       <listitem>
2309        <para>
2310         Using the <command>\encoding</command> command in
2311         <application>psql</application>.
2312         <command>\encoding</command> allows you to change client
2313         encoding on the fly. For
2314         example, to change the encoding to <literal>SJIS</literal>, type:
2315
2316 <programlisting>
2317 \encoding SJIS
2318 </programlisting>
2319        </para>
2320       </listitem>
2321
2322       <listitem>
2323        <para>
2324         <application>libpq</application> (<xref linkend="libpq-control"/>) has functions to control the client encoding.
2325        </para>
2326       </listitem>
2327
2328       <listitem>
2329        <para>
2330         Using <command>SET client_encoding TO</command>.
2331
2332         Setting the client encoding can be done with this SQL command:
2333
2334 <programlisting>
2335 SET CLIENT_ENCODING TO '<replaceable>value</replaceable>';
2336 </programlisting>
2337
2338         Also you can use the standard SQL syntax <literal>SET NAMES</literal>
2339         for this purpose:
2340
2341 <programlisting>
2342 SET NAMES '<replaceable>value</replaceable>';
2343 </programlisting>
2344
2345         To query the current client encoding:
2346
2347 <programlisting>
2348 SHOW client_encoding;
2349 </programlisting>
2350
2351         To return to the default encoding:
2352
2353 <programlisting>
2354 RESET client_encoding;
2355 </programlisting>
2356        </para>
2357       </listitem>
2358
2359       <listitem>
2360        <para>
2361         Using <envar>PGCLIENTENCODING</envar>. If the environment variable
2362         <envar>PGCLIENTENCODING</envar> is defined in the client's
2363         environment, that client encoding is automatically selected
2364         when a connection to the server is made.  (This can
2365         subsequently be overridden using any of the other methods
2366         mentioned above.)
2367        </para>
2368       </listitem>
2369
2370       <listitem>
2371       <para>
2372        Using the configuration variable <xref
2373        linkend="guc-client-encoding"/>. If the
2374        <varname>client_encoding</varname> variable is set, that client
2375        encoding is automatically selected when a connection to the
2376        server is made.  (This can subsequently be overridden using any
2377        of the other methods mentioned above.)
2378        </para>
2379       </listitem>
2380
2381      </itemizedlist>
2382     </para>
2383
2384     <para>
2385      If the conversion of a particular character is not possible
2386      &mdash; suppose you chose <literal>EUC_JP</literal> for the
2387      server and <literal>LATIN1</literal> for the client, and some
2388      Japanese characters are returned that do not have a representation in
2389      <literal>LATIN1</literal> &mdash; an error is reported.
2390     </para>
2391
2392     <para>
2393      If the client character set is defined as <literal>SQL_ASCII</literal>,
2394      encoding conversion is disabled, regardless of the server's character
2395      set.  (However, if the server's character set is
2396      not <literal>SQL_ASCII</literal>, the server will still check that
2397      incoming data is valid for that encoding; so the net effect is as
2398      though the client character set were the same as the server's.)
2399      Just as for the server, use of <literal>SQL_ASCII</literal> is unwise
2400      unless you are working with all-ASCII data.
2401     </para>
2402    </sect2>
2403
2404    <sect2 id="multibyte-conversions-supported">
2405     <title>Available Character Set Conversions</title>
2406
2407     <para>
2408      <productname>PostgreSQL</productname> allows conversion between any
2409      two character sets for which a conversion function is listed in the
2410      <link linkend="catalog-pg-conversion"><structname>pg_conversion</structname></link>
2411      system catalog.  <productname>PostgreSQL</productname> comes with
2412      some predefined conversions, as summarized in
2413      <xref linkend="multibyte-translation-table"/> and shown in more
2414      detail in <xref linkend="builtin-conversions-table"/>.  You can
2415      create a new conversion using the SQL command
2416      <xref linkend="sql-createconversion"/>.  (To be used for automatic
2417      client/server conversions, a conversion must be marked
2418      as <quote>default</quote> for its character set pair.)
2419     </para>
2420
2421     <table id="multibyte-translation-table">
2422      <title>Built-in Client/Server Character Set Conversions</title>
2423      <tgroup cols="2">
2424       <colspec colname="col1" colwidth="1*"/>
2425       <colspec colname="col2" colwidth="3*"/>
2426       <thead>
2427        <row>
2428         <entry>Server Character Set</entry>
2429         <entry>Available Client Character Sets</entry>
2430        </row>
2431       </thead>
2432       <tbody>
2433        <row>
2434         <entry><literal>BIG5</literal></entry>
2435         <entry><emphasis>not supported as a server encoding</emphasis>
2436         </entry>
2437        </row>
2438        <row>
2439         <entry><literal>EUC_CN</literal></entry>
2440         <entry><emphasis>EUC_CN</emphasis>,
2441         <literal>MULE_INTERNAL</literal>,
2442         <literal>UTF8</literal>
2443         </entry>
2444        </row>
2445        <row>
2446         <entry><literal>EUC_JP</literal></entry>
2447         <entry><emphasis>EUC_JP</emphasis>,
2448         <literal>MULE_INTERNAL</literal>,
2449         <literal>SJIS</literal>,
2450         <literal>UTF8</literal>
2451         </entry>
2452        </row>
2453        <row>
2454         <entry><literal>EUC_JIS_2004</literal></entry>
2455         <entry><emphasis>EUC_JIS_2004</emphasis>,
2456         <literal>SHIFT_JIS_2004</literal>,
2457         <literal>UTF8</literal>
2458         </entry>
2459        </row>
2460        <row>
2461         <entry><literal>EUC_KR</literal></entry>
2462         <entry><emphasis>EUC_KR</emphasis>,
2463         <literal>MULE_INTERNAL</literal>,
2464         <literal>UTF8</literal>
2465         </entry>
2466        </row>
2467        <row>
2468         <entry><literal>EUC_TW</literal></entry>
2469         <entry><emphasis>EUC_TW</emphasis>,
2470         <literal>BIG5</literal>,
2471         <literal>MULE_INTERNAL</literal>,
2472         <literal>UTF8</literal>
2473         </entry>
2474        </row>
2475        <row>
2476         <entry><literal>GB18030</literal></entry>
2477         <entry><emphasis>not supported as a server encoding</emphasis>
2478         </entry>
2479        </row>
2480        <row>
2481         <entry><literal>GBK</literal></entry>
2482         <entry><emphasis>not supported as a server encoding</emphasis>
2483         </entry>
2484        </row>
2485        <row>
2486         <entry><literal>ISO_8859_5</literal></entry>
2487         <entry><emphasis>ISO_8859_5</emphasis>,
2488         <literal>KOI8R</literal>,
2489         <literal>MULE_INTERNAL</literal>,
2490         <literal>UTF8</literal>,
2491         <literal>WIN866</literal>,
2492         <literal>WIN1251</literal>
2493         </entry>
2494        </row>
2495        <row>
2496         <entry><literal>ISO_8859_6</literal></entry>
2497         <entry><emphasis>ISO_8859_6</emphasis>,
2498         <literal>UTF8</literal>
2499         </entry>
2500        </row>
2501        <row>
2502         <entry><literal>ISO_8859_7</literal></entry>
2503         <entry><emphasis>ISO_8859_7</emphasis>,
2504         <literal>UTF8</literal>
2505         </entry>
2506        </row>
2507        <row>
2508         <entry><literal>ISO_8859_8</literal></entry>
2509         <entry><emphasis>ISO_8859_8</emphasis>,
2510         <literal>UTF8</literal>
2511         </entry>
2512        </row>
2513        <row>
2514         <entry><literal>JOHAB</literal></entry>
2515         <entry><emphasis>not supported as a server encoding</emphasis>
2516         </entry>
2517        </row>
2518        <row>
2519         <entry><literal>KOI8R</literal></entry>
2520         <entry><emphasis>KOI8R</emphasis>,
2521         <literal>ISO_8859_5</literal>,
2522         <literal>MULE_INTERNAL</literal>,
2523         <literal>UTF8</literal>,
2524         <literal>WIN866</literal>,
2525         <literal>WIN1251</literal>
2526         </entry>
2527        </row>
2528        <row>
2529         <entry><literal>KOI8U</literal></entry>
2530         <entry><emphasis>KOI8U</emphasis>,
2531         <literal>UTF8</literal>
2532         </entry>
2533        </row>
2534        <row>
2535         <entry><literal>LATIN1</literal></entry>
2536         <entry><emphasis>LATIN1</emphasis>,
2537         <literal>MULE_INTERNAL</literal>,
2538         <literal>UTF8</literal>
2539         </entry>
2540        </row>
2541        <row>
2542         <entry><literal>LATIN2</literal></entry>
2543         <entry><emphasis>LATIN2</emphasis>,
2544         <literal>MULE_INTERNAL</literal>,
2545         <literal>UTF8</literal>,
2546         <literal>WIN1250</literal>
2547         </entry>
2548        </row>
2549        <row>
2550         <entry><literal>LATIN3</literal></entry>
2551         <entry><emphasis>LATIN3</emphasis>,
2552         <literal>MULE_INTERNAL</literal>,
2553         <literal>UTF8</literal>
2554         </entry>
2555        </row>
2556        <row>
2557         <entry><literal>LATIN4</literal></entry>
2558         <entry><emphasis>LATIN4</emphasis>,
2559         <literal>MULE_INTERNAL</literal>,
2560         <literal>UTF8</literal>
2561         </entry>
2562        </row>
2563        <row>
2564         <entry><literal>LATIN5</literal></entry>
2565         <entry><emphasis>LATIN5</emphasis>,
2566         <literal>UTF8</literal>
2567         </entry>
2568        </row>
2569        <row>
2570         <entry><literal>LATIN6</literal></entry>
2571         <entry><emphasis>LATIN6</emphasis>,
2572         <literal>UTF8</literal>
2573         </entry>
2574        </row>
2575        <row>
2576         <entry><literal>LATIN7</literal></entry>
2577         <entry><emphasis>LATIN7</emphasis>,
2578         <literal>UTF8</literal>
2579         </entry>
2580        </row>
2581        <row>
2582         <entry><literal>LATIN8</literal></entry>
2583         <entry><emphasis>LATIN8</emphasis>,
2584         <literal>UTF8</literal>
2585         </entry>
2586        </row>
2587        <row>
2588         <entry><literal>LATIN9</literal></entry>
2589         <entry><emphasis>LATIN9</emphasis>,
2590         <literal>UTF8</literal>
2591         </entry>
2592        </row>
2593        <row>
2594         <entry><literal>LATIN10</literal></entry>
2595         <entry><emphasis>LATIN10</emphasis>,
2596         <literal>UTF8</literal>
2597         </entry>
2598        </row>
2599        <row>
2600         <entry><literal>MULE_INTERNAL</literal></entry>
2601         <entry><emphasis>MULE_INTERNAL</emphasis>,
2602          <literal>BIG5</literal>,
2603          <literal>EUC_CN</literal>,
2604          <literal>EUC_JP</literal>,
2605          <literal>EUC_KR</literal>,
2606          <literal>EUC_TW</literal>,
2607          <literal>ISO_8859_5</literal>,
2608          <literal>KOI8R</literal>,
2609          <literal>LATIN1</literal> to <literal>LATIN4</literal>,
2610          <literal>SJIS</literal>,
2611          <literal>WIN866</literal>,
2612          <literal>WIN1250</literal>,
2613          <literal>WIN1251</literal>
2614         </entry>
2615        </row>
2616        <row>
2617         <entry><literal>SJIS</literal></entry>
2618         <entry><emphasis>not supported as a server encoding</emphasis>
2619         </entry>
2620        </row>
2621        <row>
2622         <entry><literal>SHIFT_JIS_2004</literal></entry>
2623         <entry><emphasis>not supported as a server encoding</emphasis>
2624         </entry>
2625        </row>
2626        <row>
2627         <entry><literal>SQL_ASCII</literal></entry>
2628         <entry><emphasis>any (no conversion will be performed)</emphasis>
2629         </entry>
2630        </row>
2631        <row>
2632         <entry><literal>UHC</literal></entry>
2633         <entry><emphasis>not supported as a server encoding</emphasis>
2634         </entry>
2635        </row>
2636        <row>
2637         <entry><literal>UTF8</literal></entry>
2638         <entry><emphasis>all supported encodings</emphasis>
2639         </entry>
2640        </row>
2641        <row>
2642         <entry><literal>WIN866</literal></entry>
2643         <entry><emphasis>WIN866</emphasis>,
2644          <literal>ISO_8859_5</literal>,
2645          <literal>KOI8R</literal>,
2646          <literal>MULE_INTERNAL</literal>,
2647          <literal>UTF8</literal>,
2648          <literal>WIN1251</literal>
2649         </entry>
2650        </row>
2651        <row>
2652         <entry><literal>WIN874</literal></entry>
2653         <entry><emphasis>WIN874</emphasis>,
2654         <literal>UTF8</literal>
2655         </entry>
2656        </row>
2657        <row>
2658         <entry><literal>WIN1250</literal></entry>
2659         <entry><emphasis>WIN1250</emphasis>,
2660          <literal>LATIN2</literal>,
2661          <literal>MULE_INTERNAL</literal>,
2662          <literal>UTF8</literal>
2663         </entry>
2664        </row>
2665        <row>
2666         <entry><literal>WIN1251</literal></entry>
2667         <entry><emphasis>WIN1251</emphasis>,
2668          <literal>ISO_8859_5</literal>,
2669          <literal>KOI8R</literal>,
2670          <literal>MULE_INTERNAL</literal>,
2671          <literal>UTF8</literal>,
2672          <literal>WIN866</literal>
2673         </entry>
2674        </row>
2675        <row>
2676         <entry><literal>WIN1252</literal></entry>
2677         <entry><emphasis>WIN1252</emphasis>,
2678          <literal>UTF8</literal>
2679         </entry>
2680        </row>
2681        <row>
2682         <entry><literal>WIN1253</literal></entry>
2683         <entry><emphasis>WIN1253</emphasis>,
2684          <literal>UTF8</literal>
2685         </entry>
2686        </row>
2687        <row>
2688         <entry><literal>WIN1254</literal></entry>
2689         <entry><emphasis>WIN1254</emphasis>,
2690          <literal>UTF8</literal>
2691         </entry>
2692        </row>
2693        <row>
2694         <entry><literal>WIN1255</literal></entry>
2695         <entry><emphasis>WIN1255</emphasis>,
2696          <literal>UTF8</literal>
2697         </entry>
2698        </row>
2699        <row>
2700         <entry><literal>WIN1256</literal></entry>
2701         <entry><emphasis>WIN1256</emphasis>,
2702         <literal>UTF8</literal>
2703         </entry>
2704        </row>
2705        <row>
2706         <entry><literal>WIN1257</literal></entry>
2707         <entry><emphasis>WIN1257</emphasis>,
2708          <literal>UTF8</literal>
2709         </entry>
2710        </row>
2711        <row>
2712         <entry><literal>WIN1258</literal></entry>
2713         <entry><emphasis>WIN1258</emphasis>,
2714         <literal>UTF8</literal>
2715         </entry>
2716        </row>
2717       </tbody>
2718      </tgroup>
2719     </table>
2720
2721     <table id="builtin-conversions-table">
2722      <title>All Built-in Character Set Conversions</title>
2723      <tgroup cols="3">
2724       <colspec colname="col1" colwidth="2*"/>
2725       <colspec colname="col2" colwidth="1*"/>
2726       <colspec colname="col3" colwidth="1*"/>
2727       <thead>
2728        <row>
2729         <entry>Conversion Name
2730          <footnote>
2731           <para>
2732            The conversion names follow a standard naming scheme: The
2733            official name of the source encoding with all
2734            non-alphanumeric characters replaced by underscores, followed
2735            by <literal>_to_</literal>, followed by the similarly processed
2736            destination encoding name.  Therefore, these names sometimes
2737            deviate from the customary encoding names shown in
2738            <xref linkend="charset-table"/>.
2739           </para>
2740          </footnote>
2741         </entry>
2742         <entry>Source Encoding</entry>
2743         <entry>Destination Encoding</entry>
2744        </row>
2745       </thead>
2746
2747       <tbody>
2748        <row>
2749         <entry><literal>big5_to_euc_tw</literal></entry>
2750         <entry><literal>BIG5</literal></entry>
2751         <entry><literal>EUC_TW</literal></entry>
2752        </row>
2753        <row>
2754         <entry><literal>big5_to_mic</literal></entry>
2755         <entry><literal>BIG5</literal></entry>
2756         <entry><literal>MULE_INTERNAL</literal></entry>
2757        </row>
2758        <row>
2759         <entry><literal>big5_to_utf8</literal></entry>
2760         <entry><literal>BIG5</literal></entry>
2761         <entry><literal>UTF8</literal></entry>
2762        </row>
2763        <row>
2764         <entry><literal>euc_cn_to_mic</literal></entry>
2765         <entry><literal>EUC_CN</literal></entry>
2766         <entry><literal>MULE_INTERNAL</literal></entry>
2767        </row>
2768        <row>
2769         <entry><literal>euc_cn_to_utf8</literal></entry>
2770         <entry><literal>EUC_CN</literal></entry>
2771         <entry><literal>UTF8</literal></entry>
2772        </row>
2773        <row>
2774         <entry><literal>euc_jp_to_mic</literal></entry>
2775         <entry><literal>EUC_JP</literal></entry>
2776         <entry><literal>MULE_INTERNAL</literal></entry>
2777        </row>
2778        <row>
2779         <entry><literal>euc_jp_to_sjis</literal></entry>
2780         <entry><literal>EUC_JP</literal></entry>
2781         <entry><literal>SJIS</literal></entry>
2782        </row>
2783        <row>
2784         <entry><literal>euc_jp_to_utf8</literal></entry>
2785         <entry><literal>EUC_JP</literal></entry>
2786         <entry><literal>UTF8</literal></entry>
2787        </row>
2788        <row>
2789         <entry><literal>euc_kr_to_mic</literal></entry>
2790         <entry><literal>EUC_KR</literal></entry>
2791         <entry><literal>MULE_INTERNAL</literal></entry>
2792        </row>
2793        <row>
2794         <entry><literal>euc_kr_to_utf8</literal></entry>
2795         <entry><literal>EUC_KR</literal></entry>
2796         <entry><literal>UTF8</literal></entry>
2797        </row>
2798        <row>
2799         <entry><literal>euc_tw_to_big5</literal></entry>
2800         <entry><literal>EUC_TW</literal></entry>
2801         <entry><literal>BIG5</literal></entry>
2802        </row>
2803        <row>
2804         <entry><literal>euc_tw_to_mic</literal></entry>
2805         <entry><literal>EUC_TW</literal></entry>
2806         <entry><literal>MULE_INTERNAL</literal></entry>
2807        </row>
2808        <row>
2809         <entry><literal>euc_tw_to_utf8</literal></entry>
2810         <entry><literal>EUC_TW</literal></entry>
2811         <entry><literal>UTF8</literal></entry>
2812        </row>
2813        <row>
2814         <entry><literal>gb18030_to_utf8</literal></entry>
2815         <entry><literal>GB18030</literal></entry>
2816         <entry><literal>UTF8</literal></entry>
2817        </row>
2818        <row>
2819         <entry><literal>gbk_to_utf8</literal></entry>
2820         <entry><literal>GBK</literal></entry>
2821         <entry><literal>UTF8</literal></entry>
2822        </row>
2823        <row>
2824         <entry><literal>iso_8859_10_to_utf8</literal></entry>
2825         <entry><literal>LATIN6</literal></entry>
2826         <entry><literal>UTF8</literal></entry>
2827        </row>
2828        <row>
2829         <entry><literal>iso_8859_13_to_utf8</literal></entry>
2830         <entry><literal>LATIN7</literal></entry>
2831         <entry><literal>UTF8</literal></entry>
2832        </row>
2833        <row>
2834         <entry><literal>iso_8859_14_to_utf8</literal></entry>
2835         <entry><literal>LATIN8</literal></entry>
2836         <entry><literal>UTF8</literal></entry>
2837        </row>
2838        <row>
2839         <entry><literal>iso_8859_15_to_utf8</literal></entry>
2840         <entry><literal>LATIN9</literal></entry>
2841         <entry><literal>UTF8</literal></entry>
2842        </row>
2843        <row>
2844         <entry><literal>iso_8859_16_to_utf8</literal></entry>
2845         <entry><literal>LATIN10</literal></entry>
2846         <entry><literal>UTF8</literal></entry>
2847        </row>
2848        <row>
2849         <entry><literal>iso_8859_1_to_mic</literal></entry>
2850         <entry><literal>LATIN1</literal></entry>
2851         <entry><literal>MULE_INTERNAL</literal></entry>
2852        </row>
2853        <row>
2854         <entry><literal>iso_8859_1_to_utf8</literal></entry>
2855         <entry><literal>LATIN1</literal></entry>
2856         <entry><literal>UTF8</literal></entry>
2857        </row>
2858        <row>
2859         <entry><literal>iso_8859_2_to_mic</literal></entry>
2860         <entry><literal>LATIN2</literal></entry>
2861         <entry><literal>MULE_INTERNAL</literal></entry>
2862        </row>
2863        <row>
2864         <entry><literal>iso_8859_2_to_utf8</literal></entry>
2865         <entry><literal>LATIN2</literal></entry>
2866         <entry><literal>UTF8</literal></entry>
2867        </row>
2868        <row>
2869         <entry><literal>iso_8859_2_to_windows_1250</literal></entry>
2870         <entry><literal>LATIN2</literal></entry>
2871         <entry><literal>WIN1250</literal></entry>
2872        </row>
2873        <row>
2874         <entry><literal>iso_8859_3_to_mic</literal></entry>
2875         <entry><literal>LATIN3</literal></entry>
2876         <entry><literal>MULE_INTERNAL</literal></entry>
2877        </row>
2878        <row>
2879         <entry><literal>iso_8859_3_to_utf8</literal></entry>
2880         <entry><literal>LATIN3</literal></entry>
2881         <entry><literal>UTF8</literal></entry>
2882        </row>
2883        <row>
2884         <entry><literal>iso_8859_4_to_mic</literal></entry>
2885         <entry><literal>LATIN4</literal></entry>
2886         <entry><literal>MULE_INTERNAL</literal></entry>
2887        </row>
2888        <row>
2889         <entry><literal>iso_8859_4_to_utf8</literal></entry>
2890         <entry><literal>LATIN4</literal></entry>
2891         <entry><literal>UTF8</literal></entry>
2892        </row>
2893        <row>
2894         <entry><literal>iso_8859_5_to_koi8_r</literal></entry>
2895         <entry><literal>ISO_8859_5</literal></entry>
2896         <entry><literal>KOI8R</literal></entry>
2897        </row>
2898        <row>
2899         <entry><literal>iso_8859_5_to_mic</literal></entry>
2900         <entry><literal>ISO_8859_5</literal></entry>
2901         <entry><literal>MULE_INTERNAL</literal></entry>
2902        </row>
2903        <row>
2904         <entry><literal>iso_8859_5_to_utf8</literal></entry>
2905         <entry><literal>ISO_8859_5</literal></entry>
2906         <entry><literal>UTF8</literal></entry>
2907        </row>
2908        <row>
2909         <entry><literal>iso_8859_5_to_windows_1251</literal></entry>
2910         <entry><literal>ISO_8859_5</literal></entry>
2911         <entry><literal>WIN1251</literal></entry>
2912        </row>
2913        <row>
2914         <entry><literal>iso_8859_5_to_windows_866</literal></entry>
2915         <entry><literal>ISO_8859_5</literal></entry>
2916         <entry><literal>WIN866</literal></entry>
2917        </row>
2918        <row>
2919         <entry><literal>iso_8859_6_to_utf8</literal></entry>
2920         <entry><literal>ISO_8859_6</literal></entry>
2921         <entry><literal>UTF8</literal></entry>
2922        </row>
2923        <row>
2924         <entry><literal>iso_8859_7_to_utf8</literal></entry>
2925         <entry><literal>ISO_8859_7</literal></entry>
2926         <entry><literal>UTF8</literal></entry>
2927        </row>
2928        <row>
2929         <entry><literal>iso_8859_8_to_utf8</literal></entry>
2930         <entry><literal>ISO_8859_8</literal></entry>
2931         <entry><literal>UTF8</literal></entry>
2932        </row>
2933        <row>
2934         <entry><literal>iso_8859_9_to_utf8</literal></entry>
2935         <entry><literal>LATIN5</literal></entry>
2936         <entry><literal>UTF8</literal></entry>
2937        </row>
2938        <row>
2939         <entry><literal>johab_to_utf8</literal></entry>
2940         <entry><literal>JOHAB</literal></entry>
2941         <entry><literal>UTF8</literal></entry>
2942        </row>
2943        <row>
2944         <entry><literal>koi8_r_to_iso_8859_5</literal></entry>
2945         <entry><literal>KOI8R</literal></entry>
2946         <entry><literal>ISO_8859_5</literal></entry>
2947        </row>
2948        <row>
2949         <entry><literal>koi8_r_to_mic</literal></entry>
2950         <entry><literal>KOI8R</literal></entry>
2951         <entry><literal>MULE_INTERNAL</literal></entry>
2952        </row>
2953        <row>
2954         <entry><literal>koi8_r_to_utf8</literal></entry>
2955         <entry><literal>KOI8R</literal></entry>
2956         <entry><literal>UTF8</literal></entry>
2957        </row>
2958        <row>
2959         <entry><literal>koi8_r_to_windows_1251</literal></entry>
2960         <entry><literal>KOI8R</literal></entry>
2961         <entry><literal>WIN1251</literal></entry>
2962        </row>
2963        <row>
2964         <entry><literal>koi8_r_to_windows_866</literal></entry>
2965         <entry><literal>KOI8R</literal></entry>
2966         <entry><literal>WIN866</literal></entry>
2967        </row>
2968        <row>
2969         <entry><literal>koi8_u_to_utf8</literal></entry>
2970         <entry><literal>KOI8U</literal></entry>
2971         <entry><literal>UTF8</literal></entry>
2972        </row>
2973        <row>
2974         <entry><literal>mic_to_big5</literal></entry>
2975         <entry><literal>MULE_INTERNAL</literal></entry>
2976         <entry><literal>BIG5</literal></entry>
2977        </row>
2978        <row>
2979         <entry><literal>mic_to_euc_cn</literal></entry>
2980         <entry><literal>MULE_INTERNAL</literal></entry>
2981         <entry><literal>EUC_CN</literal></entry>
2982        </row>
2983        <row>
2984         <entry><literal>mic_to_euc_jp</literal></entry>
2985         <entry><literal>MULE_INTERNAL</literal></entry>
2986         <entry><literal>EUC_JP</literal></entry>
2987        </row>
2988        <row>
2989         <entry><literal>mic_to_euc_kr</literal></entry>
2990         <entry><literal>MULE_INTERNAL</literal></entry>
2991         <entry><literal>EUC_KR</literal></entry>
2992        </row>
2993        <row>
2994         <entry><literal>mic_to_euc_tw</literal></entry>
2995         <entry><literal>MULE_INTERNAL</literal></entry>
2996         <entry><literal>EUC_TW</literal></entry>
2997        </row>
2998        <row>
2999         <entry><literal>mic_to_iso_8859_1</literal></entry>
3000         <entry><literal>MULE_INTERNAL</literal></entry>
3001         <entry><literal>LATIN1</literal></entry>
3002        </row>
3003        <row>
3004         <entry><literal>mic_to_iso_8859_2</literal></entry>
3005         <entry><literal>MULE_INTERNAL</literal></entry>
3006         <entry><literal>LATIN2</literal></entry>
3007        </row>
3008        <row>
3009         <entry><literal>mic_to_iso_8859_3</literal></entry>
3010         <entry><literal>MULE_INTERNAL</literal></entry>
3011         <entry><literal>LATIN3</literal></entry>
3012        </row>
3013        <row>
3014         <entry><literal>mic_to_iso_8859_4</literal></entry>
3015         <entry><literal>MULE_INTERNAL</literal></entry>
3016         <entry><literal>LATIN4</literal></entry>
3017        </row>
3018        <row>
3019         <entry><literal>mic_to_iso_8859_5</literal></entry>
3020         <entry><literal>MULE_INTERNAL</literal></entry>
3021         <entry><literal>ISO_8859_5</literal></entry>
3022        </row>
3023        <row>
3024         <entry><literal>mic_to_koi8_r</literal></entry>
3025         <entry><literal>MULE_INTERNAL</literal></entry>
3026         <entry><literal>KOI8R</literal></entry>
3027        </row>
3028        <row>
3029         <entry><literal>mic_to_sjis</literal></entry>
3030         <entry><literal>MULE_INTERNAL</literal></entry>
3031         <entry><literal>SJIS</literal></entry>
3032        </row>
3033        <row>
3034         <entry><literal>mic_to_windows_1250</literal></entry>
3035         <entry><literal>MULE_INTERNAL</literal></entry>
3036         <entry><literal>WIN1250</literal></entry>
3037        </row>
3038        <row>
3039         <entry><literal>mic_to_windows_1251</literal></entry>
3040         <entry><literal>MULE_INTERNAL</literal></entry>
3041         <entry><literal>WIN1251</literal></entry>
3042        </row>
3043        <row>
3044         <entry><literal>mic_to_windows_866</literal></entry>
3045         <entry><literal>MULE_INTERNAL</literal></entry>
3046         <entry><literal>WIN866</literal></entry>
3047        </row>
3048        <row>
3049         <entry><literal>sjis_to_euc_jp</literal></entry>
3050         <entry><literal>SJIS</literal></entry>
3051         <entry><literal>EUC_JP</literal></entry>
3052        </row>
3053        <row>
3054         <entry><literal>sjis_to_mic</literal></entry>
3055         <entry><literal>SJIS</literal></entry>
3056         <entry><literal>MULE_INTERNAL</literal></entry>
3057        </row>
3058        <row>
3059         <entry><literal>sjis_to_utf8</literal></entry>
3060         <entry><literal>SJIS</literal></entry>
3061         <entry><literal>UTF8</literal></entry>
3062        </row>
3063        <row>
3064         <entry><literal>windows_1258_to_utf8</literal></entry>
3065         <entry><literal>WIN1258</literal></entry>
3066         <entry><literal>UTF8</literal></entry>
3067        </row>
3068        <row>
3069         <entry><literal>uhc_to_utf8</literal></entry>
3070         <entry><literal>UHC</literal></entry>
3071         <entry><literal>UTF8</literal></entry>
3072        </row>
3073        <row>
3074         <entry><literal>utf8_to_big5</literal></entry>
3075         <entry><literal>UTF8</literal></entry>
3076         <entry><literal>BIG5</literal></entry>
3077        </row>
3078        <row>
3079         <entry><literal>utf8_to_euc_cn</literal></entry>
3080         <entry><literal>UTF8</literal></entry>
3081         <entry><literal>EUC_CN</literal></entry>
3082        </row>
3083        <row>
3084         <entry><literal>utf8_to_euc_jp</literal></entry>
3085         <entry><literal>UTF8</literal></entry>
3086         <entry><literal>EUC_JP</literal></entry>
3087        </row>
3088        <row>
3089         <entry><literal>utf8_to_euc_kr</literal></entry>
3090         <entry><literal>UTF8</literal></entry>
3091         <entry><literal>EUC_KR</literal></entry>
3092        </row>
3093        <row>
3094         <entry><literal>utf8_to_euc_tw</literal></entry>
3095         <entry><literal>UTF8</literal></entry>
3096         <entry><literal>EUC_TW</literal></entry>
3097        </row>
3098        <row>
3099         <entry><literal>utf8_to_gb18030</literal></entry>
3100         <entry><literal>UTF8</literal></entry>
3101         <entry><literal>GB18030</literal></entry>
3102        </row>
3103        <row>
3104         <entry><literal>utf8_to_gbk</literal></entry>
3105         <entry><literal>UTF8</literal></entry>
3106         <entry><literal>GBK</literal></entry>
3107        </row>
3108        <row>
3109         <entry><literal>utf8_to_iso_8859_1</literal></entry>
3110         <entry><literal>UTF8</literal></entry>
3111         <entry><literal>LATIN1</literal></entry>
3112        </row>
3113        <row>
3114         <entry><literal>utf8_to_iso_8859_10</literal></entry>
3115         <entry><literal>UTF8</literal></entry>
3116         <entry><literal>LATIN6</literal></entry>
3117        </row>
3118        <row>
3119         <entry><literal>utf8_to_iso_8859_13</literal></entry>
3120         <entry><literal>UTF8</literal></entry>
3121         <entry><literal>LATIN7</literal></entry>
3122        </row>
3123        <row>
3124         <entry><literal>utf8_to_iso_8859_14</literal></entry>
3125         <entry><literal>UTF8</literal></entry>
3126         <entry><literal>LATIN8</literal></entry>
3127        </row>
3128        <row>
3129         <entry><literal>utf8_to_iso_8859_15</literal></entry>
3130         <entry><literal>UTF8</literal></entry>
3131         <entry><literal>LATIN9</literal></entry>
3132        </row>
3133        <row>
3134         <entry><literal>utf8_to_iso_8859_16</literal></entry>
3135         <entry><literal>UTF8</literal></entry>
3136         <entry><literal>LATIN10</literal></entry>
3137        </row>
3138        <row>
3139         <entry><literal>utf8_to_iso_8859_2</literal></entry>
3140         <entry><literal>UTF8</literal></entry>
3141         <entry><literal>LATIN2</literal></entry>
3142        </row>
3143        <row>
3144         <entry><literal>utf8_to_iso_8859_3</literal></entry>
3145         <entry><literal>UTF8</literal></entry>
3146         <entry><literal>LATIN3</literal></entry>
3147        </row>
3148        <row>
3149         <entry><literal>utf8_to_iso_8859_4</literal></entry>
3150         <entry><literal>UTF8</literal></entry>
3151         <entry><literal>LATIN4</literal></entry>
3152        </row>
3153        <row>
3154         <entry><literal>utf8_to_iso_8859_5</literal></entry>
3155         <entry><literal>UTF8</literal></entry>
3156         <entry><literal>ISO_8859_5</literal></entry>
3157        </row>
3158        <row>
3159         <entry><literal>utf8_to_iso_8859_6</literal></entry>
3160         <entry><literal>UTF8</literal></entry>
3161         <entry><literal>ISO_8859_6</literal></entry>
3162        </row>
3163        <row>
3164         <entry><literal>utf8_to_iso_8859_7</literal></entry>
3165         <entry><literal>UTF8</literal></entry>
3166         <entry><literal>ISO_8859_7</literal></entry>
3167        </row>
3168        <row>
3169         <entry><literal>utf8_to_iso_8859_8</literal></entry>
3170         <entry><literal>UTF8</literal></entry>
3171         <entry><literal>ISO_8859_8</literal></entry>
3172        </row>
3173        <row>
3174         <entry><literal>utf8_to_iso_8859_9</literal></entry>
3175         <entry><literal>UTF8</literal></entry>
3176         <entry><literal>LATIN5</literal></entry>
3177        </row>
3178        <row>
3179         <entry><literal>utf8_to_johab</literal></entry>
3180         <entry><literal>UTF8</literal></entry>
3181         <entry><literal>JOHAB</literal></entry>
3182        </row>
3183        <row>
3184         <entry><literal>utf8_to_koi8_r</literal></entry>
3185         <entry><literal>UTF8</literal></entry>
3186         <entry><literal>KOI8R</literal></entry>
3187        </row>
3188        <row>
3189         <entry><literal>utf8_to_koi8_u</literal></entry>
3190         <entry><literal>UTF8</literal></entry>
3191         <entry><literal>KOI8U</literal></entry>
3192        </row>
3193        <row>
3194         <entry><literal>utf8_to_sjis</literal></entry>
3195         <entry><literal>UTF8</literal></entry>
3196         <entry><literal>SJIS</literal></entry>
3197        </row>
3198        <row>
3199         <entry><literal>utf8_to_windows_1258</literal></entry>
3200         <entry><literal>UTF8</literal></entry>
3201         <entry><literal>WIN1258</literal></entry>
3202        </row>
3203        <row>
3204         <entry><literal>utf8_to_uhc</literal></entry>
3205         <entry><literal>UTF8</literal></entry>
3206         <entry><literal>UHC</literal></entry>
3207        </row>
3208        <row>
3209         <entry><literal>utf8_to_windows_1250</literal></entry>
3210         <entry><literal>UTF8</literal></entry>
3211         <entry><literal>WIN1250</literal></entry>
3212        </row>
3213        <row>
3214         <entry><literal>utf8_to_windows_1251</literal></entry>
3215         <entry><literal>UTF8</literal></entry>
3216         <entry><literal>WIN1251</literal></entry>
3217        </row>
3218        <row>
3219         <entry><literal>utf8_to_windows_1252</literal></entry>
3220         <entry><literal>UTF8</literal></entry>
3221         <entry><literal>WIN1252</literal></entry>
3222        </row>
3223        <row>
3224         <entry><literal>utf8_to_windows_1253</literal></entry>
3225         <entry><literal>UTF8</literal></entry>
3226         <entry><literal>WIN1253</literal></entry>
3227        </row>
3228        <row>
3229         <entry><literal>utf8_to_windows_1254</literal></entry>
3230         <entry><literal>UTF8</literal></entry>
3231         <entry><literal>WIN1254</literal></entry>
3232        </row>
3233        <row>
3234         <entry><literal>utf8_to_windows_1255</literal></entry>
3235         <entry><literal>UTF8</literal></entry>
3236         <entry><literal>WIN1255</literal></entry>
3237        </row>
3238        <row>
3239         <entry><literal>utf8_to_windows_1256</literal></entry>
3240         <entry><literal>UTF8</literal></entry>
3241         <entry><literal>WIN1256</literal></entry>
3242        </row>
3243        <row>
3244         <entry><literal>utf8_to_windows_1257</literal></entry>
3245         <entry><literal>UTF8</literal></entry>
3246         <entry><literal>WIN1257</literal></entry>
3247        </row>
3248        <row>
3249         <entry><literal>utf8_to_windows_866</literal></entry>
3250         <entry><literal>UTF8</literal></entry>
3251         <entry><literal>WIN866</literal></entry>
3252        </row>
3253        <row>
3254         <entry><literal>utf8_to_windows_874</literal></entry>
3255         <entry><literal>UTF8</literal></entry>
3256         <entry><literal>WIN874</literal></entry>
3257        </row>
3258        <row>
3259         <entry><literal>windows_1250_to_iso_8859_2</literal></entry>
3260         <entry><literal>WIN1250</literal></entry>
3261         <entry><literal>LATIN2</literal></entry>
3262        </row>
3263        <row>
3264         <entry><literal>windows_1250_to_mic</literal></entry>
3265         <entry><literal>WIN1250</literal></entry>
3266         <entry><literal>MULE_INTERNAL</literal></entry>
3267        </row>
3268        <row>
3269         <entry><literal>windows_1250_to_utf8</literal></entry>
3270         <entry><literal>WIN1250</literal></entry>
3271         <entry><literal>UTF8</literal></entry>
3272        </row>
3273        <row>
3274         <entry><literal>windows_1251_to_iso_8859_5</literal></entry>
3275         <entry><literal>WIN1251</literal></entry>
3276         <entry><literal>ISO_8859_5</literal></entry>
3277        </row>
3278        <row>
3279         <entry><literal>windows_1251_to_koi8_r</literal></entry>
3280         <entry><literal>WIN1251</literal></entry>
3281         <entry><literal>KOI8R</literal></entry>
3282        </row>
3283        <row>
3284         <entry><literal>windows_1251_to_mic</literal></entry>
3285         <entry><literal>WIN1251</literal></entry>
3286         <entry><literal>MULE_INTERNAL</literal></entry>
3287        </row>
3288        <row>
3289         <entry><literal>windows_1251_to_utf8</literal></entry>
3290         <entry><literal>WIN1251</literal></entry>
3291         <entry><literal>UTF8</literal></entry>
3292        </row>
3293        <row>
3294         <entry><literal>windows_1251_to_windows_866</literal></entry>
3295         <entry><literal>WIN1251</literal></entry>
3296         <entry><literal>WIN866</literal></entry>
3297        </row>
3298        <row>
3299         <entry><literal>windows_1252_to_utf8</literal></entry>
3300         <entry><literal>WIN1252</literal></entry>
3301         <entry><literal>UTF8</literal></entry>
3302        </row>
3303        <row>
3304         <entry><literal>windows_1256_to_utf8</literal></entry>
3305         <entry><literal>WIN1256</literal></entry>
3306         <entry><literal>UTF8</literal></entry>
3307        </row>
3308        <row>
3309         <entry><literal>windows_866_to_iso_8859_5</literal></entry>
3310         <entry><literal>WIN866</literal></entry>
3311         <entry><literal>ISO_8859_5</literal></entry>
3312        </row>
3313        <row>
3314         <entry><literal>windows_866_to_koi8_r</literal></entry>
3315         <entry><literal>WIN866</literal></entry>
3316         <entry><literal>KOI8R</literal></entry>
3317        </row>
3318        <row>
3319         <entry><literal>windows_866_to_mic</literal></entry>
3320         <entry><literal>WIN866</literal></entry>
3321         <entry><literal>MULE_INTERNAL</literal></entry>
3322        </row>
3323        <row>
3324         <entry><literal>windows_866_to_utf8</literal></entry>
3325         <entry><literal>WIN866</literal></entry>
3326         <entry><literal>UTF8</literal></entry>
3327        </row>
3328        <row>
3329         <entry><literal>windows_866_to_windows_1251</literal></entry>
3330         <entry><literal>WIN866</literal></entry>
3331         <entry><literal>WIN</literal></entry>
3332        </row>
3333        <row>
3334         <entry><literal>windows_874_to_utf8</literal></entry>
3335         <entry><literal>WIN874</literal></entry>
3336         <entry><literal>UTF8</literal></entry>
3337        </row>
3338        <row>
3339         <entry><literal>euc_jis_2004_to_utf8</literal></entry>
3340         <entry><literal>EUC_JIS_2004</literal></entry>
3341         <entry><literal>UTF8</literal></entry>
3342        </row>
3343        <row>
3344         <entry><literal>utf8_to_euc_jis_2004</literal></entry>
3345         <entry><literal>UTF8</literal></entry>
3346         <entry><literal>EUC_JIS_2004</literal></entry>
3347        </row>
3348        <row>
3349         <entry><literal>shift_jis_2004_to_utf8</literal></entry>
3350         <entry><literal>SHIFT_JIS_2004</literal></entry>
3351         <entry><literal>UTF8</literal></entry>
3352        </row>
3353        <row>
3354         <entry><literal>utf8_to_shift_jis_2004</literal></entry>
3355         <entry><literal>UTF8</literal></entry>
3356         <entry><literal>SHIFT_JIS_2004</literal></entry>
3357        </row>
3358        <row>
3359         <entry><literal>euc_jis_2004_to_shift_jis_2004</literal></entry>
3360         <entry><literal>EUC_JIS_2004</literal></entry>
3361         <entry><literal>SHIFT_JIS_2004</literal></entry>
3362        </row>
3363        <row>
3364         <entry><literal>shift_jis_2004_to_euc_jis_2004</literal></entry>
3365         <entry><literal>SHIFT_JIS_2004</literal></entry>
3366         <entry><literal>EUC_JIS_2004</literal></entry>
3367        </row>
3368       </tbody>
3369      </tgroup>
3370     </table>
3371    </sect2>
3372
3373    <sect2 id="multibyte-further-reading">
3374     <title>Further Reading</title>
3375
3376     <para>
3377      These are good sources to start learning about various kinds of encoding
3378      systems.
3379
3380      <variablelist>
3381       <varlistentry>
3382        <term><citetitle>CJKV Information Processing: Chinese, Japanese, Korean &amp; Vietnamese Computing</citetitle></term>
3383
3384        <listitem>
3385         <para>
3386          Contains detailed explanations of <literal>EUC_JP</literal>,
3387          <literal>EUC_CN</literal>, <literal>EUC_KR</literal>,
3388          <literal>EUC_TW</literal>.
3389         </para>
3390        </listitem>
3391       </varlistentry>
3392
3393       <varlistentry>
3394        <term><ulink url="https://www.unicode.org/"></ulink></term>
3395
3396        <listitem>
3397         <para>
3398          The web site of the Unicode Consortium.
3399         </para>
3400        </listitem>
3401       </varlistentry>
3402
3403       <varlistentry>
3404        <term><ulink url="https://datatracker.ietf.org/doc/html/rfc3629">RFC 3629</ulink></term>
3405
3406        <listitem>
3407         <para>
3408          <acronym>UTF</acronym>-8 (8-bit UCS/Unicode Transformation
3409          Format) is defined here.
3410         </para>
3411        </listitem>
3412       </varlistentry>
3413      </variablelist>
3414     </para>
3415    </sect2>
3416
3417   </sect1>
3418
3419 </chapter>