share/vim/vim58/doc/multibyte.txt

   1 *multibyte.txt* For Vim version 5.8.  Last change: 2000 Jun 07
   2
   3
   4                   VIM REFERENCE MANUAL    by Bram Moolenaar et al.
   5
   6
   7 Multi-byte support                              *multibyte* *multi-byte*
   8
   9                                                 *Chinese* *Japanese* *Korean*
  10 There are languages which have many characters that can not be represented
  11 using one byte (one octet).  These are Chinese (simplified or traditional),
  12 Japanese and Korean.  These languages uses more than one byte to represent a
  13 character.
  14
  15 This is limited information on the support in Vim to edit files that use more
  16 than one byte per character.  Actually, only two-byte codes are currently
  17 supported.
  18
  19 Also see |+multi_byte| and |'fileencoding'|.
  20
  21 1. Introduction                         |multibyte-intro|
  22 2. Compiling                            |multibyte-compiling|
  23 3. Display (X fontset support)          |multibyte-display|
  24 4. Input (XIM support)                  |multibyte-input|
  25 5. UTF-8 in XFree86 xterm               |UTF8-xterm|
  26
  27 ==============================================================================
  28 1. Introduction                                         *multibyte-intro*
  29
  30 LOCALE
  31                                                         *locale-multibyte*
  32 There are a number of languages in the world.  And there are different
  33 cultures and environments at least as much as the number of languages.  A
  34 linguistic environment corresponding to an area is called "|locale|".  The
  35 POSIX standard defines a concept of |locale|, which includes a lot of
  36 information about |charset|, collating order for sorting, date format,
  37 currency format and so on.
  38
  39 Your system need to support the |locale| system and the language |locale| of
  40 your choice.  Some system has a few language |locale|s, so the |locale| of the
  41 language which you want to use may not be on your system.  If so, you have to
  42 add the language |locale|.  But on some systems, it is not possible to add
  43 other |locale|s.  In this case, install X |locale|s by installing X compiled
  44 with X_LOCALE.  Add "-DX_LOCALE" to the CFLAGS if your X lib support X_LOCALE.
  45 For example, When you are using Linux system and you want to use Japanese, set
  46 up your system one of the followings.
  47     - libc5     + X compiled with X_LOCALE
  48     - glibc-2.0 + libwcsmbs + X compiled without X_LOCALE
  49     - glibc-2.1 + locale-ja + X compiled without X_LOCALE
  50
  51 The location in which the |locale|s are installed varies system to system.
  52 For example, "/usr/share/locale", "/usr/lib/locale", etc.  See your system's
  53 setlocale() man page.
  54
  55                                         *locale-name* *$LANG-multibyte*
  56 The format of |locale| name is:
  57     language[_territory[. codeset]]
  58 Territory means the country, codeset means the |charset|.  For example, the
  59 |locale| name "ja_JP.eucJP" means the language is Japanese, the country is
  60 Japan, the codeset is EUC-JP.  But it also could be "ja", "ja_JP.EUC",
  61 "ja_JP.ujis", etc.  And unfortunately, the |locale| name for a specific
  62 language, territory and codeset is not unified and depends on your system.
  63 This name is used for the LANG environment value.  When you want to use Korean
  64 and the |locale| name is "ko", do this:
  65     sh:  export LANG=ko
  66     csh: setenv LANG ko
  67
  68 Examples of locale name:
  69     |charset|       language              |locale-name|
  70     GB2312          Chinese (simplified)  zh_CN.EUC, zh_CN.GB2312
  71     Big5            Chinese (traditional) zh_TW.BIG5, zh_TW.Big5
  72     CNS-11643       Chinese (traditional) zh_TW
  73     EUC-JP          Japanese              ja, ja_JP.EUC, ja_JP.ujis, ja_JP.eucJP
  74     Shift_JIS       Japanese              ja_JP.SJIS, ja_JP.Shift_JIS
  75     EUC-KR          Korean                ko, ko_KR.EUC
  76
  77 Even if your system does not have the multibyte language |locale| of your
  78 choice, or does not have a enough implementation of the locale, Vim can
  79 somehow handle the multibyte languages.  Add "--enable-broken-locale" flag at
  80 compile time.
  81
  82
  83 CODED CHARACTER SET (CCS)
  84                                         *coded-character-set* *CCS*
  85 |CCS| is a mapping from a set of characters to a set of integers.  For
  86 example, ((65, A), (66, B), (67, C)) is a |CCS| and ((0x41, A), (0x42, B),
  87 (0x43, C)) is also a |CCS|.  Examples of |CCS| are ISO 10646, US-ASCII,
  88 ISO-8859 series, JIS X 0208, JIS X 0201, KS C 5601 (KS X 1001) and KS C 5636
  89 (KS X 1003).
  90
  91 The term "integer" means code point or character number and is different from
  92 octets or bit combination.
  93
  94 Typically, a |CCS| is a character table.  Representing the column/line as
  95 hexadecimal number becomes the code point of the character.  For example,
  96 US-ASCII CCS has 8x16 character table, the column number start with 0 and end
  97 with 7, the line number start with 0 end with F.  The code point of the
  98 character at 4/1 is 0x41.
  99
 100
 101 CHARACTER ENCODING SCHEME (CES)
 102
 103                                         *character-encode-scheme* *CES*
 104 |CES| is a mapping from a sequence of elements in one or more |CCS|es to a
 105 sequence of octets.  Examples of |CES| are EUC-JP, EUC-KR, EUC-CN (GB 2312),
 106 EUC-TW (CNS-11643), ISO-2022-JP, ISO-2022-KR, ISO-2022-CN, UTF-8, etc.
 107
 108
 109 CHARSET
 110                                                         *charset*
 111 |charset| is a method of converting a sequence of octets into a sequence of
 112 characters, the combination of one or more |CCS|es and a |CES|.  For example,
 113 ISO-2022-JP |charset| is the combination of ASCII, JIS X 0201, JIS X 0208
 114 |CCS|es and ISO-2022-JP |CES|.  Examples of |charset| are US-ASCII, ISO-8859
 115 series, GB2312, EUC-JP, EUC-KR, Shift_JIS, Big5, UTF-8, etc.
 116
 117 Note that this is not a term used by other standards bodies, such as ISO, but
 118 a term defined in RFC 2130.  The term "codeset" in POSIX has the same meaning
 119 as |charset| here.  |charset| does not mean character set (a set of
 120 characters) and the term "character repertoire" means a collection of distinct
 121 characters.  There are historical reasons, see RFC 2130.
 122
 123                                                 *charset-conversion*
 124 One language could have some |charset|s.  For example, Japanese has
 125 ISO-2022-JP, EUC-JP and Shift_JIS |charset|s.  ISO-2022-JP |charset| is used
 126 mainly for internet messages, because it is encoded in 7-bit scheme.  EUC-JP
 127 is mainly used on Unix, Shift_JIS is mainly used on Windows and MacOS.
 128
 129 Vim does not convert automatically to the locale's |charset| at display time.
 130 So, if a file's |charset| differs from your locale's |charset|, the file is
 131 not displayed correctly.  So, you must know the file's |charset| by any way:
 132 guessing, using some utilities, etc, and convert the |charset| to the locale's
 133 |charset| manually.
 134
 135 Useful utilities for converting the |charset|:
 136     Japanese:       nkf
 137         Nkf is "Network Kanji code conversion Filter".  One of the most unique
 138         facility of nkf is the guess of the input Kanji code.  So, you don't
 139         need to know what the inputting file's |charset| is.  When convert to
 140         EUC-JP from ISO-2022-JP or Shift_JIS, simply do the following command
 141         in Vim:
 142             :%!nkf -e
 143         Nkf can be found at:
 144         http://www.sfc.wide.ad.jp/~max/FreeBSD/ports/distfiles/nkf-1.62.tar.gz
 145     Chinese:        hc
 146         Hc is "Hanzi Converter".  Hc convert a GB file to a Big5 file, or Big5
 147         file to GB file.  Hc can be found at:
 148         ftp://ftp.cuhk.hk/pub/chinese/ifcss/software/unix/convert/hc-30.tar.gz
 149     Korean:         hmconv
 150         Hmconv is Korean code conversion utility especially for E-mail. It can
 151         convert between EUC-KR and ISO-2022-KR.  Hmconv can be found at:
 152         ftp://ftp.kaist.ac.kr/pub/hangul/code/hmconv/hmconv1.0pl3
 153     Multilingual:   lv
 154         Lv is a Powerful Multilingual File Viewer.  And it can be worked as
 155         |charset| converter.  Supported |charset|: ISO-2022-CN, ISO-2022-JP,
 156         ISO-2022-KR, EUC-CN, EUC-JP, EUC-KR, EUC-TW, UTF-7, UTF-8, ISO-8859
 157         series, Shift_JIS, Big5 and HZ. Lv can be found at:
 158         http://www.ff.iij4u.or.jp/~nrt/freeware/lv4493.tar.gz
 159
 160
 161 X LOGICAL FONT DESCRIPTION (XLFD)
 162                                                         *XLFD*
 163 XLFD is the X font name and contains the information about the font size,
 164 |CCS|, etc.  The name is in this format:
 165
 166 FOUNDRY-FAMILY-WEIGHT-SLANT-WIDTH-STYLE-PIXEL-POINT-X-Y-SPACE-AVE-CR-CE
 167
 168 Each field means:
 169
 170 - FOUNDRY:  FOUNDRY field.  The company that created the font.
 171 - FAMILY:   FAMILY_NAME field.  Basic font family name.  (helvetica, gothic,
 172             times, etc)
 173 - WEIGHT:   WEIGHT_NAME field.  How thick the letters are.  (light, medium,
 174             bold, etc)
 175 - SLANT:    SLANT field.
 176                 r:  Roman
 177                 i:  Italic
 178                 o:  Oblique
 179                 ri: Reverse Italic
 180                 ro: Reverse Oblique
 181                 ot: Other
 182                 number: Scaled font
 183 - WIDTH:    SETWIDTH_NAME field.  Width of characters.  (normal, condensed,
 184             narrow, double wide)
 185 - STYLE:    ADD_STYLE_NAME field.  Extra info to describe font.  (Serif, Sans
 186             Serif, Informal, Decorated, etc)
 187 - PIXEL:    PIXEL_SIZE field.  Height, in pixels, of characters.
 188 - POINT:    POINT_SIZE field.  Ten times height of characters in points.
 189 - X:        RESOLUTION_X field.  X resolution (dots per inch).
 190 - Y:        RESOLUTION_Y field.  Y resolution (dots per inch).
 191 - SPACE:    SPACING field.
 192                 p:  Proportional
 193                 m:  Monospaced
 194                 c:  CharCell
 195 - AVE:      AVERAGE_WIDTH field.  Ten times average width in pixels.
 196 - CR:       CHARSET_REGISTRY field.  Indicates the name of the font |CCS| name.
 197 - CE:       CHARSET_ENCODING field.  In some CCSes, such as ISO-8859 series,
 198             this field is the part of |CCS| name.  In other CCSes, such as JIS
 199             X 0208, if this field is 0, code points has the same value as GL,
 200             and GR if 1.
 201
 202 For example, in case of a 14 dots font corresponding to JIS X 0208, it is
 203 written like:
 204     -misc-fixed-medium-r-normal--16-110-100-100-c-160-jisx0208.1990-0
 205
 206
 207 X FONTSET
 208                                                 *fontset* *xfontset*
 209 A |CCS| typically associated with one font.  The languages which must manage
 210 multiple |CCS|es needs to manage multiple font.  In X11R5, for the
 211 internationalization of output API, FontSet was introduced.  By using this,
 212 Xlib takes care of switching of fonts and the display.  Till X11R4, the
 213 application themselves had to manage this.
 214
 215 |locale| database has the information about the |charset| of the |locale|,
 216 which |CCS|(es) is needed and which |CES| the locale uses.  When you use the
 217 locale which must manage multiple |CCS|es, you have to specify the each
 218 |CCS|'s font in 'guifontset' option.
 219
 220 Example:
 221     |charset| language              |CCS|es
 222     GB2312    Chinese (simplified)  ISO-8859-1 and GB 2312
 223     Big5      Chinese (traditional) ISO-8859-1 and Big5
 224     CNS-11643 Chinese (traditional) ISO-8859-1, CNS 11643-1 and CNS 11643-2
 225     EUC-JP    Japanese              JIS X 0201 and JIS X 0208
 226     EUC-KR    Korean                ISO-8859-1 and KS C 5601 (KS X 1001)
 227
 228 The |XLFD| contains the information of |CCS|.  So, by searching in fonts.dir,
 229 you can find the |CCS|'s font.  The fonts.dir is in the fonts directory (e.g.
 230 /usr/X11R6/lib/X11/fonts/*), the format of the file is:
 231     First line: the number of fonts which are contained in this fonts.dir
 232     other line: FILENAME  |XLFD|
 233 Or, you can search fonts using xlsfonts command.  For example, when you're
 234 searching for the font for KS C 5601:
 235 >   xlsfonts | grep ksc5601
 236 will show you the list of it.
 237
 238                                                 *base_font_name_list*
 239 In 'guifontset' option and ~/.Xdefaults, you specify the
 240 |base_font_name_list|, which is a list of |XLFD| font names that Xlib uses to
 241 load the fonts needed for the |locale|.  The base font names are a
 242 comma-separated list.
 243
 244 For example, when you use the ja_JP.eucJP |locale|, which require JIS X 0201
 245 and JIS X 0208 |CCS|es.  You could supply a |base_font_name_list| that
 246 explicitly specifies the charsets, like:
 247
 248 guifontset=-misc-fixed-medium-r-normal--14-130-75-75-c-140-jisx0208.1983-0,
 249     \-misc-fixed-medium-r-normal--14-130-75-75-c-70-jisx0201.1976-0
 250
 251 Alternatively, the user could supply a base font name list that omits the
 252 |CCS| name, letting Xlib select font characters required for the locale. For
 253 example:
 254
 255 guifontset=-misc-fixed-medium-r-normal--14-130-75-75-c-140,
 256     \-misc-fixed-medium-r-normal--14-130-75-75-c-70
 257
 258 Alternatively, the user could supply a single base font name that allows Xlib
 259 to select from all available fonts.  For example:
 260
 261 guifontset=-misc-fixed-medium-r-normal--14-*
 262
 263 Alternatively, the user could specify the alias name.  See fonts.alias in
 264 the fonts directory.
 265
 266 guifontset=k14,r14
 267
 268 Note that in East Asian fonts, the standard character cell is square.  When
 269 mixing Latin font and East Asian font, East Asian font width should be twice
 270 the Latin font width.  And GVIM needs fixed width font.
 271
 272
 273 X INPUT METHOD (XIM)                            *XIM* *xim* *x-input-method*
 274
 275 XIM (X Input Method) is an international input module for X.  There are two
 276 kind of structures, Xlib unit type and |IM-server| (Input-Method server) type.
 277 |IM-server| type is suitable for complex inputting, like CJK inputting.
 278
 279 - IM-server
 280                                                         *IM-server*
 281   In |IM-server| type input structures, the input event is handled by either
 282   of the two ways: FrontEnd system and BackEnd system.  In the FrontEnd
 283   system, input events are snatched by the |IM-server| first, then |IM-server|
 284   give the application the result of input.  On the other hand, the BackEnd
 285   system works reverse order.  MS Windows adopt BackEnd system.  In X, most of
 286   |IM-server|s adopt FrontEnd system.  The demerit of BackEnd system is the
 287   large overhead in communication, but it provides safe synchronization with
 288   no restrictions on applications.
 289
 290   For example, there are xwnmo and kinput2 Japanese |IM-server|, both are
 291   FrontEnd system.  Xwnmo is distributed with Wnn (see below), kinput2 can be
 292   found at: ftp://ftp.sra.co.jp/pub/x11/kinput2/
 293
 294   For Chinese, there's a great XIM server named "xcin", you can input both
 295   Traditional and Simplified Chinese characters.  And it can accept other
 296   locale if you make a correct input table.  Xcin can be found at:
 297   http://xcin.linux.org.tw/
 298
 299 - Conversion Server
 300                                                         *conversion-server*
 301   Some system needs additional server: conversion server.  Most of Japanese
 302   |IM-server|s need it, Kana-Kanji conversion server.  For Chinese inputting,
 303   it depends on the method of inputting, in some methods, PinYin or ZhuYin to
 304   HanZi conversion server is needed.  For Korean inputting, if you want to
 305   input Hanja, Hangul-Hanja conversion server is needed.
 306
 307   For example, the Japanese inputting process is divided into 2 steps.  First
 308   we pre-input Hira-gana, second Kana-Kanji conversion.  There are so many
 309   Kanji characters (6349 Kanji characters are defined in JIS X 0208) and the
 310   number of Hira-gana characters are 76.  So, first, we pre-input text as
 311   pronounced in Hira-gana, second, we convert Hira-gana to Kanji or Kata-Kana,
 312   if needed.  There are some Kana-Kanji conversion server: jserver
 313   (distributed with Wnn, see below) and canna. Canna can be found at:
 314   ftp://ftp.nec.co.jp/pub/Canna/
 315
 316 There is a good input system: Wnn4.2.  Wnn 4.2 contains,
 317     xwnmo (|multilingualized| |IM-server|)
 318     jserver (Japanese Kana-Kanji conversion server)
 319     cserver (Chinese PinYin or ZhuYin to simplified HanZi conversion server)
 320     tserver (Chinese PinYin or ZhuYin to traditional HanZi conversion server)
 321     kserver (Hangul-Hanja conversion server)
 322 Wnn 4.2 can be found at:
 323     ftp://ftp.FreeBSD.ORG/pub/FreeBSD/ports/distfiles/Wnn4.2.tar.gz
 324
 325
 326 - Input Style
 327                                                         *xim-input-style*
 328   When inputting CJK, there needs four areas.
 329
 330       1. The area to perform display of input in the midst
 331       2. The area to display input mode.
 332       3. The area to display the next candidate for the selection.
 333       4. The area to display other tools.
 334
 335   The third area is needed when converting.  For example, in Japanese
 336   inputting, multiple Kanji characters could have the same pronunciation, so
 337   a sequence of Hira-gana characters could map to a distinct sequence of Kanji
 338   characters.
 339
 340   The first and second areas are defined in international input of X with the
 341   names of "Preedit Area", "Status Area" respectively.  The third and fourth
 342   areas are not defined and are left to be managed by the |IM-server|.  In the
 343   international input, four input styles have been defined using combinations
 344   of Preedit Area and Status Area: |OnTheSpot|, |OffTheSpot|, |OverTheSpot|
 345   and |Root|.
 346
 347   Currently, GUI Vim support three style, |OverTheSpot|, |OffTheSpot| and
 348   |Root|.
 349
 350 *.  on-the-spot                                         *OnTheSpot*
 351     Preedit Area and Status Area are performed by the client application in
 352     the area of application.  The client application is directed by the
 353     |IM-server| to display all pre-edit data at the location of text
 354     insertion. The client registers callbacks invoked by the input method
 355     during pre-editing.
 356 *.  over-the-spot                                       *OverTheSpot*
 357     Status Area is created in a fixed position within the area of application,
 358     in case of Vim, the position is the additional status line.  Preedit Area
 359     is made at present input position of application.  The input method
 360     displays pre-edit data in a window which it brings up directly over the
 361     text insertion position.
 362 *.  off-the-spot                                        *OffTheSpot*
 363     Preedit Area and Status Area are performed in the area of application, in
 364     case of Vim, the area is additional status line.  The client application
 365     provides display windows for the pre-edit data to the input method which
 366     displays into them directly.
 367 *.  root-window                                         *Root*
 368     Preedit Area and Status Area are performed outside of the area of
 369     application.  The input method displays all pre-edit data in a separate
 370     area of the screen in a window specific to the input method.
 371
 372
 373 LOCALIZATION, INTERNATIONALIZATION AND MULTILINGUALIZATION
 374
 375                                         *localized* *Localization* *L10N*
 376 Localization (L10N)             To fit a system or an application with a
 377                                 specific language.
 378                             *internationalized* *Internationalization* *I18N*
 379 Internationalization (I18N)     To enable a system or an application to fit
 380                                 with a specific language according to the
 381                                 |locale|.
 382                             *multilingualized* *Multilingualization* *M17N*
 383 Multilingualization (M17N)      To enable a system or an application to be
 384                                 able to use multiple languages at the same
 385                                 time.
 386 For example, JVim (Japanized version Vim 3.0) is a |localized| application for
 387 Japanese.  Cxterm (|localized| xterm for Chinese), kterm (|localized| xterm
 388 for Japanese) and hanterm (|localized| xterm for Korean) is also a |localized|
 389 application.  Gnome is an |internationalized| application.  It can be
 390 |localized| for many languages according to the |locale|.  Mule (Multilingual
 391 Enhancement for GNU Emacs) is a |multilingualized| application.  It can handle
 392 multiple |charset|s and can maintain a mixture of languages in a single
 393 buffer.
 394
 395 Vim is an |internationalized| application.  So, you can change the language
 396 specifying the |locale| and some options at start time.
 397
 398 ==============================================================================
 399 2. Compiling                                            *multibyte-compiling*
 400
 401 -.  Before you start to compile Vim, be sure that your system has the language
 402     |locale| of your choice.  You might need to add "-DX_LOCALE" to CFLAGS.
 403
 404 -.  Compiling Vim:
 405 >       ./configure --with-x --enable-multibyte --enable-fontset --enable-xim
 406 >       make
 407
 408 -.  You can use multi-byte in the Vim GUI, which fully supports the
 409     |+multi_byte| feature.  If you only use console Vim, low-level multibyte
 410     input/output depends on your console.  For example, if you run Vim in an
 411     xterm, you should use a |localized| xterm or an xterm which support |XIM|.
 412     |localized| xterms are kterm (Kanji term) or hanterm (for Korean) for
 413     example.  Known |XIM| supporting xterms are Eterm (Enlightened terminal)
 414     and rxvt.
 415
 416 ==============================================================================
 417 3. Display                                              *multibyte-display*
 418
 419 Note that Display and Input are independent.  It is possible to see your
 420 language even though you have no input method for it.
 421
 422 Multibyte output uses |xfontset| feature.
 423
 424 -.  Be sure that your system has the fonts corresponding to the |CCS|es, which
 425     the |locale| needs to manage.  See: |xfontset|.
 426
 427 -.  Following are requirements to use multibyte language.
 428
 429     If needed, insert the lines below in your $HOME/.Xdefaults file.
 430     The GTK+ version of GUI Vim does not use .Xdefaults, thus this change is
 431     not needed for the GTK+ version.
 432
 433     These 3 lines are specific for Vim:
 434
 435         Vim.font: |base_font_name_list|
 436         Vim*fontSet: |base_font_name_list|
 437         Vim*fontList: your_language_font:
 438
 439         Note: Vim.font is for text area.
 440               Vim*fontSet is for menu.
 441               Vim*fontList is for menu (for Motif GUI)
 442
 443         For example, when you are using Japanese and 14 dots font,
 444
 445 >       Vim.font: -misc-fixed-medium-r-normal--14-*
 446 >       Vim*fontSet: -misc-fixed-medium-r-normal--14-*
 447 >       Vim*fontList: -misc-fixed-medium-r-normal--14-*
 448 >
 449         or
 450
 451 >       Vim.font: k14,r14
 452 >       Vim.fontSet: k14,r14
 453 >       Vim.fontList: k14
 454
 455     You should set the 'guifontset' option to display a multi-byte language.
 456     Example:
 457
 458         :set guifontset=|base_font_name_list|
 459
 460         For example, when you are using Japanese and 14 dots font,
 461
 462 >       set guifontset=-misc-fixed-medium-r-normal--14-*
 463
 464         or
 465
 466 >       set guifontset=k14,r14
 467
 468         Note: You can not use IM unless you specify 'guifontset'.
 469               Therefore, Latin users, you have to also use 'guifontset'
 470               if you use IM.
 471
 472     You should not set 'guifont'. If it is set, Vim ignores 'guifontset'.
 473     It means Vim runs without fontset support, you can see only English. The
 474     multi-byte characters are displayed corrupted.
 475
 476     After the |+xfontset| feature is enabled as explained above, Vim does not
 477     allow using 'font'.  For example, if you use:
 478 >      :set guifontset=eng_font,your_font
 479     in your .gvimrc, then you should use for highlighting:
 480 >      :hi Comment font=another_eng_font,another_your_font
 481     If you would do
 482 >      :hi Comment font=another_eng_font
 483     VIM will also try to use it as a fontset. So, if it cannot display your
 484     |locale| dependent codeset, you will see a error message.
 485
 486 -.  In your .vimrc, add this
 487 >       set fileencoding=korea
 488     You can change "korea" to the some other name such as japan, taiwan.
 489     See |'fileencoding'| for the supported encodings.
 490
 491 -.  If a file's charset is different from your |locale|'s charset, you need to
 492     convert the charset.  See |charset-conversion|.
 493
 494 ==============================================================================
 495 4. Input (XIM, X Input Method support)                  *multibyte-input*
 496
 497 Note that Display and Input are independent.  It is possible to see your
 498 language even though you have no input method for it.  But when your Display
 499 method doesn't match your Input method, the text will be displayed wrong.
 500
 501 -.  To input your language you should run the |IM-server| which supports your
 502     language and |conversion-server| if needed.  Multibyte input uses |XIM|
 503     feature.
 504
 505     Next 3 lines are common for all X applications which uses |XIM|.
 506     If you already use |XIM|, don't care.
 507
 508 >       *international: True
 509 >       *.inputMethod: your_input_server_name
 510 >       *.preeditType: your_input_style
 511
 512         Note: input_server_name is your |IM-server| name (check your
 513               |IM-server| manual).
 514               your_input_style is one of |OverTheSpot|, |OffTheSpot|, |Root|.
 515               See also |xim-input-style|.
 516               *international may not necessary if you use X11R6.
 517               *.inputMethod and *.preeditType is a optional if you use X11R6.
 518
 519         For example, when you are using kinput2 as |IM-server|,
 520
 521 >       *international: True
 522 >       *.inputMethod: kinput2
 523 >       *.preeditType: OverTheSpot
 524
 525     When using |OverTheSpot|, GUI Vim always connects to the IM Server even in
 526     Normal mode, so you can input your language with commands like "f" and
 527     "r".  But when using one of the other two methods, GUI Vim connects to the
 528     IM Server only if it is not in Normal mode.
 529
 530     If your IM Server does not support |OverTheSpot|, and if you want to use
 531     your language with some Normal mode command like "f" or "r", then you
 532     should use a |localized| xterm  or an xterm which supports |XIM|
 533
 534 -.  If needed, you can set the XMODIFIERS env. var.
 535
 536         sh:  export XMODIFIERS="@im=input_server_name"
 537         csh: setenv XMODIFIERS "@im=input_server_name"
 538
 539         For example, when you are using kinput2 as |IM-server| and sh,
 540
 541 >       export XMODIFIERS="@im=kinput2"
 542
 543
 544 Contributions specifically for the multi-byte features by:
 545         Chi-Deok Hwang <hwang@mizi.co.kr>
 546         Sung-Hyun Nam <namsh@lgic.co.kr>
 547         K.Nagano <nagano@atese.advantest.co.jp>
 548         Taro Muraoka  <koron@tka.att.ne.jp>
 549         Yasuhiro Matsumoto <mattn@mail.goo.ne.jp>
 550
 551 ==============================================================================
 552 5. UTF-8 in XFree86 xterm                               *UTF8-xterm*
 553
 554 This is a short explanation of how to use UTF-8 character encoding in the
 555 xterm that comes with XFree86 by Thomas Dickey (text by Markus Kuhn).
 556
 557 NOTE: Editing and viewing UTF-8 text in Vim does not work as expected yet!
 558
 559 Get the latest xterm version which has now UTF-8 support:
 560
 561         http://www.clark.net/pub/dickey/xterm/xterm.tar.gz
 562
 563 Compile it with "./configure --enable-wide-chars ; make"
 564
 565 Also get the ISO 10646-1 version of the 6x13 font, which is available on
 566
 567         http://www.cl.cam.ac.uk/~mgk25/download/ucs-fonts.tar.gz
 568
 569 and install the font as described in the README file.
 570
 571 Now start xterm with
 572
 573 >  xterm -u8 -fn -misc-fixed-medium-r-semicondensed--13-120-75-75-c-60-iso10646-1
 574
 575 and you will have a working UTF-8 terminal emulator. Try both
 576
 577 >  cat utf-8-demo.txt
 578 >  vim utf-8-demo.txt
 579
 580 with the demo text that comes with ucs-fonts.tar.gz in order to see
 581 whether there are any problems with UTF-8 in your xterm.