doc/mp_encoding.txt

   1 Minimum Profit character encoding support
   2 =========================================
   3
   4 This document describes the character encodings supported by the
   5 Minimum Profit text editor and the performed autodetection tests.
   6
   7 None (default locale)
   8 ---------------------
   9
  10 The following steps are performed on input:
  11
  12  * If any utf BOM is found, it sets the document encoding to any of
  13    `utf-8bom', `utf-16le', `utf-16be', `utf-32le' or `utf-32be';
  14  * Otherwise, if an explicit utf-8 sequence is detected, it sets the
  15    document encoding to `utf-8';
  16  * Otherwise, if some character is found with the 7 bit set (that is,
  17    a non-ASCII character), but does not conform to the utf-8 standard,
  18    it sets the document encoding to `8bit';
  19  * In any other case, no encoding is forced, and the file is read using
  20    the locale conversion functions.
  21
  22 On output, the document is saved using the locale conversion functions.
  23
  24 utf-8
  25 -----
  26
  27 The following steps are performed on input:
  28
  29  * If an utf-8 BOM is found, it sets the document encoding to `utf-8bom';
  30  * In any other case, utf-8 is assumed as the character encoding and any
  31    invalid character combination is converted to the `?' character.
  32
  33 On output, it saves the document using the utf-8 encoding without a BOM
  34 prefix.
  35
  36 utf-8bom
  37 --------
  38
  39 On input, if no utf-8 BOM is found, the encoding is still assumed to be
  40 `utf-8', but not changed to it.
  41
  42 On output, it saves the document using the utf-8 encoding with a BOM
  43 prefix.
  44
  45 8bit
  46 ----
  47
  48 No character conversion is done on input nor output.
  49
  50 iso8859-1
  51 ---------
  52
  53 Characters are treated as being encoded using the iso8859-1 character set,
  54 that is, no real conversion is done. This mode is really identical to
  55 `8bit'.
  56
  57 Aliases: `latin1'.
  58
  59 utf-16
  60 ------
  61
  62 On input, it tries to determine the endianness of the document by reading
  63 the BOM; if a valid one is found, encoding is set to `utf-16le' or
  64 `utf-16be'; if none is found, it assumes `utf-16le'.
  65
  66 On output, it behaves like `utf-16le'.
  67
  68 Aliases: `ucs-2'.
  69
  70 utf-16le
  71 --------
  72
  73 On input, it assumes utf-16 little endian characters.
  74
  75 On output, it saves the document using the utf-16 little endian encoding
  76 with a BOM prefix.
  77
  78 Aliases: `ucs-2le'.
  79
  80 utf-16be
  81 --------
  82
  83 On input, it assumes utf-16 big endian characters.
  84
  85 On output, it saves the document using the utf-16 big endian encoding
  86 with a BOM prefix.
  87
  88 Aliases: `ucs-2be'.
  89
  90 utf-32
  91 ------
  92
  93 On input, it tries to determine the endianness of the document by reading
  94 the BOM; it a valid one is found, encoding is set to `utf-32le' or
  95 `utf-32be'; if none is found, it assumes `utf-32le'.
  96
  97 On output, it behaves like `utf-32le'.
  98
  99 Aliases: `ucs-4'.
 100
 101 utf-32le
 102 --------
 103
 104 On input, it assumes utf-32 little endian characters.
 105
 106 On output, it saves the document using the utf-32 little endian encoding
 107 with a BOM prefix.
 108
 109 Aliases: `ucs-4le'.
 110
 111 utf-32be
 112 --------
 113
 114 On input, it assumes utf-32 big endian characters.
 115
 116 On output, it saves the document using the utf-32 big endian encoding
 117 with a BOM prefix.
 118
 119 Aliases: `ucs-4be'.
 120
 121 Iconv support
 122 -------------
 123
 124 If Minimum Profit is compiled with support for the `iconv' library, many
 125 more encodings will be available. There is no easy way of knowing their
 126 names; the underlying system may provide the `iconv --list' command to have
 127 a list.
 128
 129 End of line markers
 130 -------------------
 131
 132 Though not directly related to character encodings, the Minimum Profit text
 133 editor remembers the end of line marker found inside each document, and use
 134 it when saving it afterwards. This helps in maintaining document
 135 compatibility and portability. This behaviour can be disabled by setting
 136 the `mp.config.keep_eol' configuration directive to 0.
 137
 138 ----
 139 Angel Ortega <angel@triptico.com>