gnu/dist/gettext/gettext-tools/doc/gettext_5.html

   1 <HTML>
   2 <HEAD>
   3 <!-- This HTML file has been created by texi2html 1.52a
   4      from gettext.texi on 11 April 2005 -->
   5
   6 <TITLE>GNU gettext utilities - 5  Creating a New PO File</TITLE>
   7 </HEAD>
   8 <BODY>
   9 Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_4.html">previous</A>, <A HREF="gettext_6.html">next</A>, <A HREF="gettext_22.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
  10 <P><HR><P>
  11
  12
  13 <H1><A NAME="SEC32" HREF="gettext_toc.html#TOC32">5  Creating a New PO File</A></H1>
  14 <P>
  15 <A NAME="IDX259"></A>
  16
  17 </P>
  18 <P>
  19 When starting a new translation, the translator creates a file called
  20 <TT>`<VAR>LANG</VAR>.po&acute;</TT>, as a copy of the <TT>`<VAR>package</VAR>.pot&acute;</TT> template
  21 file with modifications in the initial comments (at the beginning of the file)
  22 and in the header entry (the first entry, near the beginning of the file).
  23
  24 </P>
  25 <P>
  26 The easiest way to do so is by use of the <SAMP>`msginit&acute;</SAMP> program.
  27 For example:
  28
  29 </P>
  30
  31 <PRE>
  32 $ cd <VAR>PACKAGE</VAR>-<VAR>VERSION</VAR>
  33 $ cd po
  34 $ msginit
  35 </PRE>
  36
  37 <P>
  38 The alternative way is to do the copy and modifications by hand.
  39 To do so, the translator copies <TT>`<VAR>package</VAR>.pot&acute;</TT> to
  40 <TT>`<VAR>LANG</VAR>.po&acute;</TT>.  Then she modifies the initial comments and
  41 the header entry of this file.
  42
  43 </P>
  44
  45
  46
  47 <H2><A NAME="SEC33" HREF="gettext_toc.html#TOC33">5.1  Invoking the <CODE>msginit</CODE> Program</A></H2>
  48
  49 <P>
  50 <A NAME="IDX260"></A>
  51 <A NAME="IDX261"></A>
  52
  53 <PRE>
  54 msginit [<VAR>option</VAR>]
  55 </PRE>
  56
  57 <P>
  58 <A NAME="IDX262"></A>
  59 <A NAME="IDX263"></A>
  60 The <CODE>msginit</CODE> program creates a new PO file, initializing the meta
  61 information with values from the user's environment.
  62
  63 </P>
  64
  65
  66 <H3><A NAME="SEC34" HREF="gettext_toc.html#TOC34">5.1.1  Input file location</A></H3>
  67
  68 <DL COMPACT>
  69
  70 <DT><SAMP>`-i <VAR>inputfile</VAR>&acute;</SAMP>
  71 <DD>
  72 <DT><SAMP>`--input=<VAR>inputfile</VAR>&acute;</SAMP>
  73 <DD>
  74 <A NAME="IDX264"></A>
  75 <A NAME="IDX265"></A>
  76 Input POT file.
  77
  78 </DL>
  79
  80 <P>
  81 If no <VAR>inputfile</VAR> is given, the current directory is searched for the
  82 POT file.  If it is <SAMP>`-&acute;</SAMP>, standard input is read.
  83
  84 </P>
  85
  86
  87 <H3><A NAME="SEC35" HREF="gettext_toc.html#TOC35">5.1.2  Output file location</A></H3>
  88
  89 <DL COMPACT>
  90
  91 <DT><SAMP>`-o <VAR>file</VAR>&acute;</SAMP>
  92 <DD>
  93 <DT><SAMP>`--output-file=<VAR>file</VAR>&acute;</SAMP>
  94 <DD>
  95 <A NAME="IDX266"></A>
  96 <A NAME="IDX267"></A>
  97 Write output to specified PO file.
  98
  99 </DL>
 100
 101 <P>
 102 If no output file is given, it depends on the <SAMP>`--locale&acute;</SAMP> option or the
 103 user's locale setting.  If it is <SAMP>`-&acute;</SAMP>, the results are written to
 104 standard output.
 105
 106 </P>
 107
 108
 109 <H3><A NAME="SEC36" HREF="gettext_toc.html#TOC36">5.1.3  Input file syntax</A></H3>
 110
 111 <DL COMPACT>
 112
 113 <DT><SAMP>`-P&acute;</SAMP>
 114 <DD>
 115 <DT><SAMP>`--properties-input&acute;</SAMP>
 116 <DD>
 117 <A NAME="IDX268"></A>
 118 <A NAME="IDX269"></A>
 119 Assume the input file is a Java ResourceBundle in Java <CODE>.properties</CODE>
 120 syntax, not in PO file syntax.
 121
 122 <DT><SAMP>`--stringtable-input&acute;</SAMP>
 123 <DD>
 124 <A NAME="IDX270"></A>
 125 Assume the input file is a NeXTstep/GNUstep localized resource file in
 126 <CODE>.strings</CODE> syntax, not in PO file syntax.
 127
 128 </DL>
 129
 130
 131
 132 <H3><A NAME="SEC37" HREF="gettext_toc.html#TOC37">5.1.4  Output details</A></H3>
 133
 134 <DL COMPACT>
 135
 136 <DT><SAMP>`-l <VAR>ll_CC</VAR>&acute;</SAMP>
 137 <DD>
 138 <DT><SAMP>`--locale=<VAR>ll_CC</VAR>&acute;</SAMP>
 139 <DD>
 140 <A NAME="IDX271"></A>
 141 <A NAME="IDX272"></A>
 142 Set target locale.  <VAR>ll</VAR> should be a language code, and <VAR>CC</VAR> should
 143 be a country code.  The command <SAMP>`locale -a&acute;</SAMP> can be used to output a list
 144 of all installed locales.  The default is the user's locale setting.
 145
 146 <DT><SAMP>`--no-translator&acute;</SAMP>
 147 <DD>
 148 <A NAME="IDX273"></A>
 149 Declares that the PO file will not have a human translator and is instead
 150 automatically generated.
 151
 152 <DT><SAMP>`-p&acute;</SAMP>
 153 <DD>
 154 <DT><SAMP>`--properties-output&acute;</SAMP>
 155 <DD>
 156 <A NAME="IDX274"></A>
 157 <A NAME="IDX275"></A>
 158 Write out a Java ResourceBundle in Java <CODE>.properties</CODE> syntax.  Note
 159 that this file format doesn't support plural forms and silently drops
 160 obsolete messages.
 161
 162 <DT><SAMP>`--stringtable-output&acute;</SAMP>
 163 <DD>
 164 <A NAME="IDX276"></A>
 165 Write out a NeXTstep/GNUstep localized resource file in <CODE>.strings</CODE> syntax.
 166 Note that this file format doesn't support plural forms.
 167
 168 <DT><SAMP>`-w <VAR>number</VAR>&acute;</SAMP>
 169 <DD>
 170 <DT><SAMP>`--width=<VAR>number</VAR>&acute;</SAMP>
 171 <DD>
 172 <A NAME="IDX277"></A>
 173 <A NAME="IDX278"></A>
 174 Set the output page width.  Long strings in the output files will be
 175 split across multiple lines in order to ensure that each line's width
 176 (= number of screen columns) is less or equal to the given <VAR>number</VAR>.
 177
 178 <DT><SAMP>`--no-wrap&acute;</SAMP>
 179 <DD>
 180 <A NAME="IDX279"></A>
 181 Do not break long message lines.  Message lines whose width exceeds the
 182 output page width will not be split into several lines.  Only file reference
 183 lines which are wider than the output page width will be split.
 184
 185 </DL>
 186
 187
 188
 189 <H3><A NAME="SEC38" HREF="gettext_toc.html#TOC38">5.1.5  Informative output</A></H3>
 190
 191 <DL COMPACT>
 192
 193 <DT><SAMP>`-h&acute;</SAMP>
 194 <DD>
 195 <DT><SAMP>`--help&acute;</SAMP>
 196 <DD>
 197 <A NAME="IDX280"></A>
 198 <A NAME="IDX281"></A>
 199 Display this help and exit.
 200
 201 <DT><SAMP>`-V&acute;</SAMP>
 202 <DD>
 203 <DT><SAMP>`--version&acute;</SAMP>
 204 <DD>
 205 <A NAME="IDX282"></A>
 206 <A NAME="IDX283"></A>
 207 Output version information and exit.
 208
 209 </DL>
 210
 211
 212
 213 <H2><A NAME="SEC39" HREF="gettext_toc.html#TOC39">5.2  Filling in the Header Entry</A></H2>
 214 <P>
 215 <A NAME="IDX284"></A>
 216
 217 </P>
 218 <P>
 219 The initial comments "SOME DESCRIPTIVE TITLE", "YEAR" and
 220 "FIRST AUTHOR &#60;EMAIL@ADDRESS&#62;, YEAR" ought to be replaced by sensible
 221 information.  This can be done in any text editor; if Emacs is used
 222 and it switched to PO mode automatically (because it has recognized
 223 the file's suffix), you can disable it by typing <KBD>M-x fundamental-mode</KBD>.
 224
 225 </P>
 226 <P>
 227 Modifying the header entry can already be done using PO mode: in Emacs,
 228 type <KBD>M-x po-mode RET</KBD> and then <KBD>RET</KBD> again to start editing the
 229 entry.  You should fill in the following fields.
 230
 231 </P>
 232 <DL COMPACT>
 233
 234 <DT>Project-Id-Version
 235 <DD>
 236 This is the name and version of the package.
 237
 238 <DT>Report-Msgid-Bugs-To
 239 <DD>
 240 This has already been filled in by <CODE>xgettext</CODE>.  It contains an email
 241 address or URL where you can report bugs in the untranslated strings:
 242
 243
 244 <UL>
 245 <LI>Strings which are not entire sentences, see the maintainer guidelines
 246
 247 in section <A HREF="gettext_3.html#SEC15">3.2  Preparing Translatable Strings</A>.
 248 <LI>Strings which use unclear terms or require additional context to be
 249
 250 understood.
 251 <LI>Strings which make invalid assumptions about notation of date, time or
 252
 253 money.
 254 <LI>Pluralisation problems.
 255
 256 <LI>Incorrect English spelling.
 257
 258 <LI>Incorrect formatting.
 259
 260 </UL>
 261
 262 <DT>POT-Creation-Date
 263 <DD>
 264 This has already been filled in by <CODE>xgettext</CODE>.
 265
 266 <DT>PO-Revision-Date
 267 <DD>
 268 You don't need to fill this in.  It will be filled by the Emacs PO mode
 269 when you save the file.
 270
 271 <DT>Last-Translator
 272 <DD>
 273 Fill in your name and email address (without double quotes).
 274
 275 <DT>Language-Team
 276 <DD>
 277 Fill in the English name of the language, and the email address or
 278 homepage URL of the language team you are part of.
 279
 280 Before starting a translation, it is a good idea to get in touch with
 281 your translation team, not only to make sure you don't do duplicated work,
 282 but also to coordinate difficult linguistic issues.
 283
 284 <A NAME="IDX285"></A>
 285 In the Free Translation Project, each translation team has its own mailing
 286 list.  The up-to-date list of teams can be found at the Free Translation
 287 Project's homepage, <A HREF="http://www.iro.umontreal.ca/contrib/po/HTML/">http://www.iro.umontreal.ca/contrib/po/HTML/</A>,
 288 in the "National teams" area.
 289
 290 <DT>Content-Type
 291 <DD>
 292 <A NAME="IDX286"></A>
 293 <A NAME="IDX287"></A>
 294 Replace <SAMP>`CHARSET&acute;</SAMP> with the character encoding used for your language,
 295 in your locale, or UTF-8.  This field is needed for correct operation of the
 296 <CODE>msgmerge</CODE> and <CODE>msgfmt</CODE> programs, as well as for users whose
 297 locale's character encoding differs from yours (see section <A HREF="gettext_10.html#SEC168">10.2.4  How to specify the output character set <CODE>gettext</CODE> uses</A>).
 298
 299 <A NAME="IDX288"></A>
 300 You get the character encoding of your locale by running the shell command
 301 <SAMP>`locale charmap&acute;</SAMP>.  If the result is <SAMP>`C&acute;</SAMP> or <SAMP>`ANSI_X3.4-1968&acute;</SAMP>,
 302 which is equivalent to <SAMP>`ASCII&acute;</SAMP> (= <SAMP>`US-ASCII&acute;</SAMP>), it means that your
 303 locale is not correctly configured.  In this case, ask your translation
 304 team which charset to use.  <SAMP>`ASCII&acute;</SAMP> is not usable for any language
 305 except Latin.
 306
 307 <A NAME="IDX289"></A>
 308 Because the PO files must be portable to operating systems with less advanced
 309 internationalization facilities, the character encodings that can be used
 310 are limited to those supported by both GNU <CODE>libc</CODE> and GNU
 311 <CODE>libiconv</CODE>.  These are:
 312 <CODE>ASCII</CODE>, <CODE>ISO-8859-1</CODE>, <CODE>ISO-8859-2</CODE>, <CODE>ISO-8859-3</CODE>,
 313 <CODE>ISO-8859-4</CODE>, <CODE>ISO-8859-5</CODE>, <CODE>ISO-8859-6</CODE>, <CODE>ISO-8859-7</CODE>,
 314 <CODE>ISO-8859-8</CODE>, <CODE>ISO-8859-9</CODE>, <CODE>ISO-8859-13</CODE>, <CODE>ISO-8859-14</CODE>,
 315 <CODE>ISO-8859-15</CODE>,
 316 <CODE>KOI8-R</CODE>, <CODE>KOI8-U</CODE>, <CODE>KOI8-T</CODE>,
 317 <CODE>CP850</CODE>, <CODE>CP866</CODE>, <CODE>CP874</CODE>,
 318 <CODE>CP932</CODE>, <CODE>CP949</CODE>, <CODE>CP950</CODE>, <CODE>CP1250</CODE>, <CODE>CP1251</CODE>,
 319 <CODE>CP1252</CODE>, <CODE>CP1253</CODE>, <CODE>CP1254</CODE>, <CODE>CP1255</CODE>, <CODE>CP1256</CODE>,
 320 <CODE>CP1257</CODE>, <CODE>GB2312</CODE>, <CODE>EUC-JP</CODE>, <CODE>EUC-KR</CODE>, <CODE>EUC-TW</CODE>,
 321 <CODE>BIG5</CODE>, <CODE>BIG5-HKSCS</CODE>, <CODE>GBK</CODE>, <CODE>GB18030</CODE>, <CODE>SHIFT_JIS</CODE>,
 322 <CODE>JOHAB</CODE>, <CODE>TIS-620</CODE>, <CODE>VISCII</CODE>, <CODE>GEORGIAN-PS</CODE>, <CODE>UTF-8</CODE>.
 323
 324 <A NAME="IDX290"></A>
 325 In the GNU system, the following encodings are frequently used for the
 326 corresponding languages.
 327
 328 <A NAME="IDX291"></A>
 329
 330 <UL>
 331 <LI><CODE>ISO-8859-1</CODE> for
 332
 333 Afrikaans, Albanian, Basque, Breton, Catalan, Cornish, Danish, Dutch,
 334 English, Estonian, Faroese, Finnish, French, Galician, German,
 335 Greenlandic, Icelandic, Indonesian, Irish, Italian, Malay, Manx,
 336 Norwegian, Occitan, Portuguese, Spanish, Swedish, Tagalog, Uzbek,
 337 Walloon,
 338 <LI><CODE>ISO-8859-2</CODE> for
 339
 340 Bosnian, Croatian, Czech, Hungarian, Polish, Romanian, Serbian, Slovak,
 341 Slovenian,
 342 <LI><CODE>ISO-8859-3</CODE> for Maltese,
 343
 344 <LI><CODE>ISO-8859-5</CODE> for Macedonian, Serbian,
 345
 346 <LI><CODE>ISO-8859-6</CODE> for Arabic,
 347
 348 <LI><CODE>ISO-8859-7</CODE> for Greek,
 349
 350 <LI><CODE>ISO-8859-8</CODE> for Hebrew,
 351
 352 <LI><CODE>ISO-8859-9</CODE> for Turkish,
 353
 354 <LI><CODE>ISO-8859-13</CODE> for Latvian, Lithuanian, Maori,
 355
 356 <LI><CODE>ISO-8859-14</CODE> for Welsh,
 357
 358 <LI><CODE>ISO-8859-15</CODE> for
 359
 360 Basque, Catalan, Dutch, English, Finnish, French, Galician, German, Irish,
 361 Italian, Portuguese, Spanish, Swedish, Walloon,
 362 <LI><CODE>KOI8-R</CODE> for Russian,
 363
 364 <LI><CODE>KOI8-U</CODE> for Ukrainian,
 365
 366 <LI><CODE>KOI8-T</CODE> for Tajik,
 367
 368 <LI><CODE>CP1251</CODE> for Bulgarian, Byelorussian,
 369
 370 <LI><CODE>GB2312</CODE>, <CODE>GBK</CODE>, <CODE>GB18030</CODE>
 371
 372 for simplified writing of Chinese,
 373 <LI><CODE>BIG5</CODE>, <CODE>BIG5-HKSCS</CODE>
 374
 375 for traditional writing of Chinese,
 376 <LI><CODE>EUC-JP</CODE> for Japanese,
 377
 378 <LI><CODE>EUC-KR</CODE> for Korean,
 379
 380 <LI><CODE>TIS-620</CODE> for Thai,
 381
 382 <LI><CODE>GEORGIAN-PS</CODE> for Georgian,
 383
 384 <LI><CODE>UTF-8</CODE> for any language, including those listed above.
 385
 386 </UL>
 387
 388 <A NAME="IDX292"></A>
 389 <A NAME="IDX293"></A>
 390 When single quote characters or double quote characters are used in
 391 translations for your language, and your locale's encoding is one of the
 392 ISO-8859-* charsets, it is best if you create your PO files in UTF-8
 393 encoding, instead of your locale's encoding.  This is because in UTF-8
 394 the real quote characters can be represented (single quote characters:
 395 U+2018, U+2019, double quote characters: U+201C, U+201D), whereas none of
 396 ISO-8859-* charsets has them all.  Users in UTF-8 locales will see the
 397 real quote characters, whereas users in ISO-8859-* locales will see the
 398 vertical apostrophe and the vertical double quote instead (because that's
 399 what the character set conversion will transliterate them to).
 400
 401 <A NAME="IDX294"></A>
 402 To enter such quote characters under X11, you can change your keyboard
 403 mapping using the <CODE>xmodmap</CODE> program.  The X11 names of the quote
 404 characters are "leftsinglequotemark", "rightsinglequotemark",
 405 "leftdoublequotemark", "rightdoublequotemark", "singlelowquotemark",
 406 "doublelowquotemark".
 407
 408 Note that only recent versions of GNU Emacs support the UTF-8 encoding:
 409 Emacs 20 with Mule-UCS, and Emacs 21.  As of January 2001, XEmacs doesn't
 410 support the UTF-8 encoding.
 411
 412 The character encoding name can be written in either upper or lower case.
 413 Usually upper case is preferred.
 414
 415 <DT>Content-Transfer-Encoding
 416 <DD>
 417 Set this to <CODE>8bit</CODE>.
 418
 419 <DT>Plural-Forms
 420 <DD>
 421 This field is optional.  It is only needed if the PO file has plural forms.
 422 You can find them by searching for the <SAMP>`msgid_plural&acute;</SAMP> keyword.  The
 423 format of the plural forms field is described in section <A HREF="gettext_10.html#SEC169">10.2.5  Additional functions for plural forms</A>.
 424 </DL>
 425
 426 <P><HR><P>
 427 Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_4.html">previous</A>, <A HREF="gettext_6.html">next</A>, <A HREF="gettext_22.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
 428 </BODY>
 429 </HTML>