3 <!-- This HTML file has been created by texi2html 1.52a
4 from gettext.texi on 11 April 2005 -->
6 <TITLE>GNU gettext utilities -
5 Creating a New PO File
</TITLE>
9 Go to the
<A HREF=
"gettext_1.html">first
</A>,
<A HREF=
"gettext_4.html">previous
</A>,
<A HREF=
"gettext_6.html">next
</A>,
<A HREF=
"gettext_22.html">last
</A> section,
<A HREF=
"gettext_toc.html">table of contents
</A>.
13 <H1><A NAME=
"SEC32" HREF=
"gettext_toc.html#TOC32">5 Creating a New PO File
</A></H1>
19 When starting a new translation, the translator creates a file called
20 <TT>`
<VAR>LANG
</VAR>.po
´</TT>, as a copy of the
<TT>`
<VAR>package
</VAR>.pot
´</TT> template
21 file with modifications in the initial comments (at the beginning of the file)
22 and in the header entry (the first entry, near the beginning of the file).
26 The easiest way to do so is by use of the
<SAMP>`msginit
´</SAMP> program.
32 $ cd
<VAR>PACKAGE
</VAR>-
<VAR>VERSION
</VAR>
38 The alternative way is to do the copy and modifications by hand.
39 To do so, the translator copies
<TT>`
<VAR>package
</VAR>.pot
´</TT> to
40 <TT>`
<VAR>LANG
</VAR>.po
´</TT>. Then she modifies the initial comments and
41 the header entry of this file.
47 <H2><A NAME=
"SEC33" HREF=
"gettext_toc.html#TOC33">5.1 Invoking the
<CODE>msginit
</CODE> Program
</A></H2>
54 msginit [
<VAR>option
</VAR>]
60 The
<CODE>msginit
</CODE> program creates a new PO file, initializing the meta
61 information with values from the user's environment.
66 <H3><A NAME=
"SEC34" HREF=
"gettext_toc.html#TOC34">5.1.1 Input file location
</A></H3>
70 <DT><SAMP>`-i
<VAR>inputfile
</VAR>´</SAMP>
72 <DT><SAMP>`--input=
<VAR>inputfile
</VAR>´</SAMP>
81 If no
<VAR>inputfile
</VAR> is given, the current directory is searched for the
82 POT file. If it is
<SAMP>`-
´</SAMP>, standard input is read.
87 <H3><A NAME=
"SEC35" HREF=
"gettext_toc.html#TOC35">5.1.2 Output file location
</A></H3>
91 <DT><SAMP>`-o
<VAR>file
</VAR>´</SAMP>
93 <DT><SAMP>`--output-file=
<VAR>file
</VAR>´</SAMP>
97 Write output to specified PO file.
102 If no output file is given, it depends on the
<SAMP>`--locale
´</SAMP> option or the
103 user's locale setting. If it is
<SAMP>`-
´</SAMP>, the results are written to
109 <H3><A NAME=
"SEC36" HREF=
"gettext_toc.html#TOC36">5.1.3 Input file syntax
</A></H3>
113 <DT><SAMP>`-P
´</SAMP>
115 <DT><SAMP>`--properties-input
´</SAMP>
117 <A NAME=
"IDX268"></A>
118 <A NAME=
"IDX269"></A>
119 Assume the input file is a Java ResourceBundle in Java
<CODE>.properties
</CODE>
120 syntax, not in PO file syntax.
122 <DT><SAMP>`--stringtable-input
´</SAMP>
124 <A NAME=
"IDX270"></A>
125 Assume the input file is a NeXTstep/GNUstep localized resource file in
126 <CODE>.strings
</CODE> syntax, not in PO file syntax.
132 <H3><A NAME=
"SEC37" HREF=
"gettext_toc.html#TOC37">5.1.4 Output details
</A></H3>
136 <DT><SAMP>`-l
<VAR>ll_CC
</VAR>´</SAMP>
138 <DT><SAMP>`--locale=
<VAR>ll_CC
</VAR>´</SAMP>
140 <A NAME=
"IDX271"></A>
141 <A NAME=
"IDX272"></A>
142 Set target locale.
<VAR>ll
</VAR> should be a language code, and
<VAR>CC
</VAR> should
143 be a country code. The command
<SAMP>`locale -a
´</SAMP> can be used to output a list
144 of all installed locales. The default is the user's locale setting.
146 <DT><SAMP>`--no-translator
´</SAMP>
148 <A NAME=
"IDX273"></A>
149 Declares that the PO file will not have a human translator and is instead
150 automatically generated.
152 <DT><SAMP>`-p
´</SAMP>
154 <DT><SAMP>`--properties-output
´</SAMP>
156 <A NAME=
"IDX274"></A>
157 <A NAME=
"IDX275"></A>
158 Write out a Java ResourceBundle in Java
<CODE>.properties
</CODE> syntax. Note
159 that this file format doesn't support plural forms and silently drops
162 <DT><SAMP>`--stringtable-output
´</SAMP>
164 <A NAME=
"IDX276"></A>
165 Write out a NeXTstep/GNUstep localized resource file in
<CODE>.strings
</CODE> syntax.
166 Note that this file format doesn't support plural forms.
168 <DT><SAMP>`-w
<VAR>number
</VAR>´</SAMP>
170 <DT><SAMP>`--width=
<VAR>number
</VAR>´</SAMP>
172 <A NAME=
"IDX277"></A>
173 <A NAME=
"IDX278"></A>
174 Set the output page width. Long strings in the output files will be
175 split across multiple lines in order to ensure that each line's width
176 (= number of screen columns) is less or equal to the given
<VAR>number
</VAR>.
178 <DT><SAMP>`--no-wrap
´</SAMP>
180 <A NAME=
"IDX279"></A>
181 Do not break long message lines. Message lines whose width exceeds the
182 output page width will not be split into several lines. Only file reference
183 lines which are wider than the output page width will be split.
189 <H3><A NAME=
"SEC38" HREF=
"gettext_toc.html#TOC38">5.1.5 Informative output
</A></H3>
193 <DT><SAMP>`-h
´</SAMP>
195 <DT><SAMP>`--help
´</SAMP>
197 <A NAME=
"IDX280"></A>
198 <A NAME=
"IDX281"></A>
199 Display this help and exit.
201 <DT><SAMP>`-V
´</SAMP>
203 <DT><SAMP>`--version
´</SAMP>
205 <A NAME=
"IDX282"></A>
206 <A NAME=
"IDX283"></A>
207 Output version information and exit.
213 <H2><A NAME=
"SEC39" HREF=
"gettext_toc.html#TOC39">5.2 Filling in the Header Entry
</A></H2>
215 <A NAME=
"IDX284"></A>
219 The initial comments
"SOME DESCRIPTIVE TITLE",
"YEAR" and
220 "FIRST AUTHOR <EMAIL@ADDRESS>, YEAR" ought to be replaced by sensible
221 information. This can be done in any text editor; if Emacs is used
222 and it switched to PO mode automatically (because it has recognized
223 the file's suffix), you can disable it by typing
<KBD>M-x fundamental-mode
</KBD>.
227 Modifying the header entry can already be done using PO mode: in Emacs,
228 type
<KBD>M-x po-mode RET
</KBD> and then
<KBD>RET
</KBD> again to start editing the
229 entry. You should fill in the following fields.
234 <DT>Project-Id-Version
236 This is the name and version of the package.
238 <DT>Report-Msgid-Bugs-To
240 This has already been filled in by
<CODE>xgettext
</CODE>. It contains an email
241 address or URL where you can report bugs in the untranslated strings:
245 <LI>Strings which are not entire sentences, see the maintainer guidelines
247 in section
<A HREF=
"gettext_3.html#SEC15">3.2 Preparing Translatable Strings
</A>.
248 <LI>Strings which use unclear terms or require additional context to be
251 <LI>Strings which make invalid assumptions about notation of date, time or
254 <LI>Pluralisation problems.
256 <LI>Incorrect English spelling.
258 <LI>Incorrect formatting.
262 <DT>POT-Creation-Date
264 This has already been filled in by
<CODE>xgettext
</CODE>.
268 You don't need to fill this in. It will be filled by the Emacs PO mode
269 when you save the file.
273 Fill in your name and email address (without double quotes).
277 Fill in the English name of the language, and the email address or
278 homepage URL of the language team you are part of.
280 Before starting a translation, it is a good idea to get in touch with
281 your translation team, not only to make sure you don't do duplicated work,
282 but also to coordinate difficult linguistic issues.
284 <A NAME=
"IDX285"></A>
285 In the Free Translation Project, each translation team has its own mailing
286 list. The up-to-date list of teams can be found at the Free Translation
287 Project's homepage,
<A HREF=
"http://www.iro.umontreal.ca/contrib/po/HTML/">http://www.iro.umontreal.ca/contrib/po/HTML/
</A>,
288 in the
"National teams" area.
292 <A NAME=
"IDX286"></A>
293 <A NAME=
"IDX287"></A>
294 Replace
<SAMP>`CHARSET
´</SAMP> with the character encoding used for your language,
295 in your locale, or UTF-
8. This field is needed for correct operation of the
296 <CODE>msgmerge
</CODE> and
<CODE>msgfmt
</CODE> programs, as well as for users whose
297 locale's character encoding differs from yours (see section
<A HREF=
"gettext_10.html#SEC168">10.2.4 How to specify the output character set
<CODE>gettext
</CODE> uses
</A>).
299 <A NAME=
"IDX288"></A>
300 You get the character encoding of your locale by running the shell command
301 <SAMP>`locale charmap
´</SAMP>. If the result is
<SAMP>`C
´</SAMP> or
<SAMP>`ANSI_X3.4-
1968´</SAMP>,
302 which is equivalent to
<SAMP>`ASCII
´</SAMP> (=
<SAMP>`US-ASCII
´</SAMP>), it means that your
303 locale is not correctly configured. In this case, ask your translation
304 team which charset to use.
<SAMP>`ASCII
´</SAMP> is not usable for any language
307 <A NAME=
"IDX289"></A>
308 Because the PO files must be portable to operating systems with less advanced
309 internationalization facilities, the character encodings that can be used
310 are limited to those supported by both GNU
<CODE>libc
</CODE> and GNU
311 <CODE>libiconv
</CODE>. These are:
312 <CODE>ASCII
</CODE>,
<CODE>ISO-
8859-
1</CODE>,
<CODE>ISO-
8859-
2</CODE>,
<CODE>ISO-
8859-
3</CODE>,
313 <CODE>ISO-
8859-
4</CODE>,
<CODE>ISO-
8859-
5</CODE>,
<CODE>ISO-
8859-
6</CODE>,
<CODE>ISO-
8859-
7</CODE>,
314 <CODE>ISO-
8859-
8</CODE>,
<CODE>ISO-
8859-
9</CODE>,
<CODE>ISO-
8859-
13</CODE>,
<CODE>ISO-
8859-
14</CODE>,
315 <CODE>ISO-
8859-
15</CODE>,
316 <CODE>KOI8-R
</CODE>,
<CODE>KOI8-U
</CODE>,
<CODE>KOI8-T
</CODE>,
317 <CODE>CP850
</CODE>,
<CODE>CP866
</CODE>,
<CODE>CP874
</CODE>,
318 <CODE>CP932
</CODE>,
<CODE>CP949
</CODE>,
<CODE>CP950
</CODE>,
<CODE>CP1250
</CODE>,
<CODE>CP1251
</CODE>,
319 <CODE>CP1252
</CODE>,
<CODE>CP1253
</CODE>,
<CODE>CP1254
</CODE>,
<CODE>CP1255
</CODE>,
<CODE>CP1256
</CODE>,
320 <CODE>CP1257
</CODE>,
<CODE>GB2312
</CODE>,
<CODE>EUC-JP
</CODE>,
<CODE>EUC-KR
</CODE>,
<CODE>EUC-TW
</CODE>,
321 <CODE>BIG5
</CODE>,
<CODE>BIG5-HKSCS
</CODE>,
<CODE>GBK
</CODE>,
<CODE>GB18030
</CODE>,
<CODE>SHIFT_JIS
</CODE>,
322 <CODE>JOHAB
</CODE>,
<CODE>TIS-
620</CODE>,
<CODE>VISCII
</CODE>,
<CODE>GEORGIAN-PS
</CODE>,
<CODE>UTF-
8</CODE>.
324 <A NAME=
"IDX290"></A>
325 In the GNU system, the following encodings are frequently used for the
326 corresponding languages.
328 <A NAME=
"IDX291"></A>
331 <LI><CODE>ISO-
8859-
1</CODE> for
333 Afrikaans, Albanian, Basque, Breton, Catalan, Cornish, Danish, Dutch,
334 English, Estonian, Faroese, Finnish, French, Galician, German,
335 Greenlandic, Icelandic, Indonesian, Irish, Italian, Malay, Manx,
336 Norwegian, Occitan, Portuguese, Spanish, Swedish, Tagalog, Uzbek,
338 <LI><CODE>ISO-
8859-
2</CODE> for
340 Bosnian, Croatian, Czech, Hungarian, Polish, Romanian, Serbian, Slovak,
342 <LI><CODE>ISO-
8859-
3</CODE> for Maltese,
344 <LI><CODE>ISO-
8859-
5</CODE> for Macedonian, Serbian,
346 <LI><CODE>ISO-
8859-
6</CODE> for Arabic,
348 <LI><CODE>ISO-
8859-
7</CODE> for Greek,
350 <LI><CODE>ISO-
8859-
8</CODE> for Hebrew,
352 <LI><CODE>ISO-
8859-
9</CODE> for Turkish,
354 <LI><CODE>ISO-
8859-
13</CODE> for Latvian, Lithuanian, Maori,
356 <LI><CODE>ISO-
8859-
14</CODE> for Welsh,
358 <LI><CODE>ISO-
8859-
15</CODE> for
360 Basque, Catalan, Dutch, English, Finnish, French, Galician, German, Irish,
361 Italian, Portuguese, Spanish, Swedish, Walloon,
362 <LI><CODE>KOI8-R
</CODE> for Russian,
364 <LI><CODE>KOI8-U
</CODE> for Ukrainian,
366 <LI><CODE>KOI8-T
</CODE> for Tajik,
368 <LI><CODE>CP1251
</CODE> for Bulgarian, Byelorussian,
370 <LI><CODE>GB2312
</CODE>,
<CODE>GBK
</CODE>,
<CODE>GB18030
</CODE>
372 for simplified writing of Chinese,
373 <LI><CODE>BIG5
</CODE>,
<CODE>BIG5-HKSCS
</CODE>
375 for traditional writing of Chinese,
376 <LI><CODE>EUC-JP
</CODE> for Japanese,
378 <LI><CODE>EUC-KR
</CODE> for Korean,
380 <LI><CODE>TIS-
620</CODE> for Thai,
382 <LI><CODE>GEORGIAN-PS
</CODE> for Georgian,
384 <LI><CODE>UTF-
8</CODE> for any language, including those listed above.
388 <A NAME=
"IDX292"></A>
389 <A NAME=
"IDX293"></A>
390 When single quote characters or double quote characters are used in
391 translations for your language, and your locale's encoding is one of the
392 ISO-
8859-* charsets, it is best if you create your PO files in UTF-
8
393 encoding, instead of your locale's encoding. This is because in UTF-
8
394 the real quote characters can be represented (single quote characters:
395 U+
2018, U+
2019, double quote characters: U+
201C, U+
201D), whereas none of
396 ISO-
8859-* charsets has them all. Users in UTF-
8 locales will see the
397 real quote characters, whereas users in ISO-
8859-* locales will see the
398 vertical apostrophe and the vertical double quote instead (because that's
399 what the character set conversion will transliterate them to).
401 <A NAME=
"IDX294"></A>
402 To enter such quote characters under X11, you can change your keyboard
403 mapping using the
<CODE>xmodmap
</CODE> program. The X11 names of the quote
404 characters are
"leftsinglequotemark",
"rightsinglequotemark",
405 "leftdoublequotemark",
"rightdoublequotemark",
"singlelowquotemark",
406 "doublelowquotemark".
408 Note that only recent versions of GNU Emacs support the UTF-
8 encoding:
409 Emacs
20 with Mule-UCS, and Emacs
21. As of January
2001, XEmacs doesn't
410 support the UTF-
8 encoding.
412 The character encoding name can be written in either upper or lower case.
413 Usually upper case is preferred.
415 <DT>Content-Transfer-Encoding
417 Set this to
<CODE>8bit
</CODE>.
421 This field is optional. It is only needed if the PO file has plural forms.
422 You can find them by searching for the
<SAMP>`msgid_plural
´</SAMP> keyword. The
423 format of the plural forms field is described in section
<A HREF=
"gettext_10.html#SEC169">10.2.5 Additional functions for plural forms
</A>.
427 Go to the
<A HREF=
"gettext_1.html">first
</A>,
<A HREF=
"gettext_4.html">previous
</A>,
<A HREF=
"gettext_6.html">next
</A>,
<A HREF=
"gettext_22.html">last
</A> section,
<A HREF=
"gettext_toc.html">table of contents
</A>.