No empty .Rs/.Re
[netbsd-mini2440.git] / gnu / dist / gettext / gettext-tools / doc / gettext_3.html
blob1ba819885fae34468f0eae202c61a25637508d50
1 <HTML>
2 <HEAD>
3 <!-- This HTML file has been created by texi2html 1.52a
4 from gettext.texi on 11 April 2005 -->
6 <TITLE>GNU gettext utilities - 3 Preparing Program Sources</TITLE>
7 </HEAD>
8 <BODY>
9 Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_2.html">previous</A>, <A HREF="gettext_4.html">next</A>, <A HREF="gettext_22.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
10 <P><HR><P>
13 <H1><A NAME="SEC13" HREF="gettext_toc.html#TOC13">3 Preparing Program Sources</A></H1>
14 <P>
15 <A NAME="IDX150"></A>
17 </P>
19 <P>
20 For the programmer, changes to the C source code fall into three
21 categories. First, you have to make the localization functions
22 known to all modules needing message translation. Second, you should
23 properly trigger the operation of GNU <CODE>gettext</CODE> when the program
24 initializes, usually from the <CODE>main</CODE> function. Last, you should
25 identify and especially mark all constant strings in your program
26 needing translation.
28 </P>
29 <P>
30 Presuming that your set of programs, or package, has been adjusted
31 so all needed GNU <CODE>gettext</CODE> files are available, and your
32 <TT>`Makefile&acute;</TT> files are adjusted (see section <A HREF="gettext_12.html#SEC192">12 The Maintainer's View</A>), each C module
33 having translated C strings should contain the line:
35 </P>
36 <P>
37 <A NAME="IDX151"></A>
39 <PRE>
40 #include &#60;libintl.h&#62;
41 </PRE>
43 <P>
44 Similarly, each C module containing <CODE>printf()</CODE>/<CODE>fprintf()</CODE>/...
45 calls with a format string that could be a translated C string (even if
46 the C string comes from a different C module) should contain the line:
48 </P>
50 <PRE>
51 #include &#60;libintl.h&#62;
52 </PRE>
54 <P>
55 The remaining changes to your C sources are discussed in the further
56 sections of this chapter.
58 </P>
62 <H2><A NAME="SEC14" HREF="gettext_toc.html#TOC14">3.1 Triggering <CODE>gettext</CODE> Operations</A></H2>
64 <P>
65 <A NAME="IDX152"></A>
66 The initialization of locale data should be done with more or less
67 the same code in every program, as demonstrated below:
69 </P>
71 <PRE>
72 int
73 main (int argc, char *argv[])
75 ...
76 setlocale (LC_ALL, "");
77 bindtextdomain (PACKAGE, LOCALEDIR);
78 textdomain (PACKAGE);
79 ...
81 </PRE>
83 <P>
84 <VAR>PACKAGE</VAR> and <VAR>LOCALEDIR</VAR> should be provided either by
85 <TT>`config.h&acute;</TT> or by the Makefile. For now consult the <CODE>gettext</CODE>
86 or <CODE>hello</CODE> sources for more information.
88 </P>
89 <P>
90 <A NAME="IDX153"></A>
91 <A NAME="IDX154"></A>
92 The use of <CODE>LC_ALL</CODE> might not be appropriate for you.
93 <CODE>LC_ALL</CODE> includes all locale categories and especially
94 <CODE>LC_CTYPE</CODE>. This later category is responsible for determining
95 character classes with the <CODE>isalnum</CODE> etc. functions from
96 <TT>`ctype.h&acute;</TT> which could especially for programs, which process some
97 kind of input language, be wrong. For example this would mean that a
98 source code using the &ccedil; (c-cedilla character) is runnable in
99 France but not in the U.S.
101 </P>
103 Some systems also have problems with parsing numbers using the
104 <CODE>scanf</CODE> functions if an other but the <CODE>LC_ALL</CODE> locale is used.
105 The standards say that additional formats but the one known in the
106 <CODE>"C"</CODE> locale might be recognized. But some systems seem to reject
107 numbers in the <CODE>"C"</CODE> locale format. In some situation, it might
108 also be a problem with the notation itself which makes it impossible to
109 recognize whether the number is in the <CODE>"C"</CODE> locale or the local
110 format. This can happen if thousands separator characters are used.
111 Some locales define this character according to the national
112 conventions to <CODE>'.'</CODE> which is the same character used in the
113 <CODE>"C"</CODE> locale to denote the decimal point.
115 </P>
117 So it is sometimes necessary to replace the <CODE>LC_ALL</CODE> line in the
118 code above by a sequence of <CODE>setlocale</CODE> lines
120 </P>
122 <PRE>
125 setlocale (LC_CTYPE, "");
126 setlocale (LC_MESSAGES, "");
129 </PRE>
132 <A NAME="IDX155"></A>
133 <A NAME="IDX156"></A>
134 <A NAME="IDX157"></A>
135 <A NAME="IDX158"></A>
136 <A NAME="IDX159"></A>
137 <A NAME="IDX160"></A>
138 <A NAME="IDX161"></A>
139 On all POSIX conformant systems the locale categories <CODE>LC_CTYPE</CODE>,
140 <CODE>LC_MESSAGES</CODE>, <CODE>LC_COLLATE</CODE>, <CODE>LC_MONETARY</CODE>,
141 <CODE>LC_NUMERIC</CODE>, and <CODE>LC_TIME</CODE> are available. On some systems
142 which are only ISO C compliant, <CODE>LC_MESSAGES</CODE> is missing, but
143 a substitute for it is defined in GNU gettext's <CODE>&#60;libintl.h&#62;</CODE>.
145 </P>
147 Note that changing the <CODE>LC_CTYPE</CODE> also affects the functions
148 declared in the <CODE>&#60;ctype.h&#62;</CODE> standard header. If this is not
149 desirable in your application (for example in a compiler's parser),
150 you can use a set of substitute functions which hardwire the C locale,
151 such as found in the <CODE>&#60;c-ctype.h&#62;</CODE> and <CODE>&#60;c-ctype.c&#62;</CODE> files
152 in the gettext source distribution.
154 </P>
156 It is also possible to switch the locale forth and back between the
157 environment dependent locale and the C locale, but this approach is
158 normally avoided because a <CODE>setlocale</CODE> call is expensive,
159 because it is tedious to determine the places where a locale switch
160 is needed in a large program's source, and because switching a locale
161 is not multithread-safe.
163 </P>
166 <H2><A NAME="SEC15" HREF="gettext_toc.html#TOC15">3.2 Preparing Translatable Strings</A></H2>
169 <A NAME="IDX162"></A>
170 Before strings can be marked for translations, they sometimes need to
171 be adjusted. Usually preparing a string for translation is done right
172 before marking it, during the marking phase which is described in the
173 next sections. What you have to keep in mind while doing that is the
174 following.
176 </P>
178 <UL>
179 <LI>
181 Decent English style.
183 <LI>
185 Entire sentences.
187 <LI>
189 Split at paragraphs.
191 <LI>
193 Use format strings instead of string concatenation.
194 </UL>
197 Let's look at some examples of these guidelines.
199 </P>
201 <A NAME="IDX163"></A>
202 Translatable strings should be in good English style. If slang language
203 with abbreviations and shortcuts is used, often translators will not
204 understand the message and will produce very inappropriate translations.
206 </P>
208 <PRE>
209 "%s: is parameter\n"
210 </PRE>
213 This is nearly untranslatable: Is the displayed item <EM>a</EM> parameter or
214 <EM>the</EM> parameter?
216 </P>
218 <PRE>
219 "No match"
220 </PRE>
223 The ambiguity in this message makes it ununderstandable: Is the program
224 attempting to set something on fire? Does it mean "The given object does
225 not match the template"? Does it mean "The template does not fit for any
226 of the objects"?
228 </P>
230 <A NAME="IDX164"></A>
231 In both cases, adding more words to the message will help both the
232 translator and the English speaking user.
234 </P>
236 <A NAME="IDX165"></A>
237 Translatable strings should be entire sentences. It is often not possible
238 to translate single verbs or adjectives in a substitutable way.
240 </P>
242 <PRE>
243 printf ("File %s is %s protected", filename, rw ? "write" : "read");
244 </PRE>
247 Most translators will not look at the source and will thus only see the
248 string <CODE>"File %s is %s protected"</CODE>, which is unintelligible. Change
249 this to
251 </P>
253 <PRE>
254 printf (rw ? "File %s is write protected" : "File %s is read protected",
255 filename);
256 </PRE>
259 This way the translator will not only understand the message, she will
260 also be able to find the appropriate grammatical construction. The French
261 translator for example translates "write protected" like "protected
262 against writing".
264 </P>
266 Entire sentences are also important because in many languages, the
267 declination of some word in a sentence depends on the gender or the
268 number (singular/plural) of another part of the sentence. There are
269 usually more interdependencies between words than in English. The
270 consequence is that asking a translator to translate two half-sentences
271 and then combining these two half-sentences through dumb string concatenation
272 will not work, for many languages, even though it would work for English.
273 That's why translators need to handle entire sentences.
275 </P>
277 Often sentences don't fit into a single line. If a sentence is output
278 using two subsequent <CODE>printf</CODE> statements, like this
280 </P>
282 <PRE>
283 printf ("Locale charset \"%s\" is different from\n", lcharset);
284 printf ("input file charset \"%s\".\n", fcharset);
285 </PRE>
288 the translator would have to translate two half sentences, but nothing
289 in the POT file would tell her that the two half sentences belong together.
290 It is necessary to merge the two <CODE>printf</CODE> statements so that the
291 translator can handle the entire sentence at once and decide at which
292 place to insert a line break in the translation (if at all):
294 </P>
296 <PRE>
297 printf ("Locale charset \"%s\" is different from\n\
298 input file charset \"%s\".\n", lcharset, fcharset);
299 </PRE>
302 You may now ask: how about two or more adjacent sentences? Like in this case:
304 </P>
306 <PRE>
307 puts ("Apollo 13 scenario: Stack overflow handling failed.");
308 puts ("On the next stack overflow we will crash!!!");
309 </PRE>
312 Should these two statements merged into a single one? I would recommend to
313 merge them if the two sentences are related to each other, because then it
314 makes it easier for the translator to understand and translate both. On
315 the other hand, if one of the two messages is a stereotypic one, occurring
316 in other places as well, you will do a favour to the translator by not
317 merging the two. (Identical messages occurring in several places are
318 combined by xgettext, so the translator has to handle them once only.)
320 </P>
322 <A NAME="IDX166"></A>
323 Translatable strings should be limited to one paragraph; don't let a
324 single message be longer than ten lines. The reason is that when the
325 translatable string changes, the translator is faced with the task of
326 updating the entire translated string. Maybe only a single word will
327 have changed in the English string, but the translator doesn't see that
328 (with the current translation tools), therefore she has to proofread
329 the entire message.
331 </P>
333 <A NAME="IDX167"></A>
334 Many GNU programs have a <SAMP>`--help&acute;</SAMP> output that extends over several
335 screen pages. It is a courtesy towards the translators to split such a
336 message into several ones of five to ten lines each. While doing that,
337 you can also attempt to split the documented options into groups,
338 such as the input options, the output options, and the informative
339 output options. This will help every user to find the option he is
340 looking for.
342 </P>
344 <A NAME="IDX168"></A>
345 <A NAME="IDX169"></A>
346 Hardcoded string concatenation is sometimes used to construct English
347 strings:
349 </P>
351 <PRE>
352 strcpy (s, "Replace ");
353 strcat (s, object1);
354 strcat (s, " with ");
355 strcat (s, object2);
356 strcat (s, "?");
357 </PRE>
360 In order to present to the translator only entire sentences, and also
361 because in some languages the translator might want to swap the order
362 of <CODE>object1</CODE> and <CODE>object2</CODE>, it is necessary to change this
363 to use a format string:
365 </P>
367 <PRE>
368 sprintf (s, "Replace %s with %s?", object1, object2);
369 </PRE>
372 <A NAME="IDX170"></A>
373 A similar case is compile time concatenation of strings. The ISO C 99
374 include file <CODE>&#60;inttypes.h&#62;</CODE> contains a macro <CODE>PRId64</CODE> that
375 can be used as a formatting directive for outputting an <SAMP>`int64_t&acute;</SAMP>
376 integer through <CODE>printf</CODE>. It expands to a constant string, usually
377 "d" or "ld" or "lld" or something like this, depending on the platform.
378 Assume you have code like
380 </P>
382 <PRE>
383 printf ("The amount is %0" PRId64 "\n", number);
384 </PRE>
387 The <CODE>gettext</CODE> tools and library have special support for these
388 <CODE>&#60;inttypes.h&#62;</CODE> macros. You can therefore simply write
390 </P>
392 <PRE>
393 printf (gettext ("The amount is %0" PRId64 "\n"), number);
394 </PRE>
397 The PO file will contain the string "The amount is %0&#60;PRId64&#62;\n".
398 The translators will provide a translation containing "%0&#60;PRId64&#62;"
399 as well, and at runtime the <CODE>gettext</CODE> function's result will
400 contain the appropriate constant string, "d" or "ld" or "lld".
402 </P>
404 This works only for the predefined <CODE>&#60;inttypes.h&#62;</CODE> macros. If
405 you have defined your own similar macros, let's say <SAMP>`MYPRId64&acute;</SAMP>,
406 that are not known to <CODE>xgettext</CODE>, the solution for this problem
407 is to change the code like this:
409 </P>
411 <PRE>
412 char buf1[100];
413 sprintf (buf1, "%0" MYPRId64, number);
414 printf (gettext ("The amount is %s\n"), buf1);
415 </PRE>
418 This means, you put the platform dependent code in one statement, and the
419 internationalization code in a different statement. Note that a buffer length
420 of 100 is safe, because all available hardware integer types are limited to
421 128 bits, and to print a 128 bit integer one needs at most 54 characters,
422 regardless whether in decimal, octal or hexadecimal.
424 </P>
426 <A NAME="IDX171"></A>
427 <A NAME="IDX172"></A>
428 All this applies to other programming languages as well. For example, in
429 Java and C#, string contenation is very frequently used, because it is a
430 compiler built-in operator. Like in C, in Java, you would change
432 </P>
434 <PRE>
435 System.out.println("Replace "+object1+" with "+object2+"?");
436 </PRE>
439 into a statement involving a format string:
441 </P>
443 <PRE>
444 System.out.println(
445 MessageFormat.format("Replace {0} with {1}?",
446 new Object[] { object1, object2 }));
447 </PRE>
450 Similarly, in C#, you would change
452 </P>
454 <PRE>
455 Console.WriteLine("Replace "+object1+" with "+object2+"?");
456 </PRE>
459 into a statement involving a format string:
461 </P>
463 <PRE>
464 Console.WriteLine(
465 String.Format("Replace {0} with {1}?", object1, object2));
466 </PRE>
470 <H2><A NAME="SEC16" HREF="gettext_toc.html#TOC16">3.3 How Marks Appear in Sources</A></H2>
472 <A NAME="IDX173"></A>
474 </P>
476 All strings requiring translation should be marked in the C sources. Marking
477 is done in such a way that each translatable string appears to be
478 the sole argument of some function or preprocessor macro. There are
479 only a few such possible functions or macros meant for translation,
480 and their names are said to be marking keywords. The marking is
481 attached to strings themselves, rather than to what we do with them.
482 This approach has more uses. A blatant example is an error message
483 produced by formatting. The format string needs translation, as
484 well as some strings inserted through some <SAMP>`%s&acute;</SAMP> specification
485 in the format, while the result from <CODE>sprintf</CODE> may have so many
486 different instances that it is impractical to list them all in some
487 <SAMP>`error_string_out()&acute;</SAMP> routine, say.
489 </P>
491 This marking operation has two goals. The first goal of marking
492 is for triggering the retrieval of the translation, at run time.
493 The keyword are possibly resolved into a routine able to dynamically
494 return the proper translation, as far as possible or wanted, for the
495 argument string. Most localizable strings are found in executable
496 positions, that is, attached to variables or given as parameters to
497 functions. But this is not universal usage, and some translatable
498 strings appear in structured initializations. See section <A HREF="gettext_3.html#SEC19">3.6 Special Cases of Translatable Strings</A>.
500 </P>
502 The second goal of the marking operation is to help <CODE>xgettext</CODE>
503 at properly extracting all translatable strings when it scans a set
504 of program sources and produces PO file templates.
506 </P>
508 The canonical keyword for marking translatable strings is
509 <SAMP>`gettext&acute;</SAMP>, it gave its name to the whole GNU <CODE>gettext</CODE>
510 package. For packages making only light use of the <SAMP>`gettext&acute;</SAMP>
511 keyword, macro or function, it is easily used <EM>as is</EM>. However,
512 for packages using the <CODE>gettext</CODE> interface more heavily, it
513 is usually more convenient to give the main keyword a shorter, less
514 obtrusive name. Indeed, the keyword might appear on a lot of strings
515 all over the package, and programmers usually do not want nor need
516 their program sources to remind them forcefully, all the time, that they
517 are internationalized. Further, a long keyword has the disadvantage
518 of using more horizontal space, forcing more indentation work on
519 sources for those trying to keep them within 79 or 80 columns.
521 </P>
523 <A NAME="IDX174"></A>
524 Many packages use <SAMP>`_&acute;</SAMP> (a simple underline) as a keyword,
525 and write <SAMP>`_("Translatable string")&acute;</SAMP> instead of <SAMP>`gettext
526 ("Translatable string")&acute;</SAMP>. Further, the coding rule, from GNU standards,
527 wanting that there is a space between the keyword and the opening
528 parenthesis is relaxed, in practice, for this particular usage.
529 So, the textual overhead per translatable string is reduced to
530 only three characters: the underline and the two parentheses.
531 However, even if GNU <CODE>gettext</CODE> uses this convention internally,
532 it does not offer it officially. The real, genuine keyword is truly
533 <SAMP>`gettext&acute;</SAMP> indeed. It is fairly easy for those wanting to use
534 <SAMP>`_&acute;</SAMP> instead of <SAMP>`gettext&acute;</SAMP> to declare:
536 </P>
538 <PRE>
539 #include &#60;libintl.h&#62;
540 #define _(String) gettext (String)
541 </PRE>
544 instead of merely using <SAMP>`#include &#60;libintl.h&#62;&acute;</SAMP>.
546 </P>
548 Later on, the maintenance is relatively easy. If, as a programmer,
549 you add or modify a string, you will have to ask yourself if the
550 new or altered string requires translation, and include it within
551 <SAMP>`_()&acute;</SAMP> if you think it should be translated. <SAMP>`"%s: %d"&acute;</SAMP> is
552 an example of string <EM>not</EM> requiring translation!
554 </P>
557 <H2><A NAME="SEC17" HREF="gettext_toc.html#TOC17">3.4 Marking Translatable Strings</A></H2>
559 <A NAME="IDX175"></A>
561 </P>
563 In PO mode, one set of features is meant more for the programmer than
564 for the translator, and allows him to interactively mark which strings,
565 in a set of program sources, are translatable, and which are not.
566 Even if it is a fairly easy job for a programmer to find and mark
567 such strings by other means, using any editor of his choice, PO mode
568 makes this work more comfortable. Further, this gives translators
569 who feel a little like programmers, or programmers who feel a little
570 like translators, a tool letting them work at marking translatable
571 strings in the program sources, while simultaneously producing a set of
572 translation in some language, for the package being internationalized.
574 </P>
576 <A NAME="IDX176"></A>
577 The set of program sources, targetted by the PO mode commands describe
578 here, should have an Emacs tags table constructed for your project,
579 prior to using these PO file commands. This is easy to do. In any
580 shell window, change the directory to the root of your project, then
581 execute a command resembling:
583 </P>
585 <PRE>
586 etags src/*.[hc] lib/*.[hc]
587 </PRE>
590 presuming here you want to process all <TT>`.h&acute;</TT> and <TT>`.c&acute;</TT> files
591 from the <TT>`src/&acute;</TT> and <TT>`lib/&acute;</TT> directories. This command will
592 explore all said files and create a <TT>`TAGS&acute;</TT> file in your root
593 directory, somewhat summarizing the contents using a special file
594 format Emacs can understand.
596 </P>
598 <A NAME="IDX177"></A>
599 For packages following the GNU coding standards, there is
600 a make goal <CODE>tags</CODE> or <CODE>TAGS</CODE> which constructs the tag files in
601 all directories and for all files containing source code.
603 </P>
605 Once your <TT>`TAGS&acute;</TT> file is ready, the following commands assist
606 the programmer at marking translatable strings in his set of sources.
607 But these commands are necessarily driven from within a PO file
608 window, and it is likely that you do not even have such a PO file yet.
609 This is not a problem at all, as you may safely open a new, empty PO
610 file, mainly for using these commands. This empty PO file will slowly
611 fill in while you mark strings as translatable in your program sources.
613 </P>
614 <DL COMPACT>
616 <DT><KBD>,</KBD>
617 <DD>
618 <A NAME="IDX178"></A>
619 Search through program sources for a string which looks like a
620 candidate for translation (<CODE>po-tags-search</CODE>).
622 <DT><KBD>M-,</KBD>
623 <DD>
624 <A NAME="IDX179"></A>
625 Mark the last string found with <SAMP>`_()&acute;</SAMP> (<CODE>po-mark-translatable</CODE>).
627 <DT><KBD>M-.</KBD>
628 <DD>
629 <A NAME="IDX180"></A>
630 Mark the last string found with a keyword taken from a set of possible
631 keywords. This command with a prefix allows some management of these
632 keywords (<CODE>po-select-mark-and-mark</CODE>).
634 </DL>
637 <A NAME="IDX181"></A>
638 The <KBD>,</KBD> (<CODE>po-tags-search</CODE>) command searches for the next
639 occurrence of a string which looks like a possible candidate for
640 translation, and displays the program source in another Emacs window,
641 positioned in such a way that the string is near the top of this other
642 window. If the string is too big to fit whole in this window, it is
643 positioned so only its end is shown. In any case, the cursor
644 is left in the PO file window. If the shown string would be better
645 presented differently in different native languages, you may mark it
646 using <KBD>M-,</KBD> or <KBD>M-.</KBD>. Otherwise, you might rather ignore it
647 and skip to the next string by merely repeating the <KBD>,</KBD> command.
649 </P>
651 A string is a good candidate for translation if it contains a sequence
652 of three or more letters. A string containing at most two letters in
653 a row will be considered as a candidate if it has more letters than
654 non-letters. The command disregards strings containing no letters,
655 or isolated letters only. It also disregards strings within comments,
656 or strings already marked with some keyword PO mode knows (see below).
658 </P>
660 If you have never told Emacs about some <TT>`TAGS&acute;</TT> file to use, the
661 command will request that you specify one from the minibuffer, the
662 first time you use the command. You may later change your <TT>`TAGS&acute;</TT>
663 file by using the regular Emacs command <KBD>M-x visit-tags-table</KBD>,
664 which will ask you to name the precise <TT>`TAGS&acute;</TT> file you want
665 to use. See section `Tag Tables' in <CITE>The Emacs Editor</CITE>.
667 </P>
669 Each time you use the <KBD>,</KBD> command, the search resumes from where it was
670 left by the previous search, and goes through all program sources,
671 obeying the <TT>`TAGS&acute;</TT> file, until all sources have been processed.
672 However, by giving a prefix argument to the command (<KBD>C-u
673 ,)</KBD>, you may request that the search be restarted all over again
674 from the first program source; but in this case, strings that you
675 recently marked as translatable will be automatically skipped.
677 </P>
679 Using this <KBD>,</KBD> command does not prevent using of other regular
680 Emacs tags commands. For example, regular <CODE>tags-search</CODE> or
681 <CODE>tags-query-replace</CODE> commands may be used without disrupting the
682 independent <KBD>,</KBD> search sequence. However, as implemented, the
683 <EM>initial</EM> <KBD>,</KBD> command (or the <KBD>,</KBD> command is used with a
684 prefix) might also reinitialize the regular Emacs tags searching to the
685 first tags file, this reinitialization might be considered spurious.
687 </P>
689 <A NAME="IDX182"></A>
690 <A NAME="IDX183"></A>
691 The <KBD>M-,</KBD> (<CODE>po-mark-translatable</CODE>) command will mark the
692 recently found string with the <SAMP>`_&acute;</SAMP> keyword. The <KBD>M-.</KBD>
693 (<CODE>po-select-mark-and-mark</CODE>) command will request that you type
694 one keyword from the minibuffer and use that keyword for marking
695 the string. Both commands will automatically create a new PO file
696 untranslated entry for the string being marked, and make it the
697 current entry (making it easy for you to immediately proceed to its
698 translation, if you feel like doing it right away). It is possible
699 that the modifications made to the program source by <KBD>M-,</KBD> or
700 <KBD>M-.</KBD> render some source line longer than 80 columns, forcing you
701 to break and re-indent this line differently. You may use the <KBD>O</KBD>
702 command from PO mode, or any other window changing command from
703 Emacs, to break out into the program source window, and do any
704 needed adjustments. You will have to use some regular Emacs command
705 to return the cursor to the PO file window, if you want command
706 <KBD>,</KBD> for the next string, say.
708 </P>
710 The <KBD>M-.</KBD> command has a few built-in speedups, so you do not
711 have to explicitly type all keywords all the time. The first such
712 speedup is that you are presented with a <EM>preferred</EM> keyword,
713 which you may accept by merely typing <KBD><KBD>RET</KBD></KBD> at the prompt.
714 The second speedup is that you may type any non-ambiguous prefix of the
715 keyword you really mean, and the command will complete it automatically
716 for you. This also means that PO mode has to <EM>know</EM> all
717 your possible keywords, and that it will not accept mistyped keywords.
719 </P>
721 If you reply <KBD>?</KBD> to the keyword request, the command gives a
722 list of all known keywords, from which you may choose. When the
723 command is prefixed by an argument (<KBD>C-u M-.</KBD>), it inhibits
724 updating any program source or PO file buffer, and does some simple
725 keyword management instead. In this case, the command asks for a
726 keyword, written in full, which becomes a new allowed keyword for
727 later <KBD>M-.</KBD> commands. Moreover, this new keyword automatically
728 becomes the <EM>preferred</EM> keyword for later commands. By typing
729 an already known keyword in response to <KBD>C-u M-.</KBD>, one merely
730 changes the <EM>preferred</EM> keyword and does nothing more.
732 </P>
734 All keywords known for <KBD>M-.</KBD> are recognized by the <KBD>,</KBD> command
735 when scanning for strings, and strings already marked by any of those
736 known keywords are automatically skipped. If many PO files are opened
737 simultaneously, each one has its own independent set of known keywords.
738 There is no provision in PO mode, currently, for deleting a known
739 keyword, you have to quit the file (maybe using <KBD>q</KBD>) and reopen
740 it afresh. When a PO file is newly brought up in an Emacs window, only
741 <SAMP>`gettext&acute;</SAMP> and <SAMP>`_&acute;</SAMP> are known as keywords, and <SAMP>`gettext&acute;</SAMP>
742 is preferred for the <KBD>M-.</KBD> command. In fact, this is not useful to
743 prefer <SAMP>`_&acute;</SAMP>, as this one is already built in the <KBD>M-,</KBD> command.
745 </P>
748 <H2><A NAME="SEC18" HREF="gettext_toc.html#TOC18">3.5 Special Comments preceding Keywords</A></H2>
751 <A NAME="IDX184"></A>
752 In C programs strings are often used within calls of functions from the
753 <CODE>printf</CODE> family. The special thing about these format strings is
754 that they can contain format specifiers introduced with <KBD>%</KBD>. Assume
755 we have the code
757 </P>
759 <PRE>
760 printf (gettext ("String `%s' has %d characters\n"), s, strlen (s));
761 </PRE>
764 A possible German translation for the above string might be:
766 </P>
768 <PRE>
769 "%d Zeichen lang ist die Zeichenkette `%s'"
770 </PRE>
773 A C programmer, even if he cannot speak German, will recognize that
774 there is something wrong here. The order of the two format specifiers
775 is changed but of course the arguments in the <CODE>printf</CODE> don't have.
776 This will most probably lead to problems because now the length of the
777 string is regarded as the address.
779 </P>
781 To prevent errors at runtime caused by translations the <CODE>msgfmt</CODE>
782 tool can check statically whether the arguments in the original and the
783 translation string match in type and number. If this is not the case
784 and the <SAMP>`-c&acute;</SAMP> option has been passed to <CODE>msgfmt</CODE>, <CODE>msgfmt</CODE>
785 will give an error and refuse to produce a MO file. Thus consequent
786 use of <SAMP>`msgfmt -c&acute;</SAMP> will catch the error, so that it cannot cause
787 cause problems at runtime.
789 </P>
791 If the word order in the above German translation would be correct one
792 would have to write
794 </P>
796 <PRE>
797 "%2$d Zeichen lang ist die Zeichenkette `%1$s'"
798 </PRE>
801 The routines in <CODE>msgfmt</CODE> know about this special notation.
803 </P>
805 Because not all strings in a program must be format strings it is not
806 useful for <CODE>msgfmt</CODE> to test all the strings in the <TT>`.po&acute;</TT> file.
807 This might cause problems because the string might contain what looks
808 like a format specifier, but the string is not used in <CODE>printf</CODE>.
810 </P>
812 Therefore the <CODE>xgettext</CODE> adds a special tag to those messages it
813 thinks might be a format string. There is no absolute rule for this,
814 only a heuristic. In the <TT>`.po&acute;</TT> file the entry is marked using the
815 <CODE>c-format</CODE> flag in the <CODE>#,</CODE> comment line (see section <A HREF="gettext_2.html#SEC9">2.2 The Format of PO Files</A>).
817 </P>
819 <A NAME="IDX185"></A>
820 <A NAME="IDX186"></A>
821 The careful reader now might say that this again can cause problems.
822 The heuristic might guess it wrong. This is true and therefore
823 <CODE>xgettext</CODE> knows about a special kind of comment which lets
824 the programmer take over the decision. If in the same line as or
825 the immediately preceding line to the <CODE>gettext</CODE> keyword
826 the <CODE>xgettext</CODE> program finds a comment containing the words
827 <CODE>xgettext:c-format</CODE>, it will mark the string in any case with
828 the <CODE>c-format</CODE> flag. This kind of comment should be used when
829 <CODE>xgettext</CODE> does not recognize the string as a format string but
830 it really is one and it should be tested. Please note that when the
831 comment is in the same line as the <CODE>gettext</CODE> keyword, it must be
832 before the string to be translated.
834 </P>
836 This situation happens quite often. The <CODE>printf</CODE> function is often
837 called with strings which do not contain a format specifier. Of course
838 one would normally use <CODE>fputs</CODE> but it does happen. In this case
839 <CODE>xgettext</CODE> does not recognize this as a format string but what
840 happens if the translation introduces a valid format specifier? The
841 <CODE>printf</CODE> function will try to access one of the parameters but none
842 exists because the original code does not pass any parameters.
844 </P>
846 <CODE>xgettext</CODE> of course could make a wrong decision the other way
847 round, i.e. a string marked as a format string actually is not a format
848 string. In this case the <CODE>msgfmt</CODE> might give too many warnings and
849 would prevent translating the <TT>`.po&acute;</TT> file. The method to prevent
850 this wrong decision is similar to the one used above, only the comment
851 to use must contain the string <CODE>xgettext:no-c-format</CODE>.
853 </P>
855 If a string is marked with <CODE>c-format</CODE> and this is not correct the
856 user can find out who is responsible for the decision. See
857 section <A HREF="gettext_4.html#SEC23">4.1 Invoking the <CODE>xgettext</CODE> Program</A> to see how the <CODE>--debug</CODE> option can be
858 used for solving this problem.
860 </P>
863 <H2><A NAME="SEC19" HREF="gettext_toc.html#TOC19">3.6 Special Cases of Translatable Strings</A></H2>
866 <A NAME="IDX187"></A>
867 The attentive reader might now point out that it is not always possible
868 to mark translatable string with <CODE>gettext</CODE> or something like this.
869 Consider the following case:
871 </P>
873 <PRE>
875 static const char *messages[] = {
876 "some very meaningful message",
877 "and another one"
879 const char *string;
881 string
882 = index &#62; 1 ? "a default message" : messages[index];
884 fputs (string);
887 </PRE>
890 While it is no problem to mark the string <CODE>"a default message"</CODE> it
891 is not possible to mark the string initializers for <CODE>messages</CODE>.
892 What is to be done? We have to fulfill two tasks. First we have to mark the
893 strings so that the <CODE>xgettext</CODE> program (see section <A HREF="gettext_4.html#SEC23">4.1 Invoking the <CODE>xgettext</CODE> Program</A>)
894 can find them, and second we have to translate the string at runtime
895 before printing them.
897 </P>
899 The first task can be fulfilled by creating a new keyword, which names a
900 no-op. For the second we have to mark all access points to a string
901 from the array. So one solution can look like this:
903 </P>
905 <PRE>
906 #define gettext_noop(String) String
909 static const char *messages[] = {
910 gettext_noop ("some very meaningful message"),
911 gettext_noop ("and another one")
913 const char *string;
915 string
916 = index &#62; 1 ? gettext ("a default message") : gettext (messages[index]);
918 fputs (string);
921 </PRE>
924 Please convince yourself that the string which is written by
925 <CODE>fputs</CODE> is translated in any case. How to get <CODE>xgettext</CODE> know
926 the additional keyword <CODE>gettext_noop</CODE> is explained in section <A HREF="gettext_4.html#SEC23">4.1 Invoking the <CODE>xgettext</CODE> Program</A>.
928 </P>
930 The above is of course not the only solution. You could also come along
931 with the following one:
933 </P>
935 <PRE>
936 #define gettext_noop(String) String
939 static const char *messages[] = {
940 gettext_noop ("some very meaningful message",
941 gettext_noop ("and another one")
943 const char *string;
945 string
946 = index &#62; 1 ? gettext_noop ("a default message") : messages[index];
948 fputs (gettext (string));
951 </PRE>
954 But this has a drawback. The programmer has to take care that
955 he uses <CODE>gettext_noop</CODE> for the string <CODE>"a default message"</CODE>.
956 A use of <CODE>gettext</CODE> could have in rare cases unpredictable results.
958 </P>
960 One advantage is that you need not make control flow analysis to make
961 sure the output is really translated in any case. But this analysis is
962 generally not very difficult. If it should be in any situation you can
963 use this second method in this situation.
965 </P>
968 <H2><A NAME="SEC20" HREF="gettext_toc.html#TOC20">3.7 Marking Proper Names for Translation</A></H2>
971 Should names of persons, cities, locations etc. be marked for translation
972 or not? People who only know languages that can be written with Latin
973 letters (English, Spanish, French, German, etc.) are tempted to say "no",
974 because names usually do not change when transported between these languages.
975 However, in general when translating from one script to another, names
976 are translated too, usually phonetically or by transliteration. For
977 example, Russian or Greek names are converted to the Latin alphabet when
978 being translated to English, and English or French names are converted
979 to the Katakana script when being translated to Japanese. This is
980 necessary because the speakers of the target language in general cannot
981 read the script the name is originally written in.
983 </P>
985 As a programmer, you should therefore make sure that names are marked
986 for translation, with a special comment telling the translators that it
987 is a proper name and how to pronounce it. Like this:
989 </P>
991 <PRE>
992 printf (_("Written by %s.\n"),
993 /* TRANSLATORS: This is a proper name. See the gettext
994 manual, section Names. Note this is actually a non-ASCII
995 name: The first name is (with Unicode escapes)
996 "Fran\u00e7ois" or (with HTML entities) "Fran&#38;ccedil;ois".
997 Pronounciation is like "fraa-swa pee-nar". */
998 _("Francois Pinard"));
999 </PRE>
1002 As a translator, you should use some care when translating names, because
1003 it is frustrating if people see their names mutilated or distorted. If
1004 your language uses the Latin script, all you need to do is to reproduce
1005 the name as perfectly as you can within the usual character set of your
1006 language. In this particular case, this means to provide a translation
1007 containing the c-cedilla character. If your language uses a different
1008 script and the people speaking it don't usually read Latin words, it means
1009 transliteration; but you should still give, in parentheses, the original
1010 writing of the name -- for the sake of the people that do read the Latin
1011 script. Here is an example, using Greek as the target script:
1013 </P>
1015 <PRE>
1016 #. This is a proper name. See the gettext
1017 #. manual, section Names. Note this is actually a non-ASCII
1018 #. name: The first name is (with Unicode escapes)
1019 #. "Fran\u00e7ois" or (with HTML entities) "Fran&#38;ccedil;ois".
1020 #. Pronounciation is like "fraa-swa pee-nar".
1021 msgid "Francois Pinard"
1022 msgstr "\phi\rho\alpha\sigma\omicron\alpha \pi\iota\nu\alpha\rho"
1023 " (Francois Pinard)"
1024 </PRE>
1027 Because translation of names is such a sensitive domain, it is a good
1028 idea to test your translation before submitting it.
1030 </P>
1032 The translation project <A HREF="http://sourceforge.net/projects/translation">http://sourceforge.net/projects/translation</A>
1033 has set up a POT file and translation domain consisting of program author
1034 names, with better facilities for the translator than those presented here.
1035 Namely, there the original name is written directly in Unicode (rather
1036 than with Unicode escapes or HTML entities), and the pronounciation is
1037 denoted using the International Phonetic Alphabet (see
1038 <A HREF="http://www.wikipedia.org/wiki/International_Phonetic_Alphabet">http://www.wikipedia.org/wiki/International_Phonetic_Alphabet</A>).
1040 </P>
1042 However, we don't recommend this approach for all POT files in all packages,
1043 because this would force translators to use PO files in UTF-8 encoding,
1044 which is - in the current state of software (as of 2003) - a major hassle
1045 for translators using GNU Emacs or XEmacs with po-mode.
1047 </P>
1050 <H2><A NAME="SEC21" HREF="gettext_toc.html#TOC21">3.8 Preparing Library Sources</A></H2>
1053 When you are preparing a library, not a program, for the use of
1054 <CODE>gettext</CODE>, only a few details are different. Here we assume that
1055 the library has a translation domain and a POT file of its own. (If
1056 it uses the translation domain and POT file of the main program, then
1057 the previous sections apply without changes.)
1059 </P>
1061 <OL>
1062 <LI>
1064 The library code doesn't call <CODE>setlocale (LC_ALL, "")</CODE>. It's the
1065 responsibility of the main program to set the locale. The library's
1066 documentation should mention this fact, so that developers of programs
1067 using the library are aware of it.
1069 <LI>
1071 The library code doesn't call <CODE>textdomain (PACKAGE)</CODE>, because it
1072 would interfere with the text domain set by the main program.
1074 <LI>
1076 The initialization code for a program was
1079 <PRE>
1080 setlocale (LC_ALL, "");
1081 bindtextdomain (PACKAGE, LOCALEDIR);
1082 textdomain (PACKAGE);
1083 </PRE>
1085 For a library it is reduced to
1088 <PRE>
1089 bindtextdomain (PACKAGE, LOCALEDIR);
1090 </PRE>
1092 If your library's API doesn't already have an initialization function,
1093 you need to create one, containing at least the <CODE>bindtextdomain</CODE>
1094 invocation. However, you usually don't need to export and document this
1095 initialization function: It is sufficient that all entry points of the
1096 library call the initialization function if it hasn't been called before.
1097 The typical idiom used to achieve this is a static boolean variable that
1098 indicates whether the initialization function has been called. Like this:
1101 <PRE>
1102 static bool libfoo_initialized;
1104 static void
1105 libfoo_initialize (void)
1107 bindtextdomain (PACKAGE, LOCALEDIR);
1108 libfoo_initialized = true;
1111 /* This function is part of the exported API. */
1112 struct foo *
1113 create_foo (...)
1115 /* Must ensure the initialization is performed. */
1116 if (!libfoo_initialized)
1117 libfoo_initialize ();
1121 /* This function is part of the exported API. The argument must be
1122 non-NULL and have been created through create_foo(). */
1124 foo_refcount (struct foo *argument)
1126 /* No need to invoke the initialization function here, because
1127 create_foo() must already have been called before. */
1130 </PRE>
1132 <LI>
1134 The usual declaration of the <SAMP>`_&acute;</SAMP> macro in each source file was
1137 <PRE>
1138 #include &#60;libintl.h&#62;
1139 #define _(String) gettext (String)
1140 </PRE>
1142 for a program. For a library, which has its own translation domain,
1143 it reads like this:
1146 <PRE>
1147 #include &#60;libintl.h&#62;
1148 #define _(String) dgettext (PACKAGE, String)
1149 </PRE>
1151 In other words, <CODE>dgettext</CODE> is used instead of <CODE>gettext</CODE>.
1152 Similary, the <CODE>dngettext</CODE> function should be used in place of the
1153 <CODE>ngettext</CODE> function.
1154 </OL>
1156 <P><HR><P>
1157 Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_2.html">previous</A>, <A HREF="gettext_4.html">next</A>, <A HREF="gettext_22.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
1158 </BODY>
1159 </HTML>