gnu/dist/gettext/gettext-tools/doc/gettext_3.html

   1 <HTML>
   2 <HEAD>
   3 <!-- This HTML file has been created by texi2html 1.52a
   4      from gettext.texi on 11 April 2005 -->
   5
   6 <TITLE>GNU gettext utilities - 3  Preparing Program Sources</TITLE>
   7 </HEAD>
   8 <BODY>
   9 Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_2.html">previous</A>, <A HREF="gettext_4.html">next</A>, <A HREF="gettext_22.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
  10 <P><HR><P>
  11
  12
  13 <H1><A NAME="SEC13" HREF="gettext_toc.html#TOC13">3  Preparing Program Sources</A></H1>
  14 <P>
  15 <A NAME="IDX150"></A>
  16
  17 </P>
  18
  19 <P>
  20 For the programmer, changes to the C source code fall into three
  21 categories.  First, you have to make the localization functions
  22 known to all modules needing message translation.  Second, you should
  23 properly trigger the operation of GNU <CODE>gettext</CODE> when the program
  24 initializes, usually from the <CODE>main</CODE> function.  Last, you should
  25 identify and especially mark all constant strings in your program
  26 needing translation.
  27
  28 </P>
  29 <P>
  30 Presuming that your set of programs, or package, has been adjusted
  31 so all needed GNU <CODE>gettext</CODE> files are available, and your
  32 <TT>`Makefile&acute;</TT> files are adjusted (see section <A HREF="gettext_12.html#SEC192">12  The Maintainer's View</A>), each C module
  33 having translated C strings should contain the line:
  34
  35 </P>
  36 <P>
  37 <A NAME="IDX151"></A>
  38
  39 <PRE>
  40 #include &#60;libintl.h&#62;
  41 </PRE>
  42
  43 <P>
  44 Similarly, each C module containing <CODE>printf()</CODE>/<CODE>fprintf()</CODE>/...
  45 calls with a format string that could be a translated C string (even if
  46 the C string comes from a different C module) should contain the line:
  47
  48 </P>
  49
  50 <PRE>
  51 #include &#60;libintl.h&#62;
  52 </PRE>
  53
  54 <P>
  55 The remaining changes to your C sources are discussed in the further
  56 sections of this chapter.
  57
  58 </P>
  59
  60
  61
  62 <H2><A NAME="SEC14" HREF="gettext_toc.html#TOC14">3.1  Triggering <CODE>gettext</CODE> Operations</A></H2>
  63
  64 <P>
  65 <A NAME="IDX152"></A>
  66 The initialization of locale data should be done with more or less
  67 the same code in every program, as demonstrated below:
  68
  69 </P>
  70
  71 <PRE>
  72 int
  73 main (int argc, char *argv[])
  74 {
  75   ...
  76   setlocale (LC_ALL, "");
  77   bindtextdomain (PACKAGE, LOCALEDIR);
  78   textdomain (PACKAGE);
  79   ...
  80 }
  81 </PRE>
  82
  83 <P>
  84 <VAR>PACKAGE</VAR> and <VAR>LOCALEDIR</VAR> should be provided either by
  85 <TT>`config.h&acute;</TT> or by the Makefile.  For now consult the <CODE>gettext</CODE>
  86 or <CODE>hello</CODE> sources for more information.
  87
  88 </P>
  89 <P>
  90 <A NAME="IDX153"></A>
  91 <A NAME="IDX154"></A>
  92 The use of <CODE>LC_ALL</CODE> might not be appropriate for you.
  93 <CODE>LC_ALL</CODE> includes all locale categories and especially
  94 <CODE>LC_CTYPE</CODE>.  This later category is responsible for determining
  95 character classes with the <CODE>isalnum</CODE> etc. functions from
  96 <TT>`ctype.h&acute;</TT> which could especially for programs, which process some
  97 kind of input language, be wrong.  For example this would mean that a
  98 source code using the &ccedil; (c-cedilla character) is runnable in
  99 France but not in the U.S.
 100
 101 </P>
 102 <P>
 103 Some systems also have problems with parsing numbers using the
 104 <CODE>scanf</CODE> functions if an other but the <CODE>LC_ALL</CODE> locale is used.
 105 The standards say that additional formats but the one known in the
 106 <CODE>"C"</CODE> locale might be recognized.  But some systems seem to reject
 107 numbers in the <CODE>"C"</CODE> locale format.  In some situation, it might
 108 also be a problem with the notation itself which makes it impossible to
 109 recognize whether the number is in the <CODE>"C"</CODE> locale or the local
 110 format.  This can happen if thousands separator characters are used.
 111 Some locales define this character according to the national
 112 conventions to <CODE>'.'</CODE> which is the same character used in the
 113 <CODE>"C"</CODE> locale to denote the decimal point.
 114
 115 </P>
 116 <P>
 117 So it is sometimes necessary to replace the <CODE>LC_ALL</CODE> line in the
 118 code above by a sequence of <CODE>setlocale</CODE> lines
 119
 120 </P>
 121
 122 <PRE>
 123 {
 124   ...
 125   setlocale (LC_CTYPE, "");
 126   setlocale (LC_MESSAGES, "");
 127   ...
 128 }
 129 </PRE>
 130
 131 <P>
 132 <A NAME="IDX155"></A>
 133 <A NAME="IDX156"></A>
 134 <A NAME="IDX157"></A>
 135 <A NAME="IDX158"></A>
 136 <A NAME="IDX159"></A>
 137 <A NAME="IDX160"></A>
 138 <A NAME="IDX161"></A>
 139 On all POSIX conformant systems the locale categories <CODE>LC_CTYPE</CODE>,
 140 <CODE>LC_MESSAGES</CODE>, <CODE>LC_COLLATE</CODE>, <CODE>LC_MONETARY</CODE>,
 141 <CODE>LC_NUMERIC</CODE>, and <CODE>LC_TIME</CODE> are available.  On some systems
 142 which are only ISO C compliant, <CODE>LC_MESSAGES</CODE> is missing, but
 143 a substitute for it is defined in GNU gettext's <CODE>&#60;libintl.h&#62;</CODE>.
 144
 145 </P>
 146 <P>
 147 Note that changing the <CODE>LC_CTYPE</CODE> also affects the functions
 148 declared in the <CODE>&#60;ctype.h&#62;</CODE> standard header.  If this is not
 149 desirable in your application (for example in a compiler's parser),
 150 you can use a set of substitute functions which hardwire the C locale,
 151 such as found in the <CODE>&#60;c-ctype.h&#62;</CODE> and <CODE>&#60;c-ctype.c&#62;</CODE> files
 152 in the gettext source distribution.
 153
 154 </P>
 155 <P>
 156 It is also possible to switch the locale forth and back between the
 157 environment dependent locale and the C locale, but this approach is
 158 normally avoided because a <CODE>setlocale</CODE> call is expensive,
 159 because it is tedious to determine the places where a locale switch
 160 is needed in a large program's source, and because switching a locale
 161 is not multithread-safe.
 162
 163 </P>
 164
 165
 166 <H2><A NAME="SEC15" HREF="gettext_toc.html#TOC15">3.2  Preparing Translatable Strings</A></H2>
 167
 168 <P>
 169 <A NAME="IDX162"></A>
 170 Before strings can be marked for translations, they sometimes need to
 171 be adjusted.  Usually preparing a string for translation is done right
 172 before marking it, during the marking phase which is described in the
 173 next sections.  What you have to keep in mind while doing that is the
 174 following.
 175
 176 </P>
 177
 178 <UL>
 179 <LI>
 180
 181 Decent English style.
 182
 183 <LI>
 184
 185 Entire sentences.
 186
 187 <LI>
 188
 189 Split at paragraphs.
 190
 191 <LI>
 192
 193 Use format strings instead of string concatenation.
 194 </UL>
 195
 196 <P>
 197 Let's look at some examples of these guidelines.
 198
 199 </P>
 200 <P>
 201 <A NAME="IDX163"></A>
 202 Translatable strings should be in good English style.  If slang language
 203 with abbreviations and shortcuts is used, often translators will not
 204 understand the message and will produce very inappropriate translations.
 205
 206 </P>
 207
 208 <PRE>
 209 "%s: is parameter\n"
 210 </PRE>
 211
 212 <P>
 213 This is nearly untranslatable: Is the displayed item <EM>a</EM> parameter or
 214 <EM>the</EM> parameter?
 215
 216 </P>
 217
 218 <PRE>
 219 "No match"
 220 </PRE>
 221
 222 <P>
 223 The ambiguity in this message makes it ununderstandable: Is the program
 224 attempting to set something on fire? Does it mean "The given object does
 225 not match the template"? Does it mean "The template does not fit for any
 226 of the objects"?
 227
 228 </P>
 229 <P>
 230 <A NAME="IDX164"></A>
 231 In both cases, adding more words to the message will help both the
 232 translator and the English speaking user.
 233
 234 </P>
 235 <P>
 236 <A NAME="IDX165"></A>
 237 Translatable strings should be entire sentences.  It is often not possible
 238 to translate single verbs or adjectives in a substitutable way.
 239
 240 </P>
 241
 242 <PRE>
 243 printf ("File %s is %s protected", filename, rw ? "write" : "read");
 244 </PRE>
 245
 246 <P>
 247 Most translators will not look at the source and will thus only see the
 248 string <CODE>"File %s is %s protected"</CODE>, which is unintelligible.  Change
 249 this to
 250
 251 </P>
 252
 253 <PRE>
 254 printf (rw ? "File %s is write protected" : "File %s is read protected",
 255         filename);
 256 </PRE>
 257
 258 <P>
 259 This way the translator will not only understand the message, she will
 260 also be able to find the appropriate grammatical construction.  The French
 261 translator for example translates "write protected" like "protected
 262 against writing".
 263
 264 </P>
 265 <P>
 266 Entire sentences are also important because in many languages, the
 267 declination of some word in a sentence depends on the gender or the
 268 number (singular/plural) of another part of the sentence.  There are
 269 usually more interdependencies between words than in English.  The
 270 consequence is that asking a translator to translate two half-sentences
 271 and then combining these two half-sentences through dumb string concatenation
 272 will not work, for many languages, even though it would work for English.
 273 That's why translators need to handle entire sentences.
 274
 275 </P>
 276 <P>
 277 Often sentences don't fit into a single line.  If a sentence is output
 278 using two subsequent <CODE>printf</CODE> statements, like this
 279
 280 </P>
 281
 282 <PRE>
 283 printf ("Locale charset \"%s\" is different from\n", lcharset);
 284 printf ("input file charset \"%s\".\n", fcharset);
 285 </PRE>
 286
 287 <P>
 288 the translator would have to translate two half sentences, but nothing
 289 in the POT file would tell her that the two half sentences belong together.
 290 It is necessary to merge the two <CODE>printf</CODE> statements so that the
 291 translator can handle the entire sentence at once and decide at which
 292 place to insert a line break in the translation (if at all):
 293
 294 </P>
 295
 296 <PRE>
 297 printf ("Locale charset \"%s\" is different from\n\
 298 input file charset \"%s\".\n", lcharset, fcharset);
 299 </PRE>
 300
 301 <P>
 302 You may now ask: how about two or more adjacent sentences? Like in this case:
 303
 304 </P>
 305
 306 <PRE>
 307 puts ("Apollo 13 scenario: Stack overflow handling failed.");
 308 puts ("On the next stack overflow we will crash!!!");
 309 </PRE>
 310
 311 <P>
 312 Should these two statements merged into a single one? I would recommend to
 313 merge them if the two sentences are related to each other, because then it
 314 makes it easier for the translator to understand and translate both.  On
 315 the other hand, if one of the two messages is a stereotypic one, occurring
 316 in other places as well, you will do a favour to the translator by not
 317 merging the two.  (Identical messages occurring in several places are
 318 combined by xgettext, so the translator has to handle them once only.)
 319
 320 </P>
 321 <P>
 322 <A NAME="IDX166"></A>
 323 Translatable strings should be limited to one paragraph; don't let a
 324 single message be longer than ten lines.  The reason is that when the
 325 translatable string changes, the translator is faced with the task of
 326 updating the entire translated string.  Maybe only a single word will
 327 have changed in the English string, but the translator doesn't see that
 328 (with the current translation tools), therefore she has to proofread
 329 the entire message.
 330
 331 </P>
 332 <P>
 333 <A NAME="IDX167"></A>
 334 Many GNU programs have a <SAMP>`--help&acute;</SAMP> output that extends over several
 335 screen pages.  It is a courtesy towards the translators to split such a
 336 message into several ones of five to ten lines each.  While doing that,
 337 you can also attempt to split the documented options into groups,
 338 such as the input options, the output options, and the informative
 339 output options.  This will help every user to find the option he is
 340 looking for.
 341
 342 </P>
 343 <P>
 344 <A NAME="IDX168"></A>
 345 <A NAME="IDX169"></A>
 346 Hardcoded string concatenation is sometimes used to construct English
 347 strings:
 348
 349 </P>
 350
 351 <PRE>
 352 strcpy (s, "Replace ");
 353 strcat (s, object1);
 354 strcat (s, " with ");
 355 strcat (s, object2);
 356 strcat (s, "?");
 357 </PRE>
 358
 359 <P>
 360 In order to present to the translator only entire sentences, and also
 361 because in some languages the translator might want to swap the order
 362 of <CODE>object1</CODE> and <CODE>object2</CODE>, it is necessary to change this
 363 to use a format string:
 364
 365 </P>
 366
 367 <PRE>
 368 sprintf (s, "Replace %s with %s?", object1, object2);
 369 </PRE>
 370
 371 <P>
 372 <A NAME="IDX170"></A>
 373 A similar case is compile time concatenation of strings.  The ISO C 99
 374 include file <CODE>&#60;inttypes.h&#62;</CODE> contains a macro <CODE>PRId64</CODE> that
 375 can be used as a formatting directive for outputting an <SAMP>`int64_t&acute;</SAMP>
 376 integer through <CODE>printf</CODE>.  It expands to a constant string, usually
 377 "d" or "ld" or "lld" or something like this, depending on the platform.
 378 Assume you have code like
 379
 380 </P>
 381
 382 <PRE>
 383 printf ("The amount is %0" PRId64 "\n", number);
 384 </PRE>
 385
 386 <P>
 387 The <CODE>gettext</CODE> tools and library have special support for these
 388 <CODE>&#60;inttypes.h&#62;</CODE> macros.  You can therefore simply write
 389
 390 </P>
 391
 392 <PRE>
 393 printf (gettext ("The amount is %0" PRId64 "\n"), number);
 394 </PRE>
 395
 396 <P>
 397 The PO file will contain the string "The amount is %0&#60;PRId64&#62;\n".
 398 The translators will provide a translation containing "%0&#60;PRId64&#62;"
 399 as well, and at runtime the <CODE>gettext</CODE> function's result will
 400 contain the appropriate constant string, "d" or "ld" or "lld".
 401
 402 </P>
 403 <P>
 404 This works only for the predefined <CODE>&#60;inttypes.h&#62;</CODE> macros.  If
 405 you have defined your own similar macros, let's say <SAMP>`MYPRId64&acute;</SAMP>,
 406 that are not known to <CODE>xgettext</CODE>, the solution for this problem
 407 is to change the code like this:
 408
 409 </P>
 410
 411 <PRE>
 412 char buf1[100];
 413 sprintf (buf1, "%0" MYPRId64, number);
 414 printf (gettext ("The amount is %s\n"), buf1);
 415 </PRE>
 416
 417 <P>
 418 This means, you put the platform dependent code in one statement, and the
 419 internationalization code in a different statement.  Note that a buffer length
 420 of 100 is safe, because all available hardware integer types are limited to
 421 128 bits, and to print a 128 bit integer one needs at most 54 characters,
 422 regardless whether in decimal, octal or hexadecimal.
 423
 424 </P>
 425 <P>
 426 <A NAME="IDX171"></A>
 427 <A NAME="IDX172"></A>
 428 All this applies to other programming languages as well.  For example, in
 429 Java and C#, string contenation is very frequently used, because it is a
 430 compiler built-in operator.  Like in C, in Java, you would change
 431
 432 </P>
 433
 434 <PRE>
 435 System.out.println("Replace "+object1+" with "+object2+"?");
 436 </PRE>
 437
 438 <P>
 439 into a statement involving a format string:
 440
 441 </P>
 442
 443 <PRE>
 444 System.out.println(
 445     MessageFormat.format("Replace {0} with {1}?",
 446                          new Object[] { object1, object2 }));
 447 </PRE>
 448
 449 <P>
 450 Similarly, in C#, you would change
 451
 452 </P>
 453
 454 <PRE>
 455 Console.WriteLine("Replace "+object1+" with "+object2+"?");
 456 </PRE>
 457
 458 <P>
 459 into a statement involving a format string:
 460
 461 </P>
 462
 463 <PRE>
 464 Console.WriteLine(
 465     String.Format("Replace {0} with {1}?", object1, object2));
 466 </PRE>
 467
 468
 469
 470 <H2><A NAME="SEC16" HREF="gettext_toc.html#TOC16">3.3  How Marks Appear in Sources</A></H2>
 471 <P>
 472 <A NAME="IDX173"></A>
 473
 474 </P>
 475 <P>
 476 All strings requiring translation should be marked in the C sources.  Marking
 477 is done in such a way that each translatable string appears to be
 478 the sole argument of some function or preprocessor macro.  There are
 479 only a few such possible functions or macros meant for translation,
 480 and their names are said to be marking keywords.  The marking is
 481 attached to strings themselves, rather than to what we do with them.
 482 This approach has more uses.  A blatant example is an error message
 483 produced by formatting.  The format string needs translation, as
 484 well as some strings inserted through some <SAMP>`%s&acute;</SAMP> specification
 485 in the format, while the result from <CODE>sprintf</CODE> may have so many
 486 different instances that it is impractical to list them all in some
 487 <SAMP>`error_string_out()&acute;</SAMP> routine, say.
 488
 489 </P>
 490 <P>
 491 This marking operation has two goals.  The first goal of marking
 492 is for triggering the retrieval of the translation, at run time.
 493 The keyword are possibly resolved into a routine able to dynamically
 494 return the proper translation, as far as possible or wanted, for the
 495 argument string.  Most localizable strings are found in executable
 496 positions, that is, attached to variables or given as parameters to
 497 functions.  But this is not universal usage, and some translatable
 498 strings appear in structured initializations.  See section <A HREF="gettext_3.html#SEC19">3.6  Special Cases of Translatable Strings</A>.
 499
 500 </P>
 501 <P>
 502 The second goal of the marking operation is to help <CODE>xgettext</CODE>
 503 at properly extracting all translatable strings when it scans a set
 504 of program sources and produces PO file templates.
 505
 506 </P>
 507 <P>
 508 The canonical keyword for marking translatable strings is
 509 <SAMP>`gettext&acute;</SAMP>, it gave its name to the whole GNU <CODE>gettext</CODE>
 510 package.  For packages making only light use of the <SAMP>`gettext&acute;</SAMP>
 511 keyword, macro or function, it is easily used <EM>as is</EM>.  However,
 512 for packages using the <CODE>gettext</CODE> interface more heavily, it
 513 is usually more convenient to give the main keyword a shorter, less
 514 obtrusive name.  Indeed, the keyword might appear on a lot of strings
 515 all over the package, and programmers usually do not want nor need
 516 their program sources to remind them forcefully, all the time, that they
 517 are internationalized.  Further, a long keyword has the disadvantage
 518 of using more horizontal space, forcing more indentation work on
 519 sources for those trying to keep them within 79 or 80 columns.
 520
 521 </P>
 522 <P>
 523 <A NAME="IDX174"></A>
 524 Many packages use <SAMP>`_&acute;</SAMP> (a simple underline) as a keyword,
 525 and write <SAMP>`_("Translatable string")&acute;</SAMP> instead of <SAMP>`gettext
 526 ("Translatable string")&acute;</SAMP>.  Further, the coding rule, from GNU standards,
 527 wanting that there is a space between the keyword and the opening
 528 parenthesis is relaxed, in practice, for this particular usage.
 529 So, the textual overhead per translatable string is reduced to
 530 only three characters: the underline and the two parentheses.
 531 However, even if GNU <CODE>gettext</CODE> uses this convention internally,
 532 it does not offer it officially.  The real, genuine keyword is truly
 533 <SAMP>`gettext&acute;</SAMP> indeed.  It is fairly easy for those wanting to use
 534 <SAMP>`_&acute;</SAMP> instead of <SAMP>`gettext&acute;</SAMP> to declare:
 535
 536 </P>
 537
 538 <PRE>
 539 #include &#60;libintl.h&#62;
 540 #define _(String) gettext (String)
 541 </PRE>
 542
 543 <P>
 544 instead of merely using <SAMP>`#include &#60;libintl.h&#62;&acute;</SAMP>.
 545
 546 </P>
 547 <P>
 548 Later on, the maintenance is relatively easy.  If, as a programmer,
 549 you add or modify a string, you will have to ask yourself if the
 550 new or altered string requires translation, and include it within
 551 <SAMP>`_()&acute;</SAMP> if you think it should be translated.  <SAMP>`"%s: %d"&acute;</SAMP> is
 552 an example of string <EM>not</EM> requiring translation!
 553
 554 </P>
 555
 556
 557 <H2><A NAME="SEC17" HREF="gettext_toc.html#TOC17">3.4  Marking Translatable Strings</A></H2>
 558 <P>
 559 <A NAME="IDX175"></A>
 560
 561 </P>
 562 <P>
 563 In PO mode, one set of features is meant more for the programmer than
 564 for the translator, and allows him to interactively mark which strings,
 565 in a set of program sources, are translatable, and which are not.
 566 Even if it is a fairly easy job for a programmer to find and mark
 567 such strings by other means, using any editor of his choice, PO mode
 568 makes this work more comfortable.  Further, this gives translators
 569 who feel a little like programmers, or programmers who feel a little
 570 like translators, a tool letting them work at marking translatable
 571 strings in the program sources, while simultaneously producing a set of
 572 translation in some language, for the package being internationalized.
 573
 574 </P>
 575 <P>
 576 <A NAME="IDX176"></A>
 577 The set of program sources, targetted by the PO mode commands describe
 578 here, should have an Emacs tags table constructed for your project,
 579 prior to using these PO file commands.  This is easy to do.  In any
 580 shell window, change the directory to the root of your project, then
 581 execute a command resembling:
 582
 583 </P>
 584
 585 <PRE>
 586 etags src/*.[hc] lib/*.[hc]
 587 </PRE>
 588
 589 <P>
 590 presuming here you want to process all <TT>`.h&acute;</TT> and <TT>`.c&acute;</TT> files
 591 from the <TT>`src/&acute;</TT> and <TT>`lib/&acute;</TT> directories.  This command will
 592 explore all said files and create a <TT>`TAGS&acute;</TT> file in your root
 593 directory, somewhat summarizing the contents using a special file
 594 format Emacs can understand.
 595
 596 </P>
 597 <P>
 598 <A NAME="IDX177"></A>
 599 For packages following the GNU coding standards, there is
 600 a make goal <CODE>tags</CODE> or <CODE>TAGS</CODE> which constructs the tag files in
 601 all directories and for all files containing source code.
 602
 603 </P>
 604 <P>
 605 Once your <TT>`TAGS&acute;</TT> file is ready, the following commands assist
 606 the programmer at marking translatable strings in his set of sources.
 607 But these commands are necessarily driven from within a PO file
 608 window, and it is likely that you do not even have such a PO file yet.
 609 This is not a problem at all, as you may safely open a new, empty PO
 610 file, mainly for using these commands.  This empty PO file will slowly
 611 fill in while you mark strings as translatable in your program sources.
 612
 613 </P>
 614 <DL COMPACT>
 615
 616 <DT><KBD>,</KBD>
 617 <DD>
 618 <A NAME="IDX178"></A>
 619 Search through program sources for a string which looks like a
 620 candidate for translation (<CODE>po-tags-search</CODE>).
 621
 622 <DT><KBD>M-,</KBD>
 623 <DD>
 624 <A NAME="IDX179"></A>
 625 Mark the last string found with <SAMP>`_()&acute;</SAMP> (<CODE>po-mark-translatable</CODE>).
 626
 627 <DT><KBD>M-.</KBD>
 628 <DD>
 629 <A NAME="IDX180"></A>
 630 Mark the last string found with a keyword taken from a set of possible
 631 keywords.  This command with a prefix allows some management of these
 632 keywords (<CODE>po-select-mark-and-mark</CODE>).
 633
 634 </DL>
 635
 636 <P>
 637 <A NAME="IDX181"></A>
 638 The <KBD>,</KBD> (<CODE>po-tags-search</CODE>) command searches for the next
 639 occurrence of a string which looks like a possible candidate for
 640 translation, and displays the program source in another Emacs window,
 641 positioned in such a way that the string is near the top of this other
 642 window.  If the string is too big to fit whole in this window, it is
 643 positioned so only its end is shown.  In any case, the cursor
 644 is left in the PO file window.  If the shown string would be better
 645 presented differently in different native languages, you may mark it
 646 using <KBD>M-,</KBD> or <KBD>M-.</KBD>.  Otherwise, you might rather ignore it
 647 and skip to the next string by merely repeating the <KBD>,</KBD> command.
 648
 649 </P>
 650 <P>
 651 A string is a good candidate for translation if it contains a sequence
 652 of three or more letters.  A string containing at most two letters in
 653 a row will be considered as a candidate if it has more letters than
 654 non-letters.  The command disregards strings containing no letters,
 655 or isolated letters only.  It also disregards strings within comments,
 656 or strings already marked with some keyword PO mode knows (see below).
 657
 658 </P>
 659 <P>
 660 If you have never told Emacs about some <TT>`TAGS&acute;</TT> file to use, the
 661 command will request that you specify one from the minibuffer, the
 662 first time you use the command.  You may later change your <TT>`TAGS&acute;</TT>
 663 file by using the regular Emacs command <KBD>M-x visit-tags-table</KBD>,
 664 which will ask you to name the precise <TT>`TAGS&acute;</TT> file you want
 665 to use.  See section `Tag Tables' in <CITE>The Emacs Editor</CITE>.
 666
 667 </P>
 668 <P>
 669 Each time you use the <KBD>,</KBD> command, the search resumes from where it was
 670 left by the previous search, and goes through all program sources,
 671 obeying the <TT>`TAGS&acute;</TT> file, until all sources have been processed.
 672 However, by giving a prefix argument to the command (<KBD>C-u
 673 ,)</KBD>, you may request that the search be restarted all over again
 674 from the first program source; but in this case, strings that you
 675 recently marked as translatable will be automatically skipped.
 676
 677 </P>
 678 <P>
 679 Using this <KBD>,</KBD> command does not prevent using of other regular
 680 Emacs tags commands.  For example, regular <CODE>tags-search</CODE> or
 681 <CODE>tags-query-replace</CODE> commands may be used without disrupting the
 682 independent <KBD>,</KBD> search sequence.  However, as implemented, the
 683 <EM>initial</EM> <KBD>,</KBD> command (or the <KBD>,</KBD> command is used with a
 684 prefix) might also reinitialize the regular Emacs tags searching to the
 685 first tags file, this reinitialization might be considered spurious.
 686
 687 </P>
 688 <P>
 689 <A NAME="IDX182"></A>
 690 <A NAME="IDX183"></A>
 691 The <KBD>M-,</KBD> (<CODE>po-mark-translatable</CODE>) command will mark the
 692 recently found string with the <SAMP>`_&acute;</SAMP> keyword.  The <KBD>M-.</KBD>
 693 (<CODE>po-select-mark-and-mark</CODE>) command will request that you type
 694 one keyword from the minibuffer and use that keyword for marking
 695 the string.  Both commands will automatically create a new PO file
 696 untranslated entry for the string being marked, and make it the
 697 current entry (making it easy for you to immediately proceed to its
 698 translation, if you feel like doing it right away).  It is possible
 699 that the modifications made to the program source by <KBD>M-,</KBD> or
 700 <KBD>M-.</KBD> render some source line longer than 80 columns, forcing you
 701 to break and re-indent this line differently.  You may use the <KBD>O</KBD>
 702 command from PO mode, or any other window changing command from
 703 Emacs, to break out into the program source window, and do any
 704 needed adjustments.  You will have to use some regular Emacs command
 705 to return the cursor to the PO file window, if you want command
 706 <KBD>,</KBD> for the next string, say.
 707
 708 </P>
 709 <P>
 710 The <KBD>M-.</KBD> command has a few built-in speedups, so you do not
 711 have to explicitly type all keywords all the time.  The first such
 712 speedup is that you are presented with a <EM>preferred</EM> keyword,
 713 which you may accept by merely typing <KBD><KBD>RET</KBD></KBD> at the prompt.
 714 The second speedup is that you may type any non-ambiguous prefix of the
 715 keyword you really mean, and the command will complete it automatically
 716 for you.  This also means that PO mode has to <EM>know</EM> all
 717 your possible keywords, and that it will not accept mistyped keywords.
 718
 719 </P>
 720 <P>
 721 If you reply <KBD>?</KBD> to the keyword request, the command gives a
 722 list of all known keywords, from which you may choose.  When the
 723 command is prefixed by an argument (<KBD>C-u M-.</KBD>), it inhibits
 724 updating any program source or PO file buffer, and does some simple
 725 keyword management instead.  In this case, the command asks for a
 726 keyword, written in full, which becomes a new allowed keyword for
 727 later <KBD>M-.</KBD> commands.  Moreover, this new keyword automatically
 728 becomes the <EM>preferred</EM> keyword for later commands.  By typing
 729 an already known keyword in response to <KBD>C-u M-.</KBD>, one merely
 730 changes the <EM>preferred</EM> keyword and does nothing more.
 731
 732 </P>
 733 <P>
 734 All keywords known for <KBD>M-.</KBD> are recognized by the <KBD>,</KBD> command
 735 when scanning for strings, and strings already marked by any of those
 736 known keywords are automatically skipped.  If many PO files are opened
 737 simultaneously, each one has its own independent set of known keywords.
 738 There is no provision in PO mode, currently, for deleting a known
 739 keyword, you have to quit the file (maybe using <KBD>q</KBD>) and reopen
 740 it afresh.  When a PO file is newly brought up in an Emacs window, only
 741 <SAMP>`gettext&acute;</SAMP> and <SAMP>`_&acute;</SAMP> are known as keywords, and <SAMP>`gettext&acute;</SAMP>
 742 is preferred for the <KBD>M-.</KBD> command.  In fact, this is not useful to
 743 prefer <SAMP>`_&acute;</SAMP>, as this one is already built in the <KBD>M-,</KBD> command.
 744
 745 </P>
 746
 747
 748 <H2><A NAME="SEC18" HREF="gettext_toc.html#TOC18">3.5  Special Comments preceding Keywords</A></H2>
 749
 750 <P>
 751 <A NAME="IDX184"></A>
 752 In C programs strings are often used within calls of functions from the
 753 <CODE>printf</CODE> family.  The special thing about these format strings is
 754 that they can contain format specifiers introduced with <KBD>%</KBD>.  Assume
 755 we have the code
 756
 757 </P>
 758
 759 <PRE>
 760 printf (gettext ("String `%s' has %d characters\n"), s, strlen (s));
 761 </PRE>
 762
 763 <P>
 764 A possible German translation for the above string might be:
 765
 766 </P>
 767
 768 <PRE>
 769 "%d Zeichen lang ist die Zeichenkette `%s'"
 770 </PRE>
 771
 772 <P>
 773 A C programmer, even if he cannot speak German, will recognize that
 774 there is something wrong here.  The order of the two format specifiers
 775 is changed but of course the arguments in the <CODE>printf</CODE> don't have.
 776 This will most probably lead to problems because now the length of the
 777 string is regarded as the address.
 778
 779 </P>
 780 <P>
 781 To prevent errors at runtime caused by translations the <CODE>msgfmt</CODE>
 782 tool can check statically whether the arguments in the original and the
 783 translation string match in type and number.  If this is not the case
 784 and the <SAMP>`-c&acute;</SAMP> option has been passed to <CODE>msgfmt</CODE>, <CODE>msgfmt</CODE>
 785 will give an error and refuse to produce a MO file.  Thus consequent
 786 use of <SAMP>`msgfmt -c&acute;</SAMP> will catch the error, so that it cannot cause
 787 cause problems at runtime.
 788
 789 </P>
 790 <P>
 791 If the word order in the above German translation would be correct one
 792 would have to write
 793
 794 </P>
 795
 796 <PRE>
 797 "%2$d Zeichen lang ist die Zeichenkette `%1$s'"
 798 </PRE>
 799
 800 <P>
 801 The routines in <CODE>msgfmt</CODE> know about this special notation.
 802
 803 </P>
 804 <P>
 805 Because not all strings in a program must be format strings it is not
 806 useful for <CODE>msgfmt</CODE> to test all the strings in the <TT>`.po&acute;</TT> file.
 807 This might cause problems because the string might contain what looks
 808 like a format specifier, but the string is not used in <CODE>printf</CODE>.
 809
 810 </P>
 811 <P>
 812 Therefore the <CODE>xgettext</CODE> adds a special tag to those messages it
 813 thinks might be a format string.  There is no absolute rule for this,
 814 only a heuristic.  In the <TT>`.po&acute;</TT> file the entry is marked using the
 815 <CODE>c-format</CODE> flag in the <CODE>#,</CODE> comment line (see section <A HREF="gettext_2.html#SEC9">2.2  The Format of PO Files</A>).
 816
 817 </P>
 818 <P>
 819 <A NAME="IDX185"></A>
 820 <A NAME="IDX186"></A>
 821 The careful reader now might say that this again can cause problems.
 822 The heuristic might guess it wrong.  This is true and therefore
 823 <CODE>xgettext</CODE> knows about a special kind of comment which lets
 824 the programmer take over the decision.  If in the same line as or
 825 the immediately preceding line to the <CODE>gettext</CODE> keyword
 826 the <CODE>xgettext</CODE> program finds a comment containing the words
 827 <CODE>xgettext:c-format</CODE>, it will mark the string in any case with
 828 the <CODE>c-format</CODE> flag.  This kind of comment should be used when
 829 <CODE>xgettext</CODE> does not recognize the string as a format string but
 830 it really is one and it should be tested.  Please note that when the
 831 comment is in the same line as the <CODE>gettext</CODE> keyword, it must be
 832 before the string to be translated.
 833
 834 </P>
 835 <P>
 836 This situation happens quite often.  The <CODE>printf</CODE> function is often
 837 called with strings which do not contain a format specifier.  Of course
 838 one would normally use <CODE>fputs</CODE> but it does happen.  In this case
 839 <CODE>xgettext</CODE> does not recognize this as a format string but what
 840 happens if the translation introduces a valid format specifier?  The
 841 <CODE>printf</CODE> function will try to access one of the parameters but none
 842 exists because the original code does not pass any parameters.
 843
 844 </P>
 845 <P>
 846 <CODE>xgettext</CODE> of course could make a wrong decision the other way
 847 round, i.e. a string marked as a format string actually is not a format
 848 string.  In this case the <CODE>msgfmt</CODE> might give too many warnings and
 849 would prevent translating the <TT>`.po&acute;</TT> file.  The method to prevent
 850 this wrong decision is similar to the one used above, only the comment
 851 to use must contain the string <CODE>xgettext:no-c-format</CODE>.
 852
 853 </P>
 854 <P>
 855 If a string is marked with <CODE>c-format</CODE> and this is not correct the
 856 user can find out who is responsible for the decision.  See
 857 section <A HREF="gettext_4.html#SEC23">4.1  Invoking the <CODE>xgettext</CODE> Program</A> to see how the <CODE>--debug</CODE> option can be
 858 used for solving this problem.
 859
 860 </P>
 861
 862
 863 <H2><A NAME="SEC19" HREF="gettext_toc.html#TOC19">3.6  Special Cases of Translatable Strings</A></H2>
 864
 865 <P>
 866 <A NAME="IDX187"></A>
 867 The attentive reader might now point out that it is not always possible
 868 to mark translatable string with <CODE>gettext</CODE> or something like this.
 869 Consider the following case:
 870
 871 </P>
 872
 873 <PRE>
 874 {
 875   static const char *messages[] = {
 876     "some very meaningful message",
 877     "and another one"
 878   };
 879   const char *string;
 880   ...
 881   string
 882     = index &#62; 1 ? "a default message" : messages[index];
 883
 884   fputs (string);
 885   ...
 886 }
 887 </PRE>
 888
 889 <P>
 890 While it is no problem to mark the string <CODE>"a default message"</CODE> it
 891 is not possible to mark the string initializers for <CODE>messages</CODE>.
 892 What is to be done?  We have to fulfill two tasks.  First we have to mark the
 893 strings so that the <CODE>xgettext</CODE> program (see section <A HREF="gettext_4.html#SEC23">4.1  Invoking the <CODE>xgettext</CODE> Program</A>)
 894 can find them, and second we have to translate the string at runtime
 895 before printing them.
 896
 897 </P>
 898 <P>
 899 The first task can be fulfilled by creating a new keyword, which names a
 900 no-op.  For the second we have to mark all access points to a string
 901 from the array.  So one solution can look like this:
 902
 903 </P>
 904
 905 <PRE>
 906 #define gettext_noop(String) String
 907
 908 {
 909   static const char *messages[] = {
 910     gettext_noop ("some very meaningful message"),
 911     gettext_noop ("and another one")
 912   };
 913   const char *string;
 914   ...
 915   string
 916     = index &#62; 1 ? gettext ("a default message") : gettext (messages[index]);
 917
 918   fputs (string);
 919   ...
 920 }
 921 </PRE>
 922
 923 <P>
 924 Please convince yourself that the string which is written by
 925 <CODE>fputs</CODE> is translated in any case.  How to get <CODE>xgettext</CODE> know
 926 the additional keyword <CODE>gettext_noop</CODE> is explained in section <A HREF="gettext_4.html#SEC23">4.1  Invoking the <CODE>xgettext</CODE> Program</A>.
 927
 928 </P>
 929 <P>
 930 The above is of course not the only solution.  You could also come along
 931 with the following one:
 932
 933 </P>
 934
 935 <PRE>
 936 #define gettext_noop(String) String
 937
 938 {
 939   static const char *messages[] = {
 940     gettext_noop ("some very meaningful message",
 941     gettext_noop ("and another one")
 942   };
 943   const char *string;
 944   ...
 945   string
 946     = index &#62; 1 ? gettext_noop ("a default message") : messages[index];
 947
 948   fputs (gettext (string));
 949   ...
 950 }
 951 </PRE>
 952
 953 <P>
 954 But this has a drawback.  The programmer has to take care that
 955 he uses <CODE>gettext_noop</CODE> for the string <CODE>"a default message"</CODE>.
 956 A use of <CODE>gettext</CODE> could have in rare cases unpredictable results.
 957
 958 </P>
 959 <P>
 960 One advantage is that you need not make control flow analysis to make
 961 sure the output is really translated in any case.  But this analysis is
 962 generally not very difficult.  If it should be in any situation you can
 963 use this second method in this situation.
 964
 965 </P>
 966
 967
 968 <H2><A NAME="SEC20" HREF="gettext_toc.html#TOC20">3.7  Marking Proper Names for Translation</A></H2>
 969
 970 <P>
 971 Should names of persons, cities, locations etc. be marked for translation
 972 or not?  People who only know languages that can be written with Latin
 973 letters (English, Spanish, French, German, etc.) are tempted to say "no",
 974 because names usually do not change when transported between these languages.
 975 However, in general when translating from one script to another, names
 976 are translated too, usually phonetically or by transliteration.  For
 977 example, Russian or Greek names are converted to the Latin alphabet when
 978 being translated to English, and English or French names are converted
 979 to the Katakana script when being translated to Japanese.  This is
 980 necessary because the speakers of the target language in general cannot
 981 read the script the name is originally written in.
 982
 983 </P>
 984 <P>
 985 As a programmer, you should therefore make sure that names are marked
 986 for translation, with a special comment telling the translators that it
 987 is a proper name and how to pronounce it.  Like this:
 988
 989 </P>
 990
 991 <PRE>
 992 printf (_("Written by %s.\n"),
 993         /* TRANSLATORS: This is a proper name.  See the gettext
 994            manual, section Names.  Note this is actually a non-ASCII
 995            name: The first name is (with Unicode escapes)
 996            "Fran\u00e7ois" or (with HTML entities) "Fran&#38;ccedil;ois".
 997            Pronounciation is like "fraa-swa pee-nar".  */
 998         _("Francois Pinard"));
 999 </PRE>
1000
1001 <P>
1002 As a translator, you should use some care when translating names, because
1003 it is frustrating if people see their names mutilated or distorted.  If
1004 your language uses the Latin script, all you need to do is to reproduce
1005 the name as perfectly as you can within the usual character set of your
1006 language.  In this particular case, this means to provide a translation
1007 containing the c-cedilla character.  If your language uses a different
1008 script and the people speaking it don't usually read Latin words, it means
1009 transliteration; but you should still give, in parentheses, the original
1010 writing of the name -- for the sake of the people that do read the Latin
1011 script.  Here is an example, using Greek as the target script:
1012
1013 </P>
1014
1015 <PRE>
1016 #. This is a proper name.  See the gettext
1017 #. manual, section Names.  Note this is actually a non-ASCII
1018 #. name: The first name is (with Unicode escapes)
1019 #. "Fran\u00e7ois" or (with HTML entities) "Fran&#38;ccedil;ois".
1020 #. Pronounciation is like "fraa-swa pee-nar".
1021 msgid "Francois Pinard"
1022 msgstr "\phi\rho\alpha\sigma\omicron\alpha \pi\iota\nu\alpha\rho"
1023        " (Francois Pinard)"
1024 </PRE>
1025
1026 <P>
1027 Because translation of names is such a sensitive domain, it is a good
1028 idea to test your translation before submitting it.
1029
1030 </P>
1031 <P>
1032 The translation project <A HREF="http://sourceforge.net/projects/translation">http://sourceforge.net/projects/translation</A>
1033 has set up a POT file and translation domain consisting of program author
1034 names, with better facilities for the translator than those presented here.
1035 Namely, there the original name is written directly in Unicode (rather
1036 than with Unicode escapes or HTML entities), and the pronounciation is
1037 denoted using the International Phonetic Alphabet (see
1038 <A HREF="http://www.wikipedia.org/wiki/International_Phonetic_Alphabet">http://www.wikipedia.org/wiki/International_Phonetic_Alphabet</A>).
1039
1040 </P>
1041 <P>
1042 However, we don't recommend this approach for all POT files in all packages,
1043 because this would force translators to use PO files in UTF-8 encoding,
1044 which is - in the current state of software (as of 2003) - a major hassle
1045 for translators using GNU Emacs or XEmacs with po-mode.
1046
1047 </P>
1048
1049
1050 <H2><A NAME="SEC21" HREF="gettext_toc.html#TOC21">3.8  Preparing Library Sources</A></H2>
1051
1052 <P>
1053 When you are preparing a library, not a program, for the use of
1054 <CODE>gettext</CODE>, only a few details are different.  Here we assume that
1055 the library has a translation domain and a POT file of its own.  (If
1056 it uses the translation domain and POT file of the main program, then
1057 the previous sections apply without changes.)
1058
1059 </P>
1060
1061 <OL>
1062 <LI>
1063
1064 The library code doesn't call <CODE>setlocale (LC_ALL, "")</CODE>.  It's the
1065 responsibility of the main program to set the locale.  The library's
1066 documentation should mention this fact, so that developers of programs
1067 using the library are aware of it.
1068
1069 <LI>
1070
1071 The library code doesn't call <CODE>textdomain (PACKAGE)</CODE>, because it
1072 would interfere with the text domain set by the main program.
1073
1074 <LI>
1075
1076 The initialization code for a program was
1077
1078
1079 <PRE>
1080   setlocale (LC_ALL, "");
1081   bindtextdomain (PACKAGE, LOCALEDIR);
1082   textdomain (PACKAGE);
1083 </PRE>
1084
1085 For a library it is reduced to
1086
1087
1088 <PRE>
1089   bindtextdomain (PACKAGE, LOCALEDIR);
1090 </PRE>
1091
1092 If your library's API doesn't already have an initialization function,
1093 you need to create one, containing at least the <CODE>bindtextdomain</CODE>
1094 invocation.  However, you usually don't need to export and document this
1095 initialization function: It is sufficient that all entry points of the
1096 library call the initialization function if it hasn't been called before.
1097 The typical idiom used to achieve this is a static boolean variable that
1098 indicates whether the initialization function has been called. Like this:
1099
1100
1101 <PRE>
1102 static bool libfoo_initialized;
1103
1104 static void
1105 libfoo_initialize (void)
1106 {
1107   bindtextdomain (PACKAGE, LOCALEDIR);
1108   libfoo_initialized = true;
1109 }
1110
1111 /* This function is part of the exported API.  */
1112 struct foo *
1113 create_foo (...)
1114 {
1115   /* Must ensure the initialization is performed.  */
1116   if (!libfoo_initialized)
1117     libfoo_initialize ();
1118   ...
1119 }
1120
1121 /* This function is part of the exported API.  The argument must be
1122    non-NULL and have been created through create_foo().  */
1123 int
1124 foo_refcount (struct foo *argument)
1125 {
1126   /* No need to invoke the initialization function here, because
1127      create_foo() must already have been called before.  */
1128   ...
1129 }
1130 </PRE>
1131
1132 <LI>
1133
1134 The usual declaration of the <SAMP>`_&acute;</SAMP> macro in each source file was
1135
1136
1137 <PRE>
1138 #include &#60;libintl.h&#62;
1139 #define _(String) gettext (String)
1140 </PRE>
1141
1142 for a program.  For a library, which has its own translation domain,
1143 it reads like this:
1144
1145
1146 <PRE>
1147 #include &#60;libintl.h&#62;
1148 #define _(String) dgettext (PACKAGE, String)
1149 </PRE>
1150
1151 In other words, <CODE>dgettext</CODE> is used instead of <CODE>gettext</CODE>.
1152 Similary, the <CODE>dngettext</CODE> function should be used in place of the
1153 <CODE>ngettext</CODE> function.
1154 </OL>
1155
1156 <P><HR><P>
1157 Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_2.html">previous</A>, <A HREF="gettext_4.html">next</A>, <A HREF="gettext_22.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
1158 </BODY>
1159 </HTML>