manual/string.texi

   1 @node String and Array Utilities, Extended Characters, Character Handling, Top
   2 @chapter String and Array Utilities
   3
   4 Operations on strings (or arrays of characters) are an important part of
   5 many programs.  The GNU C library provides an extensive set of string
   6 utility functions, including functions for copying, concatenating,
   7 comparing, and searching strings.  Many of these functions can also
   8 operate on arbitrary regions of storage; for example, the @code{memcpy}
   9 function can be used to copy the contents of any kind of array.
  10
  11 It's fairly common for beginning C programmers to ``reinvent the wheel''
  12 by duplicating this functionality in their own code, but it pays to
  13 become familiar with the library functions and to make use of them,
  14 since this offers benefits in maintenance, efficiency, and portability.
  15
  16 For instance, you could easily compare one string to another in two
  17 lines of C code, but if you use the built-in @code{strcmp} function,
  18 you're less likely to make a mistake.  And, since these library
  19 functions are typically highly optimized, your program may run faster
  20 too.
  21
  22 @menu
  23 * Representation of Strings::   Introduction to basic concepts.
  24 * String/Array Conventions::    Whether to use a string function or an
  25                                  arbitrary array function.
  26 * String Length::               Determining the length of a string.
  27 * Copying and Concatenation::   Functions to copy the contents of strings
  28                                  and arrays.
  29 * String/Array Comparison::     Functions for byte-wise and character-wise
  30                                  comparison.
  31 * Collation Functions::         Functions for collating strings.
  32 * Search Functions::            Searching for a specific element or substring.
  33 * Finding Tokens in a String::  Splitting a string into tokens by looking
  34                                  for delimiters.
  35 * Encode Binary Data::          Encoding and Decoding of Binary Data.
  36 * Argz and Envz Vectors::       Null-separated string vectors.
  37 @end menu
  38
  39 @node Representation of Strings
  40 @section Representation of Strings
  41 @cindex string, representation of
  42
  43 This section is a quick summary of string concepts for beginning C
  44 programmers.  It describes how character strings are represented in C
  45 and some common pitfalls.  If you are already familiar with this
  46 material, you can skip this section.
  47
  48 @cindex string
  49 @cindex null character
  50 A @dfn{string} is an array of @code{char} objects.  But string-valued
  51 variables are usually declared to be pointers of type @code{char *}.
  52 Such variables do not include space for the text of a string; that has
  53 to be stored somewhere else---in an array variable, a string constant,
  54 or dynamically allocated memory (@pxref{Memory Allocation}).  It's up to
  55 you to store the address of the chosen memory space into the pointer
  56 variable.  Alternatively you can store a @dfn{null pointer} in the
  57 pointer variable.  The null pointer does not point anywhere, so
  58 attempting to reference the string it points to gets an error.
  59
  60 By convention, a @dfn{null character}, @code{'\0'}, marks the end of a
  61 string.  For example, in testing to see whether the @code{char *}
  62 variable @var{p} points to a null character marking the end of a string,
  63 you can write @code{!*@var{p}} or @code{*@var{p} == '\0'}.
  64
  65 A null character is quite different conceptually from a null pointer,
  66 although both are represented by the integer @code{0}.
  67
  68 @cindex string literal
  69 @dfn{String literals} appear in C program source as strings of
  70 characters between double-quote characters (@samp{"}).  In @w{ISO C},
  71 string literals can also be formed by @dfn{string concatenation}:
  72 @code{"a" "b"} is the same as @code{"ab"}.  Modification of string
  73 literals is not allowed by the GNU C compiler, because literals
  74 are placed in read-only storage.
  75
  76 Character arrays that are declared @code{const} cannot be modified
  77 either.  It's generally good style to declare non-modifiable string
  78 pointers to be of type @code{const char *}, since this often allows the
  79 C compiler to detect accidental modifications as well as providing some
  80 amount of documentation about what your program intends to do with the
  81 string.
  82
  83 The amount of memory allocated for the character array may extend past
  84 the null character that normally marks the end of the string.  In this
  85 document, the term @dfn{allocation size} is always used to refer to the
  86 total amount of memory allocated for the string, while the term
  87 @dfn{length} refers to the number of characters up to (but not
  88 including) the terminating null character.
  89 @cindex length of string
  90 @cindex allocation size of string
  91 @cindex size of string
  92 @cindex string length
  93 @cindex string allocation
  94
  95 A notorious source of program bugs is trying to put more characters in a
  96 string than fit in its allocated size.  When writing code that extends
  97 strings or moves characters into a pre-allocated array, you should be
  98 very careful to keep track of the length of the text and make explicit
  99 checks for overflowing the array.  Many of the library functions
 100 @emph{do not} do this for you!  Remember also that you need to allocate
 101 an extra byte to hold the null character that marks the end of the
 102 string.
 103
 104 @node String/Array Conventions
 105 @section String and Array Conventions
 106
 107 This chapter describes both functions that work on arbitrary arrays or
 108 blocks of memory, and functions that are specific to null-terminated
 109 arrays of characters.
 110
 111 Functions that operate on arbitrary blocks of memory have names
 112 beginning with @samp{mem} (such as @code{memcpy}) and invariably take an
 113 argument which specifies the size (in bytes) of the block of memory to
 114 operate on.  The array arguments and return values for these functions
 115 have type @code{void *}, and as a matter of style, the elements of these
 116 arrays are referred to as ``bytes''.  You can pass any kind of pointer
 117 to these functions, and the @code{sizeof} operator is useful in
 118 computing the value for the size argument.
 119
 120 In contrast, functions that operate specifically on strings have names
 121 beginning with @samp{str} (such as @code{strcpy}) and look for a null
 122 character to terminate the string instead of requiring an explicit size
 123 argument to be passed.  (Some of these functions accept a specified
 124 maximum length, but they also check for premature termination with a
 125 null character.)  The array arguments and return values for these
 126 functions have type @code{char *}, and the array elements are referred
 127 to as ``characters''.
 128
 129 In many cases, there are both @samp{mem} and @samp{str} versions of a
 130 function.  The one that is more appropriate to use depends on the exact
 131 situation.  When your program is manipulating arbitrary arrays or blocks of
 132 storage, then you should always use the @samp{mem} functions.  On the
 133 other hand, when you are manipulating null-terminated strings it is
 134 usually more convenient to use the @samp{str} functions, unless you
 135 already know the length of the string in advance.
 136
 137 @node String Length
 138 @section String Length
 139
 140 You can get the length of a string using the @code{strlen} function.
 141 This function is declared in the header file @file{string.h}.
 142 @pindex string.h
 143
 144 @comment string.h
 145 @comment ISO
 146 @deftypefun size_t strlen (const char *@var{s})
 147 The @code{strlen} function returns the length of the null-terminated
 148 string @var{s}.  (In other words, it returns the offset of the terminating
 149 null character within the array.)
 150
 151 For example,
 152 @smallexample
 153 strlen ("hello, world")
 154     @result{} 12
 155 @end smallexample
 156
 157 When applied to a character array, the @code{strlen} function returns
 158 the length of the string stored there, not its allocation size.  You can
 159 get the allocation size of the character array that holds a string using
 160 the @code{sizeof} operator:
 161
 162 @smallexample
 163 char string[32] = "hello, world";
 164 sizeof (string)
 165     @result{} 32
 166 strlen (string)
 167     @result{} 12
 168 @end smallexample
 169 @end deftypefun
 170
 171 @node Copying and Concatenation
 172 @section Copying and Concatenation
 173
 174 You can use the functions described in this section to copy the contents
 175 of strings and arrays, or to append the contents of one string to
 176 another.  These functions are declared in the header file
 177 @file{string.h}.
 178 @pindex string.h
 179 @cindex copying strings and arrays
 180 @cindex string copy functions
 181 @cindex array copy functions
 182 @cindex concatenating strings
 183 @cindex string concatenation functions
 184
 185 A helpful way to remember the ordering of the arguments to the functions
 186 in this section is that it corresponds to an assignment expression, with
 187 the destination array specified to the left of the source array.  All
 188 of these functions return the address of the destination array.
 189
 190 Most of these functions do not work properly if the source and
 191 destination arrays overlap.  For example, if the beginning of the
 192 destination array overlaps the end of the source array, the original
 193 contents of that part of the source array may get overwritten before it
 194 is copied.  Even worse, in the case of the string functions, the null
 195 character marking the end of the string may be lost, and the copy
 196 function might get stuck in a loop trashing all the memory allocated to
 197 your program.
 198
 199 All functions that have problems copying between overlapping arrays are
 200 explicitly identified in this manual.  In addition to functions in this
 201 section, there are a few others like @code{sprintf} (@pxref{Formatted
 202 Output Functions}) and @code{scanf} (@pxref{Formatted Input
 203 Functions}).
 204
 205 @comment string.h
 206 @comment ISO
 207 @deftypefun {void *} memcpy (void *@var{to}, const void *@var{from}, size_t @var{size})
 208 The @code{memcpy} function copies @var{size} bytes from the object
 209 beginning at @var{from} into the object beginning at @var{to}.  The
 210 behavior of this function is undefined if the two arrays @var{to} and
 211 @var{from} overlap; use @code{memmove} instead if overlapping is possible.
 212
 213 The value returned by @code{memcpy} is the value of @var{to}.
 214
 215 Here is an example of how you might use @code{memcpy} to copy the
 216 contents of an array:
 217
 218 @smallexample
 219 struct foo *oldarray, *newarray;
 220 int arraysize;
 221 @dots{}
 222 memcpy (new, old, arraysize * sizeof (struct foo));
 223 @end smallexample
 224 @end deftypefun
 225
 226 @comment string.h
 227 @comment ISO
 228 @deftypefun {void *} memmove (void *@var{to}, const void *@var{from}, size_t @var{size})
 229 @code{memmove} copies the @var{size} bytes at @var{from} into the
 230 @var{size} bytes at @var{to}, even if those two blocks of space
 231 overlap.  In the case of overlap, @code{memmove} is careful to copy the
 232 original values of the bytes in the block at @var{from}, including those
 233 bytes which also belong to the block at @var{to}.
 234 @end deftypefun
 235
 236 @comment string.h
 237 @comment SVID
 238 @deftypefun {void *} memccpy (void *@var{to}, const void *@var{from}, int @var{c}, size_t @var{size})
 239 This function copies no more than @var{size} bytes from @var{from} to
 240 @var{to}, stopping if a byte matching @var{c} is found.  The return
 241 value is a pointer into @var{to} one byte past where @var{c} was copied,
 242 or a null pointer if no byte matching @var{c} appeared in the first
 243 @var{size} bytes of @var{from}.
 244 @end deftypefun
 245
 246 @comment string.h
 247 @comment ISO
 248 @deftypefun {void *} memset (void *@var{block}, int @var{c}, size_t @var{size})
 249 This function copies the value of @var{c} (converted to an
 250 @code{unsigned char}) into each of the first @var{size} bytes of the
 251 object beginning at @var{block}.  It returns the value of @var{block}.
 252 @end deftypefun
 253
 254 @comment string.h
 255 @comment ISO
 256 @deftypefun {char *} strcpy (char *@var{to}, const char *@var{from})
 257 This copies characters from the string @var{from} (up to and including
 258 the terminating null character) into the string @var{to}.  Like
 259 @code{memcpy}, this function has undefined results if the strings
 260 overlap.  The return value is the value of @var{to}.
 261 @end deftypefun
 262
 263 @comment string.h
 264 @comment ISO
 265 @deftypefun {char *} strncpy (char *@var{to}, const char *@var{from}, size_t @var{size})
 266 This function is similar to @code{strcpy} but always copies exactly
 267 @var{size} characters into @var{to}.
 268
 269 If the length of @var{from} is more than @var{size}, then @code{strncpy}
 270 copies just the first @var{size} characters.  Note that in this case
 271 there is no null terminator written into @var{to}.
 272
 273 If the length of @var{from} is less than @var{size}, then @code{strncpy}
 274 copies all of @var{from}, followed by enough null characters to add up
 275 to @var{size} characters in all.  This behavior is rarely useful, but it
 276 is specified by the @w{ISO C} standard.
 277
 278 The behavior of @code{strncpy} is undefined if the strings overlap.
 279
 280 Using @code{strncpy} as opposed to @code{strcpy} is a way to avoid bugs
 281 relating to writing past the end of the allocated space for @var{to}.
 282 However, it can also make your program much slower in one common case:
 283 copying a string which is probably small into a potentially large buffer.
 284 In this case, @var{size} may be large, and when it is, @code{strncpy} will
 285 waste a considerable amount of time copying null characters.
 286 @end deftypefun
 287
 288 @comment string.h
 289 @comment SVID
 290 @deftypefun {char *} strdup (const char *@var{s})
 291 This function copies the null-terminated string @var{s} into a newly
 292 allocated string.  The string is allocated using @code{malloc}; see
 293 @ref{Unconstrained Allocation}.  If @code{malloc} cannot allocate space
 294 for the new string, @code{strdup} returns a null pointer.  Otherwise it
 295 returns a pointer to the new string.
 296 @end deftypefun
 297
 298 @comment string.h
 299 @comment GNU
 300 @deftypefun {char *} strndup (const char *@var{s}, size_t @var{size})
 301 This function is similar to @code{strdup} but always copies at most
 302 @var{size} characters into the newly allocated string.
 303
 304 If the length of @var{s} is more than @var{size}, then @code{strndup}
 305 copies just the first @var{size} characters and adds a closing null
 306 terminator.  Otherwise all characters are copied and the string is
 307 terminated.
 308
 309 This function is different to @code{strncpy} in that it always
 310 terminates the destination string.
 311 @end deftypefun
 312
 313 @comment string.h
 314 @comment Unknown origin
 315 @deftypefun {char *} stpcpy (char *@var{to}, const char *@var{from})
 316 This function is like @code{strcpy}, except that it returns a pointer to
 317 the end of the string @var{to} (that is, the address of the terminating
 318 null character) rather than the beginning.
 319
 320 For example, this program uses @code{stpcpy} to concatenate @samp{foo}
 321 and @samp{bar} to produce @samp{foobar}, which it then prints.
 322
 323 @smallexample
 324 @include stpcpy.c.texi
 325 @end smallexample
 326
 327 This function is not part of the ISO or POSIX standards, and is not
 328 customary on Unix systems, but we did not invent it either.  Perhaps it
 329 comes from MS-DOG.
 330
 331 Its behavior is undefined if the strings overlap.
 332 @end deftypefun
 333
 334 @comment string.h
 335 @comment GNU
 336 @deftypefun {char *} stpncpy (char *@var{to}, const char *@var{from}, size_t @var{size})
 337 This function is similar to @code{stpcpy} but copies always exactly
 338 @var{size} characters into @var{to}.
 339
 340 If the length of @var{from} is more then @var{size}, then @code{stpncpy}
 341 copies just the first @var{size} characters and returns a pointer to the
 342 character directly following the one which was copied last.  Note that in
 343 this case there is no null terminator written into @var{to}.
 344
 345 If the length of @var{from} is less than @var{size}, then @code{stpncpy}
 346 copies all of @var{from}, followed by enough null characters to add up
 347 to @var{size} characters in all.  This behaviour is rarely useful, but it
 348 is implemented to be useful in contexts where this behaviour of the
 349 @code{strncpy} is used.  @code{stpncpy} returns a pointer to the
 350 @emph{first} written null character.
 351
 352 This function is not part of ISO or POSIX but was found useful while
 353 developing GNU C Library itself.
 354
 355 Its behaviour is undefined if the strings overlap.
 356 @end deftypefun
 357
 358 @comment string.h
 359 @comment GNU
 360 @deftypefun {char *} strdupa (const char *@var{s})
 361 This function is similar to @code{strdup} but allocates the new string
 362 using @code{alloca} instead of @code{malloc}
 363 @pxref{Variable Size Automatic}.  This means of course the returned
 364 string has the same limitations as any block of memory allocated using
 365 @code{alloca}.
 366
 367 For obvious reasons @code{strdupa} is implemented only as a macro.  I.e.,
 368 you cannot get the address of this function.  Despite this limitations
 369 it is a useful function.  The following code shows a situation where
 370 using @code{malloc} would be a lot more expensive.
 371
 372 @smallexample
 373 @include strdupa.c.texi
 374 @end smallexample
 375
 376 Please note that calling @code{strtok} using @var{path} directly is
 377 illegal.
 378
 379 This function is only available if GNU CC is used.
 380 @end deftypefun
 381
 382 @comment string.h
 383 @comment GNU
 384 @deftypefun {char *} strndupa (const char *@var{s}, size_t @var{size})
 385 This function is similar to @code{strndup} but like @code{strdupa} it
 386 allocates the new string using @code{alloca}
 387 @pxref{Variable Size Automatic}.  The same advantages and limitations
 388 of @code{strdupa} are valid for @code{strndupa}, too.
 389
 390 This function is implemented only as a macro which means one cannot
 391 get the address of it.
 392
 393 @code{strndupa} is only available if GNU CC is used.
 394 @end deftypefun
 395
 396 @comment string.h
 397 @comment ISO
 398 @deftypefun {char *} strcat (char *@var{to}, const char *@var{from})
 399 The @code{strcat} function is similar to @code{strcpy}, except that the
 400 characters from @var{from} are concatenated or appended to the end of
 401 @var{to}, instead of overwriting it.  That is, the first character from
 402 @var{from} overwrites the null character marking the end of @var{to}.
 403
 404 An equivalent definition for @code{strcat} would be:
 405
 406 @smallexample
 407 char *
 408 strcat (char *to, const char *from)
 409 @{
 410   strcpy (to + strlen (to), from);
 411   return to;
 412 @}
 413 @end smallexample
 414
 415 This function has undefined results if the strings overlap.
 416 @end deftypefun
 417
 418 @comment string.h
 419 @comment ISO
 420 @deftypefun {char *} strncat (char *@var{to}, const char *@var{from}, size_t @var{size})
 421 This function is like @code{strcat} except that not more than @var{size}
 422 characters from @var{from} are appended to the end of @var{to}.  A
 423 single null character is also always appended to @var{to}, so the total
 424 allocated size of @var{to} must be at least @code{@var{size} + 1} bytes
 425 longer than its initial length.
 426
 427 The @code{strncat} function could be implemented like this:
 428
 429 @smallexample
 430 @group
 431 char *
 432 strncat (char *to, const char *from, size_t size)
 433 @{
 434   strncpy (to + strlen (to), from, size);
 435   return to;
 436 @}
 437 @end group
 438 @end smallexample
 439
 440 The behavior of @code{strncat} is undefined if the strings overlap.
 441 @end deftypefun
 442
 443 Here is an example showing the use of @code{strncpy} and @code{strncat}.
 444 Notice how, in the call to @code{strncat}, the @var{size} parameter
 445 is computed to avoid overflowing the character array @code{buffer}.
 446
 447 @smallexample
 448 @include strncat.c.texi
 449 @end smallexample
 450
 451 @noindent
 452 The output produced by this program looks like:
 453
 454 @smallexample
 455 hello
 456 hello, wo
 457 @end smallexample
 458
 459 @comment string.h
 460 @comment BSD
 461 @deftypefun {void *} bcopy (void *@var{from}, const void *@var{to}, size_t @var{size})
 462 This is a partially obsolete alternative for @code{memmove}, derived from
 463 BSD.  Note that it is not quite equivalent to @code{memmove}, because the
 464 arguments are not in the same order.
 465 @end deftypefun
 466
 467 @comment string.h
 468 @comment BSD
 469 @deftypefun {void *} bzero (void *@var{block}, size_t @var{size})
 470 This is a partially obsolete alternative for @code{memset}, derived from
 471 BSD.  Note that it is not as general as @code{memset}, because the only
 472 value it can store is zero.
 473 @end deftypefun
 474
 475 @node String/Array Comparison
 476 @section String/Array Comparison
 477 @cindex comparing strings and arrays
 478 @cindex string comparison functions
 479 @cindex array comparison functions
 480 @cindex predicates on strings
 481 @cindex predicates on arrays
 482
 483 You can use the functions in this section to perform comparisons on the
 484 contents of strings and arrays.  As well as checking for equality, these
 485 functions can also be used as the ordering functions for sorting
 486 operations.  @xref{Searching and Sorting}, for an example of this.
 487
 488 Unlike most comparison operations in C, the string comparison functions
 489 return a nonzero value if the strings are @emph{not} equivalent rather
 490 than if they are.  The sign of the value indicates the relative ordering
 491 of the first characters in the strings that are not equivalent:  a
 492 negative value indicates that the first string is ``less'' than the
 493 second, while a positive value indicates that the first string is
 494 ``greater''.
 495
 496 The most common use of these functions is to check only for equality.
 497 This is canonically done with an expression like @w{@samp{! strcmp (s1, s2)}}.
 498
 499 All of these functions are declared in the header file @file{string.h}.
 500 @pindex string.h
 501
 502 @comment string.h
 503 @comment ISO
 504 @deftypefun int memcmp (const void *@var{a1}, const void *@var{a2}, size_t @var{size})
 505 The function @code{memcmp} compares the @var{size} bytes of memory
 506 beginning at @var{a1} against the @var{size} bytes of memory beginning
 507 at @var{a2}.  The value returned has the same sign as the difference
 508 between the first differing pair of bytes (interpreted as @code{unsigned
 509 char} objects, then promoted to @code{int}).
 510
 511 If the contents of the two blocks are equal, @code{memcmp} returns
 512 @code{0}.
 513 @end deftypefun
 514
 515 On arbitrary arrays, the @code{memcmp} function is mostly useful for
 516 testing equality.  It usually isn't meaningful to do byte-wise ordering
 517 comparisons on arrays of things other than bytes.  For example, a
 518 byte-wise comparison on the bytes that make up floating-point numbers
 519 isn't likely to tell you anything about the relationship between the
 520 values of the floating-point numbers.
 521
 522 You should also be careful about using @code{memcmp} to compare objects
 523 that can contain ``holes'', such as the padding inserted into structure
 524 objects to enforce alignment requirements, extra space at the end of
 525 unions, and extra characters at the ends of strings whose length is less
 526 than their allocated size.  The contents of these ``holes'' are
 527 indeterminate and may cause strange behavior when performing byte-wise
 528 comparisons.  For more predictable results, perform an explicit
 529 component-wise comparison.
 530
 531 For example, given a structure type definition like:
 532
 533 @smallexample
 534 struct foo
 535   @{
 536     unsigned char tag;
 537     union
 538       @{
 539         double f;
 540         long i;
 541         char *p;
 542       @} value;
 543   @};
 544 @end smallexample
 545
 546 @noindent
 547 you are better off writing a specialized comparison function to compare
 548 @code{struct foo} objects instead of comparing them with @code{memcmp}.
 549
 550 @comment string.h
 551 @comment ISO
 552 @deftypefun int strcmp (const char *@var{s1}, const char *@var{s2})
 553 The @code{strcmp} function compares the string @var{s1} against
 554 @var{s2}, returning a value that has the same sign as the difference
 555 between the first differing pair of characters (interpreted as
 556 @code{unsigned char} objects, then promoted to @code{int}).
 557
 558 If the two strings are equal, @code{strcmp} returns @code{0}.
 559
 560 A consequence of the ordering used by @code{strcmp} is that if @var{s1}
 561 is an initial substring of @var{s2}, then @var{s1} is considered to be
 562 ``less than'' @var{s2}.
 563 @end deftypefun
 564
 565 @comment string.h
 566 @comment BSD
 567 @deftypefun int strcasecmp (const char *@var{s1}, const char *@var{s2})
 568 This function is like @code{strcmp}, except that differences in case
 569 are ignored.
 570
 571 @code{strcasecmp} is derived from BSD.
 572 @end deftypefun
 573
 574 @comment string.h
 575 @comment BSD
 576 @deftypefun int strncasecmp (const char *@var{s1}, const char *@var{s2}, size_t @var{n})
 577 This function is like @code{strncmp}, except that differences in case
 578 are ignored.
 579
 580 @code{strncasecmp} is a GNU extension.
 581 @end deftypefun
 582
 583 @comment string.h
 584 @comment ISO
 585 @deftypefun int strncmp (const char *@var{s1}, const char *@var{s2}, size_t @var{size})
 586 This function is the similar to @code{strcmp}, except that no more than
 587 @var{size} characters are compared.  In other words, if the two strings are
 588 the same in their first @var{size} characters, the return value is zero.
 589 @end deftypefun
 590
 591 Here are some examples showing the use of @code{strcmp} and @code{strncmp}.
 592 These examples assume the use of the ASCII character set.  (If some
 593 other character set---say, EBCDIC---is used instead, then the glyphs
 594 are associated with different numeric codes, and the return values
 595 and ordering may differ.)
 596
 597 @smallexample
 598 strcmp ("hello", "hello")
 599     @result{} 0    /* @r{These two strings are the same.} */
 600 strcmp ("hello", "Hello")
 601     @result{} 32   /* @r{Comparisons are case-sensitive.} */
 602 strcmp ("hello", "world")
 603     @result{} -15  /* @r{The character @code{'h'} comes before @code{'w'}.} */
 604 strcmp ("hello", "hello, world")
 605     @result{} -44  /* @r{Comparing a null character against a comma.} */
 606 strncmp ("hello", "hello, world", 5)
 607     @result{} 0    /* @r{The initial 5 characters are the same.} */
 608 strncmp ("hello, world", "hello, stupid world!!!", 5)
 609     @result{} 0    /* @r{The initial 5 characters are the same.} */
 610 @end smallexample
 611
 612 @comment string.h
 613 @comment GNU
 614 @deftypefun int strverscmp (const char *@var{s1}, const char *@var{s2})
 615 The @code{strverscmp} function compares the string @var{s1} against
 616 @var{s2}, considering them as holding indices/version numbers.  Return
 617 value follows the same conventions as found in the @code{strverscmp}
 618 function.  In fact, if @var{s1} and @var{s2} contain no digits,
 619 @code{strverscmp} behaves like @code{strcmp}.
 620
 621 Basically, we compare strings normaly (character by character), until
 622 we find a digit in each string - then we enter a special comparison
 623 mode, where each sequence of digit is taken as a whole.  If we reach the
 624 end of these two parts without noticing a difference, we return to the
 625 standard comparison mode.  There are two types of numeric parts:
 626 "integral" and "fractionnal" (these laters begins with a '0'). The types
 627 of the numeric parts affect the way we sort them:
 628
 629 @itemize @bullet
 630 @item
 631 integral/integral: we compare values as you would expect.
 632
 633 @item
 634 fractionnal/integral: the fractionnal part is less than the integral one.
 635 Again, no surprise.
 636
 637 @item
 638 fractionnal/fractionnal: the things become a bit more complex.
 639 if the common prefix contains only leading zeroes, the longest part is less
 640 than the other one; else the comparison behaves normaly.
 641 @end itemize
 642
 643 @smallexample
 644 strverscmp ("no digit", "no digit")
 645     @result{} 0    /* @r{same behaviour as strverscmp.} */
 646 strverscmp ("item#99", "item#100")
 647     @result{} <0   /* @r{same prefix, but 99 < 100.} */
 648 strverscmp ("alpha1", "alpha001")
 649     @result{} >0   /* @r{fractionnal part inferior to integral one.} */
 650 strverscmp ("part1_f012", "part1_f01")
 651     @result{} >0   /* @r{two fractionnal parts.} */
 652 strverscmp ("foo.009", "foo.0")
 653     @result{} <0   /* @r{idem, but with leading zeroes only.} */
 654 @end smallexample
 655
 656 This function is especially usefull when dealing with filename sorting,
 657 because filenames frequently hold indices/version numbers.
 658
 659 @code{strverscmp} is a GNU extension.
 660 @end deftypefun
 661
 662 @comment string.h
 663 @comment BSD
 664 @deftypefun int bcmp (const void *@var{a1}, const void *@var{a2}, size_t @var{size})
 665 This is an obsolete alias for @code{memcmp}, derived from BSD.
 666 @end deftypefun
 667
 668 @node Collation Functions
 669 @section Collation Functions
 670
 671 @cindex collating strings
 672 @cindex string collation functions
 673
 674 In some locales, the conventions for lexicographic ordering differ from
 675 the strict numeric ordering of character codes.  For example, in Spanish
 676 most glyphs with diacritical marks such as accents are not considered
 677 distinct letters for the purposes of collation.  On the other hand, the
 678 two-character sequence @samp{ll} is treated as a single letter that is
 679 collated immediately after @samp{l}.
 680
 681 You can use the functions @code{strcoll} and @code{strxfrm} (declared in
 682 the header file @file{string.h}) to compare strings using a collation
 683 ordering appropriate for the current locale.  The locale used by these
 684 functions in particular can be specified by setting the locale for the
 685 @code{LC_COLLATE} category; see @ref{Locales}.
 686 @pindex string.h
 687
 688 In the standard C locale, the collation sequence for @code{strcoll} is
 689 the same as that for @code{strcmp}.
 690
 691 Effectively, the way these functions work is by applying a mapping to
 692 transform the characters in a string to a byte sequence that represents
 693 the string's position in the collating sequence of the current locale.
 694 Comparing two such byte sequences in a simple fashion is equivalent to
 695 comparing the strings with the locale's collating sequence.
 696
 697 The function @code{strcoll} performs this translation implicitly, in
 698 order to do one comparison.  By contrast, @code{strxfrm} performs the
 699 mapping explicitly.  If you are making multiple comparisons using the
 700 same string or set of strings, it is likely to be more efficient to use
 701 @code{strxfrm} to transform all the strings just once, and subsequently
 702 compare the transformed strings with @code{strcmp}.
 703
 704 @comment string.h
 705 @comment ISO
 706 @deftypefun int strcoll (const char *@var{s1}, const char *@var{s2})
 707 The @code{strcoll} function is similar to @code{strcmp} but uses the
 708 collating sequence of the current locale for collation (the
 709 @code{LC_COLLATE} locale).
 710 @end deftypefun
 711
 712 Here is an example of sorting an array of strings, using @code{strcoll}
 713 to compare them.  The actual sort algorithm is not written here; it
 714 comes from @code{qsort} (@pxref{Array Sort Function}).  The job of the
 715 code shown here is to say how to compare the strings while sorting them.
 716 (Later on in this section, we will show a way to do this more
 717 efficiently using @code{strxfrm}.)
 718
 719 @smallexample
 720 /* @r{This is the comparison function used with @code{qsort}.} */
 721
 722 int
 723 compare_elements (char **p1, char **p2)
 724 @{
 725   return strcoll (*p1, *p2);
 726 @}
 727
 728 /* @r{This is the entry point---the function to sort}
 729    @r{strings using the locale's collating sequence.} */
 730
 731 void
 732 sort_strings (char **array, int nstrings)
 733 @{
 734   /* @r{Sort @code{temp_array} by comparing the strings.} */
 735   qsort (array, sizeof (char *),
 736          nstrings, compare_elements);
 737 @}
 738 @end smallexample
 739
 740 @cindex converting string to collation order
 741 @comment string.h
 742 @comment ISO
 743 @deftypefun size_t strxfrm (char *@var{to}, const char *@var{from}, size_t @var{size})
 744 The function @code{strxfrm} transforms @var{string} using the collation
 745 transformation determined by the locale currently selected for
 746 collation, and stores the transformed string in the array @var{to}.  Up
 747 to @var{size} characters (including a terminating null character) are
 748 stored.
 749
 750 The behavior is undefined if the strings @var{to} and @var{from}
 751 overlap; see @ref{Copying and Concatenation}.
 752
 753 The return value is the length of the entire transformed string.  This
 754 value is not affected by the value of @var{size}, but if it is greater
 755 or equal than @var{size}, it means that the transformed string did not
 756 entirely fit in the array @var{to}.  In this case, only as much of the
 757 string as actually fits was stored.  To get the whole transformed
 758 string, call @code{strxfrm} again with a bigger output array.
 759
 760 The transformed string may be longer than the original string, and it
 761 may also be shorter.
 762
 763 If @var{size} is zero, no characters are stored in @var{to}.  In this
 764 case, @code{strxfrm} simply returns the number of characters that would
 765 be the length of the transformed string.  This is useful for determining
 766 what size string to allocate.  It does not matter what @var{to} is if
 767 @var{size} is zero; @var{to} may even be a null pointer.
 768 @end deftypefun
 769
 770 Here is an example of how you can use @code{strxfrm} when
 771 you plan to do many comparisons.  It does the same thing as the previous
 772 example, but much faster, because it has to transform each string only
 773 once, no matter how many times it is compared with other strings.  Even
 774 the time needed to allocate and free storage is much less than the time
 775 we save, when there are many strings.
 776
 777 @smallexample
 778 struct sorter @{ char *input; char *transformed; @};
 779
 780 /* @r{This is the comparison function used with @code{qsort}}
 781    @r{to sort an array of @code{struct sorter}.} */
 782
 783 int
 784 compare_elements (struct sorter *p1, struct sorter *p2)
 785 @{
 786   return strcmp (p1->transformed, p2->transformed);
 787 @}
 788
 789 /* @r{This is the entry point---the function to sort}
 790    @r{strings using the locale's collating sequence.} */
 791
 792 void
 793 sort_strings_fast (char **array, int nstrings)
 794 @{
 795   struct sorter temp_array[nstrings];
 796   int i;
 797
 798   /* @r{Set up @code{temp_array}.  Each element contains}
 799      @r{one input string and its transformed string.} */
 800   for (i = 0; i < nstrings; i++)
 801     @{
 802       size_t length = strlen (array[i]) * 2;
 803       char *transformed;
 804       size_t transformed_lenght;
 805
 806       temp_array[i].input = array[i];
 807
 808       /* @r{First try a buffer perhaps big enough.}  */
 809       transformed = (char *) xmalloc (length);
 810
 811       /* @r{Transform @code{array[i]}.}  */
 812       transformed_length = strxfrm (transformed, array[i], length);
 813
 814       /* @r{If the buffer was not large enough, resize it}
 815          @r{and try again.}  */
 816       if (transformed_length >= length)
 817         @{
 818           /* @r{Allocate the needed space. +1 for terminating}
 819              @r{@code{NUL} character.}  */
 820           transformed = (char *) xrealloc (transformed,
 821                                            transformed_length + 1);
 822
 823           /* @r{The return value is not interesting because we know}
 824              @r{how long the transformed string is.}  */
 825           (void) strxfrm (transformed, array[i], transformed_length + 1);
 826         @}
 827
 828       temp_array[i].transformed = transformed;
 829     @}
 830
 831   /* @r{Sort @code{temp_array} by comparing transformed strings.} */
 832   qsort (temp_array, sizeof (struct sorter),
 833          nstrings, compare_elements);
 834
 835   /* @r{Put the elements back in the permanent array}
 836      @r{in their sorted order.} */
 837   for (i = 0; i < nstrings; i++)
 838     array[i] = temp_array[i].input;
 839
 840   /* @r{Free the strings we allocated.} */
 841   for (i = 0; i < nstrings; i++)
 842     free (temp_array[i].transformed);
 843 @}
 844 @end smallexample
 845
 846 @strong{Compatibility Note:}  The string collation functions are a new
 847 feature of @w{ISO C 89}.  Older C dialects have no equivalent feature.
 848
 849 @node Search Functions
 850 @section Search Functions
 851
 852 This section describes library functions which perform various kinds
 853 of searching operations on strings and arrays.  These functions are
 854 declared in the header file @file{string.h}.
 855 @pindex string.h
 856 @cindex search functions (for strings)
 857 @cindex string search functions
 858
 859 @comment string.h
 860 @comment ISO
 861 @deftypefun {void *} memchr (const void *@var{block}, int @var{c}, size_t @var{size})
 862 This function finds the first occurrence of the byte @var{c} (converted
 863 to an @code{unsigned char}) in the initial @var{size} bytes of the
 864 object beginning at @var{block}.  The return value is a pointer to the
 865 located byte, or a null pointer if no match was found.
 866 @end deftypefun
 867
 868 @comment string.h
 869 @comment ISO
 870 @deftypefun {char *} strchr (const char *@var{string}, int @var{c})
 871 The @code{strchr} function finds the first occurrence of the character
 872 @var{c} (converted to a @code{char}) in the null-terminated string
 873 beginning at @var{string}.  The return value is a pointer to the located
 874 character, or a null pointer if no match was found.
 875
 876 For example,
 877 @smallexample
 878 strchr ("hello, world", 'l')
 879     @result{} "llo, world"
 880 strchr ("hello, world", '?')
 881     @result{} NULL
 882 @end smallexample
 883
 884 The terminating null character is considered to be part of the string,
 885 so you can use this function get a pointer to the end of a string by
 886 specifying a null character as the value of the @var{c} argument.
 887 @end deftypefun
 888
 889 @comment string.h
 890 @comment BSD
 891 @deftypefun {char *} index (const char *@var{string}, int @var{c})
 892 @code{index} is another name for @code{strchr}; they are exactly the same.
 893 New code should always use @code{strchr} since this name is defined in
 894 @w{ISO C} while @code{index} is a BSD invention which never was available
 895 on @w{System V} derived systems.
 896 @end deftypefun
 897
 898 @comment string.h
 899 @comment ISO
 900 @deftypefun {char *} strrchr (const char *@var{string}, int @var{c})
 901 The function @code{strrchr} is like @code{strchr}, except that it searches
 902 backwards from the end of the string @var{string} (instead of forwards
 903 from the front).
 904
 905 For example,
 906 @smallexample
 907 strrchr ("hello, world", 'l')
 908     @result{} "ld"
 909 @end smallexample
 910 @end deftypefun
 911
 912 @comment string.h
 913 @comment BSD
 914 @deftypefun {char *} rindex (const char *@var{string}, int @var{c})
 915 @code{rindex} is another name for @code{strrchr}; they are exactly the same.
 916 New code should always use @code{strrchr} since this name is defined in
 917 @w{ISO C} while @code{rindex} is a BSD invention which never was available
 918 on @w{System V} derived systems.
 919 @end deftypefun
 920
 921 @comment string.h
 922 @comment ISO
 923 @deftypefun {char *} strstr (const char *@var{haystack}, const char *@var{needle})
 924 This is like @code{strchr}, except that it searches @var{haystack} for a
 925 substring @var{needle} rather than just a single character.  It
 926 returns a pointer into the string @var{haystack} that is the first
 927 character of the substring, or a null pointer if no match was found.  If
 928 @var{needle} is an empty string, the function returns @var{haystack}.
 929
 930 For example,
 931 @smallexample
 932 strstr ("hello, world", "l")
 933     @result{} "llo, world"
 934 strstr ("hello, world", "wo")
 935     @result{} "world"
 936 @end smallexample
 937 @end deftypefun
 938
 939
 940 @comment string.h
 941 @comment GNU
 942 @deftypefun {void *} memmem (const void *@var{haystack}, size_t @var{haystack-len},@*const void *@var{needle}, size_t @var{needle-len})
 943 This is like @code{strstr}, but @var{needle} and @var{haystack} are byte
 944 arrays rather than null-terminated strings.  @var{needle-len} is the
 945 length of @var{needle} and @var{haystack-len} is the length of
 946 @var{haystack}.@refill
 947
 948 This function is a GNU extension.
 949 @end deftypefun
 950
 951 @comment string.h
 952 @comment ISO
 953 @deftypefun size_t strspn (const char *@var{string}, const char *@var{skipset})
 954 The @code{strspn} (``string span'') function returns the length of the
 955 initial substring of @var{string} that consists entirely of characters that
 956 are members of the set specified by the string @var{skipset}.  The order
 957 of the characters in @var{skipset} is not important.
 958
 959 For example,
 960 @smallexample
 961 strspn ("hello, world", "abcdefghijklmnopqrstuvwxyz")
 962     @result{} 5
 963 @end smallexample
 964 @end deftypefun
 965
 966 @comment string.h
 967 @comment ISO
 968 @deftypefun size_t strcspn (const char *@var{string}, const char *@var{stopset})
 969 The @code{strcspn} (``string complement span'') function returns the length
 970 of the initial substring of @var{string} that consists entirely of characters
 971 that are @emph{not} members of the set specified by the string @var{stopset}.
 972 (In other words, it returns the offset of the first character in @var{string}
 973 that is a member of the set @var{stopset}.)
 974
 975 For example,
 976 @smallexample
 977 strcspn ("hello, world", " \t\n,.;!?")
 978     @result{} 5
 979 @end smallexample
 980 @end deftypefun
 981
 982 @comment string.h
 983 @comment ISO
 984 @deftypefun {char *} strpbrk (const char *@var{string}, const char *@var{stopset})
 985 The @code{strpbrk} (``string pointer break'') function is related to
 986 @code{strcspn}, except that it returns a pointer to the first character
 987 in @var{string} that is a member of the set @var{stopset} instead of the
 988 length of the initial substring.  It returns a null pointer if no such
 989 character from @var{stopset} is found.
 990
 991 @c @group  Invalid outside the example.
 992 For example,
 993
 994 @smallexample
 995 strpbrk ("hello, world", " \t\n,.;!?")
 996     @result{} ", world"
 997 @end smallexample
 998 @c @end group
 999 @end deftypefun
1000
1001 @node Finding Tokens in a String
1002 @section Finding Tokens in a String
1003
1004 @cindex tokenizing strings
1005 @cindex breaking a string into tokens
1006 @cindex parsing tokens from a string
1007 It's fairly common for programs to have a need to do some simple kinds
1008 of lexical analysis and parsing, such as splitting a command string up
1009 into tokens.  You can do this with the @code{strtok} function, declared
1010 in the header file @file{string.h}.
1011 @pindex string.h
1012
1013 @comment string.h
1014 @comment ISO
1015 @deftypefun {char *} strtok (char *@var{newstring}, const char *@var{delimiters})
1016 A string can be split into tokens by making a series of calls to the
1017 function @code{strtok}.
1018
1019 The string to be split up is passed as the @var{newstring} argument on
1020 the first call only.  The @code{strtok} function uses this to set up
1021 some internal state information.  Subsequent calls to get additional
1022 tokens from the same string are indicated by passing a null pointer as
1023 the @var{newstring} argument.  Calling @code{strtok} with another
1024 non-null @var{newstring} argument reinitializes the state information.
1025 It is guaranteed that no other library function ever calls @code{strtok}
1026 behind your back (which would mess up this internal state information).
1027
1028 The @var{delimiters} argument is a string that specifies a set of delimiters
1029 that may surround the token being extracted.  All the initial characters
1030 that are members of this set are discarded.  The first character that is
1031 @emph{not} a member of this set of delimiters marks the beginning of the
1032 next token.  The end of the token is found by looking for the next
1033 character that is a member of the delimiter set.  This character in the
1034 original string @var{newstring} is overwritten by a null character, and the
1035 pointer to the beginning of the token in @var{newstring} is returned.
1036
1037 On the next call to @code{strtok}, the searching begins at the next
1038 character beyond the one that marked the end of the previous token.
1039 Note that the set of delimiters @var{delimiters} do not have to be the
1040 same on every call in a series of calls to @code{strtok}.
1041
1042 If the end of the string @var{newstring} is reached, or if the remainder of
1043 string consists only of delimiter characters, @code{strtok} returns
1044 a null pointer.
1045 @end deftypefun
1046
1047 @strong{Warning:} Since @code{strtok} alters the string it is parsing,
1048 you always copy the string to a temporary buffer before parsing it with
1049 @code{strtok}.  If you allow @code{strtok} to modify a string that came
1050 from another part of your program, you are asking for trouble; that
1051 string may be part of a data structure that could be used for other
1052 purposes during the parsing, when alteration by @code{strtok} makes the
1053 data structure temporarily inaccurate.
1054
1055 The string that you are operating on might even be a constant.  Then
1056 when @code{strtok} tries to modify it, your program will get a fatal
1057 signal for writing in read-only memory.  @xref{Program Error Signals}.
1058
1059 This is a special case of a general principle: if a part of a program
1060 does not have as its purpose the modification of a certain data
1061 structure, then it is error-prone to modify the data structure
1062 temporarily.
1063
1064 The function @code{strtok} is not reentrant.  @xref{Nonreentrancy}, for
1065 a discussion of where and why reentrancy is important.
1066
1067 Here is a simple example showing the use of @code{strtok}.
1068
1069 @comment Yes, this example has been tested.
1070 @smallexample
1071 #include <string.h>
1072 #include <stddef.h>
1073
1074 @dots{}
1075
1076 const char string[] = "words separated by spaces -- and, punctuation!";
1077 const char delimiters[] = " .,;:!-";
1078 char *token, *cp;
1079
1080 @dots{}
1081
1082 cp = strdupa (string);                /* Make writable copy.  */
1083 token = strtok (cp, delimiters);      /* token => "words" */
1084 token = strtok (NULL, delimiters);    /* token => "separated" */
1085 token = strtok (NULL, delimiters);    /* token => "by" */
1086 token = strtok (NULL, delimiters);    /* token => "spaces" */
1087 token = strtok (NULL, delimiters);    /* token => "and" */
1088 token = strtok (NULL, delimiters);    /* token => "punctuation" */
1089 token = strtok (NULL, delimiters);    /* token => NULL */
1090 @end smallexample
1091
1092 The GNU C library contains two more functions for tokenizing a string
1093 which overcome the limitation of non-reentrancy.
1094
1095 @comment string.h
1096 @comment POSIX
1097 @deftypefun {char *} strtok_r (char *@var{newstring}, const char *@var{delimiters}, char **@var{save_ptr})
1098 Just like @code{strtok} this function splits the string into several
1099 tokens which can be accessed be successive calls to @code{strtok_r}.
1100 The difference is that the information about the next token is not set
1101 up in some internal state information.  Instead the caller has to
1102 provide another argument @var{save_ptr} which is a pointer to a string
1103 pointer.  Calling @code{strtok_r} with a null pointer for
1104 @var{newstring} and leaving @var{save_ptr} between the calls unchanged
1105 does the job without limiting reentrancy.
1106
1107 This function is defined in POSIX-1 and can be found on many systems
1108 which support multi-threading.
1109 @end deftypefun
1110
1111 @comment string.h
1112 @comment BSD
1113 @deftypefun {char *} strsep (char **@var{string_ptr}, const char *@var{delimiter})
1114 A second reentrant approach is to avoid the additional first argument.
1115 The initialization of the moving pointer has to be done by the user.
1116 Successive calls of @code{strsep} move the pointer along the tokens
1117 separated by @var{delimiter}, returning the address of the next token
1118 and updating @var{string_ptr} to point to the beginning of the next
1119 token.
1120
1121 This function was introduced in 4.3BSD and therefore is widely available.
1122 @end deftypefun
1123
1124 Here is how the above example looks like when @code{strsep} is used.
1125
1126 @comment Yes, this example has been tested.
1127 @smallexample
1128 #include <string.h>
1129 #include <stddef.h>
1130
1131 @dots{}
1132
1133 const char string[] = "words separated by spaces -- and, punctuation!";
1134 const char delimiters[] = " .,;:!-";
1135 char *running;
1136 char *token;
1137
1138 @dots{}
1139
1140 running = strdupa (string);
1141 token = strsep (&running, delimiters);    /* token => "words" */
1142 token = strsep (&running, delimiters);    /* token => "separated" */
1143 token = strsep (&running, delimiters);    /* token => "by" */
1144 token = strsep (&running, delimiters);    /* token => "spaces" */
1145 token = strsep (&running, delimiters);    /* token => "and" */
1146 token = strsep (&running, delimiters);    /* token => "punctuation" */
1147 token = strsep (&running, delimiters);    /* token => NULL */
1148 @end smallexample
1149
1150 @node Encode Binary Data
1151 @section Encode Binary Data
1152
1153 To store or transfer binary data in environments which only support text
1154 one has to encode the binary data by mapping the input bytes to
1155 characters in the range allowed for storing or transfering.  SVID
1156 systems (and nowadays XPG compliant systems) have such a function in the
1157 C library.
1158
1159 @comment stdlib.h
1160 @comment XPG
1161 @deftypefun {char *} l64a (long int @var{n})
1162 This function encodes an input value with 32 bits using characters from
1163 the basic character set.  Groups of 6 bits are encoded using the
1164 following table:
1165
1166 @multitable {xxxxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx}
1167 @item              @tab 0 @tab 1 @tab 2 @tab 3 @tab 4 @tab 5 @tab 6 @tab 7
1168 @item       0      @tab @code{.} @tab @code{/} @tab @code{0} @tab @code{1}
1169                    @tab @code{2} @tab @code{3} @tab @code{4} @tab @code{5}
1170 @item       8      @tab @code{6} @tab @code{7} @tab @code{8} @tab @code{9}
1171                    @tab @code{A} @tab @code{B} @tab @code{C} @tab @code{D}
1172 @item       16     @tab @code{E} @tab @code{F} @tab @code{G} @tab @code{H}
1173                    @tab @code{I} @tab @code{J} @tab @code{K} @tab @code{L}
1174 @item       24     @tab @code{M} @tab @code{N} @tab @code{O} @tab @code{P}
1175                    @tab @code{Q} @tab @code{R} @tab @code{S} @tab @code{T}
1176 @item       32     @tab @code{U} @tab @code{V} @tab @code{W} @tab @code{X}
1177                    @tab @code{Y} @tab @code{Z} @tab @code{a} @tab @code{b}
1178 @item       40     @tab @code{c} @tab @code{d} @tab @code{e} @tab @code{f}
1179                    @tab @code{g} @tab @code{h} @tab @code{i} @tab @code{j}
1180 @item       48     @tab @code{k} @tab @code{l} @tab @code{m} @tab @code{n}
1181                    @tab @code{o} @tab @code{p} @tab @code{q} @tab @code{r}
1182 @item       56     @tab @code{s} @tab @code{t} @tab @code{u} @tab @code{v}
1183                    @tab @code{w} @tab @code{x} @tab @code{y} @tab @code{z}
1184 @end multitable
1185
1186 The function returns a pointer to a static buffer which contains the
1187 string representing of the encoding of @var{n}.  To encoded a series of
1188 bytes the use should append the new string to the destination buffer.
1189 @emph{Warning:} Since a static buffer is used this function should not
1190 be used in multi-threaded programs.  There is no thread-safe alternative
1191 to this function in the C library.
1192 @end deftypefun
1193
1194 Alone the @code{l64a} function is not usable.  To encode arbitrary
1195 sequences of bytes one needs some more code and this could look like
1196 this:
1197
1198 @smallexample
1199 char *
1200 encode (const void *buf, size_t len)
1201 @{
1202   /* @r{We know in advance how long the buffer has to be.} */
1203   unsigned char *in = (unsigned char *) buf;
1204   char *out = malloc (6 + ((len + 3) / 4) * 6 + 1);
1205   char *cp = out;
1206
1207   /* @r{Encode the length.} */
1208   memcpy (cp, l64a (len), 6);
1209   cp += 6;
1210
1211   while (len > 3)
1212     @{
1213       unsigned long int n = *in++;
1214       n = (n << 8) | *in++;
1215       n = (n << 8) | *in++;
1216       n = (n << 8) | *in++;
1217       len -= 4;
1218       /* @r{Using `htonl' is necessary so that the data can be}
1219          @r{decoded even on machines with different byte order.} */
1220       memcpy (cp, l64a (htonl (n)), 6);
1221       cp += 6;
1222     @}
1223   if (len > 0)
1224     @{
1225       unsigned long int n = *in++;
1226       if (--len > 0)
1227         @{
1228           n = (n << 8) | *in++;
1229           if (--len > 0)
1230             n = (n << 8) | *in;
1231         @}
1232       memcpy (cp, l64a (htonl (n)), 6);
1233       cp += 6;
1234     @}
1235   *cp = '\0';
1236   return out;
1237 @}
1238 @end smallexample
1239
1240 It is strange that the library does not provide the complete
1241 functionality needed but so be it.  There are some other encoding
1242 methods which are much more widely used (UU encoding, Base64 encoding).
1243 Generally, it is better to use one of these encodings.
1244
1245 To decode data produced with @code{l64a} the following function should be
1246 used.
1247
1248 @comment stdlib.h
1249 @comment XPG
1250 @deftypefun {long int} a64l (const char *@var{string})
1251 The parameter @var{string} should contain a string which was produced by
1252 a call to @code{l64a}.  The function processes the next 6 characters and
1253 decodes the characters it finds according to the table above.
1254 Characters not in the conversion table are simply ignored.  This is
1255 useful for breaking the information in lines in which case the end of
1256 line characters are simply ignored.
1257
1258 The decoded number is returned at the end as a @code{long int} value.
1259 Consecutive calls to this function are possible but the caller must make
1260 sure the buffer pointer is update after each call to @code{a64l} since
1261 this function does not modify the buffer pointer.  Every call consumes 6
1262 characters.
1263 @end deftypefun
1264
1265 @node Argz and Envz Vectors
1266 @section Argz and Envz Vectors
1267
1268 @cindex argz vectors (string vectors)
1269 @cindex string vectors, null-character separated
1270 @cindex argument vectors, null-character separated
1271 @dfn{argz vectors} are vectors of strings in a contiguous block of
1272 memory, each element separated from its neighbors by null-characters
1273 (@code{'\0'}).
1274
1275 @cindex envz vectors (environment vectors)
1276 @cindex environment vectors, null-character separated
1277 @dfn{Envz vectors} are an extension of argz vectors where each element is a
1278 name-value pair, separated by a @code{'='} character (as in a Unix
1279 environment).
1280
1281 @menu
1282 * Argz Functions::              Operations on argz vectors.
1283 * Envz Functions::              Additional operations on environment vectors.
1284 @end menu
1285
1286 @node Argz Functions, Envz Functions, , Argz and Envz Vectors
1287 @subsection Argz Functions
1288
1289 Each argz vector is represented by a pointer to the first element, of
1290 type @code{char *}, and a size, of type @code{size_t}, both of which can
1291 be initialized to @code{0} to represent an empty argz vector.  All argz
1292 functions accept either a pointer and a size argument, or pointers to
1293 them, if they will be modified.
1294
1295 The argz functions use @code{malloc}/@code{realloc} to allocate/grow
1296 argz vectors, and so any argz vector creating using these functions may
1297 be freed by using @code{free}; conversely, any argz function that may
1298 grow a string expects that string to have been allocated using
1299 @code{malloc} (those argz functions that only examine their arguments or
1300 modify them in place will work on any sort of memory).
1301 @xref{Unconstrained Allocation}.
1302
1303 All argz functions that do memory allocation have a return type of
1304 @code{error_t}, and return @code{0} for success, and @code{ENOMEM} if an
1305 allocation error occurs.
1306
1307 @pindex argz.h
1308 These functions are declared in the standard include file @file{argz.h}.
1309
1310 @comment argz.h
1311 @comment GNU
1312 @deftypefun {error_t} argz_create (char *const @var{argv}[], char **@var{argz}, size_t *@var{argz_len})
1313 The @code{argz_create} function converts the Unix-style argument vector
1314 @var{argv} (a vector of pointers to normal C strings, terminated by
1315 @code{(char *)0}; @pxref{Program Arguments}) into an argz vector with
1316 the same elements, which is returned in @var{argz} and @var{argz_len}.
1317 @end deftypefun
1318
1319 @comment argz.h
1320 @comment GNU
1321 @deftypefun {error_t} argz_create_sep (const char *@var{string}, int @var{sep}, char **@var{argz}, size_t *@var{argz_len})
1322 The @code{argz_create_sep} function converts the null-terminated string
1323 @var{string} into an argz vector (returned in @var{argz} and
1324 @var{argz_len}) by splitting it into elements at every occurance of the
1325 character @var{sep}.
1326 @end deftypefun
1327
1328 @comment argz.h
1329 @comment GNU
1330 @deftypefun {size_t} argz_count (const char *@var{argz}, size_t @var{arg_len})
1331 Returns the number of elements in the argz vector @var{argz} and
1332 @var{argz_len}.
1333 @end deftypefun
1334
1335 @comment argz.h
1336 @comment GNU
1337 @deftypefun {void} argz_extract (char *@var{argz}, size_t @var{argz_len}, char **@var{argv})
1338 The @code{argz_extract} function converts the argz vector @var{argz} and
1339 @var{argz_len} into a Unix-style argument vector stored in @var{argv},
1340 by putting pointers to every element in @var{argz} into successive
1341 positions in @var{argv}, followed by a terminator of @code{0}.
1342 @var{Argv} must be pre-allocated with enough space to hold all the
1343 elements in @var{argz} plus the terminating @code{(char *)0}
1344 (@code{(argz_count (@var{argz}, @var{argz_len}) + 1) * sizeof (char *)}
1345 bytes should be enough).  Note that the string pointers stored into
1346 @var{argv} point into @var{argz}---they are not copies---and so
1347 @var{argz} must be copied if it will be changed while @var{argv} is
1348 still active.  This function is useful for passing the elements in
1349 @var{argz} to an exec function (@pxref{Executing a File}).
1350 @end deftypefun
1351
1352 @comment argz.h
1353 @comment GNU
1354 @deftypefun {void} argz_stringify (char *@var{argz}, size_t @var{len}, int @var{sep})
1355 The @code{argz_stringify} converts @var{argz} into a normal string with
1356 the elements separated by the character @var{sep}, by replacing each
1357 @code{'\0'} inside @var{argz} (except the last one, which terminates the
1358 string) with @var{sep}.  This is handy for printing @var{argz} in a
1359 readable manner.
1360 @end deftypefun
1361
1362 @comment argz.h
1363 @comment GNU
1364 @deftypefun {error_t} argz_add (char **@var{argz}, size_t *@var{argz_len}, const char *@var{str})
1365 The @code{argz_add} function adds the string @var{str} to the end of the
1366 argz vector @code{*@var{argz}}, and updates @code{*@var{argz}} and
1367 @code{*@var{argz_len}} accordingly.
1368 @end deftypefun
1369
1370 @comment argz.h
1371 @comment GNU
1372 @deftypefun {error_t} argz_add_sep (char **@var{argz}, size_t *@var{argz_len}, const char *@var{str}, int @var{delim})
1373 The @code{argz_add_sep} function is similar to @code{argz_add}, but
1374 @var{str} is split into separate elements in the result at occurances of
1375 the character @var{delim}.  This is useful, for instance, for
1376 adding the components of a Unix search path to an argz vector, by using
1377 a value of @code{':'} for @var{delim}.
1378 @end deftypefun
1379
1380 @comment argz.h
1381 @comment GNU
1382 @deftypefun {error_t} argz_append (char **@var{argz}, size_t *@var{argz_len}, const char *@var{buf}, size_t @var{buf_len})
1383 The @code{argz_append} function appends @var{buf_len} bytes starting at
1384 @var{buf} to the argz vector @code{*@var{argz}}, reallocating
1385 @code{*@var{argz}} to accommodate it, and adding @var{buf_len} to
1386 @code{*@var{argz_len}}.
1387 @end deftypefun
1388
1389 @comment argz.h
1390 @comment GNU
1391 @deftypefun {error_t} argz_delete (char **@var{argz}, size_t *@var{argz_len}, char *@var{entry})
1392 If @var{entry} points to the beginning of one of the elements in the
1393 argz vector @code{*@var{argz}}, the @code{argz_delete} function will
1394 remove this entry and reallocate @code{*@var{argz}}, modifying
1395 @code{*@var{argz}} and @code{*@var{argz_len}} accordingly.  Note that as
1396 destructive argz functions usually reallocate their argz argument,
1397 pointers into argz vectors such as @var{entry} will then become invalid.
1398 @end deftypefun
1399
1400 @comment argz.h
1401 @comment GNU
1402 @deftypefun {error_t} argz_insert (char **@var{argz}, size_t *@var{argz_len}, char *@var{before}, const char *@var{entry})
1403 The @code{argz_insert} function inserts the string @var{entry} into the
1404 argz vector @code{*@var{argz}} at a point just before the existing
1405 element pointed to by @var{before}, reallocating @code{*@var{argz}} and
1406 updating @code{*@var{argz}} and @code{*@var{argz_len}}.  If @var{before}
1407 is @code{0}, @var{entry} is added to the end instead (as if by
1408 @code{argz_add}).  Since the first element is in fact the same as
1409 @code{*@var{argz}}, passing in @code{*@var{argz}} as the value of
1410 @var{before} will result in @var{entry} being inserted at the beginning.
1411 @end deftypefun
1412
1413 @comment argz.h
1414 @comment GNU
1415 @deftypefun {char *} argz_next (char *@var{argz}, size_t @var{argz_len}, const char *@var{entry})
1416 The @code{argz_next} function provides a convenient way of iterating
1417 over the elements in the argz vector @var{argz}.  It returns a pointer
1418 to the next element in @var{argz} after the element @var{entry}, or
1419 @code{0} if there are no elements following @var{entry}.  If @var{entry}
1420 is @code{0}, the first element of @var{argz} is returned.
1421
1422 This behavior suggests two styles of iteration:
1423
1424 @smallexample
1425     char *entry = 0;
1426     while ((entry = argz_next (@var{argz}, @var{argz_len}, entry)))
1427       @var{action};
1428 @end smallexample
1429
1430 (the double parentheses are necessary to make some C compilers shut up
1431 about what they consider a questionable @code{while}-test) and:
1432
1433 @smallexample
1434     char *entry;
1435     for (entry = @var{argz};
1436          entry;
1437          entry = argz_next (@var{argz}, @var{argz_len}, entry))
1438       @var{action};
1439 @end smallexample
1440
1441 Note that the latter depends on @var{argz} having a value of @code{0} if
1442 it is empty (rather than a pointer to an empty block of memory); this
1443 invariant is maintained for argz vectors created by the functions here.
1444 @end deftypefun
1445
1446 @comment argz.h
1447 @comment GNU
1448 @deftypefun error_t argz_replace (@w{char **@var{argz}, size_t *@var{argz_len}}, @w{const char *@var{str}, const char *@var{with}}, @w{unsigned *@var{replace_count}})
1449 Replace any occurances of the string @var{str} in @var{argz} with
1450 @var{with}, reallocating @var{argz} as necessary.  If
1451 @var{replace_count} is non-zero, @code{*@var{replace_count}} will be
1452 incremented by number of replacements performed.
1453 @end deftypefun
1454
1455 @node Envz Functions, , Argz Functions, Argz and Envz Vectors
1456 @subsection Envz Functions
1457
1458 Envz vectors are just argz vectors with additional constraints on the form
1459 of each element; as such, argz functions can also be used on them, where it
1460 makes sense.
1461
1462 Each element in an envz vector is a name-value pair, separated by a @code{'='}
1463 character; if multiple @code{'='} characters are present in an element, those
1464 after the first are considered part of the value, and treated like all other
1465 non-@code{'\0'} characters.
1466
1467 If @emph{no} @code{'='} characters are present in an element, that element is
1468 considered the name of a ``null'' entry, as distinct from an entry with an
1469 empty value: @code{envz_get} will return @code{0} if given the name of null
1470 entry, whereas an entry with an empty value would result in a value of
1471 @code{""}; @code{envz_entry} will still find such entries, however.  Null
1472 entries can be removed with @code{envz_strip} function.
1473
1474 As with argz functions, envz functions that may allocate memory (and thus
1475 fail) have a return type of @code{error_t}, and return either @code{0} or
1476 @code{ENOMEM}.
1477
1478 @pindex envz.h
1479 These functions are declared in the standard include file @file{envz.h}.
1480
1481 @comment envz.h
1482 @comment GNU
1483 @deftypefun {char *} envz_entry (const char *@var{envz}, size_t @var{envz_len}, const char *@var{name})
1484 The @code{envz_entry} function finds the entry in @var{envz} with the name
1485 @var{name}, and returns a pointer to the whole entry---that is, the argz
1486 element which begins with @var{name} followed by a @code{'='} character.  If
1487 there is no entry with that name, @code{0} is returned.
1488 @end deftypefun
1489
1490 @comment envz.h
1491 @comment GNU
1492 @deftypefun {char *} envz_get (const char *@var{envz}, size_t @var{envz_len}, const char *@var{name})
1493 The @code{envz_get} function finds the entry in @var{envz} with the name
1494 @var{name} (like @code{envz_entry}), and returns a pointer to the value
1495 portion of that entry (following the @code{'='}).  If there is no entry with
1496 that name (or only a null entry), @code{0} is returned.
1497 @end deftypefun
1498
1499 @comment envz.h
1500 @comment GNU
1501 @deftypefun {error_t} envz_add (char **@var{envz}, size_t *@var{envz_len}, const char *@var{name}, const char *@var{value})
1502 The @code{envz_add} function adds an entry to @code{*@var{envz}}
1503 (updating @code{*@var{envz}} and @code{*@var{envz_len}}) with the name
1504 @var{name}, and value @var{value}.  If an entry with the same name
1505 already exists in @var{envz}, it is removed first.  If @var{value} is
1506 @code{0}, then the new entry will the special null type of entry
1507 (mentioned above).
1508 @end deftypefun
1509
1510 @comment envz.h
1511 @comment GNU
1512 @deftypefun {error_t} envz_merge (char **@var{envz}, size_t *@var{envz_len}, const char *@var{envz2}, size_t @var{envz2_len}, int @var{override})
1513 The @code{envz_merge} function adds each entry in @var{envz2} to @var{envz},
1514 as if with @code{envz_add}, updating @code{*@var{envz}} and
1515 @code{*@var{envz_len}}.  If @var{override} is true, then values in @var{envz2}
1516 will supersede those with the same name in @var{envz}, otherwise not.
1517
1518 Null entries are treated just like other entries in this respect, so a null
1519 entry in @var{envz} can prevent an entry of the same name in @var{envz2} from
1520 being added to @var{envz}, if @var{override} is false.
1521 @end deftypefun
1522
1523 @comment envz.h
1524 @comment GNU
1525 @deftypefun {void} envz_strip (char **@var{envz}, size_t *@var{envz_len})
1526 The @code{envz_strip} function removes any null entries from @var{envz},
1527 updating @code{*@var{envz}} and @code{*@var{envz_len}}.
1528 @end deftypefun