doc/textutils.texi

   1 \input texinfo
   2 @c %**start of header
   3 @setfilename textutils.info
   4 @settitle GNU text utilities
   5 @c %**end of header
   6
   7 @include version.texi
   8
   9 @c Define new indices.
  10 @defcodeindex op
  11
  12 @c Put everything in one index (arbitrarily chosen to be the concept index).
  13 @syncodeindex fn cp
  14 @syncodeindex ky cp
  15 @syncodeindex op cp
  16 @syncodeindex pg cp
  17 @syncodeindex vr cp
  18
  19 @ifinfo
  20 @format
  21 START-INFO-DIR-ENTRY
  22 * Text utilities: (textutils).          GNU text utilities.
  23 * cat: (textutils)cat invocation.               Concatenate and write files.
  24 * cksum: (textutils)cksum invocation.           Print @sc{POSIX} CRC checksum.
  25 * comm: (textutils)comm invocation.             Compare sorted files by line.
  26 * csplit: (textutils)csplit invocation.         Split by context.
  27 * cut: (textutils)cut invocation.               Print selected parts of lines.
  28 * expand: (textutils)expand invocation.         Convert tabs to spaces.
  29 * fmt: (textutils)fmt invocation.               Reformat paragraph text.
  30 * fold: (textutils)fold invocation.             Wrap long input lines.
  31 * head: (textutils)head invocation.             Output the first part of files.
  32 * join: (textutils)join invocation.             Join lines on a common field.
  33 * md5sum: (textutils)md5sum invocation.         Print or check message-digests.
  34 * nl: (textutils)nl invocation.                 Number lines and write files.
  35 * od: (textutils)od invocation.                 Dump files in octal, etc.
  36 * paste: (textutils)paste invocation.           Merge lines of files.
  37 * pr: (textutils)pr invocation.                 Paginate or columnate files.
  38 * ptx: (textutils)ptx invocation.               Produce permuted indexes.
  39 * sort: (textutils)sort invocation.             Sort text files.
  40 * split: (textutils)split invocation.           Split into fixed-size pieces.
  41 * sum: (textutils)sum invocation.               Print traditional checksum.
  42 * tac: (textutils)tac invocation.               Reverse files.
  43 * tail: (textutils)tail invocation.             Output the last part of files.
  44 * tr: (textutils)tr invocation.                 Translate characters.
  45 * unexpand: (textutils)unexpand invocation.     Convert spaces to tabs.
  46 * uniq: (textutils)uniq invocation.             Uniqify files.
  47 * wc: (textutils)wc invocation.                 Byte, word, and line counts.
  48 END-INFO-DIR-ENTRY
  49 @end format
  50 @end ifinfo
  51
  52 @ifinfo
  53 This file documents the GNU text utilities.
  54
  55 Copyright (C) 1994, 95, 96 Free Software Foundation, Inc.
  56
  57 Permission is granted to make and distribute verbatim copies of
  58 this manual provided the copyright notice and this permission notice
  59 are preserved on all copies.
  60
  61 @ignore
  62 Permission is granted to process this file through TeX and print the
  63 results, provided the printed document carries copying permission
  64 notice identical to this one except for the removal of this paragraph
  65 (this paragraph not being relevant to the printed manual).
  66
  67 @end ignore
  68 Permission is granted to copy and distribute modified versions of this
  69 manual under the conditions for verbatim copying, provided that the entire
  70 resulting derived work is distributed under the terms of a permission
  71 notice identical to this one.
  72
  73 Permission is granted to copy and distribute translations of this manual
  74 into another language, under the above conditions for modified versions,
  75 except that this permission notice may be stated in a translation approved
  76 by the Foundation.
  77 @end ifinfo
  78
  79 @titlepage
  80 @title GNU @code{textutils}
  81 @subtitle A set of text utilities
  82 @subtitle for version @value{VERSION}, @value{UPDATED}
  83 @author David MacKenzie et al.
  84
  85 @page
  86 @vskip 0pt plus 1filll
  87 Copyright @copyright{} 1994, 95, 96 Free Software Foundation, Inc.
  88
  89 Permission is granted to make and distribute verbatim copies of
  90 this manual provided the copyright notice and this permission notice
  91 are preserved on all copies.
  92
  93 Permission is granted to copy and distribute modified versions of this
  94 manual under the conditions for verbatim copying, provided that the entire
  95 resulting derived work is distributed under the terms of a permission
  96 notice identical to this one.
  97
  98 Permission is granted to copy and distribute translations of this manual
  99 into another language, under the above conditions for modified versions,
 100 except that this permission notice may be stated in a translation approved
 101 by the Foundation.
 102 @end titlepage
 103
 104
 105 @ifinfo
 106 @node Top
 107 @top GNU text utilities
 108
 109 @cindex text utilities
 110 @cindex utilities for text handling
 111
 112 This manual documents version @value{VERSION} of the GNU text utilities.
 113
 114 @menu
 115 * Introduction::                       Caveats, overview, and authors.
 116 * Common options::                     Common options.
 117 * Output of entire files::             cat tac nl od
 118 * Formatting file contents::           fmt pr fold
 119 * Output of parts of files::           head tail split csplit
 120 * Summarizing files::                  wc sum cksum md5sum
 121 * Operating on sorted files::   sort uniq comm ptx
 122 * Operating on fields within a line::  cut paste join
 123 * Operating on characters::            tr expand unexpand
 124 * Opening the software toolbox::       The software tools philosophy.
 125 * Index::                              General index.
 126
 127 @detailmenu
 128  --- The Detailed Node Listing ---
 129
 130 Output of entire files
 131
 132 * cat invocation::              Concatenate and write files.
 133 * tac invocation::              Concatenate and write files in reverse.
 134 * nl invocation::               Number lines and write files.
 135 * od invocation::               Write files in octal or other formats.
 136
 137 Formatting file contents
 138
 139 * fmt invocation::              Reformat paragraph text.
 140 * pr invocation::               Paginate or columnate files for printing.
 141 * fold invocation::             Wrap input lines to fit in specified width.
 142
 143 Output of parts of files
 144
 145 * head invocation::             Output the first part of files.
 146 * tail invocation::             Output the last part of files.
 147 * split invocation::            Split a file into fixed-size pieces.
 148 * csplit invocation::           Split a file into context-determined pieces.
 149
 150 Summarizing files
 151
 152 * wc invocation::               Print byte, word, and line counts.
 153 * sum invocation::              Print checksum and block counts.
 154 * cksum invocation::            Print CRC checksum and byte counts.
 155 * md5sum invocation::           Print or check message-digests.
 156
 157 Operating on sorted files
 158
 159 * sort invocation::             Sort text files.
 160 * uniq invocation::             Uniqify files.
 161 * comm invocation::             Compare two sorted files line by line.
 162 * ptx invocation::              Produce a permuted index of file contents.
 163
 164 @code{ptx}: Produce permuted indexes
 165
 166 * General options in ptx::      Options which affect general program behaviour.
 167 * Charset selection in ptx::    Underlying character set considerations.
 168 * Input processing in ptx::     Input fields, contexts, and keyword selection.
 169 * Output formatting in ptx::    Types of output format, and sizing the fields.
 170 * Compatibility in ptx::        The GNU extensions to @code{ptx}
 171
 172 Operating on fields within a line
 173
 174 * cut invocation::              Print selected parts of lines.
 175 * paste invocation::            Merge lines of files.
 176 * join invocation::             Join lines on a common field.
 177
 178 Operating on characters
 179
 180 * tr invocation::               Translate, squeeze, and/or delete characters.
 181 * expand invocation::           Convert tabs to spaces.
 182 * unexpand invocation::         Convert spaces to tabs.
 183
 184 @code{tr}: Translate, squeeze, and/or delete characters
 185
 186 * Character sets::              Specifying sets of characters.
 187 * Translating::                 Changing one characters to another.
 188 * Squeezing::                   Squeezing repeats and deleting.
 189 * Warnings in tr::              Warning messages.
 190
 191 Opening the software toolbox
 192
 193 * Toolbox introduction::        Toolbox introduction
 194 * I/O redirection::             I/O redirection
 195 * The who command::             The @code{who} command
 196 * The cut command::             The @code{cut} command
 197 * The sort command::            The @code{sort} command
 198 * The uniq command::            The @code{uniq} command
 199 * Putting the tools together::  Putting the tools together
 200
 201 @end detailmenu
 202 @end menu
 203
 204 @end ifinfo
 205
 206
 207 @node Introduction
 208 @chapter Introduction
 209
 210 @cindex introduction
 211
 212 This manual is incomplete: No attempt is made to explain basic concepts
 213 in a way suitable for novices.  Thus, if you are interested, please get
 214 involved in improving this manual.  The entire GNU community will
 215 benefit.
 216
 217 @cindex POSIX.2
 218 The GNU text utilities are mostly compatible with the @sc{POSIX.2} standard.
 219
 220 @c This paragraph appears in all of fileutils.texi, textutils.texi, and
 221 @c sh-utils.texi too -- so be sure to keep them consistent.
 222 @cindex bugs, reporting
 223 Please report bugs to @email{bug-textutils@@gnu.org}.  Remember
 224 to include the version number, machine architecture, input files, and
 225 any other information needed to reproduce the bug: your input, what you
 226 expected, what you got, and why it is wrong.  Diffs are welcome, but
 227 please include a description of the problem as well, since this is
 228 sometimes difficult to infer. @xref{Bugs, , , gcc, GNU CC}.
 229
 230 This manual was originally derived from the Unix man pages in the
 231 distribution, which were written by David MacKenzie and updated by Jim
 232 Meyering.  What you are reading now is the authoritative documentation
 233 for these utilities;  the man pages are no longer being maintained.
 234 The original @code{fmt} man page was written by Ross Paterson.
 235 Fran@,{c}ois Pinard did the initial conversion to Texinfo format.
 236 Karl Berry did the indexing, some reorganization, and editing of the results.
 237 Richard Stallman contributed his usual invaluable insights to the
 238 overall process.
 239
 240
 241 @node Common options
 242 @chapter Common options
 243
 244 @cindex common options
 245
 246 Certain options are available in all these programs.  Rather than
 247 writing identical descriptions for each of the programs, they are
 248 described here.  (In fact, every GNU program accepts (or should accept)
 249 these options.)
 250
 251 A few of these programs take arbitrary strings as arguments.  In those
 252 cases, @samp{--help} and @samp{--version} are taken as these options
 253 only if there is one and exactly one command line argument.
 254
 255 @table @samp
 256
 257 @item --help
 258 @opindex --help
 259 @cindex help, online
 260 Print a usage message listing all available options, then exit successfully.
 261
 262 @item --version
 263 @opindex --version
 264 @cindex version number, finding
 265 Print the version number, then exit successfully.
 266
 267 @end table
 268
 269
 270 @node Output of entire files
 271 @chapter Output of entire files
 272
 273 @cindex output of entire files
 274 @cindex entire files, output of
 275
 276 These commands read and write entire files, possibly transforming them
 277 in some way.
 278
 279 @menu
 280 * cat invocation::              Concatenate and write files.
 281 * tac invocation::              Concatenate and write files in reverse.
 282 * nl invocation::               Number lines and write files.
 283 * od invocation::               Write files in octal or other formats.
 284 @end menu
 285
 286 @node cat invocation
 287 @section @code{cat}: Concatenate and write files
 288
 289 @pindex cat
 290 @cindex concatenate and write files
 291 @cindex copying files
 292
 293 @code{cat} copies each @var{file} (@samp{-} means standard input), or
 294 standard input if none are given, to standard output.  Synopsis:
 295
 296 @example
 297 cat [@var{option}] [@var{file}]@dots{}
 298 @end example
 299
 300 The program accepts the following options.  Also see @ref{Common options}.
 301
 302 @table @samp
 303
 304 @item -A
 305 @itemx --show-all
 306 @opindex -A
 307 @opindex --show-all
 308 Equivalent to @samp{-vET}.
 309
 310 @item -b
 311 @itemx --number-nonblank
 312 @opindex -b
 313 @opindex --number-nonblank
 314 Number all nonblank output lines, starting with 1.
 315
 316 @item -e
 317 @opindex -e
 318 Equivalent to @samp{-vE}.
 319
 320 @item -E
 321 @itemx --show-ends
 322 @opindex -E
 323 @opindex --show-ends
 324 Display a @samp{$} after the end of each line.
 325
 326 @item -n
 327 @itemx --number
 328 @opindex -n
 329 @opindex --number
 330 Number all output lines, starting with 1.
 331
 332 @item -s
 333 @itemx --squeeze-blank
 334 @opindex -s
 335 @opindex --squeeze-blank
 336 @cindex squeezing blank lines
 337 Replace multiple adjacent blank lines with a single blank line.
 338
 339 @item -t
 340 @opindex -t
 341 Equivalent to @samp{-vT}.
 342
 343 @item -T
 344 @itemx --show-tabs
 345 @opindex -T
 346 @opindex --show-tabs
 347 Display @key{TAB} characters as @samp{^I}.
 348
 349 @item -u
 350 @opindex -u
 351 Ignored; for Unix compatibility.
 352
 353 @item -v
 354 @itemx --show-nonprinting
 355 @opindex -v
 356 @opindex --show-nonprinting
 357 Display control characters except for @key{LFD} and @key{TAB} using
 358 @samp{^} notation and precede characters that have the high bit set
 359 with @samp{M-}.
 360
 361 @end table
 362
 363
 364 @node tac invocation
 365 @section @code{tac}: Concatenate and write files in reverse
 366
 367 @pindex tac
 368 @cindex reversing files
 369
 370 @code{tac} copies each @var{file} (@samp{-} means standard input), or
 371 standard input if none are given, to standard output, reversing the
 372 records (lines by default) in each separately.  Synopsis:
 373
 374 @example
 375 tac [@var{option}]@dots{} [@var{file}]@dots{}
 376 @end example
 377
 378 @dfn{Records} are separated by instances of a string (newline by
 379 default).  By default, this separator string is attached to the end of
 380 the record that it follows in the file.
 381
 382 The program accepts the following options.  Also see @ref{Common options}.
 383
 384 @table @samp
 385
 386 @item -b
 387 @itemx --before
 388 @opindex -b
 389 @opindex --before
 390 The separator is attached to the beginning of the record that it
 391 precedes in the file.
 392
 393 @item -r
 394 @itemx --regex
 395 @opindex -r
 396 @opindex --regex
 397 Treat the separator string as a regular expression.
 398
 399 @item -s @var{separator}
 400 @itemx --separator=@var{separator}
 401 @opindex -s
 402 @opindex --separator
 403 Use @var{separator} as the record separator, instead of newline.
 404
 405 @end table
 406
 407
 408 @node nl invocation
 409 @section @code{nl}: Number lines and write files
 410
 411 @pindex nl
 412 @cindex numbering lines
 413 @cindex line numbering
 414
 415 @code{nl} writes each @var{file} (@samp{-} means standard input), or
 416 standard input if none are given, to standard output, with line numbers
 417 added to some or all of the lines.  Synopsis:
 418
 419 @example
 420 nl [@var{option}]@dots{} [@var{file}]@dots{}
 421 @end example
 422
 423 @cindex logical pages, numbering on
 424 @code{nl} decomposes its input into (logical) pages; by default, the
 425 line number is reset to 1 at the top of each logical page.  @code{nl}
 426 treats all of the input files as a single document; it does not reset
 427 line numbers or logical pages between files.
 428
 429 @cindex headers, numbering
 430 @cindex body, numbering
 431 @cindex footers, numbering
 432 A logical page consists of three sections: header, body, and footer.
 433 Any of the sections can be empty.  Each can be numbered in a different
 434 style from the others.
 435
 436 The beginnings of the sections of logical pages are indicated in the
 437 input file by a line containing exactly one of these delimiter strings:
 438
 439 @table @samp
 440 @item \:\:\:
 441 start of header;
 442 @item \:\:
 443 start of body;
 444 @item \:
 445 start of footer.
 446 @end table
 447
 448 The two characters from which these strings are made can be changed from
 449 @samp{\} and @samp{:} via options (see below), but the pattern and
 450 length of each string cannot be changed.
 451
 452 A section delimiter is replaced by an empty line on output.  Any text
 453 that comes before the first section delimiter string in the input file
 454 is considered to be part of a body section, so @code{nl} treats a
 455 file that contains no section delimiters as a single body section.
 456
 457 The program accepts the following options.  Also see @ref{Common options}.
 458
 459 @table @samp
 460
 461 @item -b @var{style}
 462 @itemx --body-numbering=@var{style}
 463 @opindex -b
 464 @opindex --body-numbering
 465 Select the numbering style for lines in the body section of each
 466 logical page.  When a line is not numbered, the current line number
 467 is not incremented, but the line number separator character is still
 468 prepended to the line.  The styles are:
 469
 470 @table @samp
 471 @item a
 472 number all lines,
 473 @item t
 474 number only nonempty lines (default for body),
 475 @item n
 476 do not number lines (default for header and footer),
 477 @item p@var{regexp}
 478 number only lines that contain a match for @var{regexp}.
 479 @end table
 480
 481 @item -d @var{cd}
 482 @itemx --section-delimiter=@var{cd}
 483 @opindex -d
 484 @opindex --section-delimiter
 485 @cindex section delimiters of pages
 486 Set the section delimiter characters to @var{cd}; default is
 487 @samp{\:}. If only @var{c} is given, the second remains @samp{:}.
 488 (Remember to protect @samp{\} or other metacharacters from shell
 489 expansion with quotes or extra backslashes.)
 490
 491 @item -f @var{style}
 492 @itemx --footer-numbering=@var{style}
 493 @opindex -f
 494 @opindex --footer-numbering
 495 Analogous to @samp{--body-numbering}.
 496
 497 @item -h @var{style}
 498 @itemx --header-numbering=@var{style}
 499 @opindex -h
 500 @opindex --header-numbering
 501 Analogous to @samp{--body-numbering}.
 502
 503 @item -i @var{number}
 504 @itemx --page-increment=@var{number}
 505 @opindex -i
 506 @opindex --page-increment
 507 Increment line numbers by @var{number} (default 1).
 508
 509 @item -l @var{number}
 510 @itemx --join-blank-lines=@var{number}
 511 @opindex -l
 512 @opindex --join-blank-lines
 513 @cindex empty lines, numbering
 514 @cindex blank lines, numbering
 515 Consider @var{number} (default 1) consecutive empty lines to be one
 516 logical line for numbering, and only number the last one.  Where fewer
 517 than @var{number} consecutive empty lines occur, do not number them.
 518 An empty line is one that contains no characters, not even spaces
 519 or tabs.
 520
 521 @item -n @var{format}
 522 @itemx --number-format=@var{format}
 523 @opindex -n
 524 @opindex --number-format
 525 Select the line numbering format (default is @code{rn}):
 526
 527 @table @samp
 528 @item ln
 529 @opindex ln @r{format for @code{nl}}
 530 left justified, no leading zeros;
 531 @item rn
 532 @opindex rn @r{format for @code{nl}}
 533 right justified, no leading zeros;
 534 @item rz
 535 @opindex rz @r{format for @code{nl}}
 536 right justified, leading zeros.
 537 @end table
 538
 539 @item -p
 540 @itemx --no-renumber
 541 @opindex -p
 542 @opindex --no-renumber
 543 Do not reset the line number at the start of a logical page.
 544
 545 @item -s @var{string}
 546 @itemx --number-separator=@var{string}
 547 @opindex -s
 548 @opindex --number-separator
 549 Separate the line number from the text line in the output with
 550 @var{string} (default is @key{TAB}).
 551
 552 @item -v @var{number}
 553 @itemx --starting-line-number=@var{number}
 554 @opindex -v
 555 @opindex --starting-line-number
 556 Set the initial line number on each logical page to @var{number} (default 1).
 557
 558 @item -w @var{number}
 559 @itemx --number-width=@var{number}
 560 @opindex -w
 561 @opindex --number-width
 562 Use @var{number} characters for line numbers (default 6).
 563
 564 @end table
 565
 566
 567 @node od invocation
 568 @section @code{od}: Write files in octal or other formats
 569
 570 @pindex od
 571 @cindex octal dump of files
 572 @cindex hex dump of files
 573 @cindex ASCII dump of files
 574 @cindex file contents, dumping unambiguously
 575
 576 @code{od} writes an unambiguous representation of each @var{file}
 577 (@samp{-} means standard input), or standard input if none are given.
 578 Synopsis:
 579
 580 @example
 581 od [@var{option}]@dots{} [@var{file}]@dots{}
 582 od -C [@var{file}] [[+]@var{offset} [[+]@var{label}]]
 583 @end example
 584
 585 Each line of output consists of the offset in the input, followed by
 586 groups of data from the file. By default, @code{od} prints the offset in
 587 octal, and each group of file data is two bytes of input printed as a
 588 single octal number.
 589
 590 The program accepts the following options.  Also see @ref{Common options}.
 591
 592 @table @samp
 593
 594 @item -A @var{radix}
 595 @itemx --address-radix=@var{radix}
 596 @opindex -A
 597 @opindex --address-radix
 598 @cindex radix for file offsets
 599 @cindex file offset radix
 600 Select the base in which file offsets are printed.  @var{radix} can
 601 be one of the following:
 602
 603 @table @samp
 604 @item d
 605 decimal;
 606 @item o
 607 octal;
 608 @item x
 609 hexadecimal;
 610 @item n
 611 none (do not print offsets).
 612 @end table
 613
 614 The default is octal.
 615
 616 @item -j @var{bytes}
 617 @itemx --skip-bytes=@var{bytes}
 618 @opindex -j
 619 @opindex --skip-bytes
 620 Skip @var{bytes} input bytes before formatting and writing.  If
 621 @var{bytes} begins with @samp{0x} or @samp{0X}, it is interpreted in
 622 hexadecimal; otherwise, if it begins with @samp{0}, in octal; otherwise,
 623 in decimal.  Appending @samp{b} multiplies @var{bytes} by 512, @samp{k}
 624 by 1024, and @samp{m} by 1048576.
 625
 626 @item -N @var{bytes}
 627 @itemx --read-bytes=@var{bytes}
 628 @opindex -N
 629 @opindex --read-bytes
 630 Output at most @var{bytes} bytes of the input.  Prefixes and suffixes on
 631 @code{bytes} are interpreted as for the @samp{-j} option.
 632
 633 @item -s [@var{n}]
 634 @itemx --strings[=@var{n}]
 635 @opindex -s
 636 @opindex --strings
 637 @cindex string constants, outputting
 638 Instead of the normal output, output only @dfn{string constants}: at
 639 least @var{n} (3 by default) consecutive ASCII graphic characters,
 640 followed by a null (zero) byte.
 641
 642 @item -t @var{type}
 643 @itemx --format=@var{type}
 644 @opindex -t
 645 @opindex --format
 646 Select the format in which to output the file data.  @var{type} is a
 647 string of one or more of the below type indicator characters.  If you
 648 include more than one type indicator character in a single @var{type}
 649 string, or use this option more than once, @code{od} writes one copy
 650 of each output line using each of the data types that you specified,
 651 in the order that you specified.
 652
 653 Adding a trailing ``z'' to any type specification appends a display
 654 of the ASCII character representation of the printable characters
 655 to the output line generated by the type specification.
 656
 657 @table @samp
 658 @item a
 659 named character,
 660 @item c
 661 ASCII character or backslash escape,
 662 @item d
 663 signed decimal,
 664 @item f
 665 floating point,
 666 @item o
 667 octal,
 668 @item u
 669 unsigned decimal,
 670 @item x
 671 hexadecimal.
 672 @end table
 673
 674 The type @code{a} outputs things like @samp{sp} for space, @samp{nl} for
 675 newline, and @samp{nul} for a null (zero) byte.  Type @code{c} outputs
 676 @samp{ }, @samp{\n}, and @code{\0}, respectively.
 677
 678 @cindex type size
 679 Except for types @samp{a} and @samp{c}, you can specify the number
 680 of bytes to use in interpreting each number in the given data type
 681 by following the type indicator character with a decimal integer.
 682 Alternately, you can specify the size of one of the C compiler's
 683 built-in data types by following the type indicator character with
 684 one of the following characters.  For integers (@samp{d}, @samp{o},
 685 @samp{u}, @samp{x}):
 686
 687 @table @samp
 688 @item C
 689 char,
 690 @item S
 691 short,
 692 @item I
 693 int,
 694 @item L
 695 long.
 696 @end table
 697
 698 For floating point (@code{f}):
 699
 700 @table @asis
 701 @item F
 702 float,
 703 @item D
 704 double,
 705 @item L
 706 long double.
 707 @end table
 708
 709 @item -v
 710 @itemx --output-duplicates
 711 @opindex -v
 712 @opindex --output-duplicates
 713 Output consecutive lines that are identical.  By default, when two or
 714 more consecutive output lines would be identical, @code{od} outputs only
 715 the first line, and puts just an asterisk on the following line to
 716 indicate the elision.
 717
 718 @item -w[@var{n}]
 719 @itemx --width[=@var{n}]
 720 @opindex -w
 721 @opindex --width
 722 Dump @code{n} input bytes per output line.  This must be a multiple of
 723 the least common multiple of the sizes associated with the specified
 724 output types.  If @var{n} is omitted, the default is 32.  If this option
 725 is not given at all, the default is 16.
 726
 727 @end table
 728
 729 The next several options map the old, pre-@sc{POSIX} format specification
 730 options to the corresponding @sc{POSIX} format specs.  GNU @code{od} accepts
 731 any combination of old- and new-style options.  Format specification
 732 options accumulate.
 733
 734 @table @samp
 735
 736 @item -a
 737 @opindex -a
 738 Output as named characters.  Equivalent to @samp{-ta}.
 739
 740 @item -b
 741 @opindex -b
 742 Output as octal bytes.  Equivalent to @samp{-toC}.
 743
 744 @item -c
 745 @opindex -c
 746 Output as ASCII characters or backslash escapes.  Equivalent to
 747 @samp{-tc}.
 748
 749 @item -d
 750 @opindex -d
 751 Output as unsigned decimal shorts.  Equivalent to @samp{-tu2}.
 752
 753 @item -f
 754 @opindex -f
 755 Output as floats.  Equivalent to @samp{-tfF}.
 756
 757 @item -h
 758 @opindex -h
 759 Output as hexadecimal shorts.  Equivalent to @samp{-tx2}.
 760
 761 @item -i
 762 @opindex -i
 763 Output as decimal shorts.  Equivalent to @samp{-td2}.
 764
 765 @item -l
 766 @opindex -l
 767 Output as decimal longs.  Equivalent to @samp{-td4}.
 768
 769 @item -o
 770 @opindex -o
 771 Output as octal shorts.  Equivalent to @samp{-to2}.
 772
 773 @item -x
 774 @opindex -x
 775 Output as hexadecimal shorts.  Equivalent to @samp{-tx2}.
 776
 777 @item -C
 778 @itemx --traditional
 779 @opindex --traditional
 780 Recognize the pre-POSIX non-option arguments that traditional @code{od}
 781 accepted.  The following syntax:
 782
 783 @example
 784 od --traditional [@var{file}] [[+]@var{offset}[.][b] [[+]@var{label}[.][b]]]
 785 @end example
 786
 787 @noindent
 788 can be used to specify at most one file and optional arguments
 789 specifying an offset and a pseudo-start address, @var{label}.  By
 790 default, @var{offset} is interpreted as an octal number specifying how
 791 many input bytes to skip before formatting and writing.  The optional
 792 trailing decimal point forces the interpretation of @var{offset} as a
 793 decimal number.  If no decimal is specified and the offset begins with
 794 @samp{0x} or @samp{0X} it is interpreted as a hexadecimal number.  If
 795 there is a trailing @samp{b}, the number of bytes skipped will be
 796 @var{offset} multiplied by 512.  The @var{label} argument is interpreted
 797 just like @var{offset}, but it specifies an initial pseudo-address.  The
 798 pseudo-addresses are displayed in parentheses following any normal
 799 address.
 800
 801 @end table
 802
 803
 804 @node Formatting file contents
 805 @chapter Formatting file contents
 806
 807 @cindex formatting file contents
 808
 809 These commands reformat the contents of files.
 810
 811 @menu
 812 * fmt invocation::              Reformat paragraph text.
 813 * pr invocation::               Paginate or columnate files for printing.
 814 * fold invocation::             Wrap input lines to fit in specified width.
 815 @end menu
 816
 817
 818 @node fmt invocation
 819 @section @code{fmt}: Reformat paragraph text
 820
 821 @pindex fmt
 822 @cindex reformatting paragraph text
 823 @cindex paragraphs, reformatting
 824 @cindex text, reformatting
 825
 826 @code{fmt} fills and joins lines to produce output lines of (at most)
 827 a given number of characters (75 by default).  Synopsis:
 828
 829 @example
 830 fmt [@var{option}]@dots{} [@var{file}]@dots{}
 831 @end example
 832
 833 @code{fmt} reads from the specified @var{file} arguments (or standard
 834 input if none are given), and writes to standard output.
 835
 836 By default, blank lines, spaces between words, and indentation are
 837 preserved in the output; successive input lines with different
 838 indentation are not joined; tabs are expanded on input and introduced on
 839 output.
 840
 841 @cindex line-breaking
 842 @cindex sentences and line-breaking
 843 @cindex Knuth, Donald E.
 844 @cindex Plass, Michael F.
 845 @code{fmt} prefers breaking lines at the end of a sentence, and tries to
 846 avoid line breaks after the first word of a sentence or before the last
 847 word of a sentence.  A @dfn{sentence break} is defined as either the end
 848 of a paragraph or a word ending in any of @samp{.?!}, followed by two
 849 spaces or end of line, ignoring any intervening parentheses or quotes.
 850 Like @TeX{}, @code{fmt} reads entire ``paragraphs'' before choosing line
 851 breaks; the algorithm is a variant of that in ``Breaking Paragraphs Into
 852 Lines'' (Donald E. Knuth and Michael F. Plass, @cite{Software---Practice
 853 and Experience}, 11 (1981), 1119--1184).
 854
 855 The program accepts the following options.  Also see @ref{Common options}.
 856
 857 @table @samp
 858
 859 @item -c
 860 @itemx --crown-margin
 861 @opindex -c
 862 @opindex --crown-margin
 863 @cindex crown margin
 864 @dfn{Crown margin} mode: preserve the indentation of the first two
 865 lines within a paragraph, and align the left margin of each subsequent
 866 line with that of the second line.
 867
 868 @item -t
 869 @itemx --tagged-paragraph
 870 @opindex -t
 871 @opindex --tagged-paragraph
 872 @cindex tagged paragraphs
 873 @dfn{Tagged paragraph} mode: like crown margin mode, except that if
 874 indentation of the first line of a paragraph is the same as the
 875 indentation of the second, the first line is treated as a one-line
 876 paragraph.
 877
 878 @item -s
 879 @itemx --split-only
 880 @opindex -s
 881 @opindex --split-only
 882 Split lines only.  Do not join short lines to form longer ones.  This
 883 prevents sample lines of code, and other such ``formatted'' text from
 884 being unduly combined.
 885
 886 @item -u
 887 @itemx --uniform-spacing
 888 @opindex -u
 889 @opindex --uniform-spacing
 890 Uniform spacing.  Reduce spacing between words to one space, and spacing
 891 between sentences to two spaces.
 892
 893 @item -@var{width}
 894 @itemx -w @var{width}
 895 @itemx --width=@var{width}
 896 @opindex -@var{width}
 897 @opindex -w
 898 @opindex --width
 899 Fill output lines up to @var{width} characters (default 75).  @code{fmt}
 900 initially tries to make lines about 7% shorter than this, to give it
 901 room to balance line lengths.
 902
 903 @item -p @var{prefix}
 904 @itemx --prefix=@var{prefix}
 905 Only lines beginning with @var{prefix} (possibly preceded by whitespace)
 906 are subject to formatting. The prefix and any preceding whitespace are
 907 stripped for the formatting and then re-attached to each formatted output
 908 line.  One use is to format certain kinds of program comments, while
 909 leaving the code unchanged.
 910
 911 @end table
 912
 913
 914 @node pr invocation
 915 @section @code{pr}: Paginate or columnate files for printing
 916
 917 @pindex pr
 918 @cindex printing, preparing files for
 919 @cindex multicolumn output, generating
 920 @cindex merging files in parallel
 921
 922 @code{pr} writes each @var{file} (@samp{-} means standard input), or
 923 standard input if none are given, to standard output, paginating and
 924 optionally outputting in multicolumn format; optionally merges all
 925 @var{file}s, printing all in parallel, one per column.  Synopsis:
 926
 927 @example
 928 pr [@var{option}]@dots{} [@var{file}]@dots{}
 929 @end example
 930
 931 By default, a 5-line header is printed: two blank lines; a line with the
 932 date, the file name, and the page count; and two more blank lines.  A
 933 footer of five blank lines is also printed. With the @samp{-f} option, a
 934 3-line header is printed: the leading two blank lines are omitted; no
 935 footer used. The default @var{page_length} in both cases is 66 lines.
 936 The text line of the header takes up the full @var{page_width} in the
 937 form @samp{yy-mm-dd HH:MM string Page nnnn}. String is a centered
 938 string.
 939
 940 Form feeds in the input cause page breaks in the output. Multiple form
 941 feeds produce empty pages.
 942
 943 Columns have equal width, separated by an optional string (default
 944 space). Lines will always be truncated to line width (default 72),
 945 unless you use the @samp{-j} option. For single column output no line
 946 truncation occurs by default. Use @samp{-w} option to truncate lines
 947 in that case.
 948
 949 The program accepts the following options.  Also see @ref{Common options}.
 950
 951 @table @samp
 952
 953 @item +@var{first_page}[:@var{last_page}]
 954 @itemx --pages=@var{first_page}[:@var{last_page}]
 955 @opindex +@var{first_page}[:@var{last_page}]
 956 @opindex --pages
 957 Begin printing with page @var{first_page} and stop with
 958 @var{last_page}. Missing @samp{:@var{last_page}} implies end of file. While
 959 estimating the number of skipped pages each form feed in the input file
 960 results in a new page. Page counting with and without
 961 @samp{+@var{first_page}} is identical. By default, it starts with the
 962 first page of input file (not first page printed). Page numbering may be
 963 altered by @samp{-N} option.
 964
 965 @item -@var{column}
 966 @itemx --columns=@var{column}
 967 @opindex -@var{column}
 968 @opindex --columns
 969 @cindex down columns
 970 With each single @var{file}, produce @var{column}-column output and
 971 print columns down. The column width is automatically estimated from
 972 @var{page_width}. This option might well cause some columns to be
 973 truncated. The number of lines in the columns on each page will be
 974 balanced. @samp{-@var{column}} may not be used with @samp{-m} option.
 975
 976 @item -a
 977 @itemx --across
 978 @opindex -a
 979 @opindex --across
 980 @cindex across columns
 981 With each single @var{file}, print columns across rather than down.
 982 @var{column} must be greater than one.
 983
 984 @item -c
 985 @itemx --show-control-chars
 986 @opindex -c
 987 @opindex --show-control-chars
 988 Print control characters using hat notation (e.g., @samp{^G}); print
 989 other unprintable characters in octal backslash notation.  By default,
 990 unprintable characters are not changed.
 991
 992 @item -d
 993 @itemx --double-space
 994 @opindex -d
 995 @opindex --double-space
 996 @cindex double spacing
 997 Double space the output.
 998
 999 @item -e[@var{in-tabchar}[@var{in-tabwidth}]]
1000 @itemx --expand-tabs[=@var{in-tabchar}[@var{in-tabwidth}]]
1001 @opindex -e
1002 @opindex --expand-tabs
1003 @cindex input tabs
1004 Expand tabs to spaces on input.  Optional argument @var{in-tabchar} is
1005 the input tab character (default is @key{TAB}).  Second optional
1006 argument @var{in-tabwidth} is the input tab character's width (default
1007 is 8).
1008
1009 @item -f
1010 @itemx -F
1011 @itemx --form-feed
1012 @opindex -F
1013 @opindex -f
1014 @opindex --form-feed
1015 Use a form feed instead of newlines to separate output pages. Default
1016 page length of 66 lines is not altered. But the number of lines of text
1017 per page changes from 56 to 63 lines.
1018
1019
1020 @item -h @var{HEADER}
1021 @itemx --header=@var{HEADER}
1022 @opindex -h
1023 @opindex --header
1024 Replace the file name in the header with the centered string
1025 @var{header}. Left-hand-side truncation (marked by a @samp{*}) may occur
1026 if the total header line @samp{yy-mm-dd HH:MM HEADER Page nnnn}
1027 becomes larger than @var{page_width}. @samp{-h ""} prints a blank line
1028 header. Don't use @samp{-h""}. A space between the -h option and the
1029 argument is always peremptory.
1030
1031 @item -i[@var{out-tabchar}[@var{out-tabwidth}]]
1032 @itemx --output-tabs[=@var{out-tabchar}[@var{out-tabwidth}]]
1033 @opindex -i
1034 @opindex --output-tabs
1035 @cindex output tabs
1036 Replace spaces with tabs on output.  Optional argument @var{out-tabchar}
1037 is the output tab character (default is @key{TAB}).  Second optional
1038 argument @var{out-tabwidth} is the output tab character's width (default
1039 is 8).
1040
1041 @item -j
1042 @itemx --join-lines
1043 @opindex -j
1044 @opindex --join-lines
1045 Merge lines of full length. Used together with the column options
1046 @samp{-@var{column}}, @samp{-a -@var{column}} or @samp{-m}. Turns off
1047 @samp{-w} line truncation; no column alignment used; may be used with
1048 @samp{-s[@var{separator}]}.
1049
1050
1051 @item -l @var{page_length}
1052 @itemx --length=@var{page_length}
1053 @opindex -l
1054 @opindex --length
1055 Set the page length to @var{page_length} (default 66) lines. If
1056 @var{page_length} is less than or equal 10 (and <= 3 with @samp{-f}),
1057 the headers and footers are omitted, and all form feeds set in input
1058 files are eliminated, as if the @samp{-T} option had been given.
1059
1060 @item -m
1061 @itemx --merge
1062 @opindex -m
1063 @opindex --merge
1064 Merge and print all @var{file}s in parallel, one in each column. If a
1065 line is too long to fit in a column, it is truncated (but see
1066 @samp{-j}). @samp{-s[@var{separator}]} may be used. Empty pages in some
1067 @var{file}s (form feeds set) produce empty columns, still marked by
1068 @var{separator}. Completely empty common pages show no separators or
1069 line numbers. The default header becomes
1070 @samp{yy-mm-dd HH:MM <blanks> Page nnnn}; may be used with
1071 @samp{-h @var{header}} to fill up the middle part.
1072
1073
1074 @item -n[@var{number-separator}[@var{digits}]]
1075 @itemx --number-lines[=@var{number-separator}[@var{digits}]]
1076 @opindex -n
1077 @opindex --number-lines
1078 Precede each column with a line number; with parallel @var{file}s
1079 (@samp{-m}), precede only each line with a line number. Optional argument
1080 @var{number-separator} is the character to print after each number
1081 (default is @key{TAB}).  Optional argument @var{digits} is the number of
1082 digits per line number (default is 5). Default line counting starts with
1083 first line of the input file (not with the first line printed, see
1084 @samp{-N}).
1085
1086 @item -N @var{line_number}
1087 @itemx --first-line-number=@var{line_number}
1088 @opindex -N
1089 @opindex --first-line-number
1090 Start line counting with no. @var{line_number} at first line of first
1091 page printed.
1092
1093 @item -o @var{n}
1094 @itemx --indent=@var{n}
1095 @opindex -o
1096 @opindex --indent
1097 @cindex indenting lines
1098 @cindex left margin
1099 Indent each line with @var{n} (default is zero) spaces wide, i.e., set
1100 the left margin.  The total page width is @var{n} plus the width set
1101 with the @samp{-w} option.
1102
1103 @item -r
1104 @itemx --no-file-warnings
1105 @opindex -r
1106 @opindex --no-file-warnings
1107 Do not print a warning message when an argument @var{file} cannot be
1108 opened.  (The exit status will still be nonzero, however.)
1109
1110 @item -s[@var{separator}]
1111 @itemx --separator[=@var{separator}]
1112 @opindex -s
1113 @opindex --separator
1114 Separate columns by a string @var{separator}. Don't use
1115 @samp{-s @var{separator}}, no space between flag and argument. If this
1116 option is omitted altogether, the default is @key{TAB} together with
1117 @samp{-j} option and space otherwise (same as @samp{-s" "}). With
1118 @samp{-s} only, no separator is used (same as @samp{-s""}). @samp{-s}
1119 does not affect line truncation or column alignment.
1120
1121 @item -t
1122 @itemx --omit-header
1123 @opindex -t
1124 @opindex --omit-header
1125 Do not print the usual header [and footer] on each page, and do not fill
1126 out the bottoms of pages (with blank lines or a form feed). No page
1127 structure is produced, but retain form feeds set in the input files. The
1128 predefined page layout is not changed. @samp{-t} or @samp{-T} may be
1129 useful together with other options; e.g.: @samp{-t -e4}, expand
1130 @key{TAB} in the input file to 4 spaces but do not do any other changes.
1131 Use of @samp{-t} overrides @samp{-h}.
1132
1133 @item -T
1134 @itemx --omit-pagination
1135 @opindex -T
1136 @opindex --omit-pagination
1137 Do not print header [and footer]. In addition eliminate all form feeds
1138 set in the input files.
1139
1140 @item -v
1141 @itemx --show-nonprinting
1142 @opindex -v
1143 @opindex --show-nonprinting
1144 Print unprintable characters in octal backslash notation.
1145
1146 @item -w @var{page_width}
1147 @itemx --width=@var{page_width}
1148 @opindex -w
1149 @opindex --width
1150 Set the page width to @var{page_width} (default 72) characters.
1151 With/without @samp{-w}, header lines are always truncated to
1152 @var{page_width} characters. With @samp{-w}, text lines are truncated,
1153 unless @samp{-j} is used. Without @samp{-w} together with one of the
1154 column options @samp{-@var{column}}, @samp{-a -@var{column}} or
1155 @samp{-m}, default truncation of text lines to 72 characters is used.
1156 Without @samp{-w} and without any of the column options, no line
1157 truncation is used. That's equivalent to @samp{-w 72 -j}.
1158
1159 @end table
1160
1161
1162 @node fold invocation
1163 @section @code{fold}: Wrap input lines to fit in specified width
1164
1165 @pindex fold
1166 @cindex wrapping long input lines
1167 @cindex folding long input lines
1168
1169 @code{fold} writes each @var{file} (@samp{-} means standard input), or
1170 standard input if none are given, to standard output, breaking long
1171 lines.  Synopsis:
1172
1173 @example
1174 fold [@var{option}]@dots{} [@var{file}]@dots{}
1175 @end example
1176
1177 By default, @code{fold} breaks lines wider than 80 columns. The output
1178 is split into as many lines as necessary.
1179
1180 @cindex screen columns
1181 @code{fold} counts screen columns by default; thus, a tab may count more
1182 than one column, backspace decreases the column count, and carriage
1183 return sets the column to zero.
1184
1185 The program accepts the following options.  Also see @ref{Common options}.
1186
1187 @table @samp
1188
1189 @item -b
1190 @itemx --bytes
1191 @opindex -b
1192 @opindex --bytes
1193 Count bytes rather than columns, so that tabs, backspaces, and carriage
1194 returns are each counted as taking up one column, just like other
1195 characters.
1196
1197 @item -s
1198 @itemx --spaces
1199 @opindex -s
1200 @opindex --spaces
1201 Break at word boundaries: the line is broken after the last blank before
1202 the maximum line length.  If the line contains no such blanks, the line
1203 is broken at the maximum line length as usual.
1204
1205 @item -w @var{width}
1206 @itemx --width=@var{width}
1207 @opindex -w
1208 @opindex --width
1209 Use a maximum line length of @var{width} columns instead of 80.
1210
1211 @end table
1212
1213
1214 @node Output of parts of files
1215 @chapter Output of parts of files
1216
1217 @cindex output of parts of files
1218 @cindex parts of files, output of
1219
1220 These commands output pieces of the input.
1221
1222 @menu
1223 * head invocation::             Output the first part of files.
1224 * tail invocation::             Output the last part of files.
1225 * split invocation::            Split a file into fixed-size pieces.
1226 * csplit invocation::           Split a file into context-determined pieces.
1227 @end menu
1228
1229 @node head invocation
1230 @section @code{head}: Output the first part of files
1231
1232 @pindex head
1233 @cindex initial part of files, outputting
1234 @cindex first part of files, outputting
1235
1236 @code{head} prints the first part (10 lines by default) of each
1237 @var{file}; it reads from standard input if no files are given or
1238 when given a @var{file} of @samp{-}.  Synopses:
1239
1240 @example
1241 head [@var{option}]@dots{} [@var{file}]@dots{}
1242 head -@var{number} [@var{option}]@dots{} [@var{file}]@dots{}
1243 @end example
1244
1245 If more than one @var{file} is specified, @code{head} prints a
1246 one-line header consisting of
1247 @example
1248 ==> @var{file name} <==
1249 @end example
1250 @noindent
1251 before the output for each @var{file}.
1252
1253 @code{head} accepts two option formats: the new one, in which numbers
1254 are arguments to the options (@samp{-q -n 1}), and the old one, in which
1255 the number precedes any option letters (@samp{-1q}).
1256
1257 The program accepts the following options.  Also see @ref{Common options}.
1258
1259 @table @samp
1260
1261 @item -@var{count}@var{options}
1262 @opindex -@var{count}
1263 This option is only recognized if it is specified first.  @var{count} is
1264 a decimal number optionally followed by a size letter (@samp{b},
1265 @samp{k}, @samp{m}) as in @code{-c}, or @samp{l} to mean count by lines,
1266 or other option letters (@samp{cqv}).
1267
1268 @item -c @var{bytes}
1269 @itemx --bytes=@var{bytes}
1270 @opindex -c
1271 @opindex --bytes
1272 Print the first @var{bytes} bytes, instead of initial lines.  Appending
1273 @samp{b} multiplies @var{bytes} by 512, @samp{k} by 1024, and @samp{m}
1274 by 1048576.
1275
1276 @itemx -n @var{n}
1277 @itemx --lines=@var{n}
1278 @opindex -n
1279 @opindex --lines
1280 Output the first @var{n} lines.
1281
1282 @item -q
1283 @itemx --quiet
1284 @itemx --silent
1285 @opindex -q
1286 @opindex --quiet
1287 @opindex --silent
1288 Never print file name headers.
1289
1290 @item -v
1291 @itemx --verbose
1292 @opindex -v
1293 @opindex --verbose
1294 Always print file name headers.
1295
1296 @end table
1297
1298
1299 @node tail invocation
1300 @section @code{tail}: Output the last part of files
1301
1302 @pindex tail
1303 @cindex last part of files, outputting
1304
1305 @code{tail} prints the last part (10 lines by default) of each
1306 @var{file}; it reads from standard input if no files are given or
1307 when given a @var{file} of @samp{-}.  Synopses:
1308
1309 @example
1310 tail [@var{option}]@dots{} [@var{file}]@dots{}
1311 tail -@var{number} [@var{option}]@dots{} [@var{file}]@dots{}
1312 tail +@var{number} [@var{option}]@dots{} [@var{file}]@dots{}
1313 @end example
1314
1315 If more than one @var{file} is specified, @code{tail} prints a
1316 one-line header consisting of
1317 @example
1318 ==> @var{file name} <==
1319 @end example
1320 @noindent
1321 before the output for each @var{file}.
1322
1323 @cindex BSD @code{tail}
1324 GNU @code{tail} can output any amount of data (some other versions of
1325 @code{tail} cannot).  It also has no @samp{-r} option (print in
1326 reverse), since reversing a file is really a different job from printing
1327 the end of a file; BSD @code{tail} (which is the one with @code{-r}) can
1328 only reverse files that are at most as large as its buffer, which is
1329 typically 32k.  A more reliable and versatile way to reverse files is
1330 the GNU @code{tac} command.
1331
1332 @code{tail} accepts two option formats: the new one, in which numbers
1333 are arguments to the options (@samp{-n 1}), and the old one, in which
1334 the number precedes any option letters (@samp{-1} or @samp{+1}).
1335
1336 If any option-argument is a number @var{n} starting with a @samp{+},
1337 @code{tail} begins printing with the @var{n}th item from the start of
1338 each file, instead of from the end.
1339
1340 The program accepts the following options.  Also see @ref{Common options}.
1341
1342 @table @samp
1343
1344 @item -@var{count}
1345 @itemx +@var{count}
1346 @opindex -@var{count}
1347 @opindex +@var{count}
1348 This option is only recognized if it is specified first.  @var{count} is
1349 a decimal number optionally followed by a size letter (@samp{b},
1350 @samp{k}, @samp{m}) as in @code{-c}, or @samp{l} to mean count by lines,
1351 or other option letters (@samp{cfqv}).
1352
1353 @item -c @var{bytes}
1354 @itemx --bytes=@var{bytes}
1355 @opindex -c
1356 @opindex --bytes
1357 Output the last @var{bytes} bytes, instead of final lines.  Appending
1358 @samp{b} multiplies @var{bytes} by 512, @samp{k} by 1024, and @samp{m}
1359 by 1048576.
1360
1361 @item -f
1362 @itemx --follow
1363 @opindex -f
1364 @opindex --follow
1365 @cindex growing files
1366 Loop forever trying to read more characters at the end of the file,
1367 presumably because the file is growing.  Ignored if reading from a pipe.
1368 If more than one file is given, @code{tail} prints a header whenever it
1369 gets output from a different file, to indicate which file that output is
1370 from.
1371
1372 @itemx -n @var{n}
1373 @itemx --lines=@var{n}
1374 @opindex -n
1375 @opindex --lines
1376 Output the last @var{n} lines.
1377
1378 @item -q
1379 @itemx -quiet
1380 @itemx --silent
1381 @opindex -q
1382 @opindex --quiet
1383 @opindex --silent
1384 Never print file name headers.
1385
1386 @item -v
1387 @itemx --verbose
1388 @opindex -v
1389 @opindex --verbose
1390 Always print file name headers.
1391
1392 @end table
1393
1394
1395 @node split invocation
1396 @section @code{split}: Split a file into fixed-size pieces
1397
1398 @pindex split
1399 @cindex splitting a file into pieces
1400 @cindex pieces, splitting a file into
1401
1402 @code{split} creates output files containing consecutive sections of
1403 @var{input} (standard input if none is given or @var{input} is
1404 @samp{-}).  Synopsis:
1405
1406 @example
1407 split [@var{option}] [@var{input} [@var{prefix}]]
1408 @end example
1409
1410 By default, @code{split} puts 1000 lines of @var{input} (or whatever is
1411 left over for the last section), into each output file.
1412
1413 @cindex output file name prefix
1414 The output files' names consist of @var{prefix} (@samp{x} by default)
1415 followed by a group of letters @samp{aa}, @samp{ab}, and so on, such
1416 that concatenating the output files in sorted order by file name produces
1417 the original input file.  (If more than 676 output files are required,
1418 @code{split} uses @samp{zaa}, @samp{zab}, etc.)
1419
1420 The program accepts the following options.  Also see @ref{Common options}.
1421
1422 @table @samp
1423
1424 @item -@var{lines}
1425 @itemx -l @var{lines}
1426 @itemx --lines=@var{lines}
1427 @opindex -l
1428 @opindex --lines
1429 Put @var{lines} lines of @var{input} into each output file.
1430
1431 @item -b @var{bytes}
1432 @itemx --bytes=@var{bytes}
1433 @opindex -b
1434 @opindex --bytes
1435 Put the first @var{bytes} bytes of @var{input} into each output file.
1436 Appending @samp{b} multiplies @var{bytes} by 512, @samp{k} by 1024, and
1437 @samp{m} by 1048576.
1438
1439 @item -C @var{bytes}
1440 @itemx --line-bytes=@var{bytes}
1441 @opindex -C
1442 @opindex --line-bytes
1443 Put into each output file as many complete lines of @var{input} as
1444 possible without exceeding @var{bytes} bytes.  For lines longer than
1445 @var{bytes} bytes, put @var{bytes} bytes into each output file until
1446 less than @var{bytes} bytes of the line are left, then continue
1447 normally.  @var{bytes} has the same format as for the @samp{--bytes}
1448 option.
1449
1450 @itemx --verbose
1451 @opindex --verbose
1452 Write a diagnostic to standard error just before each output file is opened.
1453
1454 @end table
1455
1456
1457 @node csplit invocation
1458 @section @code{csplit}: Split a file into context-determined pieces
1459
1460 @pindex csplit
1461 @cindex context splitting
1462 @cindex splitting a file into pieces by context
1463
1464 @code{csplit} creates zero or more output files containing sections of
1465 @var{input} (standard input if @var{input} is @samp{-}).  Synopsis:
1466
1467 @example
1468 csplit [@var{option}]@dots{} @var{input} @var{pattern}@dots{}
1469 @end example
1470
1471 The contents of the output files are determined by the @var{pattern}
1472 arguments, as detailed below.  An error occurs if a @var{pattern}
1473 argument refers to a nonexistent line of the input file (e.g., if no
1474 remaining line matches a given regular expression).  After every
1475 @var{pattern} has been matched, any remaining input is copied into one
1476 last output file.
1477
1478 By default, @code{csplit} prints the number of bytes written to each
1479 output file after it has been created.
1480
1481 The types of pattern arguments are:
1482
1483 @table @samp
1484
1485 @item @var{n}
1486 Create an output file containing the input up to but not including line
1487 @var{n} (a positive integer).  If followed by a repeat count, also
1488 create an output file containing the next @var{line} lines of the input
1489 file once for each repeat.
1490
1491 @item /@var{regexp}/[@var{offset}]
1492 Create an output file containing the current line up to (but not
1493 including) the next line of the input file that contains a match for
1494 @var{regexp}.  The optional @var{offset} is a @samp{+} or @samp{-}
1495 followed by a positive integer.  If it is given, the input up to the
1496 matching line plus or minus @var{offset} is put into the output file,
1497 and the line after that begins the next section of input.
1498
1499 @item %@var{regexp}%[@var{offset}]
1500 Like the previous type, except that it does not create an output
1501 file, so that section of the input file is effectively ignored.
1502
1503 @item @{@var{repeat-count}@}
1504 Repeat the previous pattern @var{repeat-count} additional
1505 times. @var{repeat-count} can either be a positive integer or an
1506 asterisk, meaning repeat as many times as necessary until the input is
1507 exhausted.
1508
1509 @end table
1510
1511 The output files' names consist of a prefix (@samp{xx} by default)
1512 followed by a suffix.  By default, the suffix is an ascending sequence
1513 of two-digit decimal numbers from @samp{00} and up to @samp{99}.  In any
1514 case, concatenating the output files in sorted order by filename
1515 produces the original input file.
1516
1517 By default, if @code{csplit} encounters an error or receives a hangup,
1518 interrupt, quit, or terminate signal, it removes any output files
1519 that it has created so far before it exits.
1520
1521 The program accepts the following options.  Also see @ref{Common options}.
1522
1523 @table @samp
1524
1525 @item -f @var{prefix}
1526 @itemx --prefix=@var{prefix}
1527 @opindex -f
1528 @opindex --prefix
1529 @cindex output file name prefix
1530 Use @var{prefix} as the output file name prefix.
1531
1532 @item -b @var{suffix}
1533 @itemx --suffix=@var{suffix}
1534 @opindex -b
1535 @opindex --suffix
1536 @cindex output file name suffix
1537 Use @var{suffix} as the output file name suffix.  When this option is
1538 specified, the suffix string must include exactly one
1539 @code{printf(3)}-style conversion specification, possibly including
1540 format specification flags, a field width, a precision specifications,
1541 or all of these kinds of modifiers.  The format letter must convert a
1542 binary integer argument to readable form; thus, only @samp{d}, @samp{i},
1543 @samp{u}, @samp{o}, @samp{x}, and @samp{X} conversions are allowed.  The
1544 entire @var{suffix} is given (with the current output file number) to
1545 @code{sprintf(3)} to form the file name suffixes for each of the
1546 individual output files in turn.  If this option is used, the
1547 @samp{--digits} option is ignored.
1548
1549 @item -n @var{digits}
1550 @itemx --digits=@var{digits}
1551 @opindex -n
1552 @opindex --digits
1553 Use output file names containing numbers that are @var{digits} digits
1554 long instead of the default 2.
1555
1556 @item -k
1557 @itemx --keep-files
1558 @opindex -k
1559 @opindex --keep-files
1560 Do not remove output files when errors are encountered.
1561
1562 @item -z
1563 @itemx --elide-empty-files
1564 @opindex -z
1565 @opindex --elide-empty-files
1566 Suppress the generation of zero-length output files.  (In cases where
1567 the section delimiters of the input file are supposed to mark the first
1568 lines of each of the sections, the first output file will generally be a
1569 zero-length file unless you use this option.)  The output file sequence
1570 numbers always run consecutively starting from 0, even when this option
1571 is specified.
1572
1573 @item -s
1574 @itemx -q
1575 @itemx --silent
1576 @itemx --quiet
1577 @opindex -s
1578 @opindex -q
1579 @opindex --silent
1580 @opindex --quiet
1581 Do not print counts of output file sizes.
1582
1583 @end table
1584
1585
1586 @node Summarizing files
1587 @chapter Summarizing files
1588
1589 @cindex summarizing files
1590
1591 These commands generate just a few numbers representing entire
1592 contents of files.
1593
1594 @menu
1595 * wc invocation::               Print byte, word, and line counts.
1596 * sum invocation::              Print checksum and block counts.
1597 * cksum invocation::            Print CRC checksum and byte counts.
1598 * md5sum invocation::           Print or check message-digests.
1599 @end menu
1600
1601
1602 @node wc invocation
1603 @section @code{wc}: Print byte, word, and line counts
1604
1605 @pindex wc
1606 @cindex byte count
1607 @cindex word count
1608 @cindex line count
1609
1610 @code{wc} counts the number of bytes, whitespace-separated words, and
1611 newlines in each given @var{file}, or standard input if none are given
1612 or for a @var{file} of @samp{-}.  Synopsis:
1613
1614 @example
1615 wc [@var{option}]@dots{} [@var{file}]@dots{}
1616 @end example
1617
1618 @cindex total counts
1619 @code{wc} prints one line of counts for each file, and if the file was
1620 given as an argument, it prints the file name following the counts.  If
1621 more than one @var{file} is given, @code{wc} prints a final line
1622 containing the cumulative counts, with the file name @file{total}.  The
1623 counts are printed in this order: newlines, words, bytes.
1624
1625 By default, @code{wc} prints all three counts.  Options can specify
1626 that only certain counts be printed.  Options do not undo others
1627 previously given, so
1628
1629 @example
1630 wc --bytes --words
1631 @end example
1632
1633 @noindent
1634 prints both the byte counts and the word counts.
1635
1636 With the @code{--max-line-length} option, @code{wc} prints the length
1637 of the longest line per file, and if there is more than one file it
1638 prints the maximum (not the sum) of those lengths.
1639
1640 The program accepts the following options.  Also see @ref{Common options}.
1641
1642 @table @samp
1643
1644 @item -c
1645 @itemx --bytes
1646 @itemx --chars
1647 @opindex -c
1648 @opindex --bytes
1649 @opindex --chars
1650 Print only the byte counts.
1651
1652 @item -w
1653 @itemx --words
1654 @opindex -w
1655 @opindex --words
1656 Print only the word counts.
1657
1658 @item -l
1659 @itemx --lines
1660 @opindex -l
1661 @opindex --lines
1662 Print only the newline counts.
1663
1664 @item -L
1665 @itemx --max-line-length
1666 @opindex -L
1667 @opindex --max-line-length
1668 Print only the maximum line lengths.
1669
1670 @end table
1671
1672
1673 @node sum invocation
1674 @section @code{sum}: Print checksum and block counts
1675
1676 @pindex sum
1677 @cindex 16-bit checksum
1678 @cindex checksum, 16-bit
1679
1680 @code{sum} computes a 16-bit checksum for each given @var{file}, or
1681 standard input if none are given or for a @var{file} of @samp{-}.  Synopsis:
1682
1683 @example
1684 sum [@var{option}]@dots{} [@var{file}]@dots{}
1685 @end example
1686
1687 @code{sum} prints the checksum for each @var{file} followed by the
1688 number of blocks in the file (rounded up).  If more than one @var{file}
1689 is given, file names are also printed (by default).  (With the
1690 @samp{--sysv} option, corresponding file name are printed when there is
1691 at least one file argument.)
1692
1693 By default, GNU @code{sum} computes checksums using an algorithm
1694 compatible with BSD @code{sum} and prints file sizes in units of
1695 1024-byte blocks.
1696
1697 The program accepts the following options.  Also see @ref{Common options}.
1698
1699 @table @samp
1700
1701 @item -r
1702 @opindex -r
1703 @cindex BSD @code{sum}
1704 Use the default (BSD compatible) algorithm.  This option is included for
1705 compatibility with the System V @code{sum}.  Unless @samp{-s} was also
1706 given, it has no effect.
1707
1708 @item -s
1709 @itemx --sysv
1710 @opindex -s
1711 @opindex --sysv
1712 @cindex System V @code{sum}
1713 Compute checksums using an algorithm compatible with System V
1714 @code{sum}'s default, and print file sizes in units of 512-byte blocks.
1715
1716 @end table
1717
1718 @code{sum} is provided for compatibility; the @code{cksum} program (see
1719 next section) is preferable in new applications.
1720
1721
1722 @node cksum invocation
1723 @section @code{cksum}: Print CRC checksum and byte counts
1724
1725 @pindex cksum
1726 @cindex cyclic redundancy check
1727 @cindex CRC checksum
1728
1729 @code{cksum} computes a cyclic redundancy check (CRC) checksum for each
1730 given @var{file}, or standard input if none are given or for a
1731 @var{file} of @samp{-}.  Synopsis:
1732
1733 @example
1734 cksum [@var{option}]@dots{} [@var{file}]@dots{}
1735 @end example
1736
1737 @code{cksum} prints the CRC checksum for each file along with the number
1738 of bytes in the file, and the filename unless no arguments were given.
1739
1740 @code{cksum} is typically used to ensure that files
1741 transferred by unreliable means (e.g., netnews) have not been corrupted,
1742 by comparing the @code{cksum} output for the received files with the
1743 @code{cksum} output for the original files (typically given in the
1744 distribution).
1745
1746 The CRC algorithm is specified by the @sc{POSIX.2} standard.  It is not
1747 compatible with the BSD or System V @code{sum} algorithms (see the
1748 previous section); it is more robust.
1749
1750 The only options are @samp{--help} and @samp{--version}.  @xref{Common
1751 options}.
1752
1753
1754 @node md5sum invocation
1755 @section @code{md5sum}: Print or check message-digests
1756
1757 @pindex md5sum
1758 @cindex 128-bit checksum
1759 @cindex checksum, 128-bit
1760 @cindex fingerprint, 128-bit
1761 @cindex message-digest, 128-bit
1762
1763 @code{md5sum} computes a 128-bit checksum (or @dfn{fingerprint} or
1764 @dfn{message-digest}) for each specified @var{file}.
1765 If a @var{file} is specified as @samp{-} or if no files are given
1766 @code{md5sum} computes the checksum for the standard input.
1767 @code{md5sum} can also determine whether a file and checksum are
1768 consistent. Synopses:
1769
1770 @example
1771 md5sum [@var{option}]@dots{} [@var{file}]@dots{}
1772 md5sum [@var{option}]@dots{} --check [@var{file}]
1773 @end example
1774
1775 For each @var{file}, @samp{md5sum} outputs the MD5 checksum, a flag
1776 indicating a binary or text input file, and the filename.
1777 If @var{file} is omitted or specified as @samp{-}, standard input is read.
1778
1779 The program accepts the following options.  Also see @ref{Common options}.
1780
1781 @table @samp
1782
1783 @item -b
1784 @itemx --binary
1785 @opindex -b
1786 @opindex --binary
1787 @cindex binary input files
1788 Treat all input files as binary.  This option has no effect on Unix
1789 systems, since they don't distinguish between binary and text files.
1790 This option is useful on systems that have different internal and
1791 external character representations.
1792
1793 @item -c
1794 @itemx --check
1795 Read filenames and checksum information from the single @var{file}
1796 (or from stdin if no @var{file} was specified) and report whether
1797 each named file and the corresponding checksum data are consistent.
1798 The input to this mode of @code{md5sum} is usually the output of
1799 a prior, checksum-generating run of @samp{md5sum}.
1800 Each valid line of input consists of an MD5 checksum, a binary/text
1801 flag, and then a filename.
1802 Binary files are marked with @samp{*}, text with @samp{ }.
1803 For each such line, @code{md5sum} reads the named file and computes its
1804 MD5 checksum.  Then, if the computed message digest does not match the
1805 one on the line with the filename, the file is noted as having
1806 failed the test.  Otherwise, the file passes the test.
1807 By default, for each valid line, one line is written to standard
1808 output indicating whether the named file passed the test.
1809 After all checks have been performed, if there were any failures,
1810 a warning is issued to standard error.
1811 Use the @samp{--status} option to inhibit that output.
1812 If any listed file cannot be opened or read, if any valid line has
1813 an MD5 checksum inconsistent with the associated file, or if no valid
1814 line is found, @code{md5sum} exits with nonzero status.  Otherwise,
1815 it exits successfully.
1816
1817 @itemx --status
1818 @opindex --status
1819 @cindex verifying MD5 checksums
1820 This option is useful only when verifying checksums.
1821 When verifying checksums, don't generate the default one-line-per-file
1822 diagnostic and don't output the warning summarizing any failures.
1823 Failures to open or read a file still evoke individual diagnostics to
1824 standard error.
1825 If all listed files are readable and are consistent with the associated
1826 MD5 checksums, exit successfully.  Otherwise exit with a status code
1827 indicating there was a failure.
1828
1829 @item -t
1830 @itemx --text
1831 @opindex -t
1832 @opindex --text
1833 @cindex text input files
1834 Treat all input files as text files.  This is the reverse of
1835 @samp{--binary}.
1836
1837 @item -w
1838 @itemx --warn
1839 @opindex -w
1840 @opindex --warn
1841 @cindex verifying MD5 checksums
1842 When verifying checksums, warn about improperly formatted MD5 checksum lines.
1843 This option is useful only if all but a few lines in the checked input
1844 are valid.
1845
1846 @end table
1847
1848
1849 @node Operating on sorted files
1850 @chapter Operating on sorted files
1851
1852 @cindex operating on sorted files
1853 @cindex sorted files, operations on
1854
1855 These commands work with (or produce) sorted files.
1856
1857 @menu
1858 * sort invocation::             Sort text files.
1859 * uniq invocation::             Uniqify files.
1860 * comm invocation::             Compare two sorted files line by line.
1861 * ptx invocation::
1862 @end menu
1863
1864
1865 @node sort invocation
1866 @section @code{sort}: Sort text files
1867
1868 @pindex sort
1869 @cindex sorting files
1870
1871 @code{sort} sorts, merges, or compares all the lines from the given
1872 files, or standard input if none are given or for a @var{file} of
1873 @samp{-}.  By default, @code{sort} writes the results to standard
1874 output.  Synopsis:
1875
1876 @example
1877 sort [@var{option}]@dots{} [@var{file}]@dots{}
1878 @end example
1879
1880 @code{sort} has three modes of operation: sort (the default), merge,
1881 and check for sortedness.  The following options change the operation
1882 mode:
1883
1884 @table @samp
1885
1886 @item -c
1887 @opindex -c
1888 @cindex checking for sortedness
1889 Check whether the given files are already sorted: if they are not all
1890 sorted, print an error message and exit with a status of 1.
1891 Otherwise, exit successfully.
1892
1893 @item -m
1894 @opindex -m
1895 @cindex merging sorted files
1896 Merge the given files by sorting them as a group.  Each input file must
1897 always be individually sorted.  It always works to sort instead of
1898 merge; merging is provided because it is faster, in the case where it
1899 works.
1900
1901 @end table
1902
1903 A pair of lines is compared as follows: if any key fields have been
1904 specified, @code{sort} compares each pair of fields, in the order
1905 specified on the command line, according to the associated ordering
1906 options, until a difference is found or no fields are left.
1907
1908 If any of the global options @samp{Mbdfinr} are given but no key fields
1909 are specified, @code{sort} compares the entire lines according to the
1910 global options.
1911
1912 Finally, as a last resort when all keys compare equal (or if no
1913 ordering options were specified at all), @code{sort} compares the lines
1914 byte by byte in machine collating sequence.  The last resort comparison
1915 honors the @samp{-r} global option.  The @samp{-s} (stable) option
1916 disables this last-resort comparison so that lines in which all fields
1917 compare equal are left in their original relative order.  If no fields
1918 or global options are specified, @samp{-s} has no effect.
1919
1920 GNU @code{sort} (as specified for all GNU utilities) has no limits on
1921 input line length or restrictions on bytes allowed within lines.  In
1922 addition, if the final byte of an input file is not a newline, GNU
1923 @code{sort} silently supplies one.
1924
1925 Upon any error, @code{sort} exits with a status of @samp{2}.
1926
1927 @vindex TMPDIR
1928 If the environment variable @code{TMPDIR} is set, @code{sort} uses its
1929 value as the directory for temporary files instead of @file{/tmp}.  The
1930 @samp{-T @var{tempdir}} option in turn overrides the environment
1931 variable.
1932
1933 The following options affect the ordering of output lines.  They may be
1934 specified globally or as part of a specific key field.  If no key
1935 fields are specified, global options apply to comparison of entire
1936 lines; otherwise the global options are inherited by key fields that do
1937 not specify any special options of their own.
1938
1939 @table @samp
1940
1941 @item -b
1942 @opindex -b
1943 @cindex blanks, ignoring leading
1944 Ignore leading blanks when finding sort keys in each line.
1945
1946 @item -d
1947 @opindex -d
1948 @cindex phone directory order
1949 @cindex telephone directory order
1950 Sort in @dfn{phone directory} order: ignore all characters except
1951 letters, digits and blanks when sorting.
1952
1953 @item -f
1954 @opindex -f
1955 @cindex case folding
1956 Fold lowercase characters into the equivalent uppercase characters when
1957 sorting so that, for example, @samp{b} and @samp{B} sort as equal.
1958
1959 @item -g
1960 @opindex -g
1961 @cindex general numeric sort
1962 Sort numerically, but use strtod(3) to arrive at the numeric values.
1963 This allows floating point numbers to be specified in scientific notation,
1964 like @code{1.0e-34} and @code{10e100}.  Use this option only if there
1965 is no alternative;  it is much slower than @samp{-n} and numbers with
1966 too many significant digits will be compared as if they had been
1967 truncated.  In addition, numbers outside the range of representable
1968 double precision floating point numbers are treated as if they were
1969 zeroes; overflow and underflow are not reported.
1970
1971 @item -i
1972 @opindex -i
1973 @cindex unprintable characters, ignoring
1974 Ignore characters outside the printable ASCII range 040-0176 octal
1975 (inclusive) when sorting.
1976
1977 @item -M
1978 @opindex -M
1979 @cindex months, sorting by
1980 An initial string, consisting of any amount of whitespace, followed
1981 by three letters abbreviating a month name, is folded to UPPER case and
1982 compared in the order @samp{JAN} < @samp{FEB} < @dots{} < @samp{DEC}.
1983 Invalid names compare low to valid names.
1984
1985 @item -n
1986 @opindex -n
1987 @cindex numeric sort
1988 Sort numerically: the number begins each line; specifically, it consists
1989 of optional whitespace, an optional @samp{-} sign, and zero or more
1990 digits, optionally followed by a decimal point and zero or more digits.
1991
1992 @code{sort -n} uses what might be considered an unconventional method
1993 to compare strings representing floating point numbers.  Rather than
1994 first converting each string to the C @code{double} type and then
1995 comparing those values, sort aligns the decimal points in the two
1996 strings and compares the strings a character at a time.  One benefit
1997 of using this approach is its speed.  In practice this is much more
1998 efficient than performing the two corresponding string-to-double (or even
1999 string-to-integer) conversions and then comparing doubles.  In addition,
2000 there is no corresponding loss of precision.  Converting each string to
2001 @code{double} before comparison would limit precision to about 16 digits
2002 on most systems.
2003
2004 Neither a leading @samp{+} nor exponential notation is recognized.
2005 To compare such strings numerically, use the @samp{-g} option.
2006
2007 @item -r
2008 @opindex -r
2009 @cindex reverse sorting
2010 Reverse the result of comparison, so that lines with greater key values
2011 appear earlier in the output instead of later.
2012
2013 @end table
2014
2015 Other options are:
2016
2017 @table @samp
2018
2019 @item -o @var{output-file}
2020 @opindex -o
2021 @cindex overwriting of input, allowed
2022 Write output to @var{output-file} instead of standard output.
2023 If @var{output-file} is one of the input files, @code{sort} copies
2024 it to a temporary file before sorting and writing the output to
2025 @var{output-file}.
2026
2027 @item -t @var{separator}
2028 @opindex -t
2029 @cindex field separator character
2030 Use character @var{separator} as the field separator when finding the
2031 sort keys in each line.  By default, fields are separated by the empty
2032 string between a non-whitespace character and a whitespace character.
2033 That is, given the input line @w{@samp{ foo bar}}, @code{sort} breaks it
2034 into fields @w{@samp{ foo}} and @w{@samp{ bar}}.  The field separator is
2035 not considered to be part of either the field preceding or the field
2036 following.
2037
2038 @item -u
2039 @opindex -u
2040 @cindex uniqifying output
2041 For the default case or the @samp{-m} option, only output the first
2042 of a sequence of lines that compare equal.  For the @samp{-c} option,
2043 check that no pair of consecutive lines compares equal.
2044
2045 @item -k @var{pos1}[,@var{pos2}]
2046 @opindex -k
2047 @cindex sort field
2048 The recommended, @sc{POSIX}, option for specifying a sort field.  The field
2049 consists of the line between @var{pos1} and @var{pos2} (or the end of
2050 the line, if @var{pos2} is omitted), inclusive.  Fields and character
2051 positions are numbered starting with 1.  See below.
2052
2053 @item -z
2054 @opindex -z
2055 @cindex sort zero-terminated lines
2056 Treat the input as a set of lines, each terminated by a zero byte (@sc{ASCII}
2057 @sc{NUL} (Null) character) instead of a @sc{ASCII} @sc{LF} (Line Feed.)
2058 This option can be useful in conjunction with @samp{perl -0} or
2059 @samp{find -print0} and @samp{xargs -0} which do the same in order to
2060 reliably handle arbitrary pathnames (even those which contain Line Feed
2061 characters.)
2062
2063 @item +@var{pos1}[-@var{pos2}]
2064 The obsolete, traditional option for specifying a sort field.  The field
2065 consists of the line between @var{pos1} and up to but @emph{not including}
2066 @var{pos2} (or the end of the line if @var{pos2} is omitted).  Fields
2067 and character positions are numbered starting with 0.  See below.
2068
2069 @end table
2070
2071 In addition, when GNU @code{sort} is invoked with exactly one argument,
2072 options @samp{--help} and @samp{--version} are recognized.  @xref{Common
2073 options}.
2074
2075 Historical (BSD and System V) implementations of @code{sort} have
2076 differed in their interpretation of some options, particularly
2077 @samp{-b}, @samp{-f}, and @samp{-n}.  GNU sort follows the @sc{POSIX}
2078 behavior, which is usually (but not always!) like the System V behavior.
2079 According to @sc{POSIX}, @samp{-n} no longer implies @samp{-b}.  For
2080 consistency, @samp{-M} has been changed in the same way.  This may
2081 affect the meaning of character positions in field specifications in
2082 obscure cases.  The only fix is to add an explicit @samp{-b}.
2083
2084 A position in a sort field specified with the @samp{-k} or @samp{+}
2085 option has the form @samp{@var{f}.@var{c}}, where @var{f} is the number
2086 of the field to use and @var{c} is the number of the first character
2087 from the beginning of the field (for @samp{+@var{pos}}) or from the end
2088 of the previous field (for @samp{-@var{pos}}).  If the @samp{.@var{c}}
2089 is omitted, it is taken to be the first character in the field.  If the
2090 @samp{-b} option was specified, the @samp{.@var{c}} part of a field
2091 specification is counted from the first nonblank character of the field
2092 (for @samp{+@var{pos}}) or from the first nonblank character following
2093 the previous field (for @samp{-@var{pos}}).
2094
2095 A sort key option may also have any of the option letters @samp{Mbdfinr}
2096 appended to it, in which case the global ordering options are not used
2097 for that particular field.  The @samp{-b} option may be independently
2098 attached to either or both of the @samp{+@var{pos}} and
2099 @samp{-@var{pos}} parts of a field specification, and if it is inherited
2100 from the global options it will be attached to both.
2101 Keys may span multiple fields.
2102
2103 Here are some examples to illustrate various combinations of options.
2104 In them, the @sc{POSIX} @samp{-k} option is used to specify sort keys rather
2105 than the obsolete @samp{+@var{pos1}-@var{pos2}} syntax.
2106
2107 @itemize @bullet
2108
2109 @item
2110 Sort in descending (reverse) numeric order.
2111
2112 @example
2113 sort -nr
2114 @end example
2115
2116 Sort alphabetically, omitting the first and second fields.
2117 This uses a single key composed of the characters beginning
2118 at the start of field three and extending to the end of each line.
2119
2120 @example
2121 sort -k3
2122 @end example
2123
2124 @item
2125 Sort numerically on the second field and resolve ties by sorting
2126 alphabetically on the third and fourth characters of field five.
2127 Use @samp{:} as the field delimiter.
2128
2129 @example
2130 sort -t : -k 2,2n -k 5.3,5.4
2131 @end example
2132
2133 Note that if you had written @samp{-k 2} instead of @samp{-k 2,2}
2134 @samp{sort} would have used all characters beginning in the second field
2135 and extending to the end of the line as the primary @emph{numeric}
2136 key.  For the large majority of applications, treating keys spanning
2137 more than one field as numeric will not do what you expect.
2138
2139 Also note that the @samp{n} modifier was applied to the field-end
2140 specifier for the first key.  It would have been equivalent to
2141 specify @samp{-k 2n,2} or @samp{-k 2n,2n}.  All modifiers except
2142 @samp{b} apply to the associated @emph{field}, regardless of whether
2143 the modifier character is attached to the field-start and/or the
2144 field-end part of the key specifier.
2145
2146 @item
2147 Sort the password file on the fifth field and ignore any
2148 leading white space.  Sort lines with equal values in field five
2149 on the numeric user ID in field three.
2150
2151 @example
2152 sort -t : -k 5b,5 -k 3,3n /etc/passwd
2153 @end example
2154
2155 An alternative is to use the global numeric modifier @samp{-n}.
2156
2157 @example
2158 sort -t : -n -k 5b,5 -k 3,3 /etc/passwd
2159 @end example
2160
2161 @item
2162 Generate a tags file in case insensitive sorted order.
2163 @example
2164 find src -type f -print0 | sort -t / -z -f | xargs -0 etags --append
2165 @end example
2166
2167 The use of @samp{-print0}, @samp{-z}, and @samp{-0} in this case mean
2168 that pathnames that contain Line Feed characters will not get broken up
2169 by the sort operation.
2170
2171 Finally, to ignore both leading and trailing white space, you
2172 could have applied the @samp{b} modifier to the field-end specifier
2173 for the first key,
2174
2175 @example
2176 sort -t : -n -k 5b,5b -k 3,3 /etc/passwd
2177 @end example
2178
2179 or by using the global @samp{-b} modifier instead of @samp{-n}
2180 and an explicit @samp{n} with the second key specifier.
2181
2182 @example
2183 sort -t : -b -k 5,5 -k 3,3n /etc/passwd
2184 @end example
2185
2186 @end itemize
2187
2188
2189 @node uniq invocation
2190 @section @code{uniq}: Uniqify files
2191
2192 @pindex uniq
2193 @cindex uniqify files
2194
2195 @code{uniq} writes the unique lines in the given @file{input}, or
2196 standard input if nothing is given or for an @var{input} name of
2197 @samp{-}.  Synopsis:
2198
2199 @example
2200 uniq [@var{option}]@dots{} [@var{input} [@var{output}]]
2201 @end example
2202
2203 By default, @code{uniq} prints the unique lines in a sorted file, i.e.,
2204 discards all but one of identical successive lines.  Optionally, it can
2205 instead show only lines that appear exactly once, or lines that appear
2206 more than once.
2207
2208 The input must be sorted.  If your input is not sorted, perhaps you want
2209 to use @code{sort -u}.
2210
2211 If no @var{output} file is specified, @code{uniq} writes to standard
2212 output.
2213
2214 The program accepts the following options.  Also see @ref{Common options}.
2215
2216 @table @samp
2217
2218 @item -@var{n}
2219 @itemx -f @var{n}
2220 @itemx --skip-fields=@var{n}
2221 @opindex -@var{n}
2222 @opindex -f
2223 @opindex --skip-fields
2224 Skip @var{n} fields on each line before checking for uniqueness.  Fields
2225 are sequences of non-space non-tab characters that are separated from
2226 each other by at least one spaces or tabs.
2227
2228 @item +@var{n}
2229 @itemx -s @var{n}
2230 @itemx --skip-chars=@var{n}
2231 @opindex +@var{n}
2232 @opindex -s
2233 @opindex --skip-chars
2234 Skip @var{n} characters before checking for uniqueness.  If you use both
2235 the field and character skipping options, fields are skipped over first.
2236
2237 @item -c
2238 @itemx --count
2239 @opindex -c
2240 @opindex --count
2241 Print the number of times each line occurred along with the line.
2242
2243 @item -i
2244 @itemx --ignore-case
2245 @opindex -i
2246 @opindex --ignore-case
2247 Ignore differences in case when comparing lines.
2248
2249 @item -d
2250 @itemx --repeated
2251 @opindex -d
2252 @opindex --repeated
2253 @cindex duplicate lines, outputting
2254 Print only duplicate lines.
2255
2256 @item -u
2257 @itemx --unique
2258 @opindex -u
2259 @opindex --unique
2260 @cindex unique lines, outputting
2261 Print only unique lines.
2262
2263 @item -w @var{n}
2264 @itemx --check-chars=@var{n}
2265 @opindex -w
2266 @opindex --check-chars
2267 Compare @var{n} characters on each line (after skipping any specified
2268 fields and characters).  By default the entire rest of the lines are
2269 compared.
2270
2271 @end table
2272
2273
2274 @node comm invocation
2275 @section @code{comm}: Compare two sorted files line by line
2276
2277 @pindex comm
2278 @cindex line-by-line comparison
2279 @cindex comparing sorted files
2280
2281 @code{comm} writes to standard output lines that are common, and lines
2282 that are unique, to two input files; a file name of @samp{-} means
2283 standard input.  Synopsis:
2284
2285 @example
2286 comm [@var{option}]@dots{} @var{file1} @var{file2}
2287 @end example
2288
2289 The input files must be sorted before @code{comm} can be used.
2290
2291 @cindex differing lines
2292 @cindex common lines
2293 With no options, @code{comm} produces three column output.  Column one
2294 contains lines unique to @var{file1}, column two contains lines unique
2295 to @var{file2}, and column three contains lines common to both files.
2296 Columns are separated by @key{TAB}.
2297 @c FIXME: when there's an option to supply an alternative separator
2298 @c string, append `by default' to the above sentence.
2299
2300 @opindex -1
2301 @opindex -2
2302 @opindex -3
2303 The options @samp{-1}, @samp{-2}, and @samp{-3} suppress printing of
2304 the corresponding columns.  Also see @ref{Common options}.
2305
2306 Unlike some other comparison utilities, @code{comm} has an exit
2307 status that does not depend on the result of the comparison.
2308 Upon normal completion @code{comm} produces an exit code of zero.
2309 If there is an error it exits with nonzero status.
2310
2311
2312 @node ptx invocation
2313 @section @code{ptx}: Produce permuted indexes
2314
2315 @pindex ptx
2316
2317 @code{ptx} reads a text file and essentially produces a permuted index, with
2318 each keyword in its context.  The calling sketch is either one of:
2319
2320 @example
2321 ptx [@var{option} @dots{}] [@var{file} @dots{}]
2322 ptx -G [@var{option} @dots{}] [@var{input} [@var{output}]]
2323 @end example
2324
2325 The @samp{-G} (or its equivalent: @samp{--traditional}) option disables
2326 all GNU extensions and revert to traditional mode, thus introducing some
2327 limitations, and changes several of the program's default option values.
2328 When @samp{-G} is not specified, GNU extensions are always enabled.  GNU
2329 extensions to @code{ptx} are documented wherever appropriate in this
2330 document.  See @xref{Compatibility in ptx} for an explicit list of them.
2331
2332 Individual options are explained in incoming sections.
2333
2334 When GNU extensions are enabled, there may be zero, one or several
2335 @var{file} after the options.  If there is no @var{file}, the program
2336 reads the standard input.  If there is one or several @var{file}, they
2337 give the name of input files which are all read in turn, as if all the
2338 input files were concatenated.  However, there is a full contextual
2339 break between each file and, when automatic referencing is requested,
2340 file names and line numbers refer to individual text input files.  In
2341 all cases, the program produces the permuted index onto the standard
2342 output.
2343
2344 When GNU extensions are @emph{not} enabled, that is, when the program
2345 operates in traditional mode, there may be zero, one or two parameters
2346 besides the options.  If there is no parameters, the program reads the
2347 standard input and produces the permuted index onto the standard output.
2348 If there is only one parameter, it names the text @var{input} to be read
2349 instead of the standard input.  If two parameters are given, they give
2350 respectively the name of the @var{input} file to read and the name of
2351 the @var{output} file to produce.  @emph{Be very careful} to note that,
2352 in this case, the contents of file given by the second parameter is
2353 destroyed.  This behaviour is dictated only by System V @code{ptx}
2354 compatibility, because GNU Standards discourage output parameters not
2355 introduced by an option.
2356
2357 Note that for @emph{any} file named as the value of an option or as an
2358 input text file, a single dash @kbd{-} may be used, in which case
2359 standard input is assumed.  However, it would not make sense to use this
2360 convention more than once per program invocation.
2361
2362 @menu
2363 * General options in ptx::      Options which affect general program behaviour.
2364 * Charset selection in ptx::    Underlying character set considerations.
2365 * Input processing in ptx::     Input fields, contexts, and keyword selection.
2366 * Output formatting in ptx::    Types of output format, and sizing the fields.
2367 * Compatibility in ptx::
2368 @end menu
2369
2370
2371 @node General options in ptx
2372 @subsection General options
2373
2374 @table @code
2375
2376 @item -C
2377 @itemx --copyright
2378 Prints a short note about the Copyright and copying conditions, then
2379 exit without further processing.
2380
2381 @item -G
2382 @itemx --traditional
2383 As already explained, this option disables all GNU extensions to
2384 @code{ptx} and switch to traditional mode.
2385
2386 @item --help
2387 Prints a short help on standard output, then exit without further
2388 processing.
2389
2390 @item --version
2391 Prints the program verison on standard output, then exit without further
2392 processing.
2393
2394 @end table
2395
2396
2397 @node Charset selection in ptx
2398 @subsection Charset selection
2399
2400 As it is setup now, the program assumes that the input file is coded
2401 using 8-bit ISO 8859-1 code, also known as Latin-1 character set,
2402 @emph{unless} if it is compiled for MS-DOS, in which case it uses the
2403 character set of the IBM-PC.  (GNU @code{ptx} is not known to work on
2404 smaller MS-DOS machines anymore.)  Compared to 7-bit ASCII, the set of
2405 characters which are letters is then different, this fact alters the
2406 behaviour of regular expression matching.  Thus, the default regular
2407 expression for a keyword allows foreign or diacriticized letters.
2408 Keyword sorting, however, is still crude; it obeys the underlying
2409 character set ordering quite blindly.
2410
2411 @table @code
2412
2413 @item -f
2414 @itemx --ignore-case
2415 Fold lower case letters to upper case for sorting.
2416
2417 @end table
2418
2419
2420 @node Input processing in ptx
2421 @subsection Word selection and input processing
2422
2423 @table @code
2424
2425 @item -b @var{file}
2426 @item --break-file=@var{file}
2427
2428 This option is an alternative way to option @code{-W} for describing
2429 which characters make up words.  This option introduces the name of a
2430 file which contains a list of characters which can@emph{not} be part of
2431 one word, this file is called the @dfn{Break file}.  Any character which
2432 is not part of the Break file is a word constituent.  If both options
2433 @code{-b} and @code{-W} are specified, then @code{-W} has precedence and
2434 @code{-b} is ignored.
2435
2436 When GNU extensions are enabled, the only way to avoid newline as a
2437 break character is to write all the break characters in the file with no
2438 newline at all, not even at the end of the file.  When GNU extensions
2439 are disabled, spaces, tabs and newlines are always considered as break
2440 characters even if not included in the Break file.
2441
2442 @item -i @var{file}
2443 @itemx --ignore-file=@var{file}
2444
2445 The file associated with this option contains a list of words which will
2446 never be taken as keywords in concordance output.  It is called the
2447 @dfn{Ignore file}.  The file contains exactly one word in each line; the
2448 end of line separation of words is not subject to the value of the
2449 @code{-S} option.
2450
2451 There is a default Ignore file used by @code{ptx} when this option is
2452 not specified, usually found in @file{/usr/local/lib/eign} if this has
2453 not been changed at installation time.  If you want to deactivate the
2454 default Ignore file, specify @code{/dev/null} instead.
2455
2456 @item -o @var{file}
2457 @itemx --only-file=@var{file}
2458
2459 The file associated with this option contains a list of words which will
2460 be retained in concordance output, any word not mentioned in this file
2461 is ignored.  The file is called the @dfn{Only file}.  The file contains
2462 exactly one word in each line; the end of line separation of words is
2463 not subject to the value of the @code{-S} option.
2464
2465 There is no default for the Only file.  In the case there are both an
2466 Only file and an Ignore file, a word will be subject to be a keyword
2467 only if it is given in the Only file and not given in the Ignore file.
2468
2469 @item -r
2470 @itemx --references
2471
2472 On each input line, the leading sequence of non white characters will be
2473 taken to be a reference that has the purpose of identifying this input
2474 line on the produced permuted index.  See @xref{Output formatting in ptx} for
2475 more information about reference production.  Using this option change
2476 the default value for option @code{-S}.
2477
2478 Using this option, the program does not try very hard to remove
2479 references from contexts in output, but it succeeds in doing so
2480 @emph{when} the context ends exactly at the newline.  If option
2481 @code{-r} is used with @code{-S} default value, or when GNU extensions
2482 are disabled, this condition is always met and references are completely
2483 excluded from the output contexts.
2484
2485 @item -S @var{regexp}
2486 @itemx --sentence-regexp=@var{regexp}
2487
2488 This option selects which regular expression will describe the end of a
2489 line or the end of a sentence.  In fact, there is other distinction
2490 between end of lines or end of sentences than the effect of this regular
2491 expression, and input line boundaries have no special significance
2492 outside this option.  By default, when GNU extensions are enabled and if
2493 @code{-r} option is not used, end of sentences are used.  In this
2494 case, the precise @var{regex} is imported from GNU emacs:
2495
2496 @example
2497 [.?!][]\"')@}]*\\($\\|\t\\|  \\)[ \t\n]*
2498 @end example
2499
2500 Whenever GNU extensions are disabled or if @code{-r} option is used, end
2501 of lines are used; in this case, the default @var{regexp} is just:
2502
2503 @example
2504 \n
2505 @end example
2506
2507 Using an empty REGEXP is equivalent to completely disabling end of line or end
2508 of sentence recognition.  In this case, the whole file is considered to
2509 be a single big line or sentence.  The user might want to disallow all
2510 truncation flag generation as well, through option @code{-F ""}.
2511 @xref{Regexps, , Syntax of Regular Expressions, emacs, The GNU Emacs
2512 Manual}.
2513
2514 When the keywords happen to be near the beginning of the input line or
2515 sentence, this often creates an unused area at the beginning of the
2516 output context line; when the keywords happen to be near the end of the
2517 input line or sentence, this often creates an unused area at the end of
2518 the output context line.  The program tries to fill those unused areas
2519 by wrapping around context in them; the tail of the input line or
2520 sentence is used to fill the unused area on the left of the output line;
2521 the head of the input line or sentence is used to fill the unused area
2522 on the right of the output line.
2523
2524 As a matter of convenience to the user, many usual backslashed escape
2525 sequences, as found in the C language, are recognized and converted to
2526 the corresponding characters by @code{ptx} itself.
2527
2528 @item -W @var{regexp}
2529 @itemx --word-regexp=@var{regexp}
2530
2531 This option selects which regular expression will describe each keyword.
2532 By default, if GNU extensions are enabled, a word is a sequence of
2533 letters; the @var{regexp} used is @code{\w+}.  When GNU extensions are
2534 disabled, a word is by default anything which ends with a space, a tab
2535 or a newline; the @var{regexp} used is @code{[^ \t\n]+}.
2536
2537 An empty REGEXP is equivalent to not using this option, letting the
2538 default dive in.  @xref{Regexps, , Syntax of Regular Expressions, emacs,
2539 The GNU Emacs Manual}.
2540
2541 As a matter of convenience to the user, many usual backslashed escape
2542 sequences, as found in the C language, are recognized and converted to
2543 the corresponding characters by @code{ptx} itself.
2544
2545 @end table
2546
2547
2548 @node Output formatting in ptx
2549 @subsection Output formatting
2550
2551 Output format is mainly controlled by @code{-O} and @code{-T} options,
2552 described in the table below.  When neither @code{-O} nor @code{-T} is
2553 selected, and if GNU extensions are enabled, the program choose an
2554 output format suited for a dumb terminal.  Each keyword occurrence is
2555 output to the center of one line, surrounded by its left and right
2556 contexts.  Each field is properly justified, so the concordance output
2557 could readily be observed.  As a special feature, if automatic
2558 references are selected by option @code{-A} and are output before the
2559 left context, that is, if option @code{-R} is @emph{not} selected, then
2560 a colon is added after the reference; this nicely interfaces with GNU
2561 Emacs @code{next-error} processing.  In this default output format, each
2562 white space character, like newline and tab, is merely changed to
2563 exactly one space, with no special attempt to compress consecutive
2564 spaces.  This might change in the future.  Except for those white space
2565 characters, every other character of the underlying set of 256
2566 characters is transmitted verbatim.
2567
2568 Output format is further controlled by the following options.
2569
2570 @table @code
2571
2572 @item -g @var{number}
2573 @itemx --gap-size=@var{number}
2574
2575 Select the size of the minimum white gap between the fields on the output
2576 line.
2577
2578 @item -w @var{number}
2579 @itemx --width=@var{number}
2580
2581 Select the output maximum width of each final line.  If references are
2582 used, they are included or excluded from the output maximum width
2583 depending on the value of option @code{-R}.  If this option is not
2584 selected, that is, when references are output before the left context,
2585 the output maximum width takes into account the maximum length of all
2586 references.  If this options is selected, that is, when references are
2587 output after the right context, the output maximum width does not take
2588 into account the space taken by references, nor the gap that precedes
2589 them.
2590
2591 @item -A
2592 @itemx --auto-reference
2593
2594 Select automatic references.  Each input line will have an automatic
2595 reference made up of the file name and the line ordinal, with a single
2596 colon between them.  However, the file name will be empty when standard
2597 input is being read.  If both @code{-A} and @code{-r} are selected, then
2598 the input reference is still read and skipped, but the automatic
2599 reference is used at output time, overriding the input reference.
2600
2601 @item -R
2602 @itemx --right-side-refs
2603
2604 In default output format, when option @code{-R} is not used, any
2605 reference produced by the effect of options @code{-r} or @code{-A} are
2606 given to the far right of output lines, after the right context.  In
2607 default output format, when option @code{-R} is specified, references
2608 are rather given to the beginning of each output line, before the left
2609 context.  For any other output format, option @code{-R} is almost
2610 ignored, except for the fact that the width of references is @emph{not}
2611 taken into account in total output width given by @code{-w} whenever
2612 @code{-R} is selected.
2613
2614 This option is automatically selected whenever GNU extensions are
2615 disabled.
2616
2617 @item -F @var{string}
2618 @itemx --flac-truncation=@var{string}
2619
2620 This option will request that any truncation in the output be reported
2621 using the string @var{string}.  Most output fields theoretically extend
2622 towards the beginning or the end of the current line, or current
2623 sentence, as selected with option @code{-S}.  But there is a maximum
2624 allowed output line width, changeable through option @code{-w}, which is
2625 further divided into space for various output fields.  When a field has
2626 to be truncated because cannot extend until the beginning or the end of
2627 the current line to fit in the, then a truncation occurs.  By default,
2628 the string used is a single slash, as in @code{-F /}.
2629
2630 @var{string} may have more than one character, as in @code{-F ...}.
2631 Also, in the particular case @var{string} is empty (@code{-F ""}),
2632 truncation flagging is disabled, and no truncation marks are appended in
2633 this case.
2634
2635 As a matter of convenience to the user, many usual backslashed escape
2636 sequences, as found in the C language, are recognized and converted to
2637 the corresponding characters by @code{ptx} itself.
2638
2639 @item -M @var{string}
2640 @itemx --macro-name=@var{string}
2641
2642 Select another @var{string} to be used instead of @samp{xx}, while
2643 generating output suitable for @code{nroff}, @code{troff} or @TeX{}.
2644
2645 @item -O
2646 @itemx --format=roff
2647
2648 Choose an output format suitable for @code{nroff} or @code{troff}
2649 processing.  Each output line will look like:
2650
2651 @example
2652 .xx "@var{tail}" "@var{before}" "@var{keyword_and_after}" "@var{head}" "@var{ref}"
2653 @end example
2654
2655 so it will be possible to write an @samp{.xx} roff macro to take care of
2656 the output typesetting.  This is the default output format when GNU
2657 extensions are disabled.  Option @samp{-M} might be used to change
2658 @samp{xx} to another macro name.
2659
2660 In this output format, each non-graphical character, like newline and
2661 tab, is merely changed to exactly one space, with no special attempt to
2662 compress consecutive spaces.  Each quote character: @kbd{"} is doubled
2663 so it will be correctly processed by @code{nroff} or @code{troff}.
2664
2665 @item -T
2666 @itemx --format=tex
2667
2668 Choose an output format suitable for @TeX{} processing.  Each output
2669 line will look like:
2670
2671 @example
2672 \xx @{@var{tail}@}@{@var{before}@}@{@var{keyword}@}@{@var{after}@}@{@var{head}@}@{@var{ref}@}
2673 @end example
2674
2675 @noindent
2676 so it will be possible to write write a @code{\xx} definition to take
2677 care of the output typesetting.  Note that when references are not being
2678 produced, that is, neither option @code{-A} nor option @code{-r} is
2679 selected, the last parameter of each @code{\xx} call is inhibited.
2680 Option @samp{-M} might be used to change @samp{xx} to another macro
2681 name.
2682
2683 In this output format, some special characters, like @kbd{$}, @kbd{%},
2684 @kbd{&}, @kbd{#} and @kbd{_} are automatically protected with a
2685 backslash.  Curly brackets @kbd{@{}, @kbd{@}} are also protected with a
2686 backslash, but also enclosed in a pair of dollar signs to force
2687 mathematical mode.  The backslash itself produces the sequence
2688 @code{\backslash@{@}}.  Circumflex and tilde diacritics produce the
2689 sequence @code{^\@{ @}} and @code{~\@{ @}} respectively.  Other
2690 diacriticized characters of the underlying character set produce an
2691 appropriate @TeX{} sequence as far as possible.  The other non-graphical
2692 characters, like newline and tab, and all others characters which are
2693 not part of ASCII, are merely changed to exactly one space, with no
2694 special attempt to compress consecutive spaces.  Let me know how to
2695 improve this special character processing for @TeX{}.
2696
2697 @end table
2698
2699
2700 @node Compatibility in ptx
2701 @subsection The GNU extensions to @code{ptx}
2702
2703 This version of @code{ptx} contains a few features which do not exist in
2704 System V @code{ptx}.  These extra features are suppressed by using the
2705 @samp{-G} command line option, unless overridden by other command line
2706 options.  Some GNU extensions cannot be recovered by overriding, so the
2707 simple rule is to avoid @samp{-G} if you care about GNU extensions.
2708 Here are the differences between this program and System V @code{ptx}.
2709
2710 @itemize @bullet
2711
2712 @item
2713 This program can read many input files at once, it always writes the
2714 resulting concordance on standard output.  On the other end, System V
2715 @code{ptx} reads only one file and produce the result on standard output
2716 or, if a second @var{file} parameter is given on the command, to that
2717 @var{file}.
2718
2719 Having output parameters not introduced by options is a quite dangerous
2720 practice which GNU avoids as far as possible.  So, for using @code{ptx}
2721 portably between GNU and System V, you should pay attention to always
2722 use it with a single input file, and always expect the result on
2723 standard output.  You might also want to automatically configure in a
2724 @samp{-G} option to @code{ptx} calls in products using @code{ptx}, if
2725 the configurator finds that the installed @code{ptx} accepts @samp{-G}.
2726
2727 @item
2728 The only options available in System V @code{ptx} are options @samp{-b},
2729 @samp{-f}, @samp{-g}, @samp{-i}, @samp{-o}, @samp{-r}, @samp{-t} and
2730 @samp{-w}.  All other options are GNU extensions and are not repeated in
2731 this enumeration.  Moreover, some options have a slightly different
2732 meaning when GNU extensions are enabled, as explained below.
2733
2734 @item
2735 By default, concordance output is not formatted for @code{troff} or
2736 @code{nroff}.  It is rather formatted for a dumb terminal.  @code{troff}
2737 or @code{nroff} output may still be selected through option @code{-O}.
2738
2739 @item
2740 Unless @code{-R} option is used, the maximum reference width is
2741 subtracted from the total output line width.  With GNU extensions
2742 disabled, width of references is not taken into account in the output
2743 line width computations.
2744
2745 @item
2746 All 256 characters, even @kbd{NUL}s, are always read and processed from
2747 input file with no adverse effect, even if GNU extensions are disabled.
2748 However, System V @code{ptx} does not accept 8-bit characters, a few
2749 control characters are rejected, and the tilda @kbd{~} is condemned.
2750
2751 @item
2752 Input line length is only limited by available memory, even if GNU
2753 extensions are disabled.  However, System V @code{ptx} processes only
2754 the first 200 characters in each line.
2755
2756 @item
2757 The break (non-word) characters default to be every character except all
2758 letters of the underlying character set, diacriticized or not.  When GNU
2759 extensions are disabled, the break characters default to space, tab and
2760 newline only.
2761
2762 @item
2763 The program makes better use of output line width.  If GNU extensions
2764 are disabled, the program rather tries to imitate System V @code{ptx},
2765 but still, there are some slight disposition glitches this program does
2766 not completely reproduce.
2767
2768 @item
2769 The user can specify both an Ignore file and an Only file.  This is not
2770 allowed with System V @code{ptx}.
2771
2772 @end itemize
2773
2774
2775 @node Operating on fields within a line
2776 @chapter Operating on fields within a line
2777
2778 @menu
2779 * cut invocation::              Print selected parts of lines.
2780 * paste invocation::            Merge lines of files.
2781 * join invocation::             Join lines on a common field.
2782 @end menu
2783
2784
2785 @node cut invocation
2786 @section @code{cut}: Print selected parts of lines
2787
2788 @pindex cut
2789 @code{cut} writes to standard output selected parts of each line of each
2790 input file, or standard input if no files are given or for a file name of
2791 @samp{-}.  Synopsis:
2792
2793 @example
2794 cut [@var{option}]@dots{} [@var{file}]@dots{}
2795 @end example
2796
2797 In the table which follows, the @var{byte-list}, @var{character-list},
2798 and @var{field-list} are one or more numbers or ranges (two numbers
2799 separated by a dash) separated by commas.  Bytes, characters, and
2800 fields are numbered from starting at 1.  Incomplete ranges may be
2801 given: @samp{-@var{m}} means @samp{1-@var{m}}; @samp{@var{n}-} means
2802 @samp{@var{n}} through end of line or last field.
2803
2804 The program accepts the following options.  Also see @ref{Common
2805 options}.
2806
2807 @table @samp
2808
2809 @item -b @var{byte-list}
2810 @itemx --bytes=@var{byte-list}
2811 @opindex -b
2812 @opindex --bytes
2813 Print only the bytes in positions listed in @var{byte-list}.  Tabs and
2814 backspaces are treated like any other character; they take up 1 byte.
2815
2816 @item -c @var{character-list}
2817 @itemx --characters=@var{character-list}
2818 @opindex -c
2819 @opindex --characters
2820 Print only characters in positions listed in @var{character-list}.
2821 The same as @samp{-b} for now, but internationalization will change
2822 that.  Tabs and backspaces are treated like any other character; they
2823 take up 1 character.
2824
2825 @item -f @var{field-list}
2826 @itemx --fields=@var{field-list}
2827 @opindex -f
2828 @opindex --fields
2829 Print only the fields listed in @var{field-list}.  Fields are
2830 separated by a @key{TAB} by default.
2831
2832 @item -d @var{delim}
2833 @itemx --delimiter=@var{delim}
2834 @opindex -d
2835 @opindex --delimiter
2836 For @samp{-f}, fields are separated by the first character in @var{delim}
2837 (default is @key{TAB}).
2838
2839 @item -n
2840 @opindex -n
2841 Do not split multi-byte characters (no-op for now).
2842
2843 @item -s
2844 @itemx --only-delimited
2845 @opindex -s
2846 @opindex --only-delimited
2847 For @samp{-f}, do not print lines that do not contain the field separator
2848 character.
2849
2850 @end table
2851
2852
2853 @node paste invocation
2854 @section @code{paste}: Merge lines of files
2855
2856 @pindex paste
2857 @cindex merging files
2858
2859 @code{paste} writes to standard output lines consisting of sequentially
2860 corresponding lines of each given file, separated by @key{TAB}.
2861 Standard input is used for a file name of @samp{-} or if no input files
2862 are given.
2863
2864 Synopsis:
2865
2866 @example
2867 paste [@var{option}]@dots{} [@var{file}]@dots{}
2868 @end example
2869
2870 The program accepts the following options.  Also see @ref{Common options}.
2871
2872 @table @samp
2873
2874 @item -s
2875 @itemx --serial
2876 @opindex -s
2877 @opindex --serial
2878 Paste the lines of one file at a time rather than one line from each
2879 file.
2880
2881 @item -d @var{delim-list}
2882 @itemx --delimiters @var{delim-list}
2883 @opindex -d
2884 @opindex --delimiters
2885 Consecutively use the characters in @var{delim-list} instead of
2886 @key{TAB} to separate merged lines.  When @var{delim-list} is
2887 exhausted, start again at its beginning.
2888
2889 @end table
2890
2891
2892 @node join invocation
2893 @section @code{join}: Join lines on a common field
2894
2895 @pindex join
2896 @cindex common field, joining on
2897
2898 @code{join} writes to standard output a line for each pair of input
2899 lines that have identical join fields.  Synopsis:
2900
2901 @example
2902 join [@var{option}]@dots{} @var{file1} @var{file2}
2903 @end example
2904
2905 Either @var{file1} or @var{file2} (but not both) can be @samp{-},
2906 meaning standard input.  @var{file1} and @var{file2} should be already
2907 sorted in increasing order (not numerically) on the join fields; unless
2908 the @samp{-t} option is given, they should be sorted ignoring blanks at
2909 the start of the join field, as in @code{sort -b}.  If the
2910 @samp{--ignore-case} option is given, lines should be sorted without
2911 regard to the case of characters in the join field, as in @code{sort -f}.
2912
2913 The defaults are: the join field is the first field in each line;
2914 fields in the input are separated by one or more blanks, with leading
2915 blanks on the line ignored; fields in the output are separated by a
2916 space; each output line consists of the join field, the remaining
2917 fields from @var{file1}, then the remaining fields from @var{file2}.
2918
2919 The program accepts the following options.  Also see @ref{Common options}.
2920
2921 @table @samp
2922
2923 @item -a @var{file-number}
2924 @opindex -a
2925 Print a line for each unpairable line in file @var{file-number} (either
2926 @samp{1} or @samp{2}), in addition to the normal output.
2927
2928 @item -e @var{string}
2929 @opindex -e
2930 Replace those output fields that are missing in the input with
2931 @var{string}.
2932
2933 @item -i
2934 @itemx --ignore-case
2935 @opindex -i
2936 @opindex --ignore-case
2937 Ignore differences in case when comparing keys.
2938 With this option, the lines of the input files must be ordered in the same way.
2939 Use @samp{sort -f} to produce this ordering.
2940
2941 @item -1 @var{field}
2942 @itemx -j1 @var{field}
2943 @opindex -1
2944 @opindex -j1
2945 Join on field @var{field} (a positive integer) of file 1.
2946
2947 @item -2 @var{field}
2948 @itemx -j2 @var{field}
2949 @opindex -2
2950 @opindex -j2
2951 Join on field @var{field} (a positive integer) of file 2.
2952
2953 @item -j @var{field}
2954 Equivalent to @samp{-1 @var{field} -2 @var{field}}.
2955
2956 @item -o @var{field-list}@dots{}
2957 Construct each output line according to the format in @var{field-list}.
2958 Each element in @var{field-list} is either the single character @samp{0} or
2959 has the form @var{m.n} where the file number, @var{m}, is @samp{1} or
2960 @samp{2} and @var{n} is a positive field number.
2961
2962 A field specification of @samp{0} denotes the join field.
2963 In most cases, the functionality of the @samp{0} field spec
2964 may be reproduced using the explicit @var{m.n} that corresponds
2965 to the join field.  However, when printing unpairable lines
2966 (using either of the @samp{-a} or @samp{-v} options), there is no way
2967 to specify the join field using @var{m.n} in @var{field-list}
2968 if there are unpairable lines in both files.
2969 To give @code{join} that functionality, @sc{POSIX} invented the @samp{0}
2970 field specification notation.
2971
2972 The elements in @var{field-list}
2973 are separated by commas or blanks.  Multiple @var{field-list}
2974 arguments can be given after a single @samp{-o} option; the values
2975 of all lists given with @samp{-o} are concatenated together.
2976 All output lines -- including those printed because of any -a or -v
2977 option -- are subject to the specified @var{field-list}.
2978
2979 @item -t @var{char}
2980 Use character @var{char} as the input and output field separator.
2981
2982 @item -v @var{file-number}
2983 Print a line for each unpairable line in file @var{file-number}
2984 (either @samp{1} or @samp{2}), instead of the normal output.
2985
2986 @end table
2987
2988 In addition, when GNU @code{join} is invoked with exactly one argument,
2989 options @samp{--help} and @samp{--version} are recognized.  @xref{Common
2990 options}.
2991
2992
2993 @node Operating on characters
2994 @chapter Operating on characters
2995
2996 @cindex operating on characters
2997
2998 This commands operate on individual characters.
2999
3000 @menu
3001 * tr invocation::               Translate, squeeze, and/or delete characters.
3002 * expand invocation::           Convert tabs to spaces.
3003 * unexpand invocation::         Convert spaces to tabs.
3004 @end menu
3005
3006
3007 @node tr invocation
3008 @section @code{tr}: Translate, squeeze, and/or delete characters
3009
3010 @pindex tr
3011
3012 Synopsis:
3013
3014 @example
3015 tr [@var{option}]@dots{} @var{set1} [@var{set2}]
3016 @end example
3017
3018 @code{tr} copies standard input to standard output, performing
3019 one of the following operations:
3020
3021 @itemize @bullet
3022 @item
3023 translate, and optionally squeeze repeated characters in the result,
3024 @item
3025 squeeze repeated characters,
3026 @item
3027 delete characters,
3028 @item
3029 delete characters, then squeeze repeated characters from the result.
3030 @end itemize
3031
3032 The @var{set1} and (if given) @var{set2} arguments define ordered
3033 sets of characters, referred to below as @var{set1} and @var{set2}.  These
3034 sets are the characters of the input that @code{tr} operates on.
3035 The @samp{--complement} (@samp{-c}) option replaces @var{set1} with its
3036 complement (all of the characters that are not in @var{set1}).
3037
3038 @menu
3039 * Character sets::              Specifying sets of characters.
3040 * Translating::                 Changing one characters to another.
3041 * Squeezing::                   Squeezing repeats and deleting.
3042 * Warnings in tr::              Warning messages.
3043 @end menu
3044
3045
3046 @node Character sets
3047 @subsection Specifying sets of characters
3048
3049 @cindex specifying sets of characters
3050
3051 The format of the @var{set1} and @var{set2} arguments resembles
3052 the format of regular expressions; however, they are not regular
3053 expressions, only lists of characters.  Most characters simply
3054 represent themselves in these strings, but the strings can contain
3055 the shorthands listed below, for convenience.  Some of them can be
3056 used only in @var{set1} or @var{set2}, as noted below.
3057
3058 @table @asis
3059
3060 @item Backslash escapes
3061 @cindex backslash escapes
3062
3063 A backslash followed by a character not listed below causes an error
3064 message.
3065
3066 @table @samp
3067 @item \a
3068 Control-G.
3069 @item \b
3070 Control-H.
3071 @item \f
3072 Control-L.
3073 @item \n
3074 Control-J.
3075 @item \r
3076 Control-M.
3077 @item \t
3078 Control-I.
3079 @item \v
3080 Control-K.
3081 @item \@var{ooo}
3082 The character with the value given by @var{ooo}, which is 1 to 3
3083 octal digits,
3084 @item \\
3085 A backslash.
3086 @end table
3087
3088 @item Ranges
3089 @cindex ranges
3090
3091 The notation @samp{@var{m}-@var{n}} expands to all of the characters
3092 from @var{m} through @var{n}, in ascending order.  @var{m} should
3093 collate before @var{n}; if it doesn't, an error results.  As an example,
3094 @samp{0-9} is the same as @samp{0123456789}.  Although GNU @code{tr}
3095 does not support the System V syntax that uses square brackets to
3096 enclose ranges, translations specified in that format will still work as
3097 long as the brackets in @var{string1} correspond to identical brackets
3098 in @var{string2}.
3099
3100 @item Repeated characters
3101 @cindex repeated characters
3102
3103 The notation @samp{[@var{c}*@var{n}]} in @var{set2} expands to @var{n}
3104 copies of character @var{c}.  Thus, @samp{[y*6]} is the same as
3105 @samp{yyyyyy}.  The notation @samp{[@var{c}*]} in @var{string2} expands
3106 to as many copies of @var{c} as are needed to make @var{set2} as long as
3107 @var{set1}.  If @var{n} begins with @samp{0}, it is interpreted in
3108 octal, otherwise in decimal.
3109
3110 @item Character classes
3111 @cindex characters classes
3112
3113 The notation @samp{[:@var{class}:]} expands to all of the characters in
3114 the (predefined) class @var{class}.  The characters expand in no
3115 particular order, except for the @code{upper} and @code{lower} classes,
3116 which expand in ascending order.  When the @samp{--delete} (@samp{-d})
3117 and @samp{--squeeze-repeats} (@samp{-s}) options are both given, any
3118 character class can be used in @var{set2}.  Otherwise, only the
3119 character classes @code{lower} and @code{upper} are accepted in
3120 @var{set2}, and then only if the corresponding character class
3121 (@code{upper} and @code{lower}, respectively) is specified in the same
3122 relative position in @var{set1}.  Doing this specifies case conversion.
3123 The class names are given below; an error results when an invalid class
3124 name is given.
3125
3126 @table @code
3127 @item alnum
3128 @opindex alnum
3129 Letters and digits.
3130 @item alpha
3131 @opindex alpha
3132 Letters.
3133 @item blank
3134 @opindex blank
3135 Horizontal whitespace.
3136 @item cntrl
3137 @opindex cntrl
3138 Control characters.
3139 @item digit
3140 @opindex digit
3141 Digits.
3142 @item graph
3143 @opindex graph
3144 Printable characters, not including space.
3145 @item lower
3146 @opindex lower
3147 Lowercase letters.
3148 @item print
3149 @opindex print
3150 Printable characters, including space.
3151 @item punct
3152 @opindex punct
3153 Punctuation characters.
3154 @item space
3155 @opindex space
3156 Horizontal or vertical whitespace.
3157 @item upper
3158 @opindex upper
3159 Uppercase letters.
3160 @item xdigit
3161 @opindex xdigit
3162 Hexadecimal digits.
3163 @end table
3164
3165 @item Equivalence classes
3166 @cindex equivalence classes
3167
3168 The syntax @samp{[=@var{c}=]} expands to all of the characters that are
3169 equivalent to @var{c}, in no particular order.  Equivalence classes are
3170 a relatively recent invention intended to support non-English alphabets.
3171 But there seems to be no standard way to define them or determine their
3172 contents.  Therefore, they are not fully implemented in GNU @code{tr};
3173 each character's equivalence class consists only of that character,
3174 which is of no particular use.
3175
3176 @end table
3177
3178
3179 @node Translating
3180 @subsection Translating
3181
3182 @cindex translating characters
3183
3184 @code{tr} performs translation when @var{set1} and @var{set2} are
3185 both given and the @samp{--delete} (@samp{-d}) option is not given.
3186 @code{tr} translates each character of its input that is in @var{set1}
3187 to the corresponding character in @var{set2}.  Characters not in
3188 @var{set1} are passed through unchanged.  When a character appears more
3189 than once in @var{set1} and the corresponding characters in @var{set2}
3190 are not all the same, only the final one is used.  For example, these
3191 two commands are equivalent:
3192
3193 @example
3194 tr aaa xyz
3195 tr a z
3196 @end example
3197
3198 A common use of @code{tr} is to convert lowercase characters to
3199 uppercase.  This can be done in many ways.  Here are three of them:
3200
3201 @example
3202 tr abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
3203 tr a-z A-Z
3204 tr '[:lower:]' '[:upper:]'
3205 @end example
3206
3207 When @code{tr} is performing translation, @var{set1} and @var{set2}
3208 typically have the same length.  If @var{set1} is shorter than
3209 @var{set2}, the extra characters at the end of @var{set2} are ignored.
3210
3211 On the other hand, making @var{set1} longer than @var{set2} is not
3212 portable; @sc{POSIX.2} says that the result is undefined.  In this situation,
3213 BSD @code{tr} pads @var{set2} to the length of @var{set1} by repeating
3214 the last character of @var{set2} as many times as necessary.  System V
3215 @code{tr} truncates @var{set1} to the length of @var{set2}.
3216
3217 By default, GNU @code{tr} handles this case like BSD @code{tr}.  When
3218 the @samp{--truncate-set1} (@samp{-t}) option is given, GNU @code{tr}
3219 handles this case like the System V @code{tr} instead.  This option is
3220 ignored for operations other than translation.
3221
3222 Acting like System V @code{tr} in this case breaks the relatively common
3223 BSD idiom:
3224
3225 @example
3226 tr -cs A-Za-z0-9 '\012'
3227 @end example
3228
3229 @noindent
3230 because it converts only zero bytes (the first element in the
3231 complement of @var{set1}), rather than all non-alphanumerics, to
3232 newlines.
3233
3234
3235 @node Squeezing
3236 @subsection Squeezing repeats and deleting
3237
3238 @cindex squeezing repeat characters
3239 @cindex deleting characters
3240
3241 When given just the @samp{--delete} (@samp{-d}) option, @code{tr}
3242 removes any input characters that are in @var{set1}.
3243
3244 When given just the @samp{--squeeze-repeats} (@samp{-s}) option,
3245 @code{tr} replaces each input sequence of a repeated character that
3246 is in @var{set1} with a single occurrence of that character.
3247
3248 When given both @samp{--delete} and @samp{--squeeze-repeats}, @code{tr}
3249 first performs any deletions using @var{set1}, then squeezes repeats
3250 from any remaining characters using @var{set2}.
3251
3252 The @samp{--squeeze-repeats} option may also be used when translating,
3253 in which case @code{tr} first performs translation, then squeezes
3254 repeats from any remaining characters using @var{set2}.
3255
3256 Here are some examples to illustrate various combinations of options:
3257
3258 @itemize @bullet
3259
3260 @item
3261 Remove all zero bytes:
3262
3263 @example
3264 tr -d '\000'
3265 @end example
3266
3267 @item
3268 Put all words on lines by themselves.  This converts all
3269 non-alphanumeric characters to newlines, then squeezes each string
3270 of repeated newlines into a single newline:
3271
3272 @example
3273 tr -cs '[a-zA-Z0-9]' '[\n*]'
3274 @end example
3275
3276 @item
3277 Convert each sequence of repeated newlines to a single newline:
3278
3279 @example
3280 tr -s '\n'
3281 @end example
3282
3283 @item
3284 Find doubled occurrences of words in a document.
3285 For example, people often write ``the the'' with the duplicated words
3286 separated by a newline.  The bourne shell script below works first
3287 by converting each sequence of punctuation and blank characters to a
3288 single newline.  That puts each ``word'' on a line by itself.
3289 Next it maps all uppercase characters to lower case, and finally it
3290 runs @code{uniq} with the @samp{-d} option to print out only the words
3291 that were adjacent duplicates.
3292
3293 @example
3294 #!/bin/sh
3295 cat "$@@" \
3296   | tr -s '[:punct:][:blank:]' '\n' \
3297   | tr '[:upper:]' '[:lower:]' \
3298   | uniq -d
3299 @end example
3300
3301 @end itemize
3302
3303
3304 @node Warnings in tr
3305 @subsection Warning messages
3306
3307 @vindex POSIXLY_CORRECT
3308 Setting the environment variable @code{POSIXLY_CORRECT} turns off the
3309 following warning and error messages, for strict compliance with
3310 @sc{POSIX.2}.  Otherwise, the following diagnostics are issued:
3311
3312 @enumerate
3313
3314 @item
3315 When the @samp{--delete} option is given but @samp{--squeeze-repeats}
3316 is not, and @var{set2} is given, GNU @code{tr} by default prints
3317 a usage message and exits, because @var{set2} would not be used.
3318 The @sc{POSIX} specification says that @var{set2} must be ignored in
3319 this case. Silently ignoring arguments is a bad idea.
3320
3321 @item
3322 When an ambiguous octal escape is given.  For example, @samp{\400}
3323 is actually @samp{\40} followed by the digit @samp{0}, because the
3324 value 400 octal does not fit into a single byte.
3325
3326 @end enumerate
3327
3328 GNU @code{tr} does not provide complete BSD or System V compatibility.
3329 For example, it is impossible to disable interpretation of the @sc{POSIX}
3330 constructs @samp{[:alpha:]}, @samp{[=c=]}, and @samp{[c*10]}.  Also, GNU
3331 @code{tr} does not delete zero bytes automatically, unlike traditional
3332 Unix versions, which provide no way to preserve zero bytes.
3333
3334
3335 @node expand invocation
3336 @section @code{expand}: Convert tabs to spaces
3337
3338 @pindex expand
3339 @cindex tabs to spaces, converting
3340 @cindex converting tabs to spaces
3341
3342 @code{expand} writes the contents of each given @var{file}, or standard
3343 input if none are given or for a @var{file} of @samp{-}, to standard
3344 output, with tab characters converted to the appropriate number of
3345 spaces.  Synopsis:
3346
3347 @example
3348 expand [@var{option}]@dots{} [@var{file}]@dots{}
3349 @end example
3350
3351 By default, @code{expand} converts all tabs to spaces.  It preserves
3352 backspace characters in the output; they decrement the column count for
3353 tab calculations.  The default action is equivalent to @samp{-8} (set
3354 tabs every 8 columns).
3355
3356 The program accepts the following options.  Also see @ref{Common options}.
3357
3358 @table @samp
3359
3360 @item -@var{tab1}[,@var{tab2}]@dots{}
3361 @itemx -t @var{tab1}[,@var{tab2}]@dots{}
3362 @itemx --tabs=@var{tab1}[,@var{tab2}]@dots{}
3363 @opindex -@var{tab}
3364 @opindex -t
3365 @opindex --tabs
3366 @cindex tabstops, setting
3367 If only one tab stop is given, set the tabs @var{tab1} spaces apart
3368 (default is 8).  Otherwise, set the tabs at columns @var{tab1},
3369 @var{tab2}, @dots{} (numbered from 0), and replace any tabs beyond the
3370 last tabstop given with single spaces.  If the tabstops are specified
3371 with the @samp{-t} or @samp{--tabs} option, they can be separated by
3372 blanks as well as by commas.
3373
3374 @item -i
3375 @itemx --initial
3376 @opindex -i
3377 @opindex --initial
3378 @cindex initial tabs, converting
3379 Only convert initial tabs (those that precede all non-space or non-tab
3380 characters) on each line to spaces.
3381
3382 @end table
3383
3384
3385 @node unexpand invocation
3386 @section @code{unexpand}: Convert spaces to tabs
3387
3388 @pindex unexpand
3389
3390 @code{unexpand} writes the contents of each given @var{file}, or
3391 standard input if none are given or for a @var{file} of @samp{-}, to
3392 standard output, with strings of two or more space or tab characters
3393 converted to as many tabs as possible followed by as many spaces as are
3394 needed.  Synopsis:
3395
3396 @example
3397 unexpand [@var{option}]@dots{} [@var{file}]@dots{}
3398 @end example
3399
3400 By default, @code{unexpand} converts only initial spaces and tabs (those
3401 that precede all non space or tab characters) on each line.  It
3402 preserves backspace characters in the output; they decrement the column
3403 count for tab calculations.  By default, tabs are set at every 8th
3404 column.
3405
3406 The program accepts the following options.  Also see @ref{Common options}.
3407
3408 @table @samp
3409
3410 @item -@var{tab1}[,@var{tab2}]@dots{}
3411 @itemx -t @var{tab1}[,@var{tab2}]@dots{}
3412 @itemx --tabs=@var{tab1}[,@var{tab2}]@dots{}
3413 @opindex -@var{tab}
3414 @opindex -t
3415 @opindex --tabs
3416 If only one tab stop is given, set the tabs @var{tab1} spaces apart
3417 instead of the default 8.  Otherwise, set the tabs at columns
3418 @var{tab1}, @var{tab2}, @dots{} (numbered from 0), and leave spaces and
3419 tabs beyond the tabstops given unchanged.  If the tabstops are specified
3420 with the @samp{-t} or @samp{--tabs} option, they can be separated by
3421 blanks as well as by commas.  This option implies the @samp{-a} option.
3422
3423 @item -a
3424 @itemx --all
3425 @opindex -a
3426 @opindex --all
3427 Convert all strings of two or more spaces or tabs, not just initial
3428 ones, to tabs.
3429
3430 @end table
3431
3432 @c              What's GNU?
3433 @c              Arnold Robbins
3434 @node Opening the software toolbox
3435 @chapter Opening the software toolbox
3436
3437 This chapter originally appeared in @cite{Linux Journal}, volume 1,
3438 number 2, in the @cite{What's GNU?} column. It was written by Arnold
3439 Robbins.
3440
3441 @menu
3442 * Toolbox introduction::        Toolbox introduction
3443 * I/O redirection::             I/O redirection
3444 * The who command::             The @code{who} command
3445 * The cut command::             The @code{cut} command
3446 * The sort command::            The @code{sort} command
3447 * The uniq command::            The @code{uniq} command
3448 * Putting the tools together::  Putting the tools together
3449 @end menu
3450
3451
3452 @node Toolbox introduction
3453 @unnumberedsec Toolbox introduction
3454
3455 This month's column is only peripherally related to the GNU Project, in
3456 that it describes a number of the GNU tools on your Linux system and how they
3457 might be used.  What it's really about is the ``Software Tools'' philosophy
3458 of program development and usage.
3459
3460 The software tools philosophy was an important and integral concept
3461 in the initial design and development of Unix (of which Linux and GNU are
3462 essentially clones).  Unfortunately, in the modern day press of
3463 Internetworking and flashy GUIs, it seems to have fallen by the
3464 wayside.  This is a shame, since it provides a powerful mental model
3465 for solving many kinds of problems.
3466
3467 Many people carry a Swiss Army knife around in their pants pockets (or
3468 purse).  A Swiss Army knife is a handy tool to have: it has several knife
3469 blades, a screwdriver, tweezers, toothpick, nail file, corkscrew, and perhaps
3470 a number of other things on it.  For the everyday, small miscellaneous jobs
3471 where you need a simple, general purpose tool, it's just the thing.
3472
3473 On the other hand, an experienced carpenter doesn't build a house using
3474 a Swiss Army knife.  Instead, he has a toolbox chock full of specialized
3475 tools---a saw, a hammer, a screwdriver, a plane, and so on.  And he knows
3476 exactly when and where to use each tool; you won't catch him hammering nails
3477 with the handle of his screwdriver.
3478
3479 The Unix developers at Bell Labs were all professional programmers and trained
3480 computer scientists.  They had found that while a one-size-fits-all program
3481 might appeal to a user because there's only one program to use, in practice
3482 such programs are
3483
3484 @enumerate a
3485 @item
3486 difficult to write,
3487
3488 @item
3489 difficult to maintain and
3490 debug, and
3491
3492 @item
3493 difficult to extend to meet new situations.
3494 @end enumerate
3495
3496 Instead, they felt that programs should be specialized tools.  In short, each
3497 program ``should do one thing well.''  No more and no less.  Such programs are
3498 simpler to design, write, and get right---they only do one thing.
3499
3500 Furthermore, they found that with the right machinery for hooking programs
3501 together, that the whole was greater than the sum of the parts.  By combining
3502 several special purpose programs, you could accomplish a specific task
3503 that none of the programs was designed for, and accomplish it much more
3504 quickly and easily than if you had to write a special purpose program.
3505 We will see some (classic) examples of this further on in the column.
3506 (An important additional point was that, if necessary, take a detour
3507 and build any software tools you may need first, if you don't already
3508 have something appropriate in the toolbox.)
3509
3510 @node I/O redirection
3511 @unnumberedsec I/O redirection
3512
3513 Hopefully, you are familiar with the basics of I/O redirection in the
3514 shell, in particular the concepts of ``standard input,'' ``standard output,''
3515 and ``standard error''.  Briefly, ``standard input'' is a data source, where
3516 data comes from.  A program should not need to either know or care if the
3517 data source is a disk file, a keyboard, a magnetic tape, or even a punched
3518 card reader.  Similarly, ``standard output'' is a data sink, where data goes
3519 to.  The program should neither know nor care where this might be.
3520 Programs that only read their standard input, do something to the data,
3521 and then send it on, are called ``filters'', by analogy to filters in a
3522 water pipeline.
3523
3524 With the Unix shell, it's very easy to set up data pipelines:
3525
3526 @example
3527 program_to_create_data | filter1 | .... | filterN > final.pretty.data
3528 @end example
3529
3530 We start out by creating the raw data; each filter applies some successive
3531 transformation to the data, until by the time it comes out of the pipeline,
3532 it is in the desired form.
3533
3534 This is fine and good for standard input and standard output.  Where does the
3535 standard error come in to play?  Well, think about @code{filter1} in
3536 the pipeline above.  What happens if it encounters an error in the data it
3537 sees?  If it writes an error message to standard output, it will just
3538 disappear down the pipeline into @code{filter2}'s input, and the
3539 user will probably never see it.  So programs need a place where they can send
3540 error messages so that the user will notice them.  This is standard error,
3541 and it is usually connected to your console or window, even if you have
3542 redirected standard output of your program away from your screen.
3543
3544 For filter programs to work together, the format of the data has to be
3545 agreed upon.  The most straightforward and easiest format to use is simply
3546 lines of text.  Unix data files are generally just streams of bytes, with
3547 lines delimited by the @sc{ASCII} @sc{LF} (Line Feed) character,
3548 conventionally called a ``newline'' in the Unix literature. (This is
3549 @code{'\n'} if you're a C programmer.)  This is the format used by all
3550 the traditional filtering programs.  (Many earlier operating systems
3551 had elaborate facilities and special purpose programs for managing
3552 binary data.  Unix has always shied away from such things, under the
3553 philosophy that it's easiest to simply be able to view and edit your
3554 data with a text editor.)
3555
3556 OK, enough introduction. Let's take a look at some of the tools, and then
3557 we'll see how to hook them together in interesting ways.   In the following
3558 discussion, we will only present those command line options that interest
3559 us.  As you should always do, double check your system documentation
3560 for the full story.
3561
3562 @node The who command
3563 @unnumberedsec The @code{who} command
3564
3565 The first program is the @code{who} command.  By itself, it generates a
3566 list of the users who are currently logged in.  Although I'm writing
3567 this on a single-user system, we'll pretend that several people are
3568 logged in:
3569
3570 @example
3571 $ who
3572 arnold   console Jan 22 19:57
3573 miriam   ttyp0   Jan 23 14:19(:0.0)
3574 bill     ttyp1   Jan 21 09:32(:0.0)
3575 arnold   ttyp2   Jan 23 20:48(:0.0)
3576 @end example
3577
3578 Here, the @samp{$} is the usual shell prompt, at which I typed @code{who}.
3579 There are three people logged in, and I am logged in twice.  On traditional
3580 Unix systems, user names are never more than eight characters long.  This
3581 little bit of trivia will be useful later.  The output of @code{who} is nice,
3582 but the data is not all that exciting.
3583
3584 @node The cut command
3585 @unnumberedsec The @code{cut} command
3586
3587 The next program we'll look at is the @code{cut} command.  This program
3588 cuts out columns or fields of input data.  For example, we can tell it
3589 to print just the login name and full name from the @file{/etc/passwd
3590 file}.  The @file{/etc/passwd} file has seven fields, separated by
3591 colons:
3592
3593 @example
3594 arnold:xyzzy:2076:10:Arnold D. Robbins:/home/arnold:/bin/ksh
3595 @end example
3596
3597 To get the first and fifth fields, we would use cut like this:
3598
3599 @example
3600 $ cut -d: -f1,5 /etc/passwd
3601 root:Operator
3602 @dots{}
3603 arnold:Arnold D. Robbins
3604 miriam:Miriam A. Robbins
3605 @dots{}
3606 @end example
3607
3608 With the @samp{-c} option, @code{cut} will cut out specific characters
3609 (i.e., columns) in the input lines.  This command looks like it might be
3610 useful for data filtering.
3611
3612
3613 @node The sort command
3614 @unnumberedsec The @code{sort} command
3615
3616 Next we'll look at the @code{sort} command.  This is one of the most
3617 powerful commands on a Unix-style system; one that you will often find
3618 yourself using when setting up fancy data plumbing. The @code{sort}
3619 command reads and sorts each file named on the command line.  It then
3620 merges the sorted data and writes it to standard output.  It will read
3621 standard input if no files are given on the command line (thus
3622 making it into a filter).  The sort is based on the machine collating
3623 sequence (@sc{ASCII}) or based on  user-supplied ordering criteria.
3624
3625
3626 @node The uniq command
3627 @unnumberedsec The @code{uniq} command
3628
3629 Finally (at least for now), we'll look at the @code{uniq} program.  When
3630 sorting data, you will often end up with duplicate lines, lines that
3631 are identical.  Usually, all you need is one instance of each line.
3632 This is where @code{uniq} comes in. The @code{uniq} program reads its
3633 standard input, which it expects to be sorted.  It only prints out one
3634 copy of each duplicated line.  It does have several options.  Later on,
3635 we'll use the @samp{-c} option, which prints each unique line, preceded
3636 by a count of the number of times that line occurred in the input.
3637
3638
3639 @node Putting the tools together
3640 @unnumberedsec Putting the tools together
3641
3642 Now, let's suppose this is a large BBS system with dozens of users
3643 logged in.  The management wants the SysOp to write a program that will
3644 generate a sorted list of logged in users.  Furthermore, even if a user
3645 is logged in multiple times, his or her name should only show up in the
3646 output once.
3647
3648 The SysOp could sit down with the system documentation and write a C
3649 program that did this. It would take perhaps a couple of hundred lines
3650 of code and about two hours to write it, test it, and debug it.
3651 However, knowing the software toolbox, the SysOp can instead start out
3652 by generating just a list of logged on users:
3653
3654 @example
3655 $ who | cut -c1-8
3656 arnold
3657 miriam
3658 bill
3659 arnold
3660 @end example
3661
3662 Next, sort the list:
3663
3664 @example
3665 $ who | cut -c1-8 | sort
3666 arnold
3667 arnold
3668 bill
3669 miriam
3670 @end example
3671
3672 Finally, run the sorted list through @code{uniq}, to weed out duplicates:
3673
3674 @example
3675 $ who | cut -c1-8 | sort | uniq
3676 arnold
3677 bill
3678 miriam
3679 @end example
3680
3681 The @code{sort} command actually has a @samp{-u} option that does what
3682 @code{uniq} does. However, @code{uniq} has other uses for which one
3683 cannot substitute @samp{sort -u}.
3684
3685 The SysOp puts this pipeline into a shell script, and makes it available for
3686 all the users on the system:
3687
3688 @example
3689 # cat > /usr/local/bin/listusers
3690 who | cut -c1-8 | sort | uniq
3691 ^D
3692 # chmod +x /usr/local/bin/listusers
3693 @end example
3694
3695 There are four major points to note here.  First, with just four
3696 programs, on one command line, the SysOp was able to save about two
3697 hours worth of work.  Furthermore, the shell pipeline is just about as
3698 efficient as the C program would be, and it is much more efficient in
3699 terms of programmer time.  People time is much more expensive than
3700 computer time, and in our modern ``there's never enough time to do
3701 everything'' society, saving two hours of programmer time is no mean
3702 feat.
3703
3704 Second, it is also important to emphasize that with the
3705 @emph{combination} of the tools, it is possible to do a special
3706 purpose job never imagined by the authors of the individual programs.
3707
3708 Third, it is also valuable to build up your pipeline in stages, as we did here.
3709 This allows you to view the data at each stage in the pipeline, which helps
3710 you acquire the confidence that you are indeed using these tools correctly.
3711
3712 Finally, by bundling the pipeline in a shell script, other users can use
3713 your command, without having to remember the fancy plumbing you set up for
3714 them. In terms of how you run them, shell scripts and compiled programs are
3715 indistinguishable.
3716
3717 After the previous warm-up exercise, we'll look at two additional, more
3718 complicated pipelines.  For them, we need to introduce two more tools.
3719
3720 The first is the @code{tr} command, which stands for ``transliterate.''
3721 The @code{tr} command works on a character-by-character basis, changing
3722 characters. Normally it is used for things like mapping upper case to
3723 lower case:
3724
3725 @example
3726 $ echo ThIs ExAmPlE HaS MIXED case! | tr '[A-Z]' '[a-z]'
3727 this example has mixed case!
3728 @end example
3729
3730 There are several options of interest:
3731
3732 @table @samp
3733 @item -c
3734 work on the complement of the listed characters, i.e.,
3735 operations apply to characters not in the given set
3736
3737 @item -d
3738 delete characters in the first set from the output
3739
3740 @item -s
3741 squeeze repeated characters in the output into just one character.
3742 @end table
3743
3744 We will be using all three options in a moment.
3745
3746 The other command we'll look at is @code{comm}.  The @code{comm}
3747 command takes two sorted input files as input data, and prints out the
3748 files' lines in three columns.  The output columns are the data lines
3749 unique to the first file, the data lines unique to the second file, and
3750 the data lines that are common to both.  The @samp{-1}, @samp{-2}, and
3751 @samp{-3} command line options omit the respective columns. (This is
3752 non-intuitive and takes a little getting used to.)  For example:
3753
3754 @example
3755 $ cat f1
3756 11111
3757 22222
3758 33333
3759 44444
3760 $ cat f2
3761 00000
3762 22222
3763 33333
3764 55555
3765 $ comm f1 f2
3766         00000
3767 11111
3768                 22222
3769                 33333
3770 44444
3771         55555
3772 @end example
3773
3774 The single dash as a filename tells @code{comm} to read standard input
3775 instead of a regular file.
3776
3777 Now we're ready to build a fancy pipeline.  The first application is a word
3778 frequency counter.  This helps an author determine if he or she is over-using
3779 certain words.
3780
3781 The first step is to change the case of all the letters in our input file
3782 to one case.  ``The'' and ``the'' are the same word when doing counting.
3783
3784 @example
3785 $ tr '[A-Z]' '[a-z]' < whats.gnu | ...
3786 @end example
3787
3788 The next step is to get rid of punctuation.  Quoted words and unquoted words
3789 should be treated identically; it's easiest to just get the punctuation out of
3790 the way.
3791
3792 @example
3793 $ tr '[A-Z]' '[a-z]' < whats.gnu | tr -cd '[A-Za-z0-9_ \012]' | ...
3794 @end example
3795
3796 The second @code{tr} command operates on the complement of the listed
3797 characters, which are all the letters, the digits, the underscore, and
3798 the blank.  The @samp{\012} represents the newline character; it has to
3799 be left alone.  (The ASCII TAB character should also be included for
3800 good measure in a production script.)
3801
3802 At this point, we have data consisting of words separated by blank space.
3803 The words only contain alphanumeric characters (and the underscore).  The
3804 next step is break the data apart so that we have one word per line. This
3805 makes the counting operation much easier, as we will see shortly.
3806
3807 @example
3808 $ tr '[A-Z]' '[a-z]' < whats.gnu | tr -cd '[A-Za-z0-9_ \012]' |
3809 > tr -s '[ ]' '\012' | ...
3810 @end example
3811
3812 This command turns blanks into newlines.  The @samp{-s} option squeezes
3813 multiple newline characters in the output into just one.  This helps us
3814 avoid blank lines. (The @samp{>} is the shell's ``secondary prompt.''
3815 This is what the shell prints when it notices you haven't finished
3816 typing in all of a command.)
3817
3818 We now have data consisting of one word per line, no punctuation, all one
3819 case.  We're ready to count each word:
3820
3821 @example
3822 $ tr '[A-Z]' '[a-z]' < whats.gnu | tr -cd '[A-Za-z0-9_ \012]' |
3823 > tr -s '[ ]' '\012' | sort | uniq -c | ...
3824 @end example
3825
3826 At this point, the data might look something like this:
3827
3828 @example
3829   60 a
3830    2 able
3831    6 about
3832    1 above
3833    2 accomplish
3834    1 acquire
3835    1 actually
3836    2 additional
3837 @end example
3838
3839 The output is sorted by word, not by count!  What we want is the most
3840 frequently used words first.  Fortunately, this is easy to accomplish,
3841 with the help of two more @code{sort} options:
3842
3843 @table @samp
3844 @item -n
3845 do a numeric sort, not an ASCII one
3846
3847 @item -r
3848 reverse the order of the sort
3849 @end table
3850
3851 The final pipeline looks like this:
3852
3853 @example
3854 $ tr '[A-Z]' '[a-z]' < whats.gnu | tr -cd '[A-Za-z0-9_ \012]' |
3855 > tr -s '[ ]' '\012' | sort | uniq -c | sort -nr
3856  156 the
3857   60 a
3858   58 to
3859   51 of
3860   51 and
3861  ...
3862 @end example
3863
3864 Whew!  That's a lot to digest.  Yet, the same principles apply. With six
3865 commands, on two lines (really one long one split for convenience), we've
3866 created a program that does something interesting and useful, in much
3867 less time than we could have written a C program to do the same thing.
3868
3869 A minor modification to the above pipeline can give us a simple spelling
3870 checker!  To determine if you've spelled a word correctly, all you have to
3871 do is look it up in a dictionary.  If it is not there, then chances are
3872 that your spelling is incorrect.  So, we need a dictionary.  If you
3873 have the Slackware Linux distribution, you have the file
3874 @file{/usr/lib/ispell/ispell.words}, which is a sorted, 38,400 word
3875 dictionary.
3876
3877 Now, how to compare our file with the dictionary?  As before, we generate
3878 a sorted list of words, one per line:
3879
3880 @example
3881 $ tr '[A-Z]' '[a-z]' < whats.gnu | tr -cd '[A-Za-z0-9_ \012]' |
3882 > tr -s '[ ]' '\012' | sort -u | ...
3883 @end example
3884
3885 Now, all we need is a list of words that are @emph{not} in the
3886 dictionary.  Here is where the @code{comm} command comes in.
3887
3888 @example
3889 $ tr '[A-Z]' '[a-z]' < whats.gnu | tr -cd '[A-Za-z0-9_ \012]' |
3890 > tr -s '[ ]' '\012' | sort -u |
3891 > comm -23 - /usr/lib/ispell/ispell.words
3892 @end example
3893
3894 The @samp{-2} and @samp{-3} options eliminate lines that are only in the
3895 dictionary (the second file), and lines that are in both files.  Lines
3896 only in the first file (standard input, our stream of words), are
3897 words that are not in the dictionary.  These are likely candidates for
3898 spelling errors.  This pipeline was the first cut at a production
3899 spelling checker on Unix.
3900
3901 There are some other tools that deserve brief mention.
3902
3903 @table @code
3904 @item grep
3905 search files for text that matches a regular expression
3906
3907 @item egrep
3908 like @code{grep}, but with more powerful regular expressions
3909
3910 @item wc
3911 count lines, words, characters
3912
3913 @item tee
3914 a T-fitting for data pipes, copies data to files and to standard output
3915
3916 @item sed
3917 the stream editor, an advanced tool
3918
3919 @item awk
3920 a data manipulation language, another advanced tool
3921 @end table
3922
3923 The software tools philosophy also espoused the following bit of
3924 advice: ``Let someone else do the hard part.'' This means, take
3925 something that gives you most of what you need, and then massage it the
3926 rest of the way until it's in the form that you want.
3927
3928 To summarize:
3929
3930 @enumerate 1
3931 @item
3932 Each program should do one thing well. No more, no less.
3933
3934 @item
3935 Combining programs with appropriate plumbing leads to results where
3936 the whole is greater than the sum of the parts.  It also leads to novel
3937 uses of programs that the authors might never have imagined.
3938
3939 @item
3940 Programs should never print extraneous header or trailer data, since these
3941 could get sent on down a pipeline. (A point we didn't mention earlier.)
3942
3943 @item
3944 Let someone else do the hard part.
3945
3946 @item
3947 Know your toolbox! Use each program appropriately. If you don't have an
3948 appropriate tool, build one.
3949 @end enumerate
3950
3951 As of this writing, all the programs we've discussed are available via
3952 anonymous @code{ftp} from @code{prep.ai.mit.edu} as
3953 @file{/pub/gnu/textutils-1.9.tar.gz} directory.@footnote{Version 1.9 was
3954 current when this column was written. Check the nearest GNU archive for
3955 the current version.}
3956
3957 None of what I have presented in this column is new. The Software Tools
3958 philosophy was first introduced in the book @cite{Software Tools},
3959 by Brian Kernighan and P.J. Plauger (Addison-Wesley, ISBN
3960 0-201-03669-X).   This book showed how to write and use software
3961 tools.   It was written in 1976, using a preprocessor for FORTRAN named
3962 @code{ratfor} (RATional FORtran).  At the time, C was not as ubiquitous
3963 as it is now; FORTRAN was.  The last chapter presented a @code{ratfor}
3964 to FORTRAN processor, written in @code{ratfor}. @code{ratfor} looks an
3965 awful lot like C; if you know C, you won't have any problem following
3966 the code.
3967
3968 In 1981, the book was updated and made available as @cite{Software
3969 Tools in Pascal} (Addison-Wesley, ISBN 0-201-10342-7).  Both books
3970 remain in print, and are well worth reading if you're a programmer.
3971 They certainly made a major change in how I view programming.
3972
3973 Initially, the programs in both books were available (on 9-track tape)
3974 from Addison-Wesley.  Unfortunately, this is no longer the case,
3975 although you might be able to find copies floating around the Internet.
3976 For a number of years, there was an active Software Tools Users Group,
3977 whose members had ported the original @code{ratfor} programs to essentially
3978 every computer system with a FORTRAN compiler.  The popularity of the
3979 group waned in the middle '80s as Unix began to spread beyond universities.
3980
3981 With the current proliferation of GNU code and other clones of Unix programs,
3982 these programs now receive little attention; modern C versions are
3983 much more efficient and do more than these programs do.  Nevertheless, as
3984 exposition of good programming style, and evangelism for a still-valuable
3985 philosophy, these books are unparalleled, and I recommend them highly.
3986
3987 Acknowledgment: I would like to express my gratitude to Brian Kernighan
3988 of Bell Labs, the original Software Toolsmith, for reviewing this column.
3989
3990
3991 @node Index
3992 @unnumbered Index
3993
3994 @printindex cp
3995
3996 @contents
3997 @bye
3998
3999 @c Local variables:
4000 @c texinfo-column-for-description: 32
4001 @c End: