doc/textutils.texi

   1 \input texinfo
   2 @c %**start of header
   3 @setfilename textutils.info
   4 @settitle GNU text utilities
   5 @c %**end of header
   6
   7 @include version.texi
   8
   9 @c Define new indices.
  10 @defcodeindex op
  11
  12 @c Put everything in one index (arbitrarily chosen to be the concept index).
  13 @syncodeindex fn cp
  14 @syncodeindex ky cp
  15 @syncodeindex op cp
  16 @syncodeindex pg cp
  17 @syncodeindex vr cp
  18
  19 @ifinfo
  20 @format
  21 START-INFO-DIR-ENTRY
  22 * Text utilities: (textutils).          GNU text utilities.
  23 * cat: (textutils)cat invocation.               Concatenate and write files.
  24 * cksum: (textutils)cksum invocation.           Print @sc{POSIX} CRC checksum.
  25 * comm: (textutils)comm invocation.             Compare sorted files by line.
  26 * csplit: (textutils)csplit invocation.         Split by context.
  27 * cut: (textutils)cut invocation.               Print selected parts of lines.
  28 * expand: (textutils)expand invocation.         Convert tabs to spaces.
  29 * fmt: (textutils)fmt invocation.               Reformat paragraph text.
  30 * fold: (textutils)fold invocation.             Wrap long input lines.
  31 * head: (textutils)head invocation.             Output the first part of files.
  32 * join: (textutils)join invocation.             Join lines on a common field.
  33 * md5sum: (textutils)md5sum invocation.         Print or check message-digests.
  34 * nl: (textutils)nl invocation.                 Number lines and write files.
  35 * od: (textutils)od invocation.                 Dump files in octal, etc.
  36 * paste: (textutils)paste invocation.           Merge lines of files.
  37 * pr: (textutils)pr invocation.                 Paginate or columnate files.
  38 * sort: (textutils)sort invocation.             Sort text files.
  39 * split: (textutils)split invocation.           Split into fixed-size pieces.
  40 * sum: (textutils)sum invocation.               Print traditional checksum.
  41 * tac: (textutils)tac invocation.               Reverse files.
  42 * tail: (textutils)tail invocation.             Output the last part of files.
  43 * tr: (textutils)tr invocation.                 Translate characters.
  44 * unexpand: (textutils)unexpand invocation.     Convert spaces to tabs.
  45 * uniq: (textutils)uniq invocation.             Uniqify files.
  46 * wc: (textutils)wc invocation.                 Byte, word, and line counts.
  47 END-INFO-DIR-ENTRY
  48 @end format
  49 @end ifinfo
  50
  51 @ifinfo
  52 This file documents the GNU text utilities.
  53
  54 Copyright (C) 1994, 95, 96 Free Software Foundation, Inc.
  55
  56 Permission is granted to make and distribute verbatim copies of
  57 this manual provided the copyright notice and this permission notice
  58 are preserved on all copies.
  59
  60 @ignore
  61 Permission is granted to process this file through TeX and print the
  62 results, provided the printed document carries copying permission
  63 notice identical to this one except for the removal of this paragraph
  64 (this paragraph not being relevant to the printed manual).
  65
  66 @end ignore
  67 Permission is granted to copy and distribute modified versions of this
  68 manual under the conditions for verbatim copying, provided that the entire
  69 resulting derived work is distributed under the terms of a permission
  70 notice identical to this one.
  71
  72 Permission is granted to copy and distribute translations of this manual
  73 into another language, under the above conditions for modified versions,
  74 except that this permission notice may be stated in a translation approved
  75 by the Foundation.
  76 @end ifinfo
  77
  78 @titlepage
  79 @title GNU @code{textutils}
  80 @subtitle A set of text utilities
  81 @subtitle for version @value{VERSION}, @value{UPDATED}
  82 @author David MacKenzie et al.
  83
  84 @page
  85 @vskip 0pt plus 1filll
  86 Copyright @copyright{} 1994, 95, 96 Free Software Foundation, Inc.
  87
  88 Permission is granted to make and distribute verbatim copies of
  89 this manual provided the copyright notice and this permission notice
  90 are preserved on all copies.
  91
  92 Permission is granted to copy and distribute modified versions of this
  93 manual under the conditions for verbatim copying, provided that the entire
  94 resulting derived work is distributed under the terms of a permission
  95 notice identical to this one.
  96
  97 Permission is granted to copy and distribute translations of this manual
  98 into another language, under the above conditions for modified versions,
  99 except that this permission notice may be stated in a translation approved
 100 by the Foundation.
 101 @end titlepage
 102
 103
 104 @ifinfo
 105 @node Top
 106 @top GNU text utilities
 107
 108 @cindex text utilities
 109 @cindex utilities for text handling
 110
 111 This manual documents version @value{VERSION} of the GNU text utilities.
 112
 113 @menu
 114 * Introduction::                       Caveats, overview, and authors.
 115 * Common options::                     Common options.
 116 * Output of entire files::             cat tac nl od
 117 * Formatting file contents::           fmt pr fold
 118 * Output of parts of files::           head tail split csplit
 119 * Summarizing files::                  wc sum cksum md5sum
 120 * Operating on sorted files::          sort uniq comm
 121 * Operating on fields within a line::  cut paste join
 122 * Operating on characters::            tr expand unexpand
 123 * Opening the software toolbox::       The software tools philosophy.
 124 * Index::                              General index.
 125 @end menu
 126 @end ifinfo
 127
 128
 129 @node Introduction
 130 @chapter Introduction
 131
 132 @cindex introduction
 133
 134 This manual is incomplete: No attempt is made to explain basic concepts
 135 in a way suitable for novices.  Thus, if you are interested, please get
 136 involved in improving this manual.  The entire GNU community will
 137 benefit.
 138
 139 @cindex POSIX.2
 140 The GNU text utilities are mostly compatible with the @sc{POSIX.2} standard.
 141
 142 @c This paragraph appears in all of fileutils.texi, textutils.texi, and
 143 @c sh-utils.texi too -- so be sure to keep them consistent.
 144 @cindex bugs, reporting
 145 Please report bugs to @samp{textutils-bugs@@gnu.ai.mit.edu}.  Remember
 146 to include the version number, machine architecture, input files, and
 147 any other information needed to reproduce the bug: your input, what you
 148 expected, what you got, and why it is wrong.  Diffs are welcome, but
 149 please include a description of the problem as well, since this is
 150 sometimes difficult to infer. @xref{Bugs, , , gcc, GNU CC}.
 151
 152 This manual was originally derived from the Unix man pages in the
 153 distribution, which were written by David MacKenzie and updated by Jim
 154 Meyering.  What you are reading now is the authoritative documentation
 155 for these utilities;  the man pages are no longer being maintained.
 156 The original @code{fmt} man page was written by Ross Paterson.
 157 Fran@,{c}ois Pinard did the initial conversion to Texinfo format.
 158 Karl Berry did the indexing, some reorganization, and editing of the results.
 159 Richard Stallman contributed his usual invaluable insights to the
 160 overall process.
 161
 162
 163 @node Common options
 164 @chapter Common options
 165
 166 @cindex common options
 167
 168 Certain options are available in all these programs.  Rather than
 169 writing identical descriptions for each of the programs, they are
 170 described here.  (In fact, every GNU program accepts (or should accept)
 171 these options.)
 172
 173 A few of these programs take arbitrary strings as arguments.  In those
 174 cases, @samp{--help} and @samp{--version} are taken as these options
 175 only if there is one and exactly one command line argument.
 176
 177 @table @samp
 178
 179 @item --help
 180 @opindex --help
 181 @cindex help, online
 182 Print a usage message listing all available options, then exit successfully.
 183
 184 @item --version
 185 @opindex --version
 186 @cindex version number, finding
 187 Print the version number, then exit successfully.
 188
 189 @end table
 190
 191
 192 @node Output of entire files
 193 @chapter Output of entire files
 194
 195 @cindex output of entire files
 196 @cindex entire files, output of
 197
 198 These commands read and write entire files, possibly transforming them
 199 in some way.
 200
 201 @menu
 202 * cat invocation::              Concatenate and write files.
 203 * tac invocation::              Concatenate and write files in reverse.
 204 * nl invocation::               Number lines and write files.
 205 * od invocation::               Write files in octal or other formats.
 206 @end menu
 207
 208 @node cat invocation
 209 @section @code{cat}: Concatenate and write files
 210
 211 @pindex cat
 212 @cindex concatenate and write files
 213 @cindex copying files
 214
 215 @code{cat} copies each @var{file} (@samp{-} means standard input), or
 216 standard input if none are given, to standard output.  Synopsis:
 217
 218 @example
 219 cat [@var{option}] [@var{file}]@dots{}
 220 @end example
 221
 222 The program accepts the following options.  Also see @ref{Common options}.
 223
 224 @table @samp
 225
 226 @item -A
 227 @itemx --show-all
 228 @opindex -A
 229 @opindex --show-all
 230 Equivalent to @samp{-vET}.
 231
 232 @item -b
 233 @itemx --number-nonblank
 234 @opindex -b
 235 @opindex --number-nonblank
 236 Number all nonblank output lines, starting with 1.
 237
 238 @item -e
 239 @opindex -e
 240 Equivalent to @samp{-vE}.
 241
 242 @item -E
 243 @itemx --show-ends
 244 @opindex -E
 245 @opindex --show-ends
 246 Display a @samp{$} after the end of each line.
 247
 248 @item -n
 249 @itemx --number
 250 @opindex -n
 251 @opindex --number
 252 Number all output lines, starting with 1.
 253
 254 @item -s
 255 @itemx --squeeze-blank
 256 @opindex -s
 257 @opindex --squeeze-blank
 258 @cindex squeezing blank lines
 259 Replace multiple adjacent blank lines with a single blank line.
 260
 261 @item -t
 262 @opindex -t
 263 Equivalent to @samp{-vT}.
 264
 265 @item -T
 266 @itemx --show-tabs
 267 @opindex -T
 268 @opindex --show-tabs
 269 Display @key{TAB} characters as @samp{^I}.
 270
 271 @item -u
 272 @opindex -u
 273 Ignored; for Unix compatibility.
 274
 275 @item -v
 276 @itemx --show-nonprinting
 277 @opindex -v
 278 @opindex --show-nonprinting
 279 Display control characters except for @key{LFD} and @key{TAB} using
 280 @samp{^} notation and precede characters that have the high bit set
 281 with @samp{M-}.
 282
 283 @end table
 284
 285
 286 @node tac invocation
 287 @section @code{tac}: Concatenate and write files in reverse
 288
 289 @pindex tac
 290 @cindex reversing files
 291
 292 @code{tac} copies each @var{file} (@samp{-} means standard input), or
 293 standard input if none are given, to standard output, reversing the
 294 records (lines by default) in each separately.  Synopsis:
 295
 296 @example
 297 tac [@var{option}]@dots{} [@var{file}]@dots{}
 298 @end example
 299
 300 @dfn{Records} are separated by instances of a string (newline by
 301 default).  By default, this separator string is attached to the end of
 302 the record that it follows in the file.
 303
 304 The program accepts the following options.  Also see @ref{Common options}.
 305
 306 @table @samp
 307
 308 @item -b
 309 @itemx --before
 310 @opindex -b
 311 @opindex --before
 312 The separator is attached to the beginning of the record that it
 313 precedes in the file.
 314
 315 @item -r
 316 @itemx --regex
 317 @opindex -r
 318 @opindex --regex
 319 Treat the separator string as a regular expression.
 320
 321 @item -s @var{separator}
 322 @itemx --separator=@var{separator}
 323 @opindex -s
 324 @opindex --separator
 325 Use @var{separator} as the record separator, instead of newline.
 326
 327 @end table
 328
 329
 330 @node nl invocation
 331 @section @code{nl}: Number lines and write files
 332
 333 @pindex nl
 334 @cindex numbering lines
 335 @cindex line numbering
 336
 337 @code{nl} writes each @var{file} (@samp{-} means standard input), or
 338 standard input if none are given, to standard output, with line numbers
 339 added to some or all of the lines.  Synopsis:
 340
 341 @example
 342 nl [@var{option}]@dots{} [@var{file}]@dots{}
 343 @end example
 344
 345 @cindex logical pages, numbering on
 346 @code{nl} decomposes its input into (logical) pages; by default, the
 347 line number is reset to 1 at the top of each logical page.  @code{nl}
 348 treats all of the input files as a single document; it does not reset
 349 line numbers or logical pages between files.
 350
 351 @cindex headers, numbering
 352 @cindex body, numbering
 353 @cindex footers, numbering
 354 A logical page consists of three sections: header, body, and footer.
 355 Any of the sections can be empty.  Each can be numbered in a different
 356 style from the others.
 357
 358 The beginnings of the sections of logical pages are indicated in the
 359 input file by a line containing exactly one of these delimiter strings:
 360
 361 @table @samp
 362 @item \:\:\:
 363 start of header;
 364 @item \:\:
 365 start of body;
 366 @item \:
 367 start of footer.
 368 @end table
 369
 370 The two characters from which these strings are made can be changed from
 371 @samp{\} and @samp{:} via options (see below), but the pattern and
 372 length of each string cannot be changed.
 373
 374 A section delimiter is replaced by an empty line on output.  Any text
 375 that comes before the first section delimiter string in the input file
 376 is considered to be part of a body section, so @code{nl} treats a
 377 file that contains no section delimiters as a single body section.
 378
 379 The program accepts the following options.  Also see @ref{Common options}.
 380
 381 @table @samp
 382
 383 @item -b @var{style}
 384 @itemx --body-numbering=@var{style}
 385 @opindex -b
 386 @opindex --body-numbering
 387 Select the numbering style for lines in the body section of each
 388 logical page.  When a line is not numbered, the current line number
 389 is not incremented, but the line number separator character is still
 390 prepended to the line.  The styles are:
 391
 392 @table @samp
 393 @item a
 394 number all lines,
 395 @item t
 396 number only nonempty lines (default for body),
 397 @item n
 398 do not number lines (default for header and footer),
 399 @item p@var{regexp}
 400 number only lines that contain a match for @var{regexp}.
 401 @end table
 402
 403 @item -d @var{cd}
 404 @itemx --section-delimiter=@var{cd}
 405 @opindex -d
 406 @opindex --section-delimiter
 407 @cindex section delimiters of pages
 408 Set the section delimiter characters to @var{cd}; default is
 409 @samp{\:}. If only @var{c} is given, the second remains @samp{:}.
 410 (Remember to protect @samp{\} or other metacharacters from shell
 411 expansion with quotes or extra backslashes.)
 412
 413 @item -f @var{style}
 414 @itemx --footer-numbering=@var{style}
 415 @opindex -f
 416 @opindex --footer-numbering
 417 Analogous to @samp{--body-numbering}.
 418
 419 @item -h @var{style}
 420 @itemx --header-numbering=@var{style}
 421 @opindex -h
 422 @opindex --header-numbering
 423 Analogous to @samp{--body-numbering}.
 424
 425 @item -i @var{number}
 426 @itemx --page-increment=@var{number}
 427 @opindex -i
 428 @opindex --page-increment
 429 Increment line numbers by @var{number} (default 1).
 430
 431 @item -l @var{number}
 432 @itemx --join-blank-lines=@var{number}
 433 @opindex -l
 434 @opindex --join-blank-lines
 435 @cindex empty lines, numbering
 436 @cindex blank lines, numbering
 437 Consider @var{number} (default 1) consecutive empty lines to be one
 438 logical line for numbering, and only number the last one.  Where fewer
 439 than @var{number} consecutive empty lines occur, do not number them.
 440 An empty line is one that contains no characters, not even spaces
 441 or tabs.
 442
 443 @item -n @var{format}
 444 @itemx --number-format=@var{format}
 445 @opindex -n
 446 @opindex --number-format
 447 Select the line numbering format (default is @code{rn}):
 448
 449 @table @samp
 450 @item ln
 451 @opindex ln @r{format for @code{nl}}
 452 left justified, no leading zeros;
 453 @item rn
 454 @opindex rn @r{format for @code{nl}}
 455 right justified, no leading zeros;
 456 @item rz
 457 @opindex rz @r{format for @code{nl}}
 458 right justified, leading zeros.
 459 @end table
 460
 461 @item -p
 462 @itemx --no-renumber
 463 @opindex -p
 464 @opindex --no-renumber
 465 Do not reset the line number at the start of a logical page.
 466
 467 @item -s @var{string}
 468 @itemx --number-separator=@var{string}
 469 @opindex -s
 470 @opindex --number-separator
 471 Separate the line number from the text line in the output with
 472 @var{string} (default is @key{TAB}).
 473
 474 @item -v @var{number}
 475 @itemx --starting-line-number=@var{number}
 476 @opindex -v
 477 @opindex --starting-line-number
 478 Set the initial line number on each logical page to @var{number} (default 1).
 479
 480 @item -w @var{number}
 481 @itemx --number-width=@var{number}
 482 @opindex -w
 483 @opindex --number-width
 484 Use @var{number} characters for line numbers (default 6).
 485
 486 @end table
 487
 488
 489 @node od invocation
 490 @section @code{od}: Write files in octal or other formats
 491
 492 @pindex od
 493 @cindex octal dump of files
 494 @cindex hex dump of files
 495 @cindex ASCII dump of files
 496 @cindex file contents, dumping unambiguously
 497
 498 @code{od} writes an unambiguous representation of each @var{file}
 499 (@samp{-} means standard input), or standard input if none are given.
 500 Synopsis:
 501
 502 @example
 503 od [@var{option}]@dots{} [@var{file}]@dots{}
 504 od -C [@var{file}] [[+]@var{offset} [[+]@var{label}]]
 505 @end example
 506
 507 Each line of output consists of the offset in the input, followed by
 508 groups of data from the file. By default, @code{od} prints the offset in
 509 octal, and each group of file data is two bytes of input printed as a
 510 single octal number.
 511
 512 The program accepts the following options.  Also see @ref{Common options}.
 513
 514 @table @samp
 515
 516 @item -A @var{radix}
 517 @itemx --address-radix=@var{radix}
 518 @opindex -A
 519 @opindex --address-radix
 520 @cindex radix for file offsets
 521 @cindex file offset radix
 522 Select the base in which file offsets are printed.  @var{radix} can
 523 be one of the following:
 524
 525 @table @samp
 526 @item d
 527 decimal;
 528 @item o
 529 octal;
 530 @item x
 531 hexadecimal;
 532 @item n
 533 none (do not print offsets).
 534 @end table
 535
 536 The default is octal.
 537
 538 @item -j @var{bytes}
 539 @itemx --skip-bytes=@var{bytes}
 540 @opindex -j
 541 @opindex --skip-bytes
 542 Skip @var{bytes} input bytes before formatting and writing.  If
 543 @var{bytes} begins with @samp{0x} or @samp{0X}, it is interpreted in
 544 hexadecimal; otherwise, if it begins with @samp{0}, in octal; otherwise,
 545 in decimal.  Appending @samp{b} multiplies @var{bytes} by 512, @samp{k}
 546 by 1024, and @samp{m} by 1048576.
 547
 548 @item -N @var{bytes}
 549 @itemx --read-bytes=@var{bytes}
 550 @opindex -N
 551 @opindex --read-bytes
 552 Output at most @var{bytes} bytes of the input.  Prefixes and suffixes on
 553 @code{bytes} are interpreted as for the @samp{-j} option.
 554
 555 @item -s [@var{n}]
 556 @itemx --strings[=@var{n}]
 557 @opindex -s
 558 @opindex --strings
 559 @cindex string constants, outputting
 560 Instead of the normal output, output only @dfn{string constants}: at
 561 least @var{n} (3 by default) consecutive ASCII graphic characters,
 562 followed by a null (zero) byte.
 563
 564 @item -t @var{type}
 565 @itemx --format=@var{type}
 566 @opindex -t
 567 @opindex --format
 568 Select the format in which to output the file data.  @var{type} is a
 569 string of one or more of the below type indicator characters.  If you
 570 include more than one type indicator character in a single @var{type}
 571 string, or use this option more than once, @code{od} writes one copy
 572 of each output line using each of the data types that you specified,
 573 in the order that you specified.
 574
 575 Adding a trailing ``z'' to any type specification appends a display
 576 of the ASCII character representation of the printable characters
 577 to the output line generated by the type specification.
 578
 579 @table @samp
 580 @item a
 581 named character,
 582 @item c
 583 ASCII character or backslash escape,
 584 @item d
 585 signed decimal,
 586 @item f
 587 floating point,
 588 @item o
 589 octal,
 590 @item u
 591 unsigned decimal,
 592 @item x
 593 hexadecimal.
 594 @end table
 595
 596 The type @code{a} outputs things like @samp{sp} for space, @samp{nl} for
 597 newline, and @samp{nul} for a null (zero) byte.  Type @code{c} outputs
 598 @samp{ }, @samp{\n}, and @code{\0}, respectively.
 599
 600 @cindex type size
 601 Except for types @samp{a} and @samp{c}, you can specify the number
 602 of bytes to use in interpreting each number in the given data type
 603 by following the type indicator character with a decimal integer.
 604 Alternately, you can specify the size of one of the C compiler's
 605 built-in data types by following the type indicator character with
 606 one of the following characters.  For integers (@samp{d}, @samp{o},
 607 @samp{u}, @samp{x}):
 608
 609 @table @samp
 610 @item C
 611 char,
 612 @item S
 613 short,
 614 @item I
 615 int,
 616 @item L
 617 long.
 618 @end table
 619
 620 For floating point (@code{f}):
 621
 622 @table @asis
 623 @item F
 624 float,
 625 @item D
 626 double,
 627 @item L
 628 long double.
 629 @end table
 630
 631 @item -v
 632 @itemx --output-duplicates
 633 @opindex -v
 634 @opindex --output-duplicates
 635 Output consecutive lines that are identical.  By default, when two or
 636 more consecutive output lines would be identical, @code{od} outputs only
 637 the first line, and puts just an asterisk on the following line to
 638 indicate the elision.
 639
 640 @item -w[@var{n}]
 641 @itemx --width[=@var{n}]
 642 @opindex -w
 643 @opindex --width
 644 Dump @code{n} input bytes per output line.  This must be a multiple of
 645 the least common multiple of the sizes associated with the specified
 646 output types.  If @var{n} is omitted, the default is 32.  If this option
 647 is not given at all, the default is 16.
 648
 649 @end table
 650
 651 The next several options map the old, pre-@sc{POSIX} format specification
 652 options to the corresponding @sc{POSIX} format specs.  GNU @code{od} accepts
 653 any combination of old- and new-style options.  Format specification
 654 options accumulate.
 655
 656 @table @samp
 657
 658 @item -a
 659 @opindex -a
 660 Output as named characters.  Equivalent to @samp{-ta}.
 661
 662 @item -b
 663 @opindex -b
 664 Output as octal bytes.  Equivalent to @samp{-toC}.
 665
 666 @item -c
 667 @opindex -c
 668 Output as ASCII characters or backslash escapes.  Equivalent to
 669 @samp{-tc}.
 670
 671 @item -d
 672 @opindex -d
 673 Output as unsigned decimal shorts.  Equivalent to @samp{-tu2}.
 674
 675 @item -f
 676 @opindex -f
 677 Output as floats.  Equivalent to @samp{-tfF}.
 678
 679 @item -h
 680 @opindex -h
 681 Output as hexadecimal shorts.  Equivalent to @samp{-tx2}.
 682
 683 @item -i
 684 @opindex -i
 685 Output as decimal shorts.  Equivalent to @samp{-td2}.
 686
 687 @item -l
 688 @opindex -l
 689 Output as decimal longs.  Equivalent to @samp{-td4}.
 690
 691 @item -o
 692 @opindex -o
 693 Output as octal shorts.  Equivalent to @samp{-to2}.
 694
 695 @item -x
 696 @opindex -x
 697 Output as hexadecimal shorts.  Equivalent to @samp{-tx2}.
 698
 699 @item -C
 700 @itemx --traditional
 701 @opindex --traditional
 702 Recognize the pre-POSIX non-option arguments that traditional @code{od}
 703 accepted.  The following syntax:
 704
 705 @example
 706 od --traditional [@var{file}] [[+]@var{offset}[.][b] [[+]@var{label}[.][b]]]
 707 @end example
 708
 709 @noindent
 710 can be used to specify at most one file and optional arguments
 711 specifying an offset and a pseudo-start address, @var{label}.  By
 712 default, @var{offset} is interpreted as an octal number specifying how
 713 many input bytes to skip before formatting and writing.  The optional
 714 trailing decimal point forces the interpretation of @var{offset} as a
 715 decimal number.  If no decimal is specified and the offset begins with
 716 @samp{0x} or @samp{0X} it is interpreted as a hexadecimal number.  If
 717 there is a trailing @samp{b}, the number of bytes skipped will be
 718 @var{offset} multiplied by 512.  The @var{label} argument is interpreted
 719 just like @var{offset}, but it specifies an initial pseudo-address.  The
 720 pseudo-addresses are displayed in parentheses following any normal
 721 address.
 722
 723 @end table
 724
 725
 726 @node Formatting file contents
 727 @chapter Formatting file contents
 728
 729 @cindex formatting file contents
 730
 731 These commands reformat the contents of files.
 732
 733 @menu
 734 * fmt invocation::              Reformat paragraph text.
 735 * pr invocation::               Paginate or columnate files for printing.
 736 * fold invocation::             Wrap input lines to fit in specified width.
 737 @end menu
 738
 739
 740 @node fmt invocation
 741 @section @code{fmt}: Reformat paragraph text
 742
 743 @pindex fmt
 744 @cindex reformatting paragraph text
 745 @cindex paragraphs, reformatting
 746 @cindex text, reformatting
 747
 748 @code{fmt} fills and joins lines to produce output lines of (at most)
 749 a given number of characters (75 by default).  Synopsis:
 750
 751 @example
 752 fmt [@var{option}]@dots{} [@var{file}]@dots{}
 753 @end example
 754
 755 @code{fmt} reads from the specified @var{file} arguments (or standard
 756 input if none are given), and writes to standard output.
 757
 758 By default, blank lines, spaces between words, and indentation are
 759 preserved in the output; successive input lines with different
 760 indentation are not joined; tabs are expanded on input and introduced on
 761 output.
 762
 763 @cindex line-breaking
 764 @cindex sentences and line-breaking
 765 @cindex Knuth, Donald E.
 766 @cindex Plass, Michael F.
 767 @code{fmt} prefers breaking lines at the end of a sentence, and tries to
 768 avoid line breaks after the first word of a sentence or before the last
 769 word of a sentence.  A @dfn{sentence break} is defined as either the end
 770 of a paragraph or a word ending in any of @samp{.?!}, followed by two
 771 spaces or end of line, ignoring any intervening parentheses or quotes.
 772 Like @TeX{}, @code{fmt} reads entire ``paragraphs'' before choosing line
 773 breaks; the algorithm is a variant of that in ``Breaking Paragraphs Into
 774 Lines'' (Donald E. Knuth and Michael F. Plass, @cite{Software---Practice
 775 and Experience}, 11 (1981), 1119--1184).
 776
 777 The program accepts the following options.  Also see @ref{Common options}.
 778
 779 @table @samp
 780
 781 @item -c
 782 @itemx --crown-margin
 783 @opindex -c
 784 @opindex --crown-margin
 785 @cindex crown margin
 786 @dfn{Crown margin} mode: preserve the indentation of the first two
 787 lines within a paragraph, and align the left margin of each subsequent
 788 line with that of the second line.
 789
 790 @item -t
 791 @itemx --tagged-paragraph
 792 @opindex -t
 793 @opindex --tagged-paragraph
 794 @cindex tagged paragraphs
 795 @dfn{Tagged paragraph} mode: like crown margin mode, except that if
 796 indentation of the first line of a paragraph is the same as the
 797 indentation of the second, the first line is treated as a one-line
 798 paragraph.
 799
 800 @item -s
 801 @itemx --split-only
 802 @opindex -s
 803 @opindex --split-only
 804 Split lines only.  Do not join short lines to form longer ones.  This
 805 prevents sample lines of code, and other such ``formatted'' text from
 806 being unduly combined.
 807
 808 @item -u
 809 @itemx --uniform-spacing
 810 @opindex -u
 811 @opindex --uniform-spacing
 812 Uniform spacing.  Reduce spacing between words to one space, and spacing
 813 between sentences to two spaces.
 814
 815 @item -@var{width}
 816 @itemx -w @var{width}
 817 @itemx --width=@var{width}
 818 @opindex -@var{width}
 819 @opindex -w
 820 @opindex --width
 821 Fill output lines up to @var{width} characters (default 75).  @code{fmt}
 822 initially tries to make lines about 7% shorter than this, to give it
 823 room to balance line lengths.
 824
 825 @item -p @var{prefix}
 826 @itemx --prefix=@var{prefix}
 827 Only lines beginning with @var{prefix} (possibly preceded by whitespace)
 828 are subject to formatting. The prefix and any preceding whitespace are
 829 stripped for the formatting and then re-attached to each formatted output
 830 line.  One use is to format certain kinds of program comments, while
 831 leaving the code unchanged.
 832
 833 @end table
 834
 835
 836 @node pr invocation
 837 @section @code{pr}: Paginate or columnate files for printing
 838
 839 @pindex pr
 840 @cindex printing, preparing files for
 841 @cindex multicolumn output, generating
 842 @cindex merging files in parallel
 843
 844 @code{pr} writes each @var{file} (@samp{-} means standard input), or
 845 standard input if none are given, to standard output, paginating and
 846 optionally outputting in multicolumn format; optionally merges all
 847 @var{file}s, printing all in parallel, one per column.  Synopsis:
 848
 849 @example
 850 pr [@var{option}]@dots{} [@var{file}]@dots{}
 851 @end example
 852
 853 By default, a 5-line header is printed: two blank lines; a line with the
 854 date, the file name, and the page count; and two more blank lines.  A
 855 footer of five blank lines is also printed. With the @samp{-f} option, a
 856 3-line header is printed: the leading two blank lines are omitted; no
 857 footer used. The default @var{page_length} in both cases is 66 lines.
 858 The text line of the header takes up the full @var{page_width} in the
 859 form @samp{yy-mm-dd HH:MM string Page nnnn}. String is a centered
 860 string.
 861
 862 Form feeds in the input cause page breaks in the output. Multiple form
 863 feeds produce empty pages.
 864
 865 Columns have equal width, separated by an optional string (default
 866 space). Lines will always be truncated to line width (default 72),
 867 unless you use the @samp{-j} option. For single column output no line
 868 truncation occurs by default. Use @samp{-w} option to truncate lines
 869 in that case.
 870
 871 The program accepts the following options.  Also see @ref{Common options}.
 872
 873 @table @samp
 874
 875 @item +@var{first_page}[@var{:last_page}]
 876 @opindex +@var{first_page}[@var{:last_page}]
 877 Begin printing with page @var{first_page} and stop with
 878 @var{last_page}. Missing @samp{:LAST_PAGE} implies end of file. While
 879 estimating the number of skipped pages each form feed in the input file
 880 results in a new page. Page counting with and without
 881 @samp{+@var{first_page}} is identical. By default, it starts with the
 882 first page of input file (not first page printed). Page numbering may be
 883 altered by @samp{-N} option.
 884
 885 @item -@var{column}
 886 @opindex -@var{column}
 887 @cindex down columns
 888 With each single @var{file}, produce @var{column}-column output and
 889 print columns down. The column width is automatically estimated from
 890 @var{page_width}. This option might well cause some columns to be
 891 truncated. The number of lines in the columns on each page will be
 892 balanced. @samp{-@var{column}} may not be used with @samp{-m} option.
 893
 894 @item -a
 895 @opindex -a
 896 @cindex across columns
 897 With each single @var{file}, print columns across rather than down.
 898 @var{column} must be greater than one.
 899
 900 @item -c
 901 @opindex -c
 902 Print control characters using hat notation (e.g., @samp{^G}); print
 903 other unprintable characters in octal backslash notation.  By default,
 904 unprintable characters are not changed.
 905
 906 @item -d
 907 @opindex -d
 908 @cindex double spacing
 909 Double space the output.
 910
 911 @item -e[@var{in-tabchar}[@var{in-tabwidth}]]
 912 @opindex -e
 913 @cindex input tabs
 914 Expand tabs to spaces on input.  Optional argument @var{in-tabchar} is
 915 the input tab character (default is @key{TAB}).  Second optional
 916 argument @var{in-tabwidth} is the input tab character's width (default
 917 is 8).
 918
 919 @item -f
 920 @itemx -F
 921 @opindex -F
 922 @opindex -f
 923 Use a form feed instead of newlines to separate output pages. Default
 924 page length of 66 lines is not altered. But the number of lines of text
 925 per page changes from 56 to 63 lines.
 926
 927
 928 @item -h @var{HEADER}
 929 @opindex -h
 930 Replace the file name in the header with the centered string
 931 @var{header}. Left-hand-side truncation (marked by a @samp{*}) may occur
 932 if the total header line @samp{yy-mm-dd HH:MM HEADER Page nnnn}
 933 becomes larger than @var{page_width}. @samp{-h ""} prints a blank line
 934 header. Don't use @samp{-h""}. A space between the -h option and the
 935 argument is always peremptory.
 936
 937 @item -i[@var{out-tabchar}[@var{out-tabwidth}]]
 938 @opindex -i
 939 @cindex output tabs
 940 Replace spaces with tabs on output.  Optional argument @var{out-tabchar}
 941 is the output tab character (default is @key{TAB}).  Second optional
 942 argument @var{out-tabwidth} is the output tab character's width (default
 943 is 8).
 944
 945 @item -j
 946 @opindex -j
 947 Merge lines of full length. Used together with the column options
 948 @samp{-@var{column}}, @samp{-a -@var{column}} or @samp{-m}. Turns off
 949 @samp{-w} line truncation; no column alignment used; may be used with
 950 @samp{-s[@var{separator}]}.
 951
 952
 953 @item -l @var{page_length}
 954 @opindex -l
 955 Set the page length to @var{page_length} (default 66) lines. If
 956 @var{page_length} is less than or equal 10 (and <= 3 with @samp{-f}),
 957 the headers and footers are omitted, and all form feeds set in input
 958 files are eliminated, as if the @samp{-T} option had been given.
 959
 960 @item -m
 961 @opindex -m
 962 Merge and print all @var{file}s in parallel, one in each column. If a
 963 line is too long to fit in a column, it is truncated (but see
 964 @samp{-j}). @samp{-s[@var{separator}]} may be used. Empty pages in some
 965 @var{file}s (form feeds set) produce empty columns, still marked by
 966 @var{separator}. Completely empty common pages show no separators or
 967 line numbers. The default header becomes
 968 @samp{yy-mm-dd HH:MM <blanks> Page nnnn}; may be used with
 969 @samp{-h @var{header}} to fill up the middle part.
 970
 971
 972 @item -n[@var{number-separator}[@var{digits}]]
 973 @opindex -n
 974 Precede each column with a line number; with parallel @var{file}s
 975 (@samp{-m}), precede only each line with a line number. Optional argument
 976 @var{number-separator} is the character to print after each number
 977 (default is @key{TAB}).  Optional argument @var{digits} is the number of
 978 digits per line number (default is 5). Default line counting starts with
 979 first line of the input file (not with the first line printed, see
 980 @samp{-N}).
 981
 982 @item -N @var{line_number}
 983 @opindex -N
 984 Start line counting with no. @var{line_number} at first line of first
 985 page printed.
 986
 987 @item -o @var{n}
 988 @opindex -o
 989 @cindex indenting lines
 990 @cindex left margin
 991 Indent each line with @var{n} (default is zero) spaces wide, i.e., set
 992 the left margin.  The total page width is @var{n} plus the width set
 993 with the @samp{-w} option.
 994
 995 @item -r
 996 @opindex -r
 997 Do not print a warning message when an argument @var{file} cannot be
 998 opened.  (The exit status will still be nonzero, however.)
 999
1000 @item -s[@var{separator}]
1001 @opindex -s
1002 Separate columns by a string @var{separator}. Don't use
1003 @samp{-s @var{separator}}, no space between flag and argument. If this
1004 option is omitted altogether, the default is @key{TAB} together with
1005 @samp{-j} option and space otherwise (same as @samp{-s" "}). With
1006 @samp{-s} only, no separator is used (same as @samp{-s""}). @samp{-s}
1007 does not affect line truncation or column alignment.
1008
1009 @item -t
1010 @opindex -t
1011 Do not print the usual header [and footer] on each page, and do not fill
1012 out the bottoms of pages (with blank lines or a form feed). No page
1013 structure is produced, but retain form feeds set in the input files. The
1014 predefined page layout is not changed. @samp{-t} or @samp{-T} may be
1015 useful together with other options; e.g.: @samp{-t -e4}, expand
1016 @key{TAB} in the input file to 4 spaces but do not do any other changes.
1017 Use of @samp{-t} overrides @samp{-h}.
1018
1019 @item -T
1020 @opindex -T
1021 Do not print header [and footer]. In addition eliminate all form feeds
1022 set in the input files.
1023
1024 @item -v
1025 @opindex -v
1026 Print unprintable characters in octal backslash notation.
1027
1028 @item -w @var{page_width}
1029 @opindex -w
1030 Set the page width to @var{page_width} (default 72) characters.
1031 With/without @samp{-w}, header lines are always truncated to
1032 @var{page_width} characters. With @samp{-w}, text lines are truncated,
1033 unless @samp{-j} is used. Without @samp{-w} together with one of the
1034 column options @samp{-@var{column}}, @samp{-a -@var{column}} or
1035 @samp{-m}, default truncation of text lines to 72 characters is used.
1036 Without @samp{-w} and without any of the column options, no line
1037 truncation is used. That's equivalent to @samp{-w 72 -j}.
1038
1039 @end table
1040
1041
1042 @node fold invocation
1043 @section @code{fold}: Wrap input lines to fit in specified width
1044
1045 @pindex fold
1046 @cindex wrapping long input lines
1047 @cindex folding long input lines
1048
1049 @code{fold} writes each @var{file} (@samp{-} means standard input), or
1050 standard input if none are given, to standard output, breaking long
1051 lines.  Synopsis:
1052
1053 @example
1054 fold [@var{option}]@dots{} [@var{file}]@dots{}
1055 @end example
1056
1057 By default, @code{fold} breaks lines wider than 80 columns. The output
1058 is split into as many lines as necessary.
1059
1060 @cindex screen columns
1061 @code{fold} counts screen columns by default; thus, a tab may count more
1062 than one column, backspace decreases the column count, and carriage
1063 return sets the column to zero.
1064
1065 The program accepts the following options.  Also see @ref{Common options}.
1066
1067 @table @samp
1068
1069 @item -b
1070 @itemx --bytes
1071 @opindex -b
1072 @opindex --bytes
1073 Count bytes rather than columns, so that tabs, backspaces, and carriage
1074 returns are each counted as taking up one column, just like other
1075 characters.
1076
1077 @item -s
1078 @itemx --spaces
1079 @opindex -s
1080 @opindex --spaces
1081 Break at word boundaries: the line is broken after the last blank before
1082 the maximum line length.  If the line contains no such blanks, the line
1083 is broken at the maximum line length as usual.
1084
1085 @item -w @var{width}
1086 @itemx --width=@var{width}
1087 @opindex -w
1088 @opindex --width
1089 Use a maximum line length of @var{width} columns instead of 80.
1090
1091 @end table
1092
1093
1094 @node Output of parts of files
1095 @chapter Output of parts of files
1096
1097 @cindex output of parts of files
1098 @cindex parts of files, output of
1099
1100 These commands output pieces of the input.
1101
1102 @menu
1103 * head invocation::             Output the first part of files.
1104 * tail invocation::             Output the last part of files.
1105 * split invocation::            Split a file into fixed-size pieces.
1106 * csplit invocation::           Split a file into context-determined pieces.
1107 @end menu
1108
1109 @node head invocation
1110 @section @code{head}: Output the first part of files
1111
1112 @pindex head
1113 @cindex initial part of files, outputting
1114 @cindex first part of files, outputting
1115
1116 @code{head} prints the first part (10 lines by default) of each
1117 @var{file}; it reads from standard input if no files are given or
1118 when given a @var{file} of @samp{-}.  Synopses:
1119
1120 @example
1121 head [@var{option}]@dots{} [@var{file}]@dots{}
1122 head -@var{number} [@var{option}]@dots{} [@var{file}]@dots{}
1123 @end example
1124
1125 If more than one @var{file} is specified, @code{head} prints a
1126 one-line header consisting of
1127 @example
1128 ==> @var{file name} <==
1129 @end example
1130 @noindent
1131 before the output for each @var{file}.
1132
1133 @code{head} accepts two option formats: the new one, in which numbers
1134 are arguments to the options (@samp{-q -n 1}), and the old one, in which
1135 the number precedes any option letters (@samp{-1q}).
1136
1137 The program accepts the following options.  Also see @ref{Common options}.
1138
1139 @table @samp
1140
1141 @item -@var{count}@var{options}
1142 @opindex -@var{count}
1143 This option is only recognized if it is specified first.  @var{count} is
1144 a decimal number optionally followed by a size letter (@samp{b},
1145 @samp{k}, @samp{m}) as in @code{-c}, or @samp{l} to mean count by lines,
1146 or other option letters (@samp{cqv}).
1147
1148 @item -c @var{bytes}
1149 @itemx --bytes=@var{bytes}
1150 @opindex -c
1151 @opindex --bytes
1152 Print the first @var{bytes} bytes, instead of initial lines.  Appending
1153 @samp{b} multiplies @var{bytes} by 512, @samp{k} by 1024, and @samp{m}
1154 by 1048576.
1155
1156 @itemx -n @var{n}
1157 @itemx --lines=@var{n}
1158 @opindex -n
1159 @opindex --lines
1160 Output the first @var{n} lines.
1161
1162 @item -q
1163 @itemx --quiet
1164 @itemx --silent
1165 @opindex -q
1166 @opindex --quiet
1167 @opindex --silent
1168 Never print file name headers.
1169
1170 @item -v
1171 @itemx --verbose
1172 @opindex -v
1173 @opindex --verbose
1174 Always print file name headers.
1175
1176 @end table
1177
1178
1179 @node tail invocation
1180 @section @code{tail}: Output the last part of files
1181
1182 @pindex tail
1183 @cindex last part of files, outputting
1184
1185 @code{tail} prints the last part (10 lines by default) of each
1186 @var{file}; it reads from standard input if no files are given or
1187 when given a @var{file} of @samp{-}.  Synopses:
1188
1189 @example
1190 tail [@var{option}]@dots{} [@var{file}]@dots{}
1191 tail -@var{number} [@var{option}]@dots{} [@var{file}]@dots{}
1192 tail +@var{number} [@var{option}]@dots{} [@var{file}]@dots{}
1193 @end example
1194
1195 If more than one @var{file} is specified, @code{tail} prints a
1196 one-line header consisting of
1197 @example
1198 ==> @var{file name} <==
1199 @end example
1200 @noindent
1201 before the output for each @var{file}.
1202
1203 @cindex BSD @code{tail}
1204 GNU @code{tail} can output any amount of data (some other versions of
1205 @code{tail} cannot).  It also has no @samp{-r} option (print in
1206 reverse), since reversing a file is really a different job from printing
1207 the end of a file; BSD @code{tail} (which is the one with @code{-r}) can
1208 only reverse files that are at most as large as its buffer, which is
1209 typically 32k.  A more reliable and versatile way to reverse files is
1210 the GNU @code{tac} command.
1211
1212 @code{tail} accepts two option formats: the new one, in which numbers
1213 are arguments to the options (@samp{-n 1}), and the old one, in which
1214 the number precedes any option letters (@samp{-1} or @samp{+1}).
1215
1216 If any option-argument is a number @var{n} starting with a @samp{+},
1217 @code{tail} begins printing with the @var{n}th item from the start of
1218 each file, instead of from the end.
1219
1220 The program accepts the following options.  Also see @ref{Common options}.
1221
1222 @table @samp
1223
1224 @item -@var{count}
1225 @itemx +@var{count}
1226 @opindex -@var{count}
1227 @opindex +@var{count}
1228 This option is only recognized if it is specified first.  @var{count} is
1229 a decimal number optionally followed by a size letter (@samp{b},
1230 @samp{k}, @samp{m}) as in @code{-c}, or @samp{l} to mean count by lines,
1231 or other option letters (@samp{cfqv}).
1232
1233 @item -c @var{bytes}
1234 @itemx --bytes=@var{bytes}
1235 @opindex -c
1236 @opindex --bytes
1237 Output the last @var{bytes} bytes, instead of final lines.  Appending
1238 @samp{b} multiplies @var{bytes} by 512, @samp{k} by 1024, and @samp{m}
1239 by 1048576.
1240
1241 @item -f
1242 @itemx --follow
1243 @opindex -f
1244 @opindex --follow
1245 @cindex growing files
1246 Loop forever trying to read more characters at the end of the file,
1247 presumably because the file is growing.  Ignored if reading from a pipe.
1248 If more than one file is given, @code{tail} prints a header whenever it
1249 gets output from a different file, to indicate which file that output is
1250 from.
1251
1252 @itemx -n @var{n}
1253 @itemx --lines=@var{n}
1254 @opindex -n
1255 @opindex --lines
1256 Output the last @var{n} lines.
1257
1258 @item -q
1259 @itemx -quiet
1260 @itemx --silent
1261 @opindex -q
1262 @opindex --quiet
1263 @opindex --silent
1264 Never print file name headers.
1265
1266 @item -v
1267 @itemx --verbose
1268 @opindex -v
1269 @opindex --verbose
1270 Always print file name headers.
1271
1272 @end table
1273
1274
1275 @node split invocation
1276 @section @code{split}: Split a file into fixed-size pieces
1277
1278 @pindex split
1279 @cindex splitting a file into pieces
1280 @cindex pieces, splitting a file into
1281
1282 @code{split} creates output files containing consecutive sections of
1283 @var{input} (standard input if none is given or @var{input} is
1284 @samp{-}).  Synopsis:
1285
1286 @example
1287 split [@var{option}] [@var{input} [@var{prefix}]]
1288 @end example
1289
1290 By default, @code{split} puts 1000 lines of @var{input} (or whatever is
1291 left over for the last section), into each output file.
1292
1293 @cindex output file name prefix
1294 The output files' names consist of @var{prefix} (@samp{x} by default)
1295 followed by a group of letters @samp{aa}, @samp{ab}, and so on, such
1296 that concatenating the output files in sorted order by file name produces
1297 the original input file.  (If more than 676 output files are required,
1298 @code{split} uses @samp{zaa}, @samp{zab}, etc.)
1299
1300 The program accepts the following options.  Also see @ref{Common options}.
1301
1302 @table @samp
1303
1304 @item -@var{lines}
1305 @itemx -l @var{lines}
1306 @itemx --lines=@var{lines}
1307 @opindex -l
1308 @opindex --lines
1309 Put @var{lines} lines of @var{input} into each output file.
1310
1311 @item -b @var{bytes}
1312 @itemx --bytes=@var{bytes}
1313 @opindex -b
1314 @opindex --bytes
1315 Put the first @var{bytes} bytes of @var{input} into each output file.
1316 Appending @samp{b} multiplies @var{bytes} by 512, @samp{k} by 1024, and
1317 @samp{m} by 1048576.
1318
1319 @item -C @var{bytes}
1320 @itemx --line-bytes=@var{bytes}
1321 @opindex -C
1322 @opindex --line-bytes
1323 Put into each output file as many complete lines of @var{input} as
1324 possible without exceeding @var{bytes} bytes.  For lines longer than
1325 @var{bytes} bytes, put @var{bytes} bytes into each output file until
1326 less than @var{bytes} bytes of the line are left, then continue
1327 normally.  @var{bytes} has the same format as for the @samp{--bytes}
1328 option.
1329
1330 @itemx --verbose
1331 @opindex --verbose
1332 Write a diagnostic to standard error just before each output file is opened.
1333
1334 @end table
1335
1336
1337 @node csplit invocation
1338 @section @code{csplit}: Split a file into context-determined pieces
1339
1340 @pindex csplit
1341 @cindex context splitting
1342 @cindex splitting a file into pieces by context
1343
1344 @code{csplit} creates zero or more output files containing sections of
1345 @var{input} (standard input if @var{input} is @samp{-}).  Synopsis:
1346
1347 @example
1348 csplit [@var{option}]@dots{} @var{input} @var{pattern}@dots{}
1349 @end example
1350
1351 The contents of the output files are determined by the @var{pattern}
1352 arguments, as detailed below.  An error occurs if a @var{pattern}
1353 argument refers to a nonexistent line of the input file (e.g., if no
1354 remaining line matches a given regular expression).  After every
1355 @var{pattern} has been matched, any remaining input is copied into one
1356 last output file.
1357
1358 By default, @code{csplit} prints the number of bytes written to each
1359 output file after it has been created.
1360
1361 The types of pattern arguments are:
1362
1363 @table @samp
1364
1365 @item @var{n}
1366 Create an output file containing the input up to but not including line
1367 @var{n} (a positive integer).  If followed by a repeat count, also
1368 create an output file containing the next @var{line} lines of the input
1369 file once for each repeat.
1370
1371 @item /@var{regexp}/[@var{offset}]
1372 Create an output file containing the current line up to (but not
1373 including) the next line of the input file that contains a match for
1374 @var{regexp}.  The optional @var{offset} is a @samp{+} or @samp{-}
1375 followed by a positive integer.  If it is given, the input up to the
1376 matching line plus or minus @var{offset} is put into the output file,
1377 and the line after that begins the next section of input.
1378
1379 @item %@var{regexp}%[@var{offset}]
1380 Like the previous type, except that it does not create an output
1381 file, so that section of the input file is effectively ignored.
1382
1383 @item @{@var{repeat-count}@}
1384 Repeat the previous pattern @var{repeat-count} additional
1385 times. @var{repeat-count} can either be a positive integer or an
1386 asterisk, meaning repeat as many times as necessary until the input is
1387 exhausted.
1388
1389 @end table
1390
1391 The output files' names consist of a prefix (@samp{xx} by default)
1392 followed by a suffix.  By default, the suffix is an ascending sequence
1393 of two-digit decimal numbers from @samp{00} and up to @samp{99}.  In any
1394 case, concatenating the output files in sorted order by filename
1395 produces the original input file.
1396
1397 By default, if @code{csplit} encounters an error or receives a hangup,
1398 interrupt, quit, or terminate signal, it removes any output files
1399 that it has created so far before it exits.
1400
1401 The program accepts the following options.  Also see @ref{Common options}.
1402
1403 @table @samp
1404
1405 @item -f @var{prefix}
1406 @itemx --prefix=@var{prefix}
1407 @opindex -f
1408 @opindex --prefix
1409 @cindex output file name prefix
1410 Use @var{prefix} as the output file name prefix.
1411
1412 @item -b @var{suffix}
1413 @itemx --suffix=@var{suffix}
1414 @opindex -b
1415 @opindex --suffix
1416 @cindex output file name suffix
1417 Use @var{suffix} as the output file name suffix.  When this option is
1418 specified, the suffix string must include exactly one
1419 @code{printf(3)}-style conversion specification, possibly including
1420 format specification flags, a field width, a precision specifications,
1421 or all of these kinds of modifiers.  The format letter must convert a
1422 binary integer argument to readable form; thus, only @samp{d}, @samp{i},
1423 @samp{u}, @samp{o}, @samp{x}, and @samp{X} conversions are allowed.  The
1424 entire @var{suffix} is given (with the current output file number) to
1425 @code{sprintf(3)} to form the file name suffixes for each of the
1426 individual output files in turn.  If this option is used, the
1427 @samp{--digits} option is ignored.
1428
1429 @item -n @var{digits}
1430 @itemx --digits=@var{digits}
1431 @opindex -n
1432 @opindex --digits
1433 Use output file names containing numbers that are @var{digits} digits
1434 long instead of the default 2.
1435
1436 @item -k
1437 @itemx --keep-files
1438 @opindex -k
1439 @opindex --keep-files
1440 Do not remove output files when errors are encountered.
1441
1442 @item -z
1443 @itemx --elide-empty-files
1444 @opindex -z
1445 @opindex --elide-empty-files
1446 Suppress the generation of zero-length output files.  (In cases where
1447 the section delimiters of the input file are supposed to mark the first
1448 lines of each of the sections, the first output file will generally be a
1449 zero-length file unless you use this option.)  The output file sequence
1450 numbers always run consecutively starting from 0, even when this option
1451 is specified.
1452
1453 @item -s
1454 @itemx -q
1455 @itemx --silent
1456 @itemx --quiet
1457 @opindex -s
1458 @opindex -q
1459 @opindex --silent
1460 @opindex --quiet
1461 Do not print counts of output file sizes.
1462
1463 @end table
1464
1465
1466 @node Summarizing files
1467 @chapter Summarizing files
1468
1469 @cindex summarizing files
1470
1471 These commands generate just a few numbers representing entire
1472 contents of files.
1473
1474 @menu
1475 * wc invocation::               Print byte, word, and line counts.
1476 * sum invocation::              Print checksum and block counts.
1477 * cksum invocation::            Print CRC checksum and byte counts.
1478 * md5sum invocation::           Print or check message-digests.
1479 @end menu
1480
1481
1482 @node wc invocation
1483 @section @code{wc}: Print byte, word, and line counts
1484
1485 @pindex wc
1486 @cindex byte count
1487 @cindex word count
1488 @cindex line count
1489
1490 @code{wc} counts the number of bytes, whitespace-separated words, and
1491 newlines in each given @var{file}, or standard input if none are given
1492 or for a @var{file} of @samp{-}.  Synopsis:
1493
1494 @example
1495 wc [@var{option}]@dots{} [@var{file}]@dots{}
1496 @end example
1497
1498 @cindex total counts
1499 @code{wc} prints one line of counts for each file, and if the file was
1500 given as an argument, it prints the file name following the counts.  If
1501 more than one @var{file} is given, @code{wc} prints a final line
1502 containing the cumulative counts, with the file name @file{total}.  The
1503 counts are printed in this order: newlines, words, bytes.
1504
1505 By default, @code{wc} prints all three counts.  Options can specify
1506 that only certain counts be printed.  Options do not undo others
1507 previously given, so
1508
1509 @example
1510 wc --bytes --words
1511 @end example
1512
1513 @noindent
1514 prints both the byte counts and the word counts.
1515
1516 With the @code{--max-line-length} option, @code{wc} prints the length
1517 of the longest line per file, and if there is more than one file it
1518 prints the maximum (not the sum) of those lengths.
1519
1520 The program accepts the following options.  Also see @ref{Common options}.
1521
1522 @table @samp
1523
1524 @item -c
1525 @itemx --bytes
1526 @itemx --chars
1527 @opindex -c
1528 @opindex --bytes
1529 @opindex --chars
1530 Print only the byte counts.
1531
1532 @item -w
1533 @itemx --words
1534 @opindex -w
1535 @opindex --words
1536 Print only the word counts.
1537
1538 @item -l
1539 @itemx --lines
1540 @opindex -l
1541 @opindex --lines
1542 Print only the newline counts.
1543
1544 @item -L
1545 @itemx --max-line-length
1546 @opindex -L
1547 @opindex --max-line-length
1548 Print only the maximum line lengths.
1549
1550 @end table
1551
1552
1553 @node sum invocation
1554 @section @code{sum}: Print checksum and block counts
1555
1556 @pindex sum
1557 @cindex 16-bit checksum
1558 @cindex checksum, 16-bit
1559
1560 @code{sum} computes a 16-bit checksum for each given @var{file}, or
1561 standard input if none are given or for a @var{file} of @samp{-}.  Synopsis:
1562
1563 @example
1564 sum [@var{option}]@dots{} [@var{file}]@dots{}
1565 @end example
1566
1567 @code{sum} prints the checksum for each @var{file} followed by the
1568 number of blocks in the file (rounded up).  If more than one @var{file}
1569 is given, file names are also printed (by default).  (With the
1570 @samp{--sysv} option, corresponding file name are printed when there is
1571 at least one file argument.)
1572
1573 By default, GNU @code{sum} computes checksums using an algorithm
1574 compatible with BSD @code{sum} and prints file sizes in units of
1575 1024-byte blocks.
1576
1577 The program accepts the following options.  Also see @ref{Common options}.
1578
1579 @table @samp
1580
1581 @item -r
1582 @opindex -r
1583 @cindex BSD @code{sum}
1584 Use the default (BSD compatible) algorithm.  This option is included for
1585 compatibility with the System V @code{sum}.  Unless @samp{-s} was also
1586 given, it has no effect.
1587
1588 @item -s
1589 @itemx --sysv
1590 @opindex -s
1591 @opindex --sysv
1592 @cindex System V @code{sum}
1593 Compute checksums using an algorithm compatible with System V
1594 @code{sum}'s default, and print file sizes in units of 512-byte blocks.
1595
1596 @end table
1597
1598 @code{sum} is provided for compatibility; the @code{cksum} program (see
1599 next section) is preferable in new applications.
1600
1601
1602 @node cksum invocation
1603 @section @code{cksum}: Print CRC checksum and byte counts
1604
1605 @pindex cksum
1606 @cindex cyclic redundancy check
1607 @cindex CRC checksum
1608
1609 @code{cksum} computes a cyclic redundancy check (CRC) checksum for each
1610 given @var{file}, or standard input if none are given or for a
1611 @var{file} of @samp{-}.  Synopsis:
1612
1613 @example
1614 cksum [@var{option}]@dots{} [@var{file}]@dots{}
1615 @end example
1616
1617 @code{cksum} prints the CRC checksum for each file along with the number
1618 of bytes in the file, and the filename unless no arguments were given.
1619
1620 @code{cksum} is typically used to ensure that files
1621 transferred by unreliable means (e.g., netnews) have not been corrupted,
1622 by comparing the @code{cksum} output for the received files with the
1623 @code{cksum} output for the original files (typically given in the
1624 distribution).
1625
1626 The CRC algorithm is specified by the @sc{POSIX.2} standard.  It is not
1627 compatible with the BSD or System V @code{sum} algorithms (see the
1628 previous section); it is more robust.
1629
1630 The only options are @samp{--help} and @samp{--version}.  @xref{Common
1631 options}.
1632
1633
1634 @node md5sum invocation
1635 @section @code{md5sum}: Print or check message-digests
1636
1637 @pindex md5sum
1638 @cindex 128-bit checksum
1639 @cindex checksum, 128-bit
1640 @cindex fingerprint, 128-bit
1641 @cindex message-digest, 128-bit
1642
1643 @code{md5sum} computes a 128-bit checksum (or @dfn{fingerprint} or
1644 @dfn{message-digest}) for each specified @var{file}.
1645 If a @var{file} is specified as @samp{-} or if no files are given
1646 @code{md5sum} computes the checksum for the standard input.
1647 @code{md5sum} can also determine whether a file and checksum are
1648 consistent. Synopses:
1649
1650 @example
1651 md5sum [@var{option}]@dots{} [@var{file}]@dots{}
1652 md5sum [@var{option}]@dots{} --check [@var{file}]
1653 @end example
1654
1655 For each @var{file}, @samp{md5sum} outputs the MD5 checksum, a flag
1656 indicating a binary or text input file, and the filename.
1657 If @var{file} is omitted or specified as @samp{-}, standard input is read.
1658
1659 The program accepts the following options.  Also see @ref{Common options}.
1660
1661 @table @samp
1662
1663 @item -b
1664 @itemx --binary
1665 @opindex -b
1666 @opindex --binary
1667 @cindex binary input files
1668 Treat all input files as binary.  This option has no effect on Unix
1669 systems, since they don't distinguish between binary and text files.
1670 This option is useful on systems that have different internal and
1671 external character representations.
1672
1673 @item -c
1674 @itemx --check
1675 Read filenames and checksum information from the single @var{file}
1676 (or from stdin if no @var{file} was specified) and report whether
1677 each named file and the corresponding checksum data are consistent.
1678 The input to this mode of @code{md5sum} is usually the output of
1679 a prior, checksum-generating run of @samp{md5sum}.
1680 Each valid line of input consists of an MD5 checksum, a binary/text
1681 flag, and then a filename.
1682 Binary files are marked with @samp{*}, text with @samp{ }.
1683 For each such line, @code{md5sum} reads the named file and computes its
1684 MD5 checksum.  Then, if the computed message digest does not match the
1685 one on the line with the filename, the file is noted as having
1686 failed the test.  Otherwise, the file passes the test.
1687 By default, for each valid line, one line is written to standard
1688 output indicating whether the named file passed the test.
1689 After all checks have been performed, if there were any failures,
1690 a warning is issued to standard error.
1691 Use the @samp{--status} option to inhibit that output.
1692 If any listed file cannot be opened or read, if any valid line has
1693 an MD5 checksum inconsistent with the associated file, or if no valid
1694 line is found, @code{md5sum} exits with nonzero status.  Otherwise,
1695 it exits successfully.
1696
1697 @itemx --status
1698 @opindex --status
1699 @cindex verifying MD5 checksums
1700 This option is useful only when verifying checksums.
1701 When verifying checksums, don't generate the default one-line-per-file
1702 diagnostic and don't output the warning summarizing any failures.
1703 Failures to open or read a file still evoke individual diagnostics to
1704 standard error.
1705 If all listed files are readable and are consistent with the associated
1706 MD5 checksums, exit successfully.  Otherwise exit with a status code
1707 indicating there was a failure.
1708
1709 @item -t
1710 @itemx --text
1711 @opindex -t
1712 @opindex --text
1713 @cindex text input files
1714 Treat all input files as text files.  This is the reverse of
1715 @samp{--binary}.
1716
1717 @item -w
1718 @itemx --warn
1719 @opindex -w
1720 @opindex --warn
1721 @cindex verifying MD5 checksums
1722 When verifying checksums, warn about improperly formatted MD5 checksum lines.
1723 This option is useful only if all but a few lines in the checked input
1724 are valid.
1725
1726 @end table
1727
1728
1729 @node Operating on sorted files
1730 @chapter Operating on sorted files
1731
1732 @cindex operating on sorted files
1733 @cindex sorted files, operations on
1734
1735 These commands work with (or produce) sorted files.
1736
1737 @menu
1738 * sort invocation::             Sort text files.
1739 * uniq invocation::             Uniqify files.
1740 * comm invocation::             Compare two sorted files line by line.
1741 @end menu
1742
1743
1744 @node sort invocation
1745 @section @code{sort}: Sort text files
1746
1747 @pindex sort
1748 @cindex sorting files
1749
1750 @code{sort} sorts, merges, or compares all the lines from the given
1751 files, or standard input if none are given or for a @var{file} of
1752 @samp{-}.  By default, @code{sort} writes the results to standard
1753 output.  Synopsis:
1754
1755 @example
1756 sort [@var{option}]@dots{} [@var{file}]@dots{}
1757 @end example
1758
1759 @code{sort} has three modes of operation: sort (the default), merge,
1760 and check for sortedness.  The following options change the operation
1761 mode:
1762
1763 @table @samp
1764
1765 @item -c
1766 @opindex -c
1767 @cindex checking for sortedness
1768 Check whether the given files are already sorted: if they are not all
1769 sorted, print an error message and exit with a status of 1.
1770 Otherwise, exit successfully.
1771
1772 @item -m
1773 @opindex -m
1774 @cindex merging sorted files
1775 Merge the given files by sorting them as a group.  Each input file must
1776 always be individually sorted.  It always works to sort instead of
1777 merge; merging is provided because it is faster, in the case where it
1778 works.
1779
1780 @end table
1781
1782 A pair of lines is compared as follows: if any key fields have been
1783 specified, @code{sort} compares each pair of fields, in the order
1784 specified on the command line, according to the associated ordering
1785 options, until a difference is found or no fields are left.
1786
1787 If any of the global options @samp{Mbdfinr} are given but no key fields
1788 are specified, @code{sort} compares the entire lines according to the
1789 global options.
1790
1791 Finally, as a last resort when all keys compare equal (or if no
1792 ordering options were specified at all), @code{sort} compares the lines
1793 byte by byte in machine collating sequence.  The last resort comparison
1794 honors the @samp{-r} global option.  The @samp{-s} (stable) option
1795 disables this last-resort comparison so that lines in which all fields
1796 compare equal are left in their original relative order.  If no fields
1797 or global options are specified, @samp{-s} has no effect.
1798
1799 GNU @code{sort} (as specified for all GNU utilities) has no limits on
1800 input line length or restrictions on bytes allowed within lines.  In
1801 addition, if the final byte of an input file is not a newline, GNU
1802 @code{sort} silently supplies one.
1803
1804 Upon any error, @code{sort} exits with a status of @samp{2}.
1805
1806 @vindex TMPDIR
1807 If the environment variable @code{TMPDIR} is set, @code{sort} uses its
1808 value as the directory for temporary files instead of @file{/tmp}.  The
1809 @samp{-T @var{tempdir}} option in turn overrides the environment
1810 variable.
1811
1812 The following options affect the ordering of output lines.  They may be
1813 specified globally or as part of a specific key field.  If no key
1814 fields are specified, global options apply to comparison of entire
1815 lines; otherwise the global options are inherited by key fields that do
1816 not specify any special options of their own.
1817
1818 @table @samp
1819
1820 @item -b
1821 @opindex -b
1822 @cindex blanks, ignoring leading
1823 Ignore leading blanks when finding sort keys in each line.
1824
1825 @item -d
1826 @opindex -d
1827 @cindex phone directory order
1828 @cindex telephone directory order
1829 Sort in @dfn{phone directory} order: ignore all characters except
1830 letters, digits and blanks when sorting.
1831
1832 @item -f
1833 @opindex -f
1834 @cindex case folding
1835 Fold lowercase characters into the equivalent uppercase characters when
1836 sorting so that, for example, @samp{b} and @samp{B} sort as equal.
1837
1838 @item -g
1839 @opindex -g
1840 @cindex general numeric sort
1841 Sort numerically, but use strtod(3) to arrive at the numeric values.
1842 This allows floating point numbers to be specified in scientific notation,
1843 like @code{1.0e-34} and @code{10e100}.  Use this option only if there
1844 is no alternative;  it is much slower than @samp{-n} and numbers with
1845 too many significant digits will be compared as if they had been
1846 truncated.  In addition, numbers outside the range of representable
1847 double precision floating point numbers are treated as if they were
1848 zeroes; overflow and underflow are not reported.
1849
1850 @item -i
1851 @opindex -i
1852 @cindex unprintable characters, ignoring
1853 Ignore characters outside the printable ASCII range 040-0176 octal
1854 (inclusive) when sorting.
1855
1856 @item -M
1857 @opindex -M
1858 @cindex months, sorting by
1859 An initial string, consisting of any amount of whitespace, followed
1860 by three letters abbreviating a month name, is folded to UPPER case and
1861 compared in the order @samp{JAN} < @samp{FEB} < @dots{} < @samp{DEC}.
1862 Invalid names compare low to valid names.
1863
1864 @item -n
1865 @opindex -n
1866 @cindex numeric sort
1867 Sort numerically: the number begins each line; specifically, it consists
1868 of optional whitespace, an optional @samp{-} sign, and zero or more
1869 digits, optionally followed by a decimal point and zero or more digits.
1870
1871 @code{sort -n} uses what might be considered an unconventional method
1872 to compare strings representing floating point numbers.  Rather than
1873 first converting each string to the C @code{double} type and then
1874 comparing those values, sort aligns the decimal points in the two
1875 strings and compares the strings a character at a time.  One benefit
1876 of using this approach is its speed.  In practice this is much more
1877 efficient than performing the two corresponding string-to-double (or even
1878 string-to-integer) conversions and then comparing doubles.  In addition,
1879 there is no corresponding loss of precision.  Converting each string to
1880 @code{double} before comparison would limit precision to about 16 digits
1881 on most systems.
1882
1883 Neither a leading @samp{+} nor exponential notation is recognized.
1884 To compare such strings numerically, use the @samp{-g} option.
1885
1886 @item -r
1887 @opindex -r
1888 @cindex reverse sorting
1889 Reverse the result of comparison, so that lines with greater key values
1890 appear earlier in the output instead of later.
1891
1892 @end table
1893
1894 Other options are:
1895
1896 @table @samp
1897
1898 @item -o @var{output-file}
1899 @opindex -o
1900 @cindex overwriting of input, allowed
1901 Write output to @var{output-file} instead of standard output.
1902 If @var{output-file} is one of the input files, @code{sort} copies
1903 it to a temporary file before sorting and writing the output to
1904 @var{output-file}.
1905
1906 @item -t @var{separator}
1907 @opindex -t
1908 @cindex field separator character
1909 Use character @var{separator} as the field separator when finding the
1910 sort keys in each line.  By default, fields are separated by the empty
1911 string between a non-whitespace character and a whitespace character.
1912 That is, given the input line @w{@samp{ foo bar}}, @code{sort} breaks it
1913 into fields @w{@samp{ foo}} and @w{@samp{ bar}}.  The field separator is
1914 not considered to be part of either the field preceding or the field
1915 following.
1916
1917 @item -u
1918 @opindex -u
1919 @cindex uniqifying output
1920 For the default case or the @samp{-m} option, only output the first
1921 of a sequence of lines that compare equal.  For the @samp{-c} option,
1922 check that no pair of consecutive lines compares equal.
1923
1924 @item -k @var{pos1}[,@var{pos2}]
1925 @opindex -k
1926 @cindex sort field
1927 The recommended, @sc{POSIX}, option for specifying a sort field.  The field
1928 consists of the line between @var{pos1} and @var{pos2} (or the end of
1929 the line, if @var{pos2} is omitted), inclusive.  Fields and character
1930 positions are numbered starting with 1.  See below.
1931
1932 @item -z
1933 @opindex -z
1934 @cindex sort zero-terminated lines
1935 Treat the input as a set of lines, each terminated by a zero byte (@sc{ASCII}
1936 @sc{NUL} (Null) character) instead of a @sc{ASCII} @sc{LF} (Line Feed.)
1937 This option can be useful in conjunction with @samp{perl -0} or
1938 @samp{find -print0} and @samp{xargs -0} which do the same in order to
1939 reliably handle arbitrary pathnames (even those which contain Line Feed
1940 characters.)
1941
1942 @item +@var{pos1}[-@var{pos2}]
1943 The obsolete, traditional option for specifying a sort field.  The field
1944 consists of the line between @var{pos1} and up to but @emph{not including}
1945 @var{pos2} (or the end of the line if @var{pos2} is omitted).  Fields
1946 and character positions are numbered starting with 0.  See below.
1947
1948 @end table
1949
1950 In addition, when GNU @code{sort} is invoked with exactly one argument,
1951 options @samp{--help} and @samp{--version} are recognized.  @xref{Common
1952 options}.
1953
1954 Historical (BSD and System V) implementations of @code{sort} have
1955 differed in their interpretation of some options, particularly
1956 @samp{-b}, @samp{-f}, and @samp{-n}.  GNU sort follows the @sc{POSIX}
1957 behavior, which is usually (but not always!) like the System V behavior.
1958 According to @sc{POSIX}, @samp{-n} no longer implies @samp{-b}.  For
1959 consistency, @samp{-M} has been changed in the same way.  This may
1960 affect the meaning of character positions in field specifications in
1961 obscure cases.  The only fix is to add an explicit @samp{-b}.
1962
1963 A position in a sort field specified with the @samp{-k} or @samp{+}
1964 option has the form @samp{@var{f}.@var{c}}, where @var{f} is the number
1965 of the field to use and @var{c} is the number of the first character
1966 from the beginning of the field (for @samp{+@var{pos}}) or from the end
1967 of the previous field (for @samp{-@var{pos}}).  If the @samp{.@var{c}}
1968 is omitted, it is taken to be the first character in the field.  If the
1969 @samp{-b} option was specified, the @samp{.@var{c}} part of a field
1970 specification is counted from the first nonblank character of the field
1971 (for @samp{+@var{pos}}) or from the first nonblank character following
1972 the previous field (for @samp{-@var{pos}}).
1973
1974 A sort key option may also have any of the option letters @samp{Mbdfinr}
1975 appended to it, in which case the global ordering options are not used
1976 for that particular field.  The @samp{-b} option may be independently
1977 attached to either or both of the @samp{+@var{pos}} and
1978 @samp{-@var{pos}} parts of a field specification, and if it is inherited
1979 from the global options it will be attached to both.
1980 Keys may span multiple fields.
1981
1982 Here are some examples to illustrate various combinations of options.
1983 In them, the @sc{POSIX} @samp{-k} option is used to specify sort keys rather
1984 than the obsolete @samp{+@var{pos1}-@var{pos2}} syntax.
1985
1986 @itemize @bullet
1987
1988 @item
1989 Sort in descending (reverse) numeric order.
1990
1991 @example
1992 sort -nr
1993 @end example
1994
1995 Sort alphabetically, omitting the first and second fields.
1996 This uses a single key composed of the characters beginning
1997 at the start of field three and extending to the end of each line.
1998
1999 @example
2000 sort -k3
2001 @end example
2002
2003 @item
2004 Sort numerically on the second field and resolve ties by sorting
2005 alphabetically on the third and fourth characters of field five.
2006 Use @samp{:} as the field delimiter.
2007
2008 @example
2009 sort -t : -k 2,2n -k 5.3,5.4
2010 @end example
2011
2012 Note that if you had written @samp{-k 2} instead of @samp{-k 2,2}
2013 @samp{sort} would have used all characters beginning in the second field
2014 and extending to the end of the line as the primary @emph{numeric}
2015 key.  For the large majority of applications, treating keys spanning
2016 more than one field as numeric will not do what you expect.
2017
2018 Also note that the @samp{n} modifier was applied to the field-end
2019 specifier for the first key.  It would have been equivalent to
2020 specify @samp{-k 2n,2} or @samp{-k 2n,2n}.  All modifiers except
2021 @samp{b} apply to the associated @emph{field}, regardless of whether
2022 the modifier character is attached to the field-start and/or the
2023 field-end part of the key specifier.
2024
2025 @item
2026 Sort the password file on the fifth field and ignore any
2027 leading white space.  Sort lines with equal values in field five
2028 on the numeric user ID in field three.
2029
2030 @example
2031 sort -t : -k 5b,5 -k 3,3n /etc/passwd
2032 @end example
2033
2034 An alternative is to use the global numeric modifier @samp{-n}.
2035
2036 @example
2037 sort -t : -n -k 5b,5 -k 3,3 /etc/passwd
2038 @end example
2039
2040 @item
2041 Generate a tags file in case insensitive sorted order.
2042 @example
2043 find src -type f -print0 | sort -t / -z -f | xargs -0 etags --append
2044 @end example
2045
2046 The use of @samp{-print0}, @samp{-z}, and @samp{-0} in this case mean
2047 that pathnames that contain Line Feed characters will not get broken up
2048 by the sort operation.
2049
2050 Finally, to ignore both leading and trailing white space, you
2051 could have applied the @samp{b} modifier to the field-end specifier
2052 for the first key,
2053
2054 @example
2055 sort -t : -n -k 5b,5b -k 3,3 /etc/passwd
2056 @end example
2057
2058 or by using the global @samp{-b} modifier instead of @samp{-n}
2059 and an explicit @samp{n} with the second key specifier.
2060
2061 @example
2062 sort -t : -b -k 5,5 -k 3,3n /etc/passwd
2063 @end example
2064
2065 @end itemize
2066
2067
2068 @node uniq invocation
2069 @section @code{uniq}: Uniqify files
2070
2071 @pindex uniq
2072 @cindex uniqify files
2073
2074 @code{uniq} writes the unique lines in the given @file{input}, or
2075 standard input if nothing is given or for an @var{input} name of
2076 @samp{-}.  Synopsis:
2077
2078 @example
2079 uniq [@var{option}]@dots{} [@var{input} [@var{output}]]
2080 @end example
2081
2082 By default, @code{uniq} prints the unique lines in a sorted file, i.e.,
2083 discards all but one of identical successive lines.  Optionally, it can
2084 instead show only lines that appear exactly once, or lines that appear
2085 more than once.
2086
2087 The input must be sorted.  If your input is not sorted, perhaps you want
2088 to use @code{sort -u}.
2089
2090 If no @var{output} file is specified, @code{uniq} writes to standard
2091 output.
2092
2093 The program accepts the following options.  Also see @ref{Common options}.
2094
2095 @table @samp
2096
2097 @item -@var{n}
2098 @itemx -f @var{n}
2099 @itemx --skip-fields=@var{n}
2100 @opindex -@var{n}
2101 @opindex -f
2102 @opindex --skip-fields
2103 Skip @var{n} fields on each line before checking for uniqueness.  Fields
2104 are sequences of non-space non-tab characters that are separated from
2105 each other by at least one spaces or tabs.
2106
2107 @item +@var{n}
2108 @itemx -s @var{n}
2109 @itemx --skip-chars=@var{n}
2110 @opindex +@var{n}
2111 @opindex -s
2112 @opindex --skip-chars
2113 Skip @var{n} characters before checking for uniqueness.  If you use both
2114 the field and character skipping options, fields are skipped over first.
2115
2116 @item -c
2117 @itemx --count
2118 @opindex -c
2119 @opindex --count
2120 Print the number of times each line occurred along with the line.
2121
2122 @item -i
2123 @itemx --ignore-case
2124 @opindex -i
2125 @opindex --ignore-case
2126 Ignore differences in case when comparing lines.
2127
2128 @item -d
2129 @itemx --repeated
2130 @opindex -d
2131 @opindex --repeated
2132 @cindex duplicate lines, outputting
2133 Print only duplicate lines.
2134
2135 @item -u
2136 @itemx --unique
2137 @opindex -u
2138 @opindex --unique
2139 @cindex unique lines, outputting
2140 Print only unique lines.
2141
2142 @item -w @var{n}
2143 @itemx --check-chars=@var{n}
2144 @opindex -w
2145 @opindex --check-chars
2146 Compare @var{n} characters on each line (after skipping any specified
2147 fields and characters).  By default the entire rest of the lines are
2148 compared.
2149
2150 @end table
2151
2152
2153 @node comm invocation
2154 @section @code{comm}: Compare two sorted files line by line
2155
2156 @pindex comm
2157 @cindex line-by-line comparison
2158 @cindex comparing sorted files
2159
2160 @code{comm} writes to standard output lines that are common, and lines
2161 that are unique, to two input files; a file name of @samp{-} means
2162 standard input.  Synopsis:
2163
2164 @example
2165 comm [@var{option}]@dots{} @var{file1} @var{file2}
2166 @end example
2167
2168 The input files must be sorted before @code{comm} can be used.
2169
2170 @cindex differing lines
2171 @cindex common lines
2172 With no options, @code{comm} produces three column output.  Column one
2173 contains lines unique to @var{file1}, column two contains lines unique
2174 to @var{file2}, and column three contains lines common to both files.
2175 Columns are separated by @key{TAB}.
2176 @c FIXME: when there's an option to supply an alternative separator
2177 @c string, append `by default' to the above sentence.
2178
2179 @opindex -1
2180 @opindex -2
2181 @opindex -3
2182 The options @samp{-1}, @samp{-2}, and @samp{-3} suppress printing of
2183 the corresponding columns.  Also see @ref{Common options}.
2184
2185 Unlike some other comparison utilities, @code{comm} has an exit
2186 status that does not depend on the result of the comparison.
2187 Upon normal completion @code{comm} produces an exit code of zero.
2188 If there is an error it exits with nonzero status.
2189
2190
2191 @node Operating on fields within a line
2192 @chapter Operating on fields within a line
2193
2194 @menu
2195 * cut invocation::              Print selected parts of lines.
2196 * paste invocation::            Merge lines of files.
2197 * join invocation::             Join lines on a common field.
2198 @end menu
2199
2200
2201 @node cut invocation
2202 @section @code{cut}: Print selected parts of lines
2203
2204 @pindex cut
2205 @code{cut} writes to standard output selected parts of each line of each
2206 input file, or standard input if no files are given or for a file name of
2207 @samp{-}.  Synopsis:
2208
2209 @example
2210 cut [@var{option}]@dots{} [@var{file}]@dots{}
2211 @end example
2212
2213 In the table which follows, the @var{byte-list}, @var{character-list},
2214 and @var{field-list} are one or more numbers or ranges (two numbers
2215 separated by a dash) separated by commas.  Bytes, characters, and
2216 fields are numbered from starting at 1.  Incomplete ranges may be
2217 given: @samp{-@var{m}} means @samp{1-@var{m}}; @samp{@var{n}-} means
2218 @samp{@var{n}} through end of line or last field.
2219
2220 The program accepts the following options.  Also see @ref{Common
2221 options}.
2222
2223 @table @samp
2224
2225 @item -b @var{byte-list}
2226 @itemx --bytes=@var{byte-list}
2227 @opindex -b
2228 @opindex --bytes
2229 Print only the bytes in positions listed in @var{byte-list}.  Tabs and
2230 backspaces are treated like any other character; they take up 1 byte.
2231
2232 @item -c @var{character-list}
2233 @itemx --characters=@var{character-list}
2234 @opindex -c
2235 @opindex --characters
2236 Print only characters in positions listed in @var{character-list}.
2237 The same as @samp{-b} for now, but internationalization will change
2238 that.  Tabs and backspaces are treated like any other character; they
2239 take up 1 character.
2240
2241 @item -f @var{field-list}
2242 @itemx --fields=@var{field-list}
2243 @opindex -f
2244 @opindex --fields
2245 Print only the fields listed in @var{field-list}.  Fields are
2246 separated by a @key{TAB} by default.
2247
2248 @item -d @var{delim}
2249 @itemx --delimiter=@var{delim}
2250 @opindex -d
2251 @opindex --delimiter
2252 For @samp{-f}, fields are separated by the first character in @var{delim}
2253 (default is @key{TAB}).
2254
2255 @item -n
2256 @opindex -n
2257 Do not split multi-byte characters (no-op for now).
2258
2259 @item -s
2260 @itemx --only-delimited
2261 @opindex -s
2262 @opindex --only-delimited
2263 For @samp{-f}, do not print lines that do not contain the field separator
2264 character.
2265
2266 @end table
2267
2268
2269 @node paste invocation
2270 @section @code{paste}: Merge lines of files
2271
2272 @pindex paste
2273 @cindex merging files
2274
2275 @code{paste} writes to standard output lines consisting of sequentially
2276 corresponding lines of each given file, separated by @key{TAB}.
2277 Standard input is used for a file name of @samp{-} or if no input files
2278 are given.
2279
2280 Synopsis:
2281
2282 @example
2283 paste [@var{option}]@dots{} [@var{file}]@dots{}
2284 @end example
2285
2286 The program accepts the following options.  Also see @ref{Common options}.
2287
2288 @table @samp
2289
2290 @item -s
2291 @itemx --serial
2292 @opindex -s
2293 @opindex --serial
2294 Paste the lines of one file at a time rather than one line from each
2295 file.
2296
2297 @item -d @var{delim-list}
2298 @itemx --delimiters @var{delim-list}
2299 @opindex -d
2300 @opindex --delimiters
2301 Consecutively use the characters in @var{delim-list} instead of
2302 @key{TAB} to separate merged lines.  When @var{delim-list} is
2303 exhausted, start again at its beginning.
2304
2305 @end table
2306
2307
2308 @node join invocation
2309 @section @code{join}: Join lines on a common field
2310
2311 @pindex join
2312 @cindex common field, joining on
2313
2314 @code{join} writes to standard output a line for each pair of input
2315 lines that have identical join fields.  Synopsis:
2316
2317 @example
2318 join [@var{option}]@dots{} @var{file1} @var{file2}
2319 @end example
2320
2321 Either @var{file1} or @var{file2} (but not both) can be @samp{-},
2322 meaning standard input.  @var{file1} and @var{file2} should be already
2323 sorted in increasing order (not numerically) on the join fields; unless
2324 the @samp{-t} option is given, they should be sorted ignoring blanks at
2325 the start of the join field, as in @code{sort -b}.  If the
2326 @samp{--ignore-case} option is given, lines should be sorted without
2327 regard to the case of characters in the join field, as in @code{sort -f}.
2328
2329 The defaults are: the join field is the first field in each line;
2330 fields in the input are separated by one or more blanks, with leading
2331 blanks on the line ignored; fields in the output are separated by a
2332 space; each output line consists of the join field, the remaining
2333 fields from @var{file1}, then the remaining fields from @var{file2}.
2334
2335 The program accepts the following options.  Also see @ref{Common options}.
2336
2337 @table @samp
2338
2339 @item -a @var{file-number}
2340 @opindex -a
2341 Print a line for each unpairable line in file @var{file-number} (either
2342 @samp{1} or @samp{2}), in addition to the normal output.
2343
2344 @item -e @var{string}
2345 @opindex -e
2346 Replace those output fields that are missing in the input with
2347 @var{string}.
2348
2349 @item -i
2350 @itemx --ignore-case
2351 @opindex -i
2352 @opindex --ignore-case
2353 Ignore differences in case when comparing keys.
2354 With this option, the lines of the input files must be ordered in the same way.
2355 Use @samp{sort -f} to produce this ordering.
2356
2357 @item -1 @var{field}
2358 @itemx -j1 @var{field}
2359 @opindex -1
2360 @opindex -j1
2361 Join on field @var{field} (a positive integer) of file 1.
2362
2363 @item -2 @var{field}
2364 @itemx -j2 @var{field}
2365 @opindex -2
2366 @opindex -j2
2367 Join on field @var{field} (a positive integer) of file 2.
2368
2369 @item -j @var{field}
2370 Equivalent to @samp{-1 @var{field} -2 @var{field}}.
2371
2372 @item -o @var{field-list}@dots{}
2373 Construct each output line according to the format in @var{field-list}.
2374 Each element in @var{field-list} is either the single character @samp{0} or
2375 has the form @var{m.n} where the file number, @var{m}, is @samp{1} or
2376 @samp{2} and @var{n} is a positive field number.
2377
2378 A field specification of @samp{0} denotes the join field.
2379 In most cases, the functionality of the @samp{0} field spec
2380 may be reproduced using the explicit @var{m.n} that corresponds
2381 to the join field.  However, when printing unpairable lines
2382 (using either of the @samp{-a} or @samp{-v} options), there is no way
2383 to specify the join field using @var{m.n} in @var{field-list}
2384 if there are unpairable lines in both files.
2385 To give @code{join} that functionality, @sc{POSIX} invented the @samp{0}
2386 field specification notation.
2387
2388 The elements in @var{field-list}
2389 are separated by commas or blanks.  Multiple @var{field-list}
2390 arguments can be given after a single @samp{-o} option; the values
2391 of all lists given with @samp{-o} are concatenated together.
2392 All output lines -- including those printed because of any -a or -v
2393 option -- are subject to the specified @var{field-list}.
2394
2395 @item -t @var{char}
2396 Use character @var{char} as the input and output field separator.
2397
2398 @item -v @var{file-number}
2399 Print a line for each unpairable line in file @var{file-number}
2400 (either @samp{1} or @samp{2}), instead of the normal output.
2401
2402 @end table
2403
2404 In addition, when GNU @code{join} is invoked with exactly one argument,
2405 options @samp{--help} and @samp{--version} are recognized.  @xref{Common
2406 options}.
2407
2408
2409 @node Operating on characters
2410 @chapter Operating on characters
2411
2412 @cindex operating on characters
2413
2414 This commands operate on individual characters.
2415
2416 @menu
2417 * tr invocation::               Translate, squeeze, and/or delete characters.
2418 * expand invocation::           Convert tabs to spaces.
2419 * unexpand invocation::         Convert spaces to tabs.
2420 @end menu
2421
2422
2423 @node tr invocation
2424 @section @code{tr}: Translate, squeeze, and/or delete characters
2425
2426 @pindex tr
2427
2428 Synopsis:
2429
2430 @example
2431 tr [@var{option}]@dots{} @var{set1} [@var{set2}]
2432 @end example
2433
2434 @code{tr} copies standard input to standard output, performing
2435 one of the following operations:
2436
2437 @itemize @bullet
2438 @item
2439 translate, and optionally squeeze repeated characters in the result,
2440 @item
2441 squeeze repeated characters,
2442 @item
2443 delete characters,
2444 @item
2445 delete characters, then squeeze repeated characters from the result.
2446 @end itemize
2447
2448 The @var{set1} and (if given) @var{set2} arguments define ordered
2449 sets of characters, referred to below as @var{set1} and @var{set2}.  These
2450 sets are the characters of the input that @code{tr} operates on.
2451 The @samp{--complement} (@samp{-c}) option replaces @var{set1} with its
2452 complement (all of the characters that are not in @var{set1}).
2453
2454 @menu
2455 * Character sets::              Specifying sets of characters.
2456 * Translating::                 Changing one characters to another.
2457 * Squeezing::                   Squeezing repeats and deleting.
2458 * Warnings in tr::              Warning messages.
2459 @end menu
2460
2461
2462 @node Character sets
2463 @subsection Specifying sets of characters
2464
2465 @cindex specifying sets of characters
2466
2467 The format of the @var{set1} and @var{set2} arguments resembles
2468 the format of regular expressions; however, they are not regular
2469 expressions, only lists of characters.  Most characters simply
2470 represent themselves in these strings, but the strings can contain
2471 the shorthands listed below, for convenience.  Some of them can be
2472 used only in @var{set1} or @var{set2}, as noted below.
2473
2474 @table @asis
2475
2476 @item Backslash escapes
2477 @cindex backslash escapes
2478
2479 A backslash followed by a character not listed below causes an error
2480 message.
2481
2482 @table @samp
2483 @item \a
2484 Control-G.
2485 @item \b
2486 Control-H.
2487 @item \f
2488 Control-L.
2489 @item \n
2490 Control-J.
2491 @item \r
2492 Control-M.
2493 @item \t
2494 Control-I.
2495 @item \v
2496 Control-K.
2497 @item \@var{ooo}
2498 The character with the value given by @var{ooo}, which is 1 to 3
2499 octal digits,
2500 @item \\
2501 A backslash.
2502 @end table
2503
2504 @item Ranges
2505 @cindex ranges
2506
2507 The notation @samp{@var{m}-@var{n}} expands to all of the characters
2508 from @var{m} through @var{n}, in ascending order.  @var{m} should
2509 collate before @var{n}; if it doesn't, an error results.  As an example,
2510 @samp{0-9} is the same as @samp{0123456789}.  Although GNU @code{tr}
2511 does not support the System V syntax that uses square brackets to
2512 enclose ranges, translations specified in that format will still work as
2513 long as the brackets in @var{string1} correspond to identical brackets
2514 in @var{string2}.
2515
2516 @item Repeated characters
2517 @cindex repeated characters
2518
2519 The notation @samp{[@var{c}*@var{n}]} in @var{set2} expands to @var{n}
2520 copies of character @var{c}.  Thus, @samp{[y*6]} is the same as
2521 @samp{yyyyyy}.  The notation @samp{[@var{c}*]} in @var{string2} expands
2522 to as many copies of @var{c} as are needed to make @var{set2} as long as
2523 @var{set1}.  If @var{n} begins with @samp{0}, it is interpreted in
2524 octal, otherwise in decimal.
2525
2526 @item Character classes
2527 @cindex characters classes
2528
2529 The notation @samp{[:@var{class}:]} expands to all of the characters in
2530 the (predefined) class @var{class}.  The characters expand in no
2531 particular order, except for the @code{upper} and @code{lower} classes,
2532 which expand in ascending order.  When the @samp{--delete} (@samp{-d})
2533 and @samp{--squeeze-repeats} (@samp{-s}) options are both given, any
2534 character class can be used in @var{set2}.  Otherwise, only the
2535 character classes @code{lower} and @code{upper} are accepted in
2536 @var{set2}, and then only if the corresponding character class
2537 (@code{upper} and @code{lower}, respectively) is specified in the same
2538 relative position in @var{set1}.  Doing this specifies case conversion.
2539 The class names are given below; an error results when an invalid class
2540 name is given.
2541
2542 @table @code
2543 @item alnum
2544 @opindex alnum
2545 Letters and digits.
2546 @item alpha
2547 @opindex alpha
2548 Letters.
2549 @item blank
2550 @opindex blank
2551 Horizontal whitespace.
2552 @item cntrl
2553 @opindex cntrl
2554 Control characters.
2555 @item digit
2556 @opindex digit
2557 Digits.
2558 @item graph
2559 @opindex graph
2560 Printable characters, not including space.
2561 @item lower
2562 @opindex lower
2563 Lowercase letters.
2564 @item print
2565 @opindex print
2566 Printable characters, including space.
2567 @item punct
2568 @opindex punct
2569 Punctuation characters.
2570 @item space
2571 @opindex space
2572 Horizontal or vertical whitespace.
2573 @item upper
2574 @opindex upper
2575 Uppercase letters.
2576 @item xdigit
2577 @opindex xdigit
2578 Hexadecimal digits.
2579 @end table
2580
2581 @item Equivalence classes
2582 @cindex equivalence classes
2583
2584 The syntax @samp{[=@var{c}=]} expands to all of the characters that are
2585 equivalent to @var{c}, in no particular order.  Equivalence classes are
2586 a relatively recent invention intended to support non-English alphabets.
2587 But there seems to be no standard way to define them or determine their
2588 contents.  Therefore, they are not fully implemented in GNU @code{tr};
2589 each character's equivalence class consists only of that character,
2590 which is of no particular use.
2591
2592 @end table
2593
2594
2595 @node Translating
2596 @subsection Translating
2597
2598 @cindex translating characters
2599
2600 @code{tr} performs translation when @var{set1} and @var{set2} are
2601 both given and the @samp{--delete} (@samp{-d}) option is not given.
2602 @code{tr} translates each character of its input that is in @var{set1}
2603 to the corresponding character in @var{set2}.  Characters not in
2604 @var{set1} are passed through unchanged.  When a character appears more
2605 than once in @var{set1} and the corresponding characters in @var{set2}
2606 are not all the same, only the final one is used.  For example, these
2607 two commands are equivalent:
2608
2609 @example
2610 tr aaa xyz
2611 tr a z
2612 @end example
2613
2614 A common use of @code{tr} is to convert lowercase characters to
2615 uppercase.  This can be done in many ways.  Here are three of them:
2616
2617 @example
2618 tr abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
2619 tr a-z A-Z
2620 tr '[:lower:]' '[:upper:]'
2621 @end example
2622
2623 When @code{tr} is performing translation, @var{set1} and @var{set2}
2624 typically have the same length.  If @var{set1} is shorter than
2625 @var{set2}, the extra characters at the end of @var{set2} are ignored.
2626
2627 On the other hand, making @var{set1} longer than @var{set2} is not
2628 portable; @sc{POSIX.2} says that the result is undefined.  In this situation,
2629 BSD @code{tr} pads @var{set2} to the length of @var{set1} by repeating
2630 the last character of @var{set2} as many times as necessary.  System V
2631 @code{tr} truncates @var{set1} to the length of @var{set2}.
2632
2633 By default, GNU @code{tr} handles this case like BSD @code{tr}.  When
2634 the @samp{--truncate-set1} (@samp{-t}) option is given, GNU @code{tr}
2635 handles this case like the System V @code{tr} instead.  This option is
2636 ignored for operations other than translation.
2637
2638 Acting like System V @code{tr} in this case breaks the relatively common
2639 BSD idiom:
2640
2641 @example
2642 tr -cs A-Za-z0-9 '\012'
2643 @end example
2644
2645 @noindent
2646 because it converts only zero bytes (the first element in the
2647 complement of @var{set1}), rather than all non-alphanumerics, to
2648 newlines.
2649
2650
2651 @node Squeezing
2652 @subsection Squeezing repeats and deleting
2653
2654 @cindex squeezing repeat characters
2655 @cindex deleting characters
2656
2657 When given just the @samp{--delete} (@samp{-d}) option, @code{tr}
2658 removes any input characters that are in @var{set1}.
2659
2660 When given just the @samp{--squeeze-repeats} (@samp{-s}) option,
2661 @code{tr} replaces each input sequence of a repeated character that
2662 is in @var{set1} with a single occurrence of that character.
2663
2664 When given both @samp{--delete} and @samp{--squeeze-repeats}, @code{tr}
2665 first performs any deletions using @var{set1}, then squeezes repeats
2666 from any remaining characters using @var{set2}.
2667
2668 The @samp{--squeeze-repeats} option may also be used when translating,
2669 in which case @code{tr} first performs translation, then squeezes
2670 repeats from any remaining characters using @var{set2}.
2671
2672 Here are some examples to illustrate various combinations of options:
2673
2674 @itemize @bullet
2675
2676 @item
2677 Remove all zero bytes:
2678
2679 @example
2680 tr -d '\000'
2681 @end example
2682
2683 @item
2684 Put all words on lines by themselves.  This converts all
2685 non-alphanumeric characters to newlines, then squeezes each string
2686 of repeated newlines into a single newline:
2687
2688 @example
2689 tr -cs '[a-zA-Z0-9]' '[\n*]'
2690 @end example
2691
2692 @item
2693 Convert each sequence of repeated newlines to a single newline:
2694
2695 @example
2696 tr -s '\n'
2697 @end example
2698
2699 @end itemize
2700
2701
2702 @node Warnings in tr
2703 @subsection Warning messages
2704
2705 @vindex POSIXLY_CORRECT
2706 Setting the environment variable @code{POSIXLY_CORRECT} turns off the
2707 following warning and error messages, for strict compliance with
2708 @sc{POSIX.2}.  Otherwise, the following diagnostics are issued:
2709
2710 @enumerate
2711
2712 @item
2713 When the @samp{--delete} option is given but @samp{--squeeze-repeats}
2714 is not, and @var{set2} is given, GNU @code{tr} by default prints
2715 a usage message and exits, because @var{set2} would not be used.
2716 The @sc{POSIX} specification says that @var{set2} must be ignored in
2717 this case. Silently ignoring arguments is a bad idea.
2718
2719 @item
2720 When an ambiguous octal escape is given.  For example, @samp{\400}
2721 is actually @samp{\40} followed by the digit @samp{0}, because the
2722 value 400 octal does not fit into a single byte.
2723
2724 @end enumerate
2725
2726 GNU @code{tr} does not provide complete BSD or System V compatibility.
2727 For example, it is impossible to disable interpretation of the @sc{POSIX}
2728 constructs @samp{[:alpha:]}, @samp{[=c=]}, and @samp{[c*10]}.  Also, GNU
2729 @code{tr} does not delete zero bytes automatically, unlike traditional
2730 Unix versions, which provide no way to preserve zero bytes.
2731
2732
2733 @node expand invocation
2734 @section @code{expand}: Convert tabs to spaces
2735
2736 @pindex expand
2737 @cindex tabs to spaces, converting
2738 @cindex converting tabs to spaces
2739
2740 @code{expand} writes the contents of each given @var{file}, or standard
2741 input if none are given or for a @var{file} of @samp{-}, to standard
2742 output, with tab characters converted to the appropriate number of
2743 spaces.  Synopsis:
2744
2745 @example
2746 expand [@var{option}]@dots{} [@var{file}]@dots{}
2747 @end example
2748
2749 By default, @code{expand} converts all tabs to spaces.  It preserves
2750 backspace characters in the output; they decrement the column count for
2751 tab calculations.  The default action is equivalent to @samp{-8} (set
2752 tabs every 8 columns).
2753
2754 The program accepts the following options.  Also see @ref{Common options}.
2755
2756 @table @samp
2757
2758 @item -@var{tab1}[,@var{tab2}]@dots{}
2759 @itemx -t @var{tab1}[,@var{tab2}]@dots{}
2760 @itemx --tabs=@var{tab1}[,@var{tab2}]@dots{}
2761 @opindex -@var{tab}
2762 @opindex -t
2763 @opindex --tabs
2764 @cindex tabstops, setting
2765 If only one tab stop is given, set the tabs @var{tab1} spaces apart
2766 (default is 8).  Otherwise, set the tabs at columns @var{tab1},
2767 @var{tab2}, @dots{} (numbered from 0), and replace any tabs beyond the
2768 last tabstop given with single spaces.  If the tabstops are specified
2769 with the @samp{-t} or @samp{--tabs} option, they can be separated by
2770 blanks as well as by commas.
2771
2772 @item -i
2773 @itemx --initial
2774 @opindex -i
2775 @opindex --initial
2776 @cindex initial tabs, converting
2777 Only convert initial tabs (those that precede all non-space or non-tab
2778 characters) on each line to spaces.
2779
2780 @end table
2781
2782
2783 @node unexpand invocation
2784 @section @code{unexpand}: Convert spaces to tabs
2785
2786 @pindex unexpand
2787
2788 @code{unexpand} writes the contents of each given @var{file}, or
2789 standard input if none are given or for a @var{file} of @samp{-}, to
2790 standard output, with strings of two or more space or tab characters
2791 converted to as many tabs as possible followed by as many spaces as are
2792 needed.  Synopsis:
2793
2794 @example
2795 unexpand [@var{option}]@dots{} [@var{file}]@dots{}
2796 @end example
2797
2798 By default, @code{unexpand} converts only initial spaces and tabs (those
2799 that precede all non space or tab characters) on each line.  It
2800 preserves backspace characters in the output; they decrement the column
2801 count for tab calculations.  By default, tabs are set at every 8th
2802 column.
2803
2804 The program accepts the following options.  Also see @ref{Common options}.
2805
2806 @table @samp
2807
2808 @item -@var{tab1}[,@var{tab2}]@dots{}
2809 @itemx -t @var{tab1}[,@var{tab2}]@dots{}
2810 @itemx --tabs=@var{tab1}[,@var{tab2}]@dots{}
2811 @opindex -@var{tab}
2812 @opindex -t
2813 @opindex --tabs
2814 If only one tab stop is given, set the tabs @var{tab1} spaces apart
2815 instead of the default 8.  Otherwise, set the tabs at columns
2816 @var{tab1}, @var{tab2}, @dots{} (numbered from 0), and leave spaces and
2817 tabs beyond the tabstops given unchanged.  If the tabstops are specified
2818 with the @samp{-t} or @samp{--tabs} option, they can be separated by
2819 blanks as well as by commas.  This option implies the @samp{-a} option.
2820
2821 @item -a
2822 @itemx --all
2823 @opindex -a
2824 @opindex --all
2825 Convert all strings of two or more spaces or tabs, not just initial
2826 ones, to tabs.
2827
2828 @end table
2829
2830
2831 @c              What's GNU?
2832 @c              Arnold Robbins
2833 @node Opening the software toolbox
2834 @chapter Opening the software toolbox
2835
2836 This chapter originally appeared in @cite{Linux Journal}, volume 1,
2837 number 2, in the @cite{What's GNU?} column. It was written by Arnold
2838 Robbins.
2839
2840 @menu
2841 * Toolbox introduction::
2842 * I/O redirection::
2843 * The who command::
2844 * The cut command::
2845 * The sort command::
2846 * The uniq command::
2847 * Putting the tools together::
2848 @end menu
2849
2850
2851 @node Toolbox introduction
2852 @unnumberedsec Toolbox introduction
2853
2854 This month's column is only peripherally related to the GNU Project, in
2855 that it describes a number of the GNU tools on your Linux system and how they
2856 might be used.  What it's really about is the ``Software Tools'' philosophy
2857 of program development and usage.
2858
2859 The software tools philosophy was an important and integral concept
2860 in the initial design and development of Unix (of which Linux and GNU are
2861 essentially clones).  Unfortunately, in the modern day press of
2862 Internetworking and flashy GUIs, it seems to have fallen by the
2863 wayside.  This is a shame, since it provides a powerful mental model
2864 for solving many kinds of problems.
2865
2866 Many people carry a Swiss Army knife around in their pants pockets (or
2867 purse).  A Swiss Army knife is a handy tool to have: it has several knife
2868 blades, a screwdriver, tweezers, toothpick, nail file, corkscrew, and perhaps
2869 a number of other things on it.  For the everyday, small miscellaneous jobs
2870 where you need a simple, general purpose tool, it's just the thing.
2871
2872 On the other hand, an experienced carpenter doesn't build a house using
2873 a Swiss Army knife.  Instead, he has a toolbox chock full of specialized
2874 tools---a saw, a hammer, a screwdriver, a plane, and so on.  And he knows
2875 exactly when and where to use each tool; you won't catch him hammering nails
2876 with the handle of his screwdriver.
2877
2878 The Unix developers at Bell Labs were all professional programmers and trained
2879 computer scientists.  They had found that while a one-size-fits-all program
2880 might appeal to a user because there's only one program to use, in practice
2881 such programs are
2882
2883 @enumerate a
2884 @item
2885 difficult to write,
2886
2887 @item
2888 difficult to maintain and
2889 debug, and
2890
2891 @item
2892 difficult to extend to meet new situations.
2893 @end enumerate
2894
2895 Instead, they felt that programs should be specialized tools.  In short, each
2896 program ``should do one thing well.''  No more and no less.  Such programs are
2897 simpler to design, write, and get right---they only do one thing.
2898
2899 Furthermore, they found that with the right machinery for hooking programs
2900 together, that the whole was greater than the sum of the parts.  By combining
2901 several special purpose programs, you could accomplish a specific task
2902 that none of the programs was designed for, and accomplish it much more
2903 quickly and easily than if you had to write a special purpose program.
2904 We will see some (classic) examples of this further on in the column.
2905 (An important additional point was that, if necessary, take a detour
2906 and build any software tools you may need first, if you don't already
2907 have something appropriate in the toolbox.)
2908
2909 @node I/O redirection
2910 @unnumberedsec I/O redirection
2911
2912 Hopefully, you are familiar with the basics of I/O redirection in the
2913 shell, in particular the concepts of ``standard input,'' ``standard output,''
2914 and ``standard error''.  Briefly, ``standard input'' is a data source, where
2915 data comes from.  A program should not need to either know or care if the
2916 data source is a disk file, a keyboard, a magnetic tape, or even a punched
2917 card reader.  Similarly, ``standard output'' is a data sink, where data goes
2918 to.  The program should neither know nor care where this might be.
2919 Programs that only read their standard input, do something to the data,
2920 and then send it on, are called ``filters'', by analogy to filters in a
2921 water pipeline.
2922
2923 With the Unix shell, it's very easy to set up data pipelines:
2924
2925 @example
2926 program_to_create_data | filter1 | .... | filterN > final.pretty.data
2927 @end example
2928
2929 We start out by creating the raw data; each filter applies some successive
2930 transformation to the data, until by the time it comes out of the pipeline,
2931 it is in the desired form.
2932
2933 This is fine and good for standard input and standard output.  Where does the
2934 standard error come in to play?  Well, think about @code{filter1} in
2935 the pipeline above.  What happens if it encounters an error in the data it
2936 sees?  If it writes an error message to standard output, it will just
2937 disappear down the pipeline into @code{filter2}'s input, and the
2938 user will probably never see it.  So programs need a place where they can send
2939 error messages so that the user will notice them.  This is standard error,
2940 and it is usually connected to your console or window, even if you have
2941 redirected standard output of your program away from your screen.
2942
2943 For filter programs to work together, the format of the data has to be
2944 agreed upon.  The most straightforward and easiest format to use is simply
2945 lines of text.  Unix data files are generally just streams of bytes, with
2946 lines delimited by the @sc{ASCII} @sc{LF} (Line Feed) character,
2947 conventionally called a ``newline'' in the Unix literature. (This is
2948 @code{'\n'} if you're a C programmer.)  This is the format used by all
2949 the traditional filtering programs.  (Many earlier operating systems
2950 had elaborate facilities and special purpose programs for managing
2951 binary data.  Unix has always shied away from such things, under the
2952 philosophy that it's easiest to simply be able to view and edit your
2953 data with a text editor.)
2954
2955 OK, enough introduction. Let's take a look at some of the tools, and then
2956 we'll see how to hook them together in interesting ways.   In the following
2957 discussion, we will only present those command line options that interest
2958 us.  As you should always do, double check your system documentation
2959 for the full story.
2960
2961 @node The who command
2962 @unnumberedsec The @code{who} command
2963
2964 The first program is the @code{who} command.  By itself, it generates a
2965 list of the users who are currently logged in.  Although I'm writing
2966 this on a single-user system, we'll pretend that several people are
2967 logged in:
2968
2969 @example
2970 $ who
2971 arnold   console Jan 22 19:57
2972 miriam   ttyp0   Jan 23 14:19(:0.0)
2973 bill     ttyp1   Jan 21 09:32(:0.0)
2974 arnold   ttyp2   Jan 23 20:48(:0.0)
2975 @end example
2976
2977 Here, the @samp{$} is the usual shell prompt, at which I typed @code{who}.
2978 There are three people logged in, and I am logged in twice.  On traditional
2979 Unix systems, user names are never more than eight characters long.  This
2980 little bit of trivia will be useful later.  The output of @code{who} is nice,
2981 but the data is not all that exciting.
2982
2983 @node The cut command
2984 @unnumberedsec The @code{cut} command
2985
2986 The next program we'll look at is the @code{cut} command.  This program
2987 cuts out columns or fields of input data.  For example, we can tell it
2988 to print just the login name and full name from the @file{/etc/passwd
2989 file}.  The @file{/etc/passwd} file has seven fields, separated by
2990 colons:
2991
2992 @example
2993 arnold:xyzzy:2076:10:Arnold D. Robbins:/home/arnold:/bin/ksh
2994 @end example
2995
2996 To get the first and fifth fields, we would use cut like this:
2997
2998 @example
2999 $ cut -d: -f1,5 /etc/passwd
3000 root:Operator
3001 @dots{}
3002 arnold:Arnold D. Robbins
3003 miriam:Miriam A. Robbins
3004 @dots{}
3005 @end example
3006
3007 With the @samp{-c} option, @code{cut} will cut out specific characters
3008 (i.e., columns) in the input lines.  This command looks like it might be
3009 useful for data filtering.
3010
3011
3012 @node The sort command
3013 @unnumberedsec The @code{sort} command
3014
3015 Next we'll look at the @code{sort} command.  This is one of the most
3016 powerful commands on a Unix-style system; one that you will often find
3017 yourself using when setting up fancy data plumbing. The @code{sort}
3018 command reads and sorts each file named on the command line.  It then
3019 merges the sorted data and writes it to standard output.  It will read
3020 standard input if no files are given on the command line (thus
3021 making it into a filter).  The sort is based on the machine collating
3022 sequence (@sc{ASCII}) or based on  user-supplied ordering criteria.
3023
3024
3025 @node The uniq command
3026 @unnumberedsec The @code{uniq} command
3027
3028 Finally (at least for now), we'll look at the @code{uniq} program.  When
3029 sorting data, you will often end up with duplicate lines, lines that
3030 are identical.  Usually, all you need is one instance of each line.
3031 This is where @code{uniq} comes in. The @code{uniq} program reads its
3032 standard input, which it expects to be sorted.  It only prints out one
3033 copy of each duplicated line.  It does have several options.  Later on,
3034 we'll use the @samp{-c} option, which prints each unique line, preceded
3035 by a count of the number of times that line occurred in the input.
3036
3037
3038 @node Putting the tools together
3039 @unnumberedsec Putting the tools together
3040
3041 Now, let's suppose this is a large BBS system with dozens of users
3042 logged in.  The management wants the SysOp to write a program that will
3043 generate a sorted list of logged in users.  Furthermore, even if a user
3044 is logged in multiple times, his or her name should only show up in the
3045 output once.
3046
3047 The SysOp could sit down with the system documentation and write a C
3048 program that did this. It would take perhaps a couple of hundred lines
3049 of code and about two hours to write it, test it, and debug it.
3050 However, knowing the software toolbox, the SysOp can instead start out
3051 by generating just a list of logged on users:
3052
3053 @example
3054 $ who | cut -c1-8
3055 arnold
3056 miriam
3057 bill
3058 arnold
3059 @end example
3060
3061 Next, sort the list:
3062
3063 @example
3064 $ who | cut -c1-8 | sort
3065 arnold
3066 arnold
3067 bill
3068 miriam
3069 @end example
3070
3071 Finally, run the sorted list through @code{uniq}, to weed out duplicates:
3072
3073 @example
3074 $ who | cut -c1-8 | sort | uniq
3075 arnold
3076 bill
3077 miriam
3078 @end example
3079
3080 The @code{sort} command actually has a @samp{-u} option that does what
3081 @code{uniq} does. However, @code{uniq} has other uses for which one
3082 cannot substitute @samp{sort -u}.
3083
3084 The SysOp puts this pipeline into a shell script, and makes it available for
3085 all the users on the system:
3086
3087 @example
3088 # cat > /usr/local/bin/listusers
3089 who | cut -c1-8 | sort | uniq
3090 ^D
3091 # chmod +x /usr/local/bin/listusers
3092 @end example
3093
3094 There are four major points to note here.  First, with just four
3095 programs, on one command line, the SysOp was able to save about two
3096 hours worth of work.  Furthermore, the shell pipeline is just about as
3097 efficient as the C program would be, and it is much more efficient in
3098 terms of programmer time.  People time is much more expensive than
3099 computer time, and in our modern ``there's never enough time to do
3100 everything'' society, saving two hours of programmer time is no mean
3101 feat.
3102
3103 Second, it is also important to emphasize that with the
3104 @emph{combination} of the tools, it is possible to do a special
3105 purpose job never imagined by the authors of the individual programs.
3106
3107 Third, it is also valuable to build up your pipeline in stages, as we did here.
3108 This allows you to view the data at each stage in the pipeline, which helps
3109 you acquire the confidence that you are indeed using these tools correctly.
3110
3111 Finally, by bundling the pipeline in a shell script, other users can use
3112 your command, without having to remember the fancy plumbing you set up for
3113 them. In terms of how you run them, shell scripts and compiled programs are
3114 indistinguishable.
3115
3116 After the previous warm-up exercise, we'll look at two additional, more
3117 complicated pipelines.  For them, we need to introduce two more tools.
3118
3119 The first is the @code{tr} command, which stands for ``transliterate.''
3120 The @code{tr} command works on a character-by-character basis, changing
3121 characters. Normally it is used for things like mapping upper case to
3122 lower case:
3123
3124 @example
3125 $ echo ThIs ExAmPlE HaS MIXED case! | tr '[A-Z]' '[a-z]'
3126 this example has mixed case!
3127 @end example
3128
3129 There are several options of interest:
3130
3131 @table @samp
3132 @item -c
3133 work on the complement of the listed characters, i.e.,
3134 operations apply to characters not in the given set
3135
3136 @item -d
3137 delete characters in the first set from the output
3138
3139 @item -s
3140 squeeze repeated characters in the output into just one character.
3141 @end table
3142
3143 We will be using all three options in a moment.
3144
3145 The other command we'll look at is @code{comm}.  The @code{comm}
3146 command takes two sorted input files as input data, and prints out the
3147 files' lines in three columns.  The output columns are the data lines
3148 unique to the first file, the data lines unique to the second file, and
3149 the data lines that are common to both.  The @samp{-1}, @samp{-2}, and
3150 @samp{-3} command line options omit the respective columns. (This is
3151 non-intuitive and takes a little getting used to.)  For example:
3152
3153 @example
3154 $ cat f1
3155 11111
3156 22222
3157 33333
3158 44444
3159 $ cat f2
3160 00000
3161 22222
3162 33333
3163 55555
3164 $ comm f1 f2
3165         00000
3166 11111
3167                 22222
3168                 33333
3169 44444
3170         55555
3171 @end example
3172
3173 The single dash as a filename tells @code{comm} to read standard input
3174 instead of a regular file.
3175
3176 Now we're ready to build a fancy pipeline.  The first application is a word
3177 frequency counter.  This helps an author determine if he or she is over-using
3178 certain words.
3179
3180 The first step is to change the case of all the letters in our input file
3181 to one case.  ``The'' and ``the'' are the same word when doing counting.
3182
3183 @example
3184 $ tr '[A-Z]' '[a-z]' < whats.gnu | ...
3185 @end example
3186
3187 The next step is to get rid of punctuation.  Quoted words and unquoted words
3188 should be treated identically; it's easiest to just get the punctuation out of
3189 the way.
3190
3191 @example
3192 $ tr '[A-Z]' '[a-z]' < whats.gnu | tr -cd '[A-Za-z0-9_ \012]' | ...
3193 @end example
3194
3195 The second @code{tr} command operates on the complement of the listed
3196 characters, which are all the letters, the digits, the underscore, and
3197 the blank.  The @samp{\012} represents the newline character; it has to
3198 be left alone.  (The ASCII TAB character should also be included for
3199 good measure in a production script.)
3200
3201 At this point, we have data consisting of words separated by blank space.
3202 The words only contain alphanumeric characters (and the underscore).  The
3203 next step is break the data apart so that we have one word per line. This
3204 makes the counting operation much easier, as we will see shortly.
3205
3206 @example
3207 $ tr '[A-Z]' '[a-z]' < whats.gnu | tr -cd '[A-Za-z0-9_ \012]' |
3208 > tr -s '[ ]' '\012' | ...
3209 @end example
3210
3211 This command turns blanks into newlines.  The @samp{-s} option squeezes
3212 multiple newline characters in the output into just one.  This helps us
3213 avoid blank lines. (The @samp{>} is the shell's ``secondary prompt.''
3214 This is what the shell prints when it notices you haven't finished
3215 typing in all of a command.)
3216
3217 We now have data consisting of one word per line, no punctuation, all one
3218 case.  We're ready to count each word:
3219
3220 @example
3221 $ tr '[A-Z]' '[a-z]' < whats.gnu | tr -cd '[A-Za-z0-9_ \012]' |
3222 > tr -s '[ ]' '\012' | sort | uniq -c | ...
3223 @end example
3224
3225 At this point, the data might look something like this:
3226
3227 @example
3228   60 a
3229    2 able
3230    6 about
3231    1 above
3232    2 accomplish
3233    1 acquire
3234    1 actually
3235    2 additional
3236 @end example
3237
3238 The output is sorted by word, not by count!  What we want is the most
3239 frequently used words first.  Fortunately, this is easy to accomplish,
3240 with the help of two more @code{sort} options:
3241
3242 @table @samp
3243 @item -n
3244 do a numeric sort, not an ASCII one
3245
3246 @item -r
3247 reverse the order of the sort
3248 @end table
3249
3250 The final pipeline looks like this:
3251
3252 @example
3253 $ tr '[A-Z]' '[a-z]' < whats.gnu | tr -cd '[A-Za-z0-9_ \012]' |
3254 > tr -s '[ ]' '\012' | sort | uniq -c | sort -nr
3255  156 the
3256   60 a
3257   58 to
3258   51 of
3259   51 and
3260  ...
3261 @end example
3262
3263 Whew!  That's a lot to digest.  Yet, the same principles apply. With six
3264 commands, on two lines (really one long one split for convenience), we've
3265 created a program that does something interesting and useful, in much
3266 less time than we could have written a C program to do the same thing.
3267
3268 A minor modification to the above pipeline can give us a simple spelling
3269 checker!  To determine if you've spelled a word correctly, all you have to
3270 do is look it up in a dictionary.  If it is not there, then chances are
3271 that your spelling is incorrect.  So, we need a dictionary.  If you
3272 have the Slackware Linux distribution, you have the file
3273 @file{/usr/lib/ispell/ispell.words}, which is a sorted, 38,400 word
3274 dictionary.
3275
3276 Now, how to compare our file with the dictionary?  As before, we generate
3277 a sorted list of words, one per line:
3278
3279 @example
3280 $ tr '[A-Z]' '[a-z]' < whats.gnu | tr -cd '[A-Za-z0-9_ \012]' |
3281 > tr -s '[ ]' '\012' | sort -u | ...
3282 @end example
3283
3284 Now, all we need is a list of words that are @emph{not} in the
3285 dictionary.  Here is where the @code{comm} command comes in.
3286
3287 @example
3288 $ tr '[A-Z]' '[a-z]' < whats.gnu | tr -cd '[A-Za-z0-9_ \012]' |
3289 > tr -s '[ ]' '\012' | sort -u |
3290 > comm -23 - /usr/lib/ispell/ispell.words
3291 @end example
3292
3293 The @samp{-2} and @samp{-3} options eliminate lines that are only in the
3294 dictionary (the second file), and lines that are in both files.  Lines
3295 only in the first file (standard input, our stream of words), are
3296 words that are not in the dictionary.  These are likely candidates for
3297 spelling errors.  This pipeline was the first cut at a production
3298 spelling checker on Unix.
3299
3300 There are some other tools that deserve brief mention.
3301
3302 @table @code
3303 @item grep
3304 search files for text that matches a regular expression
3305
3306 @item egrep
3307 like @code{grep}, but with more powerful regular expressions
3308
3309 @item wc
3310 count lines, words, characters
3311
3312 @item tee
3313 a T-fitting for data pipes, copies data to files and to standard output
3314
3315 @item sed
3316 the stream editor, an advanced tool
3317
3318 @item awk
3319 a data manipulation language, another advanced tool
3320 @end table
3321
3322 The software tools philosophy also espoused the following bit of
3323 advice: ``Let someone else do the hard part.'' This means, take
3324 something that gives you most of what you need, and then massage it the
3325 rest of the way until it's in the form that you want.
3326
3327 To summarize:
3328
3329 @enumerate 1
3330 @item
3331 Each program should do one thing well. No more, no less.
3332
3333 @item
3334 Combining programs with appropriate plumbing leads to results where
3335 the whole is greater than the sum of the parts.  It also leads to novel
3336 uses of programs that the authors might never have imagined.
3337
3338 @item
3339 Programs should never print extraneous header or trailer data, since these
3340 could get sent on down a pipeline. (A point we didn't mention earlier.)
3341
3342 @item
3343 Let someone else do the hard part.
3344
3345 @item
3346 Know your toolbox! Use each program appropriately. If you don't have an
3347 appropriate tool, build one.
3348 @end enumerate
3349
3350 As of this writing, all the programs we've discussed are available via
3351 anonymous @code{ftp} from @code{prep.ai.mit.edu} as
3352 @file{/pub/gnu/textutils-1.9.tar.gz} directory.@footnote{Version 1.9 was
3353 current when this column was written. Check the nearest GNU archive for
3354 the current version.}
3355
3356 None of what I have presented in this column is new. The Software Tools
3357 philosophy was first introduced in the book @cite{Software Tools},
3358 by Brian Kernighan and P.J. Plauger (Addison-Wesley, ISBN
3359 0-201-03669-X).   This book showed how to write and use software
3360 tools.   It was written in 1976, using a preprocessor for FORTRAN named
3361 @code{ratfor} (RATional FORtran).  At the time, C was not as ubiquitous
3362 as it is now; FORTRAN was.  The last chapter presented a @code{ratfor}
3363 to FORTRAN processor, written in @code{ratfor}. @code{ratfor} looks an
3364 awful lot like C; if you know C, you won't have any problem following
3365 the code.
3366
3367 In 1981, the book was updated and made available as @cite{Software
3368 Tools in Pascal} (Addison-Wesley, ISBN 0-201-10342-7).  Both books
3369 remain in print, and are well worth reading if you're a programmer.
3370 They certainly made a major change in how I view programming.
3371
3372 Initially, the programs in both books were available (on 9-track tape)
3373 from Addison-Wesley.  Unfortunately, this is no longer the case,
3374 although you might be able to find copies floating around the Internet.
3375 For a number of years, there was an active Software Tools Users Group,
3376 whose members had ported the original @code{ratfor} programs to essentially
3377 every computer system with a FORTRAN compiler.  The popularity of the
3378 group waned in the middle '80s as Unix began to spread beyond universities.
3379
3380 With the current proliferation of GNU code and other clones of Unix programs,
3381 these programs now receive little attention; modern C versions are
3382 much more efficient and do more than these programs do.  Nevertheless, as
3383 exposition of good programming style, and evangelism for a still-valuable
3384 philosophy, these books are unparalleled, and I recommend them highly.
3385
3386 Acknowledgment: I would like to express my gratitude to Brian Kernighan
3387 of Bell Labs, the original Software Toolsmith, for reviewing this column.
3388
3389
3390 @node Index
3391 @unnumbered Index
3392
3393 @printindex cp
3394
3395 @contents
3396 @bye
3397
3398 @c Local variables:
3399 @c texinfo-column-for-description: 32
3400 @c End: