man/man1x/awk.1x

   1 .so mnx.mac
   2 .TH AWK 1x
   3 .CD "awk \(en pattern matching language"
   4 .SX "awk \fIrules\fR [\fIfile\fR] ...
   5 .FL "\fR(none)"
   6 .EX "awk rules input" "Process \fIinput\fR according to \fIrules\fR"
   7 .EX "awk rules \(en  >out" "Input from terminal, output to \fIout\fR"
   8 .PP
   9 AWK is a programming language devised by Aho, Weinberger, and Kernighan
  10 at Bell Labs (hence the name).
  11 \fIAwk\fR programs search files for
  12 specific patterns and performs \*(OQactions\*(CQ for every occurrence
  13 of these patterns.  The patterns can be \*(OQregular expressions\*(CQ
  14 as used in the \fIed\fR editor.  The actions are expressed
  15 using a subset of the C language.
  16 .PP
  17 The patterns and actions are usually placed in a \*(OQrules\*(CQ file
  18 whose name must be the first argument in the command line,
  19 preceded by the flag \fB\(enf\fR.  Otherwise, the first argument on the
  20 command line is taken to be a string containing the rules
  21 themselves. All other arguments are taken to be the names of text
  22 files on which the rules are to be applied, with \fB\(en\fR being the
  23 standard input.  To take rules from the standard input, use \fB\(enf \(en\fR.
  24 .PP
  25 The command:
  26 .HS
  27 .Cx "awk  rules  prog.\d\s+2*\s0\u"
  28 .HS
  29 would read the patterns and actions rules from the file \fIrules\fR
  30 and apply them to all the arguments.
  31 .PP
  32 The general format of a rules file is:
  33 .HS
  34 ~~~<pattern> { <action> }
  35 ~~~<pattern> { <action> }
  36 ~~~...
  37 .HS
  38 There may be any number of these <pattern> { <action> }
  39 sequences in the rules file.  \fIAwk\fR reads a line of input from
  40 the current input file and applies every <pattern> { <action> }
  41 in sequence to the line.
  42 .PP
  43 If the <pattern> corresponding to any { <action> } is missing,
  44 the action is applied to every line of input.  The default
  45 { <action> } is to print the matched input line.
  46 .SS "Patterns"
  47 .PP
  48 The <pattern>s may consist of any valid C expression.  If the
  49 <pattern> consists of two expressions separated by a comma, it
  50 is taken to be a range and the <action> is performed on all
  51 lines of input that match the range.  <pattern>s may contain
  52 \*(OQregular expressions\*(CQ delimited by an @ symbol.  Regular
  53 expressions can be thought of as a generalized \*(OQwildcard\*(CQ
  54 string matching mechanism, similar to that used by many
  55 operating systems to specify file names.  Regular expressions
  56 may contain any of the following characters:
  57 .HS
  58 .in +0.75i
  59 .ta +0.5i
  60 .ti -0.5i
  61 x       An ordinary character
  62 .ti -0.5i
  63 \\      The backslash quotes any character
  64 .ti -0.5i
  65 ^       A circumflex at the beginning of an expr matches the beginning of a line.
  66 .ti -0.5i
  67 $       A dollar-sign at the end of an expression matches the end of a line.
  68 .ti -0.5i
  69 \&.     A period matches any single character except newline.
  70 .ti -0.5i
  71 *       An expression followed by an asterisk matches zero or more occurrences
  72 of that expression: \*(OQfo*\*(CQ matches \*(OQf\*(CQ, \*(OQfo\*(CQ, \*(OQfoo\*(CQ, \*(OQfooo\*(CQ, etc.
  73 .ti -0.5i
  74 +       An expression followed by a plus sign matches one or more occurrences
  75 of that expression: \*(OQfo+\*(CQ matches \*(OQfo\*(CQ, \*(OQfoo\*(CQ, \*(OQfooo\*(CQ, etc.
  76 .ti -0.5i
  77 []      A string enclosed in square brackets matches any single character in that
  78 string, but no others.  If the first character in the string is a circumflex, the
  79 expression matches any character except newline and the characters in the
  80 string.  For example, \*(OQ[xyz]\*(CQ matches \*(OQxx\*(CQ and \*(OQzyx\*(CQ, while
  81 \*(OQ[^xyz]\*(CQ matches \*(OQabc\*(CQ but not \*(OQaxb\*(CQ.  A range of characters may be
  82 specified by two characters separated by \*(OQ-\*(CQ.
  83 .in -0.75i
  84 .SS "Actions"
  85 .PP
  86 Actions are expressed as a subset of the C language.  All
  87 variables are global and default to int's if not formally
  88 declared.
  89 Only char's and int's and pointers and arrays of
  90 char and int are allowed.  \fIAwk\fR allows only decimal integer
  91 constants to be used\(emno hex (0xnn) or octal (0nn). String
  92 and character constants may contain all of the special C
  93 escapes (\\n, \\r, etc.).
  94 .PP
  95 \fIAwk\fR supports the \*(OQif\*(CQ, \*(OQelse\*(CQ,
  96 \*(OQwhile\*(CQ and \*(OQbreak\*(CQ flow of
  97 control constructs, which behave exactly as in C.
  98 .PP
  99 Also supported are the following unary and binary operators,
 100 listed in order from highest to lowest precedence:
 101 .HS
 102 .ta 0.25i 1.75i 3.0i
 103 .nf
 104 \fB     Operator        Type    Associativity\fR
 105         () []   unary   left to right
 106 .tr ~~
 107         ! ~ ++ \(en\(en \(en * &        unary   right to left
 108 .tr ~
 109         * / %   binary  left to right
 110         + \(en  binary  left to right
 111         << >>   binary  left to right
 112         < <= > >=       binary  left to right
 113         == !=   binary  left to right
 114         &       binary  left to right
 115         ^       binary  left to right
 116         |       binary  left to right
 117         &&      binary  left to right
 118         ||      binary  left to right
 119         =       binary  right to left
 120 .fi
 121 .HS
 122 Comments are introduced by a '#' symbol and are terminated by
 123 the first newline character.  The standard \*(OQ/*\*(CQ and \*(OQ*/\*(CQ
 124 comment delimiters are not supported and will result in a
 125 syntax error.
 126 .SP 0.5
 127 .SS "Fields"
 128 .SP 0.5
 129 .PP
 130 When \fIawk\fR reads a line from the current input file, the
 131 record is automatically separated into \*(OQfields.\*(CQ  A field is
 132 simply a string of consecutive characters delimited by either
 133 the beginning or end of line, or a \*(OQfield separator\*(CQ character.
 134 Initially, the field separators are the space and tab character.
 135 The special unary operator '$' is used to reference one of the
 136 fields in the current input record (line).  The fields are
 137 numbered sequentially starting at 1.  The expression \*(OQ$0\*(CQ
 138 references the entire input line.
 139 .PP
 140 Similarly, the \*(OQrecord separator\*(CQ is used to determine the end
 141 of an input \*(OQline,\*(CQ initially the newline character.  The field
 142 and record separators may be changed programatically by one of
 143 the actions and will remain in effect until changed again.
 144 .PP
 145 Multiple (up to 10) field separators are allowed at a time, but
 146 only one record separator.
 147 .PP
 148 Fields behave exactly like strings; and can be used in the same
 149 context as a character array.  These \*(OQarrays\*(CQ can be considered
 150 to have been declared as:
 151 .SP 0.15
 152 .HS
 153 ~~~~~char ($n)[ 128 ];
 154 .HS
 155 .SP 0.15
 156 In other words, they are 128 bytes long.  Notice that the
 157 parentheses are necessary because the operators [] and $
 158 associate from right to left; without them, the statement
 159 would have parsed as:
 160 .HS
 161 .SP 0.15
 162 ~~~~~char $(1[ 128 ]);
 163 .HS
 164 .SP 0.15
 165 which is obviously ridiculous.
 166 .PP
 167 If the contents of one of these field arrays is altered, the
 168 \*(OQ$0\*(CQ field will reflect this change.  For example, this
 169 expression:
 170 .HS
 171 .SP 0.15
 172 ~~~~~*$4 = 'A';
 173 .HS
 174 .SP 0.15
 175 will change the first character of the fourth field to an upper-
 176 case letter 'A'.  Then, when the following input line:
 177 .HS
 178 .SP 0.15
 179 ~~~~~120 PRINT "Name         address        Zip"
 180 .SP 0.15
 181 .HS
 182 is processed, it would be printed as:
 183 .HS
 184 .SP 0.15
 185 ~~~~~120 PRINT "Name         Address        Zip"
 186 .HS
 187 .SP 0.15
 188 Fields may also be modified with the strcpy() function (see
 189 below).  For example, the expression:
 190 .HS
 191 ~~~~~strcpy( $4, "Addr." );
 192 .HS
 193 applied to the same line above would yield:
 194 .HS
 195 ~~~~~120 PRINT "Name         Addr.        Zip"
 196 .HS
 197 .SS "Predefined Variables"
 198 .PP
 199 The following variables are pre-defined:
 200 .HS
 201 .in +1.5i
 202 .ta +1.25i
 203 .ti -1.25i
 204 FS      Field separator (see below).
 205 .ti -1.25i
 206 RS      Record separator (see below also).
 207 .ti -1.25i
 208 NF      Number of fields in current input record (line).
 209 .ti -1.25i
 210 NR      Number of records processed thus far.
 211 .ti -1.25i
 212 FILENAME        Name of current input file.
 213 .ti -1.25i
 214 BEGIN   A special <pattern> that matches the beginning of input text.
 215 .ti -1.25i
 216 END     A special <pattern> that matches the end of input text.
 217 .in -1.5i
 218 .HS
 219 \fIAwk\fR also provides some useful built-in functions for string
 220 manipulation and printing:
 221 .HS
 222 .in +1.5i
 223 .ta +1.25i
 224 .ti -1.25i
 225 print(arg)      Simple printing of strings only, terminated by '\\n'.
 226 .ti -1.25i
 227 printf(arg...)  Exactly the printf() function from C.
 228 .ti -1.25i
 229 getline()       Reads the next record and returns 0 on end of file.
 230 .ti -1.25i
 231 nextfile()      Closes the current input file and begins processing the next file
 232 .ti -1.25i
 233 strlen(s)       Returns the length of its string argument.
 234 .ti -1.25i
 235 strcpy(s,t)     Copies the string \*(OQt\*(CQ to the string \*(OQs\*(CQ.
 236 .ti -1.25i
 237 strcmp(s,t)     Compares the \*(OQs\*(CQ to \*(OQt\*(CQ and returns 0 if they match.
 238 .ti -1.25i
 239 toupper(c)      Returns its character argument converted to upper-case.
 240 .ti -1.25i
 241 tolower(c)      Returns its character argument converted to lower-case.
 242 .ti -1.25i
 243 match(s,@re@)   Compares the string \*(OQs\*(CQ to the regular expression \*(OQre\*(CQ and
 244 returns the number of matches found (zero if none).
 245 .in -1.5i
 246 .SS "Authors"
 247 .PP
 248 \fIAwk\fR was written by Saeko Hirabauashi and Kouichi Hirabayashi.