12 .CT 1 files prog_other
14 awk \- pattern-directed scanning and processing language
38 for lines that match any of a set of patterns specified literally in
40 or in one or more files
45 there can be an associated action that will be performed
49 Each line is matched against the
50 pattern portion of every pattern-action statement;
51 the associated action is performed for each matched pattern.
54 means the standard input.
59 is treated as an assignment, not a filename,
60 and is executed at the time it would have been opened if it were a filename.
65 is an assignment to be done before
70 options may be present.
74 option defines the input field separator to be the regular expression
77 An input line is normally made up of fields separated by white space,
78 or by regular expression
80 The fields are denoted
85 refers to the entire line.
88 is null, the input line is split into one field per character.
90 A pattern-action statement has the form
92 .IB pattern " { " action " }
97 a missing pattern always matches.
98 Pattern-action statements are separated by newlines or semicolons.
100 An action is a sequence of statements.
101 A statement can be one of the following:
104 .ta \w'\f(CWdelete array[expression]'u
108 if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP
109 while(\fI expression \fP)\fI statement\fP
110 for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP
111 for(\fI var \fPin\fI array \fP)\fI statement\fP
112 do\fI statement \fPwhile(\fI expression \fP)
115 {\fR [\fP\fI statement ... \fP\fR] \fP}
116 \fIexpression\fP #\fR commonly\fP\fI var = expression\fP
117 print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
118 printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
119 return\fR [ \fP\fIexpression \fP\fR]\fP
120 next #\fR skip remaining patterns on this input line\fP
121 nextfile #\fR skip rest of this file, open next, start at top\fP
122 delete\fI array\fP[\fI expression \fP] #\fR delete an array element\fP
123 delete\fI array\fP #\fR delete all elements of array\fP
124 exit\fR [ \fP\fIexpression \fP\fR]\fP #\fR exit immediately; status is \fP\fIexpression\fP
130 Statements are terminated by
131 semicolons, newlines or right braces.
136 String constants are quoted \&\f(CW"\ "\fR,
137 with the usual C escapes recognized within.
138 Expressions take on string or numeric values as appropriate,
139 and are built using the operators
141 (exponentiation), and concatenation (indicated by white space).
144 ! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?:
145 are also available in expressions.
146 Variables may be scalars, array elements
150 Variables are initialized to the null string.
151 Array subscripts may be any string,
152 not necessarily numeric;
153 this allows for a form of associative memory.
154 Multiple subscripts such as
156 are permitted; the constituents are concatenated,
157 separated by the value of
162 statement prints its arguments on the standard output
167 is present or on a pipe if
169 is present), separated by the current output field separator,
170 and terminated by the output record separator.
174 may be literal names or parenthesized expressions;
175 identical string values in different statements denote
179 statement formats its expression list according to the format
182 The built-in function
184 closes the file or pipe
186 The built-in function
188 flushes any buffered output for the file or pipe
191 The mathematical functions
200 Other built-in functions:
204 the length of its argument
211 random number on (0,1)
216 and returns the previous seed.
219 truncates to an integer value
221 .BI substr( s , " m" , " n\fB)
226 that begins at position
230 .BI index( s , " t" )
235 occurs, or 0 if it does not.
237 .BI match( s , " r" )
240 where the regular expression
242 occurs, or 0 if it does not.
247 are set to the position and length of the matched string.
249 .BI split( s , " a" , " fs\fB)
259 The separation is done with the regular expression
261 or with the field separator
266 An empty string as field separator splits the string
267 into one array element per character.
269 .BI sub( r , " t" , " s\fB)
272 for the first occurrence of the regular expression
285 except that all occurrences of the regular expression
290 return the number of replacements.
292 .BI sprintf( fmt , " expr" , " ...\fB )
293 the string resulting from formatting
303 and returns its exit status
308 with all upper-case characters translated to their
309 corresponding lower-case equivalents.
314 with all lower-case characters translated to their
315 corresponding upper-case equivalents.
322 to the next input record from the current input file;
327 to the next record from
342 returns the next line of output from
346 returns 1 for a successful input,
347 0 for end of file, and \-1 for an error.
349 Patterns are arbitrary Boolean combinations
352 of regular expressions and
353 relational expressions.
354 Regular expressions are as in
358 Isolated regular expressions
359 in a pattern apply to the entire line.
360 Regular expressions may also occur in
361 relational expressions, using the operators
366 is a constant regular expression;
367 any string (constant or variable) may be used
368 as a regular expression, except in the position of an isolated regular expression
371 A pattern may consist of two patterns separated by a comma;
372 in this case, the action is performed for all lines
373 from an occurrence of the first pattern
374 though an occurrence of the second.
376 A relational expression is one of the following:
378 .I expression matchop regular-expression
380 .I expression relop expression
382 .IB expression " in " array-name
384 .BI ( expr , expr,... ") in " array-name
386 where a relop is any of the six relational operators in C,
387 and a matchop is either
393 A conditional is an arithmetic expression,
394 a relational expression,
395 or a Boolean combination
402 may be used to capture control before the first input line is read
407 do not combine with other patterns.
409 Variable names with special meanings:
413 conversion format used when converting numbers
418 regular expression used to separate fields; also settable
423 number of fields in the current record
426 ordinal number of the current record
429 ordinal number of the current record in the current file
432 the name of the current input file
435 input record separator (default newline)
438 output field separator (default blank)
441 output record separator (default newline)
444 output format for numbers (default
448 separates multiple subscripts (default 034)
451 argument count, assignable
454 argument array, assignable;
455 non-null members are taken as filenames
458 array of environment variables; subscripts are names.
461 Functions may be defined (at the position of a pattern-action statement) thus:
464 function foo(a, b, c) { ...; return x }
466 Parameters are passed by value if scalar and by reference if array name;
467 functions may be called recursively.
468 Parameters are local to the function; all other variables are global.
469 Thus local variables may be created by providing excess parameters in
470 the function definition.
476 Print lines longer than 72 characters.
481 Print first two fields in opposite order.
484 BEGIN { FS = ",[ \et]*|[ \et]+" }
489 Same, with input fields separated by comma and/or blanks and tabs.
494 END { print "sum is", s, " average is", s/NR }
499 Add up first column, print sum and average.
504 Print all lines between start/stop pairs.
508 BEGIN { # Simulate echo(1)
509 for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
515 .SH BIOAWK EXTENSIONS
520 option that specifies the input format. The behavior
521 of bioawk may vary depending on the value of
526 .IR header " or " hdr ,
527 bioawk parses named columns. It automatically adds variables whose names are
528 taken from the first line and values from the column index. Special characters
529 are converted to a underscore. When
532 .IR sam , " vcf" , " bed " or " gff" ,
533 it sets predefined column names. Users may check out the predefined column names by
540 bioawk will parse the input FASTA or FASTQ file into a TAB-delimited format first
541 with each line consisting of sequence name, sequence, quality and comments, and
542 then sets column names. Note that when
545 is in use, the input file can be optionally gzip'ed.
548 Bioawk also adds more built-in functions:
556 reverse complement a nucleotide string
560 .BI trimq( qual , " beg" , " end" , " param" )
563 in the Sanger scale using Richard Mott's algorithm (used in Phred). The
564 0-based beginning and ending positions are written to
565 .IR beg " and " end ,
566 respectively. The last arguement
568 is the single parameter used in the algorithm, which is optional and defaults 0.05.
571 bit AND operation (& in C)
574 bit OR operation (| in C)
577 bit XOR operation (^ in C)
583 A. V. Aho, B. W. Kernighan, P. J. Weinberger,
585 The AWK Programming Language,
586 Addison-Wesley, 1988. ISBN 0-201-07981-X
588 There are no explicit conversions between numbers and strings.
589 To force an expression to be treated as a number add 0 to it;
590 to force it to be treated as a string concatenate
593 The scope rules for variables in functions are a botch;