usr.bin/tr/tr.1

   1 .\"     $NetBSD: tr.1,v 1.17 2009/08/22 00:23:02 joerg Exp $
   2 .\"
   3 .\" Copyright (c) 1991, 1993
   4 .\"     The Regents of the University of California.  All rights reserved.
   5 .\"
   6 .\" This code is derived from software contributed to Berkeley by
   7 .\" the Institute of Electrical and Electronics Engineers, Inc.
   8 .\"
   9 .\" Redistribution and use in source and binary forms, with or without
  10 .\" modification, are permitted provided that the following conditions
  11 .\" are met:
  12 .\" 1. Redistributions of source code must retain the above copyright
  13 .\"    notice, this list of conditions and the following disclaimer.
  14 .\" 2. Redistributions in binary form must reproduce the above copyright
  15 .\"    notice, this list of conditions and the following disclaimer in the
  16 .\"    documentation and/or other materials provided with the distribution.
  17 .\" 3. Neither the name of the University nor the names of its contributors
  18 .\"    may be used to endorse or promote products derived from this software
  19 .\"    without specific prior written permission.
  20 .\"
  21 .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
  22 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  23 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  24 .\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
  25 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  26 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  27 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  28 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  29 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  30 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  31 .\" SUCH DAMAGE.
  32 .\"
  33 .\"     @(#)tr.1        8.1 (Berkeley) 6/6/93
  34 .\"
  35 .Dd March 23, 2004
  36 .Dt TR 1
  37 .Os
  38 .Sh NAME
  39 .Nm tr
  40 .Nd translate characters
  41 .Sh SYNOPSIS
  42 .Nm
  43 .Op Fl cs
  44 .Ar string1 string2
  45 .Nm
  46 .Op Fl c
  47 .Fl d
  48 .Ar string1
  49 .Nm
  50 .Op Fl c
  51 .Fl s
  52 .Ar string1
  53 .Nm
  54 .Op Fl c
  55 .Fl ds
  56 .Ar string1 string2
  57 .Sh DESCRIPTION
  58 The
  59 .Nm
  60 utility copies the standard input to the standard output with substitution
  61 or deletion of selected characters.
  62 .Pp
  63 The following options are available:
  64 .Bl -tag -width Ds
  65 .It Fl c
  66 Complements the set of characters in
  67 .Ar string1 ,
  68 that is
  69 .Fl c Ar \&ab
  70 includes every character except for
  71 .Sq a
  72 and
  73 .Sq b .
  74 .It Fl d
  75 The
  76 .Fl d
  77 option causes characters to be deleted from the input.
  78 .It Fl s
  79 The
  80 .Fl s
  81 option squeezes multiple occurrences of the characters listed in the last
  82 operand (either
  83 .Ar string1
  84 or
  85 .Ar string2 )
  86 in the input into a single instance of the character.
  87 This occurs after all deletion and translation is completed.
  88 .El
  89 .Pp
  90 In the first synopsis form, the characters in
  91 .Ar string1
  92 are translated into the characters in
  93 .Ar string2
  94 where the first character in
  95 .Ar string1
  96 is translated into the first character in
  97 .Ar string2
  98 and so on.
  99 If
 100 .Ar string1
 101 is longer than
 102 .Ar string2 ,
 103 the last character found in
 104 .Ar string2
 105 is duplicated until
 106 .Ar string1
 107 is exhausted.
 108 .Pp
 109 In the second synopsis form, the characters in
 110 .Ar string1
 111 are deleted from the input.
 112 .Pp
 113 In the third synopsis form, the characters in
 114 .Ar string1
 115 are compressed as described for the
 116 .Fl s
 117 option.
 118 .Pp
 119 In the fourth synopsis form, the characters in
 120 .Ar string1
 121 are deleted from the input, and the characters in
 122 .Ar string2
 123 are compressed as described for the
 124 .Fl s
 125 option.
 126 .Pp
 127 The following conventions can be used in
 128 .Ar string1
 129 and
 130 .Ar string2
 131 to specify sets of characters:
 132 .Bl -tag -width [:equiv:]
 133 .It character
 134 Any character not described by one of the following conventions
 135 represents itself.
 136 .It \eoctal
 137 A backslash followed by 1, 2 or 3 octal digits represents a character
 138 with that encoded value.
 139 To follow an octal sequence with a digit as a character, left zero-pad
 140 the octal sequence to the full 3 octal digits.
 141 .It \echaracter
 142 A backslash followed by certain special characters maps to special
 143 values.
 144 .sp
 145 .Bl -column cc
 146 .It \ea \*[Lt]alert character\*[Gt]
 147 .It \eb \*[Lt]backspace\*[Gt]
 148 .It \ef \*[Lt]form-feed\*[Gt]
 149 .It \en \*[Lt]newline\*[Gt]
 150 .It \er \*[Lt]carriage return\*[Gt]
 151 .It \et \*[Lt]tab\*[Gt]
 152 .It \ev \*[Lt]vertical tab\*[Gt]
 153 .El
 154 .sp
 155 A backslash followed by any other character maps to that character.
 156 .It c-c
 157 Represents the range of characters between the range endpoints, inclusively.
 158 .It [:class:]
 159 Represents all characters belonging to the defined character class.
 160 Class names are:
 161 .sp
 162 .Bl -column xdigit
 163 .It alnum       \*[Lt]alphanumeric characters\*[Gt]
 164 .It alpha       \*[Lt]alphabetic characters\*[Gt]
 165 .It blank       \*[Lt]blank characters\*[Gt]
 166 .It cntrl       \*[Lt]control characters\*[Gt]
 167 .It digit       \*[Lt]numeric characters\*[Gt]
 168 .It graph       \*[Lt]graphic characters\*[Gt]
 169 .It lower       \*[Lt]lower-case alphabetic characters\*[Gt]
 170 .It print       \*[Lt]printable characters\*[Gt]
 171 .It punct       \*[Lt]punctuation characters\*[Gt]
 172 .It space       \*[Lt]space characters\*[Gt]
 173 .It upper       \*[Lt]upper-case characters\*[Gt]
 174 .It xdigit      \*[Lt]hexadecimal characters\*[Gt]
 175 .El
 176 .Pp
 177 .\" All classes may be used in
 178 .\" .Ar string1 ,
 179 .\" and in
 180 .\" .Ar string2
 181 .\" when both the
 182 .\" .Fl d
 183 .\" and
 184 .\" .Fl s
 185 .\" options are specified.
 186 .\" Otherwise, only the classes ``upper'' and ``lower'' may be used in
 187 .\" .Ar string2
 188 .\" and then only when the corresponding class (``upper'' for ``lower''
 189 .\" and vice-versa) is specified in the same relative position in
 190 .\" .Ar string1 .
 191 .\" .Pp
 192 With the exception of the
 193 .Dq upper
 194 and
 195 .Dq lower
 196 classes, characters in the classes are in unspecified order.
 197 In the
 198 .Dq upper
 199 and
 200 .Dq lower
 201 classes, characters are entered in ascending order.
 202 .Pp
 203 For specific information as to which ASCII characters are included
 204 in these classes, see
 205 .Xr ctype 3
 206 and related manual pages.
 207 .It [=equiv=]
 208 Represents all characters or collating (sorting) elements belonging to
 209 the same equivalence class as
 210 .Ar equiv .
 211 If there is a secondary ordering within the equivalence class, the
 212 characters are ordered in ascending sequence.
 213 Otherwise, they are ordered after their encoded values.
 214 An example of an equivalence class might be
 215 .Dq \&c
 216 and
 217 .Dq \&ch
 218 in Spanish;
 219 English has no equivalence classes.
 220 .It [#*n]
 221 Represents
 222 .Ar n
 223 repeated occurrences of the character represented by
 224 .Ar # .
 225 This
 226 expression is only valid when it occurs in
 227 .Ar string2 .
 228 If
 229 .Ar n
 230 is omitted or is zero, it is interpreted as large enough to extend
 231 .Ar string2
 232 sequence to the length of
 233 .Ar string1 .
 234 If
 235 .Ar n
 236 has a leading zero, it is interpreted as an octal value, otherwise,
 237 it's interpreted as a decimal value.
 238 .El
 239 .Sh EXIT STATUS
 240 .Nm
 241 exits 0 on success, and \*[Gt]0 if an error occurs.
 242 .Sh EXAMPLES
 243 The following examples are shown as given to the shell:
 244 .Pp
 245 Create a list of the words in
 246 .Ar file1 ,
 247 one per line, where a word is taken to be a maximal string of letters:
 248 .sp
 249 .D1 Li "tr -cs \*q[:alpha:]\*q \*q\en\*q \*[Lt] file1"
 250 .sp
 251 Translate the contents of
 252 .Ar file1
 253 to upper-case:
 254 .sp
 255 .D1 Li "tr \*q[:lower:]\*q \*q[:upper:]\*q \*[Lt] file1"
 256 .sp
 257 Strip out non-printable characters from
 258 .Ar file1 :
 259 .sp
 260 .D1 Li "tr -cd \*q[:print:]\*q \*[Lt] file1"
 261 .Sh COMPATIBILITY
 262 .At V
 263 has historically implemented character ranges using the syntax
 264 .Dq [c-c]
 265 instead of the
 266 .Dq c-c
 267 used by historic
 268 .Bx
 269 implementations and standardized by POSIX.
 270 .At V
 271 shell scripts should work under this implementation as long as
 272 the range is intended to map in another range, i.e. the command
 273 .Pp
 274 .Ic "tr [a-z] [A-Z]"
 275 .Pp
 276 will work as it will map the
 277 .Sq \&[
 278 character in
 279 .Ar string1
 280 to the
 281 .Sq \&[
 282 character in
 283 .Ar string2 .
 284 However, if the shell script is deleting or squeezing characters as in
 285 the command
 286 .Pp
 287 .Ic "tr -d [a-z]"
 288 .Pp
 289 the characters
 290 .Sq \&[
 291 and
 292 .Sq \&]
 293 will be included in the deletion or compression list which would
 294 not have happened under an historic
 295 .At V
 296 implementation.
 297 Additionally, any scripts that depended on the sequence
 298 .Dq a-z
 299 to represent the three characters
 300 .Sq \&a ,
 301 .Sq \&- ,
 302 and
 303 .Sq \&z
 304 will have to be rewritten as
 305 .Dq a\e-z .
 306 .Pp
 307 The
 308 .Nm
 309 utility has historically not permitted the manipulation of NUL bytes in
 310 its input and, additionally, stripped NUL's from its input stream.
 311 This implementation has removed this behavior as a bug.
 312 .Pp
 313 The
 314 .Nm
 315 utility has historically been extremely forgiving of syntax errors,
 316 for example, the
 317 .Fl c
 318 and
 319 .Fl s
 320 options were ignored unless two strings were specified.
 321 This implementation will not permit illegal syntax.
 322 .Sh STANDARDS
 323 The
 324 .Nm
 325 utility is expected to be
 326 .St -p1003.2
 327 compatible.
 328 It should be noted that the feature wherein the last character of
 329 .Ar string2
 330 is duplicated if
 331 .Ar string2
 332 has less characters than
 333 .Ar string1
 334 is permitted by POSIX but is not required.
 335 Shell scripts attempting to be portable to other POSIX systems should use
 336 the
 337 .Dq [#*]
 338 convention instead of relying on this behavior.
 339 .Sh BUGS
 340 .Nm
 341 was originally designed to work with
 342 .Tn US-ASCII .
 343 Its use with character sets that do not share all the properties of
 344 .Tn US-ASCII ,
 345 e.g., a symmetric set of upper and lower case characters
 346 that can be algorithmically converted one to the other,
 347 may yield unpredictable results.
 348 .Pp
 349 .Nm
 350 should be internationalized.