sox.1

   1 '\" t
   2 '\" The line above instructs most `man' programs to invoke tbl
   3 '\"
   4 '\" Separate paragraphs; not the same as PP which resets indent level.
   5 .de SP
   6 .if t .sp .5
   7 .if n .sp
   8 ..
   9 '\"
  10 '\" Replacement em-dash for nroff (default is too short).
  11 .ie n .ds m " -
  12 .el .ds m \(em
  13 '\"
  14 '\" Placeholder macro for if longer nroff arrow is needed.
  15 .ds RA \(->
  16 '\"
  17 '\" Decimal point set slightly raised
  18 .if t .ds d \v'-.15m'.\v'+.15m'
  19 .if n .ds d .
  20 '\"
  21 '\" Enclosure macro for examples
  22 .de EX
  23 .SP
  24 .nf
  25 .ft CW
  26 ..
  27 .de EE
  28 .ft R
  29 .SP
  30 .fi
  31 ..
  32 .TH SoX 1 "February 19, 2011" "sox" "Sound eXchange"
  33 .SH NAME
  34 SoX \- Sound eXchange, the Swiss Army knife of audio manipulation
  35 .SH SYNOPSIS
  36 .nf
  37 \fBsox\fR [\fIglobal-options\fR] [\fIformat-options\fR] \fIinfile1\fR
  38         [[\fIformat-options\fR] \fIinfile2\fR] ... [\fIformat-options\fR] \fIoutfile\fR
  39         [\fIeffect\fR [\fIeffect-options\fR]] ...
  40 .SP
  41 \fBplay\fR [\fIglobal-options\fR] [\fIformat-options\fR] \fIinfile1\fR
  42         [[\fIformat-options\fR] \fIinfile2\fR] ... [\fIformat-options\fR]
  43         [\fIeffect\fR [\fIeffect-options\fR]] ...
  44 .SP
  45 \fBrec\fR [\fIglobal-options\fR] [\fIformat-options\fR] \fIoutfile\fR
  46         [\fIeffect\fR [\fIeffect-options\fR]] ...
  47 .fi
  48 .SH DESCRIPTION
  49 .SS Introduction
  50 SoX reads and writes audio files in most popular formats and can
  51 optionally apply effects to them. It can combine multiple input
  52 sources, synthesise audio, and, on many systems, act as a general
  53 purpose audio player or a multi-track audio recorder. It also has
  54 limited ability to split the input into multiple output files.
  55 .SP
  56 All SoX functionality is available using just the \fBsox\fR command.
  57 To simplify playing and recording audio, if SoX is invoked as
  58 \fBplay\fR, the output file is automatically set to be the default sound
  59 device, and if invoked as \fBrec\fR, the default sound device is used as an
  60 input source.
  61 Additionally, the
  62 .BR soxi (1)
  63 command provides a convenient way to just query audio file header information.
  64 .SP
  65 The heart of SoX is a library called libSoX.  Those interested in
  66 extending SoX or using it in other programs should refer to the libSoX
  67 manual page:
  68 .BR libsox (3).
  69 .SP
  70 SoX is a command-line audio processing tool, particularly suited to making
  71 quick, simple edits and to batch processing.
  72 If you need an interactive, graphical audio editor, use
  73 .BR audacity (1).
  74 .TS
  75 center;
  76 c8 c8 c.
  77 *       *       *
  78 .TE
  79 .DT
  80 .SP
  81 The overall SoX processing chain can be summarised as follows:
  82 .TS
  83 center;
  84 l.
  85 Input(s) \*(RA Combiner \*(RA Effects \*(RA Output(s)
  86 .TE
  87 .DT
  88 .SP
  89 Note however, that on the SoX command line, the positions of the
  90 Output(s) and the Effects are swapped w.r.t. the logical flow just
  91 shown.  Note also that whilst options pertaining to files are placed
  92 before their respective file name, the opposite is true for effects.
  93 To show how this works in practice, here is a selection of examples of
  94 how SoX might be used.  The simple
  95 .EX
  96    sox recital.au recital.wav
  97 .EE
  98 translates an audio file in Sun AU format to a Microsoft WAV file, whilst
  99 .EX
 100    sox recital.au \-b 16 recital.wav channels 1 rate 16k fade 3 norm
 101 .EE
 102 performs the same format translation, but also applies four effects
 103 (down-mix to one channel, sample rate change, fade-in, nomalize),
 104 and stores the result at a bit-depth of 16.
 105 .EX
 106    sox \-r 16k \-e signed \-b 8 \-c 1 voice-memo.raw voice-memo.wav
 107 .EE
 108 converts `raw' (a.k.a. `headerless') audio to a self-describing file format,
 109 .EX
 110    sox slow.aiff fixed.aiff speed 1.027
 111 .EE
 112 adjusts audio speed,
 113 .EX
 114    sox short.wav long.wav longer.wav
 115 .EE
 116 concatenates two audio files, and
 117 .EX
 118    sox \-m music.mp3 voice.wav mixed.flac
 119 .EE
 120 mixes together two audio files.
 121 .EX
 122    play \(dqThe Moonbeams/Greatest/*.ogg\(dq bass +3
 123 .EE
 124 plays a collection of audio files whilst applying a bass boosting effect,
 125 .EX
 126    play \-n \-c1 synth sin %\-12 sin %\-9 sin %\-5 sin %\-2 fade h 0.1 1 0.1
 127 .EE
 128 plays a synthesised `A minor seventh' chord with a pipe-organ sound,
 129 .EX
 130    rec \-c 2 radio.aiff trim 0 30:00
 131 .EE
 132 records half an hour of stereo audio, and
 133 .EX
 134    play \-q take1.aiff & rec \-M take1.aiff take1\-dub.aiff
 135 .EE
 136 (with POSIX shell and where supported by hardware)
 137 records a new track in a multi-track recording.  Finally,
 138 .EX
 139 .ne 3
 140    rec \-r 44100 \-b 16 \-s \-p silence 1 0.50 0.1% 1 10:00 0.1% | \\
 141         sox \-p song.ogg silence 1 0.50 0.1% 1 2.0 0.1% : \\
 142         newfile : restart
 143 .EE
 144 records a stream of audio such as LP/cassette and splits in to multiple
 145 audio files at points with 2 seconds of silence.  Also, it does not start
 146 recording until it detects audio is playing and stops after it sees
 147 10 minutes of silence.
 148 .SP
 149 N.B.  The above is just an overview of SoX's capabilities; detailed
 150 explanations of how to use \fIall\fR SoX parameters, file formats, and
 151 effects can be found below in this manual, in
 152 .BR soxformat (7),
 153 and in
 154 .BR soxi (1).
 155 .SS File Format Types
 156 SoX can work with `self-describing' and `raw' audio files.
 157 `self-describing' formats (e.g. WAV, FLAC, MP3) have a header that
 158 completely describes the signal and encoding attributes of the audio
 159 data that follows. `raw' or `headerless' formats do not contain this
 160 information, so the audio characteristics of these must be described
 161 on the SoX command line or inferred from those of the input file.
 162 .SP
 163 The following four characteristics are used to describe the format of
 164 audio data such that it can be processed with SoX:
 165 .TP
 166 sample rate
 167 The sample rate in samples per second (`Hertz' or `Hz').
 168 Digital telephony traditionally uses a sample rate of 8000\ Hz (8\ kHz),
 169 though these days, 16 and even 32\ kHz are becoming more common. Audio
 170 Compact Discs use 44100\ Hz (44\*d1\ kHz). Digital Audio Tape and many
 171 computer systems use 48\ kHz. Professional audio systems often use 96
 172 kHz.
 173 .TP
 174 sample size
 175 The number of bits used to store each sample.  Today, 16-bit is
 176 commonly used. 8-bit was popular in the early days of computer
 177 audio. 24-bit is used in the professional audio arena. Other sizes are
 178 also used.
 179 .TP
 180 data encoding
 181 The way in which each audio sample is represented (or `encoded').  Some
 182 encodings have variants with different byte-orderings or bit-orderings.
 183 Some compress the audio data so that the stored audio data takes up less
 184 space (i.e. disk space or transmission bandwidth) than the other format
 185 parameters and the number of samples would imply.  Commonly-used
 186 encoding types include floating-point, \(*m-law, ADPCM, signed-integer
 187 PCM, MP3, and FLAC.
 188 .TP
 189 channels
 190 The number of audio channels contained in the file.  One (`mono') and
 191 two (`stereo') are widely used.  `Surround sound' audio typically
 192 contains six or more channels.
 193 .PP
 194 The term `bit-rate' is a measure of the amount of storage occupied by an
 195 encoded audio signal over a unit of time.  It can depend on all of the
 196 above and is typically denoted as a number of kilo-bits per second
 197 (kbps).  An A-law telephony signal has a bit-rate of 64 kbps. MP3-encoded
 198 stereo music typically has a bit-rate of 128\-196 kbps. FLAC-encoded
 199 stereo music typically has a bit-rate of 550\-760 kbps.
 200 .SP
 201 Most self-describing formats also allow textual `comments' to be
 202 embedded in the file that can be used to describe the audio in some way,
 203 e.g. for music, the title, the author, etc.
 204 .SP
 205 One important use of audio file comments is to convey `Replay Gain'
 206 information.  SoX supports applying Replay Gain information, but not
 207 generating it.  Note that by default, SoX copies input file comments
 208 to output files that support comments, so output files may contain
 209 Replay Gain information if some was present in the input file.  In this
 210 case, if anything other than a simple format conversion was performed
 211 then the output file Replay Gain information is likely to be incorrect
 212 and so should be recalculated using a tool that supports this (not SoX).
 213 .SP
 214 The
 215 .BR soxi (1)
 216 command can be used to display information from audio file headers.
 217 .SS Determining & Setting The File Format
 218 There are several mechanisms available for SoX to use to determine or set the
 219 format characteristics of an audio file.  Depending on the circumstances,
 220 individual characteristics may be determined or set using different mechanisms.
 221 .SP
 222 To determine the format of an input file, SoX will use, in order of
 223 precedence and as given or available:
 224 .IP 1. 4
 225 Command-line format options.
 226 .IP 2. 4
 227 The contents of the file header.
 228 .IP 3. 4
 229 The filename extension.
 230 .PP
 231 To set the output file format, SoX will use, in order of
 232 precedence and as given or available:
 233 .IP 1. 4
 234 Command-line format options.
 235 .IP 2. 4
 236 The filename extension.
 237 .IP 3. 4
 238 The input file format characteristics, or the closest
 239 that is supported by the output file type.
 240 .PP
 241 For all files, SoX will exit with an error
 242 if the file type cannot be determined. Command-line format options may
 243 need to be added or changed to resolve the problem.
 244 .SS Playing & Recording Audio
 245 The
 246 .B play
 247 and
 248 .B rec
 249 commands are provided so that basic playing and
 250 recording is as simple as
 251 .EX
 252    play existing-file.wav
 253 .EE
 254 and
 255 .EX
 256    rec new-file.wav
 257 .EE
 258 These two commands are functionally equivalent to
 259 .EX
 260    sox existing-file.wav \-d
 261 .EE
 262 and
 263 .EX
 264    sox \-d new-file.wav
 265 .EE
 266 Of course, further options and effects (as described below) can be
 267 added to the commands in either form.
 268 .TS
 269 center;
 270 c8 c8 c.
 271 *       *       *
 272 .TE
 273 .DT
 274 .SP
 275 Some systems provide more than one type of (SoX-compatible) audio
 276 driver, e.g. ALSA & OSS, or SUNAU & AO.
 277 Systems can also have more than one audio device (a.k.a. `sound card').
 278 If more than one audio driver has been
 279 built-in to SoX, and the default selected by SoX when recording or playing
 280 is not the one that is wanted, then the
 281 .B AUDIODRIVER
 282 environment variable can be used to override the default.  For example
 283 (on many systems):
 284 .EX
 285    set AUDIODRIVER=oss
 286    play ...
 287 .EE
 288 The
 289 .B AUDIODEV
 290 environment variable can be used to override the default audio device,
 291 e.g.
 292 .EX
 293    set AUDIODEV=/dev/dsp2
 294    play ...
 295    sox ... \-t oss
 296 .EE
 297 or
 298 .EX
 299    set AUDIODEV=hw:soundwave,1,2
 300    play ...
 301    sox ... \-t alsa
 302 .EE
 303 Note that the way of setting environment variables varies from system
 304 to system\*mfor some specific examples, see `SOX_OPTS' below.
 305 .SP
 306 When playing a file with a sample rate that is not supported by the
 307 audio output device, SoX will automatically invoke the \fBrate\fR effect
 308 to perform the necessary sample rate conversion.  For
 309 compatibility with old hardware, the
 310 default \fBrate\fR quality level is set to `low'. This
 311 can be changed by explicitly specifying the \fBrate\fR
 312 effect with a different quality level, e.g.
 313 .EX
 314    play ... rate \-m
 315 .EE
 316 or by using the
 317 .B \-\-play\-rate\-arg
 318 option (see below).
 319 .TS
 320 center;
 321 c8 c8 c.
 322 *       *       *
 323 .TE
 324 .DT
 325 .SP
 326 On some systems, SoX allows audio playback volume to be adjusted whilst
 327 using
 328 .BR play .
 329 Where supported, this is achieved by tapping the `v' & `V' keys during
 330 playback.
 331 .SP
 332 To help with setting a suitable recording level, SoX includes a peak-level
 333 meter which can be invoked (before making the actual recording) as follows:
 334 .EX
 335    rec \-n
 336 .EE
 337 The recording level should be adjusted (using the system-provided mixer
 338 program, not SoX) so that the meter is \fIat most occasionally\fR full
 339 scale, and never `in the red' (an exclamation mark is shown).
 340 See also \fB\-S\fR below.
 341 .SS Accuracy
 342 Many file formats that compress audio discard some of the audio signal
 343 information whilst doing so. Converting to such a format and then converting
 344 back again will not produce an exact copy of the original audio.  This
 345 is the case for many formats used in telephony (e.g. A-law, GSM) where
 346 low signal bandwidth is more important than high audio fidelity, and for
 347 many formats used in portable music players (e.g. MP3, Vorbis) where
 348 adequate fidelity can be retained even with the large compression ratios
 349 that are needed to make portable players practical.
 350 .SP
 351 Formats that discard audio signal information are called `lossy'.
 352 Formats that do not are called `lossless'.  The term `quality' is used as a
 353 measure of how closely the original audio signal can be reproduced when
 354 using a lossy format.
 355 .SP
 356 Audio file conversion with SoX is lossless when it can be, i.e. when not
 357 using lossy compression, when not reducing the sampling rate or number
 358 of channels, and when the number of bits used in the destination format
 359 is not less than in the source format.  E.g.  converting from an 8-bit
 360 PCM format to a 16-bit PCM format is lossless but converting from an
 361 8-bit PCM format to (8-bit) A-law isn't.
 362 .SP
 363 .B N.B.
 364 SoX converts all audio files to an internal uncompressed
 365 format before performing any audio processing. This means that
 366 manipulating a file that is stored in a lossy format can cause further
 367 losses in audio fidelity.  E.g. with
 368 .EX
 369    sox long.mp3 short.mp3 trim 10
 370 .EE
 371 SoX first decompresses the input MP3 file, then applies the
 372 .B trim
 373 effect, and finally creates the output MP3 file by re-compressing the
 374 audio\*mwith a possible reduction in fidelity above that which
 375 occurred when the input file was created.
 376 Hence, if what is ultimately desired is lossily compressed audio, it is
 377 highly recommended to perform all audio processing using lossless file
 378 formats and then convert to the lossy format only at the final stage.
 379 .SP
 380 .B N.B.
 381 Applying multiple effects with a single SoX invocation will,
 382 in general, produce more accurate results than those produced using
 383 multiple SoX invocations.
 384 .SS Dithering
 385 Dithering is a technique used to maximise the dynamic range of audio
 386 stored at a particular bit-depth. Any distortion introduced by
 387 quantisation is decorrelated by adding a small amount of white noise
 388 to the signal.  In most cases, SoX can determine whether the selected
 389 processing requires dither and will add it during output formatting if
 390 appropriate.
 391 .SP
 392 Specifically, by default, SoX automatically adds TPDF dither
 393 when the output bit-depth is less than 24 and any
 394 of the following are true:
 395 .IP \(bu 4
 396 bit-depth reduction has been specified explicitly using a command-line
 397 option
 398 .IP \(bu 4
 399 the output file format supports only bit-depths lower than that of the
 400 input file format
 401 .IP \(bu 4
 402 an effect has increased effective bit-depth within the internal
 403 processing chain
 404 .PP
 405 For example, adjusting volume with
 406 .B vol 0.25
 407 requires two additional bits in which to losslessly store its results
 408 (since 0\*d25 decimal equals 0\*d01 binary).  So if the input file
 409 bit-depth is 16, then SoX's internal representation will utilise 18
 410 bits after processing this volume change.  In order to store the
 411 output at the same depth as the input, dithering is used to remove the
 412 additional bits.
 413 .SP
 414 Use the
 415 .B \-V
 416 option to see what processing SoX has automatically added. The
 417 .B \-D
 418 option may be given to override automatic dithering.  To invoke
 419 dithering manually (e.g. to select a noise-shaping curve), see the
 420 .B dither
 421 effect.
 422 .SS Clipping
 423 Clipping is distortion that occurs when an audio signal level (or
 424 `volume') exceeds the range of the chosen representation.  In most
 425 cases, clipping is undesirable and so should be corrected by adjusting
 426 the level prior to the point (in the processing chain) at which it
 427 occurs.
 428 .SP
 429 In SoX, clipping could occur, as you might expect, when using the
 430 .B vol
 431 or
 432 .B gain
 433 effects to increase the audio volume. Clipping could also occur with many
 434 other effects, when converting one format to another, and even when
 435 simply playing the audio.
 436 .SP
 437 Playing an audio file often involves resampling, and processing by
 438 analogue components can introduce a small DC offset and/or
 439 amplification, all of which can produce distortion if the audio signal
 440 level was initially too close to the clipping point.
 441 .SP
 442 For these reasons, it is usual to make sure that an audio
 443 file's signal level has some `headroom', i.e. it does not exceed a particular
 444 level below the maximum possible level for the given representation.
 445 Some standards bodies recommend as much as 9dB headroom, but in most cases,
 446 3dB (\(~~ 70% linear) is enough.  Note that this wisdom
 447 seems to have been lost in modern music production; in fact, many CDs,
 448 MP3s, etc.  are now mastered at levels \fIabove\fR 0dBFS i.e. the
 449 audio is clipped as delivered.
 450 .SP
 451 SoX's
 452 .B stat
 453 and
 454 .B stats
 455 effects can assist in determining the signal level in an audio file. The
 456 .B gain
 457 or
 458 .B vol
 459 effect can be used to prevent clipping, e.g.
 460 .EX
 461    sox dull.wav bright.wav gain \-6 treble +6
 462 .EE
 463 guarantees that the treble boost will not clip.
 464 .SP
 465 If clipping occurs at any point during processing,
 466 SoX will display a warning message to that effect.
 467 .SP
 468 See also
 469 .B \-G
 470 and the
 471 .B gain
 472 and
 473 .B norm
 474 effects.
 475 .SS Input File Combining
 476 SoX's input combiner can be configured (see OPTIONS below) to
 477 combine multiple files using any of the
 478 following methods: `concatenate', `sequence', `mix', `mix-power',
 479 `merge', or `multiply'.
 480 The default method is `sequence' for
 481 .BR play ,
 482 and `concatenate' for
 483 .B rec
 484 and
 485 .BR sox .
 486 .SP
 487 For all methods other than `sequence', multiple input files must have
 488 the same sampling rate. If necessary, separate SoX invocations can be
 489 used to make sampling rate adjustments prior to combining.
 490 .SP
 491 If the `concatenate' combining method is selected (usually, this will be
 492 by default) then the input files must also have the same number of
 493 channels.  The audio from each input will be concatenated in the order
 494 given to form the output file.
 495 .SP
 496 The `sequence' combining method is selected automatically for
 497 .BR play .
 498 It is similar to `concatenate' in that the audio from each input file is
 499 sent serially to the output file. However, here the output file may be
 500 closed and reopened at the corresponding transition between input
 501 files. This may be just what is needed when sending different types of
 502 audio to an output device, but is not generally useful when the output is a
 503 normal file.
 504 .SP
 505 If either the `mix' or `mix-power' combining method is selected then two or
 506 more input files must be given and will be mixed together to form the
 507 output file.  The number of channels in each input file need not be the
 508 same, but SoX will issue a warning if they are not and some
 509 channels in the output file will not contain audio from every input
 510 file.  A mixed audio file cannot be un-mixed without reference to the
 511 original input files.
 512 .SP
 513 If the `merge' combining method is selected then two or
 514 more input files must be given and will be merged together to form the
 515 output file.  The number of channels in each input file need not be the
 516 same.  A merged audio file comprises all of the channels from all of the
 517 input files. Un-merging is possible using multiple
 518 invocations of SoX with the
 519 .B remix
 520 effect.
 521 For example, two mono files could be merged to form one stereo file. The
 522 first and second mono files would become the left and right channels of
 523 the stereo file.
 524 .SP
 525 The `multiply' combining method multiplies the sample values of
 526 corresponding channels (treated as numbers in the interval \-1 to +1).
 527 If the number of channels in the input files is not the same, the
 528 missing channels are considered to contain all zero.
 529 .SP
 530 When combining input files, SoX applies any specified effects
 531 (including, for example, the
 532 .B vol
 533 volume adjustment effect) after the audio has been combined. However, it
 534 is often useful to be able to set the volume of (i.e. `balance') the
 535 inputs individually, before combining takes place.
 536 .SP
 537 For all combining methods, input
 538 file volume adjustments can be made manually using the
 539 .B \-v
 540 option (below) which can be given for one or more input files. If it is
 541 given for only some of the input files then the others receive no volume
 542 adjustment.  In some circumstances, automatic volume
 543 adjustments may be applied (see below).
 544 .SP
 545 The \fB\-V\fR option (below) can be used to show the input file volume
 546 adjustments that have been selected (either manually or automatically).
 547 .SP
 548 There are some special considerations that need to made when mixing
 549 input files:
 550 .SP
 551 Unlike the other methods, `mix' combining has the
 552 potential to cause clipping in the combiner if no balancing is
 553 performed.  In this case, if manual volume adjustments are not given,
 554 SoX will try to ensure that clipping does not occur by automatically
 555 adjusting the
 556 volume (amplitude) of each input signal by a factor of \(S1/\s-2n\s+2,
 557 where n is the number of input files.  If this results in audio that is
 558 too quiet or otherwise unbalanced then the input file volumes can be
 559 set manually as described above. Using the
 560 .B norm
 561 effect on the mix is another alternative.
 562 .SP
 563 If mixed audio seems loud enough at some points but
 564 too quiet in others then dynamic range compression should be applied to
 565 correct this\*msee the
 566 .B compand
 567 effect.
 568 .SP
 569 With the `mix-power' combine method, the
 570 mixed volume is approximately equal to that of one of the input signals.
 571 This is achieved by balancing using a factor of
 572 \(S1/\s-2\(srn\s+2 instead of \(S1/\s-2n\s+2.
 573 Note that this balancing factor does not guarantee that clipping will not occur,
 574 but the number of clips will usually be low and the resultant
 575 distortion is generally imperceptible.
 576 .SS Output Files
 577 SoX's default behaviour is to take one or more input files and
 578 write them to a single output file.
 579
 580 This behaviour can be changed by specifying the pseudo-effect `newfile'
 581 within the effects list.  SoX will then enter multiple output mode.
 582
 583 In multiple output mode, a new file is created when the effects
 584 prior to the `newfile' indicate they are done.
 585 The effects chain listed after `newfile'
 586 is then started up and its output is saved to the new file.
 587
 588 In multiple output mode, a unique number will automatically be appended
 589 to the end of all filenames.  If the filename has an extension
 590 then the number is inserted before the extension.  This behaviour can
 591 be customized by placing a %n anywhere in the filename where the
 592 number should be substituted.  An optional number can be placed after
 593 the % to indicate a minimum fixed width for the number.
 594
 595 Multiple output mode is not very useful unless an effect that will
 596 stop the effects chain early is
 597 specified before the `newfile'. If end of file is
 598 reached before the effects chain stops itself then no new file
 599 will be created as it would be empty.
 600
 601 The following is an example of splitting the first 60 seconds of an input
 602 file into two 30 second files and ignoring the rest.
 603 .EX
 604    sox song.wav ringtone%1n.wav trim 0 30 : newfile : trim 0 30
 605 .SS Stopping SoX
 606 Usually SoX will complete its processing and exit automatically once
 607 it has read all available audio data from the input files.
 608 .SP
 609 If desired, it can be terminated earlier by sending an
 610 interrupt signal to the process (usually by pressing the
 611 keyboard interrupt key which is normally Ctrl-C).  This is a natural requirement
 612 in some circumstances, e.g. when using SoX to make a recording.  Note
 613 that when using SoX to play multiple files, Ctrl-C behaves slightly
 614 differently: pressing it once causes SoX to skip to the next file;
 615 pressing it twice in quick succession causes SoX to exit.
 616 .SP
 617 Another option to stop processing early is to use an effect that
 618 has a time period or sample count to determine the stopping
 619 point. The trim effect is an example of this.  Once all
 620 effects chains have stopped then SoX will also stop.
 621 .SH FILENAMES
 622 Filenames can be simple file names, absolute or relative path names,
 623 or URLs (input files only).  Note that URL support requires that
 624 .BR wget (1)
 625 is available.
 626 .SP
 627 Note:
 628 Giving SoX an input or output filename that is the same as a SoX
 629 effect-name will not work since SoX will treat it as an effect
 630 specification.  The only work-around to this is to avoid such
 631 filenames. This is generally not difficult since most audio
 632 filenames have a filename `extension', whilst effect-names do not.
 633 .SS Special Filenames
 634 The following special filenames may be used in certain circumstances
 635 in place of a normal filename on the command line:
 636 .TP
 637 \fB\-\fR
 638 SoX can be used in simple pipeline operations by using the special
 639 filename `\-' which,
 640 if used as an input filename, will cause
 641 SoX will read audio data from `standard input' (stdin),
 642 and which,
 643 if used as the output filename, will cause
 644 SoX will send audio data to `standard output' (stdout).
 645 Note that when using this option for the output file, and sometimes
 646 when using it for an input file, the file-type (see
 647 .B \-t
 648 below) must also be given.
 649 .TP
 650 \fB\(dq\^|\^\fIprogram \fR[\fIoptions\fR] ...\fB\(dq\fR
 651 This can be used in place of an input filename to specify the
 652 the given program's standard output (stdout) be used as an input file.
 653 Unlike
 654 .B \-
 655 (above), this can be used for several inputs to one SoX command.  For
 656 example, if `genw' generates mono WAV formatted signals to its
 657 standard output, then the following command makes a stereo file
 658 from two generated signals:
 659 .EX
 660    sox \-M "|genw \-\-imd \-" "|genw \-\-thd \-" out.wav
 661 .EE
 662 For headerless (raw) audio,
 663 .B \-t
 664 (and perhaps other format options) will need to be given, preceding the input
 665 command.
 666 .TP
 667 \fB\(dq\fIwildcard-filename\fB\(dq\fR
 668 Specifies that filename `globbing' (wild-card matching) should be performed
 669 by SoX instead of by the shell.  This allows a single set of file options to be
 670 applied to a group of files.  For example, if the current directory contains
 671 three `vox' files, file1.vox, file2.vox, and file3.vox, then
 672 .EX
 673    play \-\-rate 6k *.vox
 674 .EE
 675 will be expanded by the `shell' (in most environments) to
 676 .EX
 677    play \-\-rate 6k file1.vox file2.vox file3.vox
 678 .EE
 679 which will treat only the first vox file as having a sample rate of 6k.
 680 With
 681 .EX
 682    play \-\-rate 6k "*.vox"
 683 .EE
 684 the given sample rate option will be applied to all three vox files.
 685 .TP
 686 \fB\-p\fR, \fB\-\-sox\-pipe\fR
 687 This can be used in place of an output filename to specify that
 688 the SoX command should be used as in input pipe to another SoX command.
 689 For example, the command:
 690 .EX
 691    play "|sox \-n \-p synth 2" "|sox \-n \-p synth 2 tremolo 10" stat
 692 .EE
 693 plays two `files' in succession, each with different effects.
 694 .SP
 695 .B \-p
 696 is in fact an alias for `\fB\-t sox \-\fR'.
 697 .TP
 698 \fB\-d\fR, \fB\-\-default\-device\fR
 699 This can be used in place of an input or output filename to specify that
 700 the default audio device (if one has been built into SoX) is to be used.
 701 This is akin to invoking
 702 .B rec
 703 or
 704 .B play
 705 (as described above).
 706 .TP
 707 \fB\-n\fR, \fB\-\-null\fR
 708 This can be used in place of an input or output filename to specify that
 709 a `null file' is to be used.  Note that here, `null file' refers to a
 710 SoX-specific mechanism and is not related to any operating-system
 711 mechanism with a similar name.
 712 .SP
 713 Using a null file to input audio is equivalent to
 714 using a normal audio file that contains an infinite amount
 715 of silence, and as such is not generally useful unless used
 716 with an effect that specifies a finite time length
 717 (such as \fBtrim\fR or \fBsynth\fR).
 718 .SP
 719 Using a null file to output audio amounts to discarding the audio
 720 and is useful mainly with effects that produce information about the
 721 audio instead of affecting it (such as \fBnoiseprof\fR or \fBstat\fR).
 722 .SP
 723 The sampling rate associated with a null file
 724 is by default 48\ kHz, but, as with a normal
 725 file, this can be overridden if desired using command-line format
 726 options (see below).
 727 .SS Supported File & Audio Device Types
 728 See
 729 .BR soxformat (7)
 730 for a list and description of the supported file formats and audio device
 731 drivers.
 732 .SH OPTIONS
 733 .SS Global Options
 734 These options can be specified on the command line at any point
 735 before the first effect name.
 736 .SP
 737 The
 738 .B SOX_OPTS
 739 environment variable can be used to provide alternative default values for
 740 SoX's global options.
 741 For example:
 742 .EX
 743    SOX_OPTS="\-\-buffer 20000 \-\-play\-rate\-arg \-hs \-\-temp /mnt/temp"
 744 .EE
 745 Note that setting SOX_OPTS can potentially create unwanted changes in
 746 the behaviour of scripts or other programs that invoke SoX.  SOX_OPTS
 747 might best be used for things (such as in the given example) that reflect the
 748 environment in which SoX is being run.  Enabling options such as
 749 .B \-\-no\-clobber
 750 as default might be handled better using a shell alias
 751 since a shell alias will not affect operation in scripts etc.
 752 .SP
 753 One way to ensure that a script cannot be affected by SOX_OPTS is to
 754 clear SOX_OPTS at the start of the script, but this of course loses
 755 the benefit of SOX_OPTS carrying some system-wide default options.  An
 756 alternative approach is to explicitly invoke SoX with default
 757 option values, e.g.
 758 .EX
 759    SOX_OPTS="\-V \-\-no-clobber"
 760    ...
 761    sox \-V2 \-\-clobber $input $output ...
 762 .EE
 763 Note that the way to set environment variables varies from system
 764 to system. Here are some examples:
 765 .SP
 766 Unix bash:
 767 .EX
 768    export SOX_OPTS="\-V \-\-no-clobber"
 769 .EE
 770 Unix csh:
 771 .EX
 772    setenv SOX_OPTS "\-V \-\-no-clobber"
 773 .EE
 774 MS-DOS/MS-Windows:
 775 .EX
 776    set SOX_OPTS=\-V \-\-no-clobber
 777 .EE
 778 MS-Windows GUI: via Control Panel : System : Advanced : Environment
 779 Variables
 780 .SP
 781 Mac OS X GUI: Refer to Apple's Technical Q&A QA1067 document.
 782 .TP
 783 \fB\-\-buffer\fR \fBBYTES\fR, \fB\-\-input\-buffer\fR \fBBYTES\fR
 784 Set the size in bytes of the buffers used for processing audio (default 8192).
 785 .B \-\-buffer
 786 applies to input, effects, and output processing;
 787 .B \-\-input\-buffer
 788 applies only to input processing (for which it overrides
 789 .B \-\-buffer
 790 if both are given).
 791 .SP
 792 Be aware that large values for
 793 .B \-\-buffer
 794 will cause SoX to be become slow to respond to requests to terminate or to skip
 795 the current input file.
 796 .TP
 797 \fB\-\-clobber\fR
 798 Don't prompt before overwriting an existing file with the same name as that
 799 given for the output file.  This is the default behaviour.
 800 .TP
 801 \fB\-\-combine concatenate\fR\^|\^\fBmerge\fR\^|\^\fBmix\fR\^|\^\fBmix\-power\fR\^|\^\fBmultiply\fR\^|\^\fBsequence\fR
 802 Select the input file combining method;
 803 for some of these, short options are available:
 804 .B \-m
 805 selects `mix',
 806 .B \-M
 807 selects `merge', and
 808 .B \-T
 809 selects `multiply'.
 810 .SP
 811 See \fBInput File Combining\fR above for a description of the different
 812 combining methods.
 813 .TP
 814 \fB\-D\fR, \fB\-\-no\-dither\fR
 815 Disable automatic dither\*msee `Dithering' above.  An example of why this
 816 might occasionally be useful is if a file has been converted from 16 to
 817 24 bit with the intention of doing some processing on it, but in fact
 818 no processing is needed after all and the original 16 bit file has
 819 been lost, then, strictly speaking, no dither is needed if converting the
 820 file back to 16 bit.  See also the
 821 .B stats
 822 effect for how to determine the actual bit depth of the audio within a
 823 file.
 824 .TP
 825 \fB\-\-effects\-file \fIFILENAME\fR
 826 Use FILENAME to obtain all effects and their arguments.
 827 The file is parsed as if the values were specified on the
 828 command line.  A new line can be used in place of the special \fB:\fR
 829 marker to separate effect chains.  For convenience, such markers at the
 830 end of the file are normally ignored; if you want to specify an empty
 831 last effects chain, use an explicit \fB:\fR by itself on the last line
 832 of the file.  This option causes any effects specified on the command
 833 line to be discarded.
 834 .TP
 835 \fB\-G\fR, \fB\-\-guard\fR
 836 Automatically invoke the
 837 .B gain
 838 effect to guard against clipping. E.g.
 839 .EX
 840    sox \-G infile \-b 16 outfile rate 44100 dither \-s
 841 .EE
 842 is shorthand for
 843 .EX
 844    sox infile \-b 16 outfile gain \-h rate 44100 gain \-rh dither \-s
 845 .EE
 846 See also
 847 .BR \-V,
 848 .BR \-\-norm,
 849 and the
 850 .B gain
 851 effect.
 852 .TP
 853 \fB\-h\fR, \fB\-\-help\fR
 854 Show version number and usage information.
 855 .TP
 856 \fB\-\-help\-effect \fINAME\fR
 857 Show usage information on the specified effect.  The name
 858 \fBall\fR can be used to show usage on all effects.
 859 .TP
 860 \fB\-\-help\-format \fINAME\fR
 861 Show information about the specified file format.  The name
 862 \fBall\fR can be used to show information on all formats.
 863 .TP
 864 \fB\-\-i\fR, \fB\-\-info\fR
 865 Only if given as the first parameter to
 866 .BR sox ,
 867 behave as
 868 .BR soxi (1).
 869 .TP
 870 \fB\-m\fR\^|\^\fB\-M\fR
 871 Equivalent to \fB\-\-combine mix\fR and \fB\-\-combine merge\fR, respectively.
 872 .TP
 873 .B \-\-magic
 874 If SoX has been built with the optional `libmagic' library then this
 875 option can be given to enable its use in helping to detect audio file types.
 876 .TP
 877 \fB\-\-multi\-threaded\fR | \fB\-\-single\-threaded\fR
 878 By default, SoX is `single threaded'.
 879 If the \fB\-\-multi\-threaded\fR option is given however then SoX
 880 will process audio channels for most multi-channel
 881 effects in parallel on hyper-threading/multi-core architectures. This
 882 may reduce processing time, though sometimes it may be necessary to use
 883 this option in conjuction with a larger buffer size than is the default
 884 to gain any benefit from multi-threaded processing
 885 (e.g. 131072; see \fB\-\-buffer\fR above).
 886 .TP
 887 \fB\-\-no\-clobber\fR
 888 Prompt before overwriting an existing file with the same name as that
 889 given for the output file.
 890 .SP
 891 .B N.B.
 892 Unintentionally overwriting a file is easier than you might think, for
 893 example, if you accidentally enter
 894 .EX
 895    sox file1 file2 effect1 effect2 ...
 896 .EE
 897 when what you really meant was
 898 .EX
 899    play file1 file2 effect1 effect2 ...
 900 .EE
 901 then, without this option, file2 will be overwritten.  Hence, using
 902 this option is recommended. SOX_OPTS (above), a `shell'
 903 alias, script, or batch file may be an appropriate way of permanently
 904 enabling it.
 905 .TP
 906 \fB\-\-norm\fR[\fB=\fIdB-level\fR]
 907 Automatically invoke the
 908 .B gain
 909 effect to guard against clipping and to normalise the audio. E.g.
 910 .EX
 911    sox \-\-norm infile \-b 16 outfile rate 44100 dither \-s
 912 .EE
 913 is shorthand for
 914 .EX
 915    sox infile \-b 16 outfile gain \-h rate 44100 gain \-nh dither \-s
 916 .EE
 917 Optionally, the audio can be normalized to a given level (usually)
 918 below 0 dBFS:
 919 .EX
 920    sox \-\-norm=\-3 infile outfile
 921 .EE
 922 .SP
 923 See also
 924 .BR \-V,
 925 .BR \-G,
 926 and the
 927 .B gain
 928 effect.
 929 .TP
 930 \fB\-\-play\-rate\-arg ARG\fR
 931 Selects a quality option to be used when the `rate' effect is automatically
 932 invoked whilst playing audio.  This option is typically set via the
 933 .B SOX_OPTS
 934 environment variable (see above).
 935 .TP
 936 \fB\-\-plot gnuplot\fR\^|\^\fBoctave\fR\^|\^\fBoff\fR
 937 If not set to
 938 .B off
 939 (the default if
 940 .B \-\-plot
 941 is not given), run in a mode that can be used, in conjunction with the
 942 gnuplot program or the GNU Octave program, to assist with the selection
 943 and configuration of many of the transfer-function based effects.
 944 For the first given effect that supports the selected plotting program,
 945 SoX will output commands to plot the effect's transfer function, and
 946 then exit without actually processing any audio.  E.g.
 947 .EX
 948    sox \-\-plot octave input-file \-n highpass 1320 > highpass.plt
 949    octave highpass.plt
 950 .EE
 951 .TP
 952 \fB\-q\fR, \fB\-\-no\-show\-progress\fR
 953 Run in quiet mode when SoX wouldn't otherwise do so.
 954 This is the opposite of the \fB\-S\fR option.
 955 .TP
 956 \fB\-R\fR
 957 Run in `repeatable' mode.  When this option is given, where
 958 applicable, SoX will embed a fixed time-stamp in the output file (e.g.
 959 \fBAIFF\fR) and will `seed' pseudo random number generators (e.g.
 960 \fBdither\fR) with a fixed number, thus ensuring that successive SoX
 961 invocations with the same inputs and the same parameters yield the
 962 same output.
 963 .TP
 964 \fB\-\-replay\-gain track\fR\^|\^\fBalbum\fR\^|\^\fBoff\fR
 965 Select whether or not to apply replay-gain adjustment to input files.
 966 The default is
 967 .B off
 968 for
 969 .B sox
 970 and
 971 .BR rec ,
 972 .B album
 973 for
 974 .B play
 975 where (at least) the first two input files are tagged with the same Artist and
 976 Album names, and
 977 .B track
 978 for
 979 .B play
 980 otherwise.
 981 .TP
 982 \fB\-S\fR, \fB\-\-show\-progress\fR
 983 Display input file format/header information, and processing progress as
 984 input file(s) percentage complete, elapsed time, and remaining time (if
 985 known; shown in brackets), and the number of samples written to the
 986 output file.  Also shown is a peak-level meter, and an indication if
 987 clipping has occurred.  The peak-level meter shows up to two channels
 988 and is calibrated for digital audio as follows (right channel shown):
 989 .ne 8
 990 .TS
 991 center;
 992 cI lI cI lI
 993 c l c l.
 994 dB FSD  Display dB FSD  Display
 995 \-25    \-      \-11    ====
 996 \-23    T{
 997 =
 998 T}      \-9     ====\-
 999 \-21    =\-     \-7     =====
1000 \-19    ==      \-5     =====\-
1001 \-17    ==\-    \-3     ======
1002 \-15    ===     \-1     =====!
1003 \-13    ===\-
1004 .TE
1005 .DT
1006 .SP
1007 A three-second peak-held value of headroom in dBs will be shown to the right
1008 of the meter if this is below 6dB.
1009 .SP
1010 This option is enabled by default when using
1011 SoX to play or record audio.
1012 .TP
1013 \fB\-T\fR\fR
1014 Equivalent to \fB\-\-combine multiply\fR.
1015 .TP
1016 \fB\-\-temp\fI DIRECTORY\fR
1017 Specify that any temporary files should be created in the given
1018 .IR DIRECTORY .
1019 This can be useful if there are permission or free-space problems with the
1020 default location. In this case, using `\fB\-\-temp .\fR' (to use the
1021 current directory) is often a good solution.
1022 .TP
1023 \fB\-\-version\fR
1024 Show SoX's version number and exit.
1025 .IP \fB\-V\fR[\fIlevel\fR]
1026 Set verbosity. This is particularly useful for seeing how any automatic
1027 effects have been invoked by SoX.
1028 .SP
1029 SoX displays messages on the console (stderr) according to the following
1030 verbosity levels:
1031 .IP
1032 .RS
1033 .IP 0
1034 No messages are shown at all; use the exit status to determine
1035 if an error has occurred.
1036 .IP 1
1037 Only error messages are shown.  These are generated if
1038 SoX cannot complete the requested commands.
1039 .IP 2
1040 Warning messages are also shown.  These are generated if
1041 SoX can complete the requested commands,
1042 but not exactly according to the requested command parameters,
1043 or if clipping occurs.
1044 .IP 3
1045 Descriptions of
1046 SoX's processing phases are also shown.
1047 Useful for seeing exactly how
1048 SoX is processing your audio.
1049 .IP "4 and above"
1050 Messages to help with debugging
1051 SoX are also shown.
1052 .RE
1053 .IP
1054 By default, the verbosity level is set to 2 (shows errors and
1055 warnings). Each occurrence of the \fB\-V\fR option increases the
1056 verbosity level by 1.  Alternatively, the verbosity level can be set
1057 to an absolute number by specifying it immediately after the
1058 .BR \-V ,
1059 e.g.
1060 .B \-V0
1061 sets it to 0.
1062 .IP
1063 .SS Input File Options
1064 These options apply only to input files and may precede only input
1065 filenames on the command line.
1066 .TP
1067 \fB\-\-ignore\-length\fR
1068 Override an (incorrect) audio length given in an audio file's header. If
1069 this option is given then SoX will keep reading audio until it reaches
1070 the end of the input file.
1071 .TP
1072 \fB\-v\fR, \fB\-\-volume\fR \fIFACTOR\fR
1073 Intended for use when combining multiple input files, this option
1074 adjusts the volume of the file that follows it on the command line by a
1075 factor of \fIFACTOR\fR. This allows it to be `balanced' w.r.t. the other
1076 input files.  This is a linear (amplitude) adjustment, so a number less
1077 than 1 decreases the volume and a number greater than 1 increases it.  If a
1078 negative number is given then in addition to the volume adjustment,
1079 the audio signal will be inverted.
1080 .SP
1081 See also the
1082 .BR norm ,
1083 .BR vol ,
1084 and
1085 .B gain
1086 effects, and see \fBInput File Balancing\fR above.
1087 .SS Input & Output File Format Options
1088 These options apply to the input or output file whose name they
1089 immediately precede on the command line and are used mainly when
1090 working with headerless file formats or when specifying a format
1091 for the output file that is different to that of the input file.
1092 .TP
1093 \fB\-b\fR \fIBITS\fR, \fB\-\-bits\fR \fIBITS\fR
1094 The number of bits (a.k.a. bit-depth or sometimes word-length) in each
1095 encoded sample.  Not applicable to complex encodings such as MP3 or GSM.
1096 Not necessary with encodings that have a fixed number of bits, e.g.
1097 A/\(*m-law, ADPCM.
1098 .SP
1099 For an input file, the most common use for this option is to inform
1100 SoX of the number of bits per sample in a `raw' (`headerless') audio
1101 file.  For example
1102 .EX
1103    sox \-r 16k \-e signed \-b 8 input.raw output.wav
1104 .EE
1105 converts a particular `raw' file to a self-describing `WAV' file.
1106 .SP
1107 For an output file, this option can be used (perhaps along with
1108 .BR \-e )
1109 to set the output encoding size.  By default (i.e. if this option is
1110 not given), the output encoding size will (providing it is supported
1111 by the output file type) be set to the input encoding size.  For
1112 example
1113 .EX
1114    sox input.cdda \-b 24 output.wav
1115 .EE
1116 converts raw CD digital audio (16-bit, signed-integer) to a
1117 24-bit (signed-integer) `WAV' file.
1118 .TP
1119 \fB\-1\fR\^/\fB\-2\fR\^/\fB\-3\fR\^/\fB\-4\fR\^/\fB\-8\fR
1120 The number of bytes in each encoded sample.  Deprecated aliases for
1121 \fB\-b 8\fR, \fB\-b 16\fR, \fB\-b 24\fR, \fB\-b 32\fR, \fB\-b 64\fR
1122 respectively.
1123 .TP
1124 \fB\-c\fR \fICHANNELS\fR, \fB\-\-channels\fR \fICHANNELS\fR
1125 The number of audio channels in the audio file. This can be any number
1126 greater than zero.
1127 .SP
1128 For an input file, the most common use for this option is to inform
1129 SoX of the number of channels in a `raw' (`headerless') audio file.
1130 Occasionally, it may be useful to use this option with a `headered'
1131 file, in order to override the (presumably incorrect) value in the
1132 header\*mnote that this is only supported with certain file types.
1133 Examples:
1134 .EX
1135    sox \-r 48k \-e float \-b 32 \-c 2 input.raw output.wav
1136 .EE
1137 converts a particular `raw' file to a self-describing `WAV' file.
1138 .EX
1139    play \-c 1 music.wav
1140 .EE
1141 interprets the file data as belonging to a single channel regardless
1142 of what is indicated in the file header.  Note that if the file does
1143 in fact have two channels, this will result in the file playing at
1144 half speed.
1145 .SP
1146 For an output file, this option provides a shorthand for specifying
1147 that the
1148 .B channels
1149 effect should be invoked in order to change (if necessary) the number
1150 of channels in the audio signal to the number given.  For
1151 example, the following two commands are equivalent:
1152 .EX
1153 .ne 2
1154    sox input.wav \-c 1 output.wav bass \-b 24
1155    sox input.wav      output.wav bass \-b 24 channels 1
1156 .EE
1157 though the second form is more flexible as it allows the effects to
1158 be ordered arbitrarily.
1159 .TP
1160 \fB\-e \fIENCODING\fR, \fB\-\-encoding\fR \fIENCODING\fR
1161 The audio encoding type.  Sometimes needed with file-types that
1162 support more than one encoding type. For example, with raw, WAV, or
1163 AU (but not, for example, with MP3 or FLAC).
1164 The available encoding types are as follows:
1165 .RS
1166 .IP \fBsigned-integer\fR
1167 PCM data stored as signed (`two's complement') integers.  Commonly used
1168 with a 16 or 24 \-bit encoding size.
1169 A value of 0 represents minimum signal power.
1170 .IP \fBunsigned-integer\fR
1171 PCM data stored as unsigned integers.  Commonly used
1172 with an 8-bit encoding size.  A value of 0 represents maximum signal
1173 power.
1174 .IP \fBfloating-point\fR
1175 PCM data stored as IEEE 753 single precision (32-bit) or double
1176 precision (64-bit) floating-point (`real') numbers.
1177 A value of 0 represents minimum signal power.
1178 .IP \fBa-law\fR
1179 International telephony standard for logarithmic encoding to 8 bits per
1180 sample.  It has a precision equivalent to roughly 13-bit PCM and is
1181 sometimes encoded with reversed bit-ordering (see the
1182 .B \-X
1183 option).
1184 .IP \fBu-law,\ mu-law\fR
1185 North American telephony standard for logarithmic encoding to 8 bits per
1186 sample.  A.k.a. \(*m-law.  It has a precision equivalent to roughly
1187 14-bit PCM and is
1188 sometimes encoded with reversed bit-ordering (see the
1189 .B \-X
1190 option).
1191 .IP \fBoki-adpcm\fR
1192 OKI (a.k.a. VOX, Dialogic, or Intel) 4-bit ADPCM;
1193 it has a precision equivalent to roughly 12-bit PCM.
1194 ADPCM is a form of audio compression that has a good
1195 compromise between audio quality and encoding/decoding speed.
1196 .IP \fBima-adpcm\fR
1197 IMA (a.k.a. DVI) 4-bit ADPCM;
1198 it has a precision equivalent to roughly 13-bit PCM.
1199 .IP \fBms-adpcm\fR
1200 Microsoft 4-bit ADPCM; it has a precision equivalent to roughly 14-bit
1201 PCM.
1202 .IP \fBgsm-full-rate\fR
1203 GSM is currently used for the vast majority of the world's digital
1204 wireless telephone calls.  It utilises several audio
1205 formats with different bit-rates and associated speech quality.
1206 SoX has support for GSM's original 13kbps `Full Rate' audio format.
1207 It is usually CPU-intensive to work with GSM audio.
1208 .RE
1209 .TP
1210 \
1211 Encoding names can be abbreviated where this would not be ambiguous;
1212 e.g. `unsigned-integer' can be given as `un', but not `u' (ambiguous
1213 with `u-law').
1214 .SP
1215 For an input file, the most common use for this option is to inform
1216 SoX of the encoding of a `raw' (`headerless') audio
1217 file (see the examples in
1218 .B \-b
1219 and
1220 .B \-c
1221 above).
1222 .SP
1223 For an output file, this option can be used (perhaps along with
1224 .BR \-b )
1225 to set the output encoding type  For example
1226 .EX
1227    sox input.cdda \-e float output1.wav
1228
1229    sox input.cdda \-b 64 \-e float output2.wav
1230 .EE
1231 convert raw CD digital audio (16-bit, signed-integer) to
1232 floating-point `WAV' files (single & double precision respectively).
1233 .SP
1234 By default (i.e. if this option is not given), the output encoding
1235 type will (providing it is supported by the output file type) be set
1236 to the input encoding type.
1237 .TP
1238 \fB\-s\fR\^/\fB\-u\fR\^/\fB\-f\fR\^/\fB\-A\fR\^/\fB\-U\fR\^/\fB\-o\fR\^/\fB\-i\fR\^/\fB\-a\fR\^/\fB\-g\fR
1239 Deprecated aliases for specifying the encoding types
1240 \fBsigned-integer\fR, \fBunsigned-integer\fR, \fBfloating-point\fR, \fBa-law\fR, \fBmu-law\fR, \fBoki-adpcm\fR, \fBima-adpcm\fR, \fBms-adpcm\fR, \fBgsm-full-rate\fR
1241 respectively (see
1242 .B \-e
1243 above).
1244 .TP
1245 \fB\-\-no\-glob\fR
1246 Specifies that filename `globbing' (wild-card matching) should not be
1247 performed by SoX on the following filename.  For example, if the current
1248 directory contains the two files `five-seconds.wav' and `five*.wav', then
1249 .EX
1250    play \-\-no\-glob "five*.wav"
1251 .EE
1252 can be used to play just the single file `five*.wav'.
1253 .TP
1254 \fB\-r, \fB\-\-rate\fR \fIRATE\fR[\fBk\fR]
1255 Gives the sample rate in Hz (or kHz if appended with `k') of the file.
1256 .SP
1257 For an input file, the most common use for this option is to inform
1258 SoX of the sample rate of a `raw' (`headerless') audio file (see the
1259 examples in
1260 .B \-b
1261 and
1262 .B \-c
1263 above).
1264 Occasionally it may be useful to use this option with a `headered'
1265 file, in order to override the (presumably incorrect) value in the
1266 header\*mnote that this is only supported with certain file types.
1267 For example, if audio was recorded with a sample-rate of say 48k from
1268 a source that played back a little, say 1\*d5%, too slowly, then
1269 .EX
1270    sox \-r 48720 input.wav output.wav
1271 .EE
1272 effectively corrects the speed by changing only the file header (but see
1273 also the
1274 .B speed
1275 effect for the more usual solution to this problem).
1276 .SP
1277 For an output file, this option provides a shorthand for specifying
1278 that the
1279 .B rate
1280 effect should be invoked in order to change (if necessary) the sample
1281 rate of the audio signal to the given value.  For example, the
1282 following two commands are equivalent:
1283 .EX
1284 .ne 2
1285    sox input.wav \-r 48k output.wav bass \-b 24
1286    sox input.wav        output.wav bass \-b 24 rate 48k
1287 .EE
1288 though the second form is more flexible as it allows
1289 .B rate
1290 options to be given, and allows the effects to be ordered arbitrarily.
1291 .TP
1292 \fB\-t\fR, \fB\-\-type\fR \fIFILE-TYPE\fR
1293 Gives the type of the audio file.  For both input and output files,
1294 this option is commonly used to inform SoX of the type a `headerless'
1295 audio file (e.g. raw, mp3) where the actual/desired type cannot be
1296 determined from a given filename extension.  For example:
1297 .EX
1298    another-command | sox \-t mp3 \- output.wav
1299
1300    sox input.wav \-t raw output.bin
1301 .EE
1302 It can also be used to override the type implied by an input filename
1303 extension, but if overriding with a type that has a header, SoX will
1304 exit with an appropriate error message if such a header is not
1305 actually present.
1306 .SP
1307 See
1308 .BR soxformat (7)
1309 for a list of supported file types.
1310 .PP
1311 \fB\-L\fR, \fB\-\-endian little\fR
1312 .br
1313 \fB\-B\fR, \fB\-\-endian big\fR
1314 .br
1315 \fB\-x\fR, \fB\-\-endian swap\fR
1316 .if t .sp -.5
1317 .if n .sp -1
1318 .TP
1319 \
1320 These options specify whether the byte-order of the audio data is,
1321 respectively, `little endian', `big endian', or the opposite to that of
1322 the system on which SoX is being used.  Endianness applies only to data
1323 encoded as floating-point, or as signed or unsigned integers of 16 or
1324 more bits.  It is often necessary to specify one of these options for
1325 headerless files, and sometimes necessary for (otherwise)
1326 self-describing files.  A given endian-setting option may be ignored
1327 for an input file whose header contains a specific endianness
1328 identifier, or for an output file that is actually an audio device.
1329 .SP
1330 .B N.B.
1331 Unlike other format characteristics, the endianness (byte, nibble, &
1332 bit ordering) of the input file is not automatically used for the output
1333 file; so, for example, when the following is run on a little-endian system:
1334 .EX
1335    sox \-B audio.s16 trimmed.s16 trim 2
1336 .EE
1337 trimmed.s16 will be created as little-endian;
1338 .EX
1339    sox \-B audio.s16 \-B trimmed.s16 trim 2
1340 .EE
1341 must be used to preserve big-endianness in the output file.
1342 .SP
1343 The
1344 .B \-V
1345 option can be used to check the selected orderings.
1346 .TP
1347 \fB\-N\fR, \fB\-\-reverse\-nibbles\fR
1348 Specifies that the nibble ordering (i.e. the 2 halves of a byte) of the samples should be reversed;
1349 sometimes useful with ADPCM-based formats.
1350 .SP
1351 .B N.B.
1352 See also N.B. in section on
1353 .B \-x
1354 above.
1355 .TP
1356 \fB\-X\fR, \fB\-\-reverse\-bits\fR
1357 Specifies that the bit ordering of the samples should be reversed;
1358 sometimes useful with a few (mostly headerless) formats.
1359 .SP
1360 .B N.B.
1361 See also N.B. in section on
1362 .B \-x
1363 above.
1364 .SS Output File Format Options
1365 These options apply only to the output file and may precede only the output
1366 filename on the command line.
1367 .TP
1368 \fB\-\-add\-comment \fITEXT\fR
1369 Append a comment in the output file header (where applicable).
1370 .TP
1371 \fB\-\-comment \fITEXT\fR
1372 Specify the comment text to store in the output file header (where
1373 applicable).
1374 .SP
1375 SoX will provide a default comment if this option (or
1376 .BR \-\-comment\-file )
1377 is not given. To specify that no comment should be stored in the output file,
1378 use
1379 .B "\-\-comment \(dq\(dq" .
1380 .TP
1381 \fB\-\-comment\-file \fIFILENAME\fR
1382 Specify a file containing the comment text to store in the output
1383 file header (where applicable).
1384 .TP
1385 \fB\-C\fR, \fB\-\-compression\fR \fIFACTOR\fR
1386 The compression factor for variably compressing output file formats.  If
1387 this option is not given then a default compression factor will apply.
1388 The compression factor is interpreted differently for different
1389 compressing file formats.  See the description of the file formats that
1390 use this option in
1391 .BR soxformat (7)
1392 for more information.
1393 .SH EFFECTS
1394 In addition to converting, playing and recording audio files, SoX can
1395 be used to invoke a number of audio `effects'.  Multiple effects may
1396 be applied by specifying them one after another at the end of the SoX
1397 command line, forming an `effects chain'.
1398 Note that applying multiple effects in real-time (i.e. when playing audio)
1399 is likely to require a high performance computer. Stopping other applications
1400 may alleviate performance issues should they occur.
1401 .SP
1402 Some of the SoX effects are primarily intended to be applied to a single
1403 instrument or `voice'.  To facilitate this, the \fBremix\fR effect and
1404 the global SoX option \fB\-M\fR can be used to isolate then recombine
1405 tracks from a multi-track recording.
1406 .SS Multiple Effect Chains
1407 A single effects chain is made up of one or more effects.  Audio from
1408 the input runs through the chain until either the end of the input file
1409 is reached or an effect in the chain requests to terminate the chain.
1410 .SP
1411 SoX supports running multiple effects chains over the input audio.
1412 In this case, when one chain indicates it is done processing audio,
1413 the audio data is then sent through the next effects chain.  This
1414 continues until either no more effects chains exist or the input has
1415 reached the end of the file.
1416 .SP
1417 An effects chain is terminated by placing a
1418 .B :
1419 (colon) after an effect.  Any following effects are a part of a new effects chain.
1420 .SP
1421 It is important to place the effect that will stop the chain
1422 as the first effect in the chain.  This is because any samples
1423 that are buffered by effects to the left of the terminating effect
1424 will be discarded.  The amount of samples discarded is related to the
1425 .B \-\-buffer
1426 option and it should be kept small, relative to the sample rate, if
1427 the terminating effect cannot be first.  Further information on
1428 stopping effects can be found in the
1429 .B Stopping SoX
1430 section.
1431 .SP
1432 There are a few pseudo-effects that aid using multiple effects chains.
1433 These include
1434 .B newfile
1435 which will start writing to a new output file before moving to the
1436 next effects chain and
1437 .B restart
1438 which will move back to the first effects chain.  Pseudo-effects
1439 must be specified as the first effect in a chain and as the only
1440 effect in a chain (they must have a
1441 .B :
1442 before and after they are specified).
1443 .SP
1444 The following is an example of multiple effects chains.  It will split the
1445 input file into multiple files of 30 seconds in length.  Each output filename
1446 will have unique number in its name as documented in the
1447 .B Output Files
1448 section.
1449 .EX
1450    sox infile.wav output.wav trim 0 30 : newfile : restart
1451 .EE
1452 .SS Common Notation And Parameters
1453 In the descriptions that follow,
1454 brackets [ ] are used to denote parameters that are optional, braces
1455 { } to denote those that are both optional and repeatable,
1456 and angle brackets < > to denote those that are repeatable but not
1457 optional.
1458 Where applicable, default values for optional parameters are shown in parenthesis ( ).
1459 .SP
1460 The following parameters are used with, and have the same meaning for,
1461 several effects:
1462 .TP
1463 \fIcenter\fR[\fBk\fR]
1464 See
1465 .IR frequency .
1466 .TP
1467 \fIfrequency\fR[\fBk\fR]
1468 A frequency in Hz, or, if appended with `k', kHz.
1469 .TP
1470 \fIgain\fR
1471 A power gain in dB.
1472 Zero gives no gain; less than zero gives an attenuation.
1473 .TP
1474 \fIwidth\fR[\fBh\fR\^|\^\fBk\fR\^|\^\fBo\fR\^|\^\fBq\fR]
1475 Used to specify the band-width of a filter.  A number of different
1476 methods to specify the width are available (though not all for every effect).
1477 One of the characters shown may be appended to select the desired method
1478 as follows:
1479 .ne 5
1480 .TS
1481 center;
1482 cI cI lI
1483 cB c l.
1484 \       Method  Notes
1485 h       Hz      \
1486 k       kHz     \
1487 o       Octaves \
1488 q       Q-factor        See [2]
1489 .TE
1490 .DT
1491 .SP
1492 For each effect that uses this parameter, the default method (i.e. if no
1493 character is appended) is the one that it listed first in the first line of
1494 the effect's description.
1495 .PP
1496 To see if SoX has support for an optional effect, enter
1497 .B sox \-h
1498 and look for its name under the list: `EFFECTS'.
1499 .SS Supported Effects
1500 Note: a categorised list of the effects can be found in the
1501 accompanying `README' file.
1502 .TP
1503 \fBallpass\fR \fIfrequency\fR[\fBk\fR]\fI width\fR[\fBh\fR\^|\^\fBk\fR\^|\^\fBo\fR\^|\^\fBq\fR]
1504 Apply a two-pole all-pass filter with central frequency (in Hz)
1505 \fIfrequency\fR, and filter-width \fIwidth\fR.
1506 An all-pass filter changes the
1507 audio's frequency to phase relationship without changing its frequency
1508 to amplitude relationship.  The filter is described in detail in [1].
1509 .SP
1510 This effect supports the \fB\-\-plot\fR global option.
1511 .TP
1512 \fBband\fR [\fB\-n\fR] \fIcenter\fR[\fBk\fR]\fR [\fIwidth\fR[\fBh\fR\^|\^\fBk\fR\^|\^\fBo\fR\^|\^\fBq\fR]]
1513 Apply a band-pass filter.
1514 The frequency response drops logarithmically
1515 around the
1516 .I center
1517 frequency.
1518 The
1519 .I width
1520 parameter gives the slope of the drop.
1521 The frequencies at
1522 .I center
1523 +
1524 .I width
1525 and
1526 .I center
1527 \-
1528 .I width
1529 will be half of their original amplitudes.
1530 .B band
1531 defaults to a mode oriented to pitched audio,
1532 i.e. voice, singing, or instrumental music.
1533 The \fB\-n\fR (for noise) option uses the alternate mode
1534 for un-pitched audio (e.g. percussion).
1535 .B Warning:
1536 \fB\-n\fR introduces a power-gain of about 11dB in the filter, so beware
1537 of output clipping.
1538 .B band
1539 introduces noise in the shape of the filter,
1540 i.e. peaking at the
1541 .I center
1542 frequency and settling around it.
1543 .SP
1544 This effect supports the \fB\-\-plot\fR global option.
1545 .SP
1546 See also \fBsinc\fR for a bandpass filter with steeper shoulders.
1547 .TP
1548 \fBbandpass\fR\^|\^\fBbandreject\fR [\fB\-c\fR] \fIfrequency\fR[\fBk\fR]\fI width\fR[\fBh\fR\^|\^\fBk\fR\^|\^\fBo\fR\^|\^\fBq\fR]
1549 Apply a two-pole Butterworth band-pass or band-reject filter with
1550 central frequency \fIfrequency\fR, and (3dB-point) band-width
1551 \fIwidth\fR.  The
1552 .B \-c
1553 option applies only to
1554 .B bandpass
1555 and selects a constant skirt gain (peak gain = Q) instead of the
1556 default: constant 0dB peak gain.
1557 The filters roll off at 6dB per octave (20dB per decade)
1558 and are described in detail in [1].
1559 .SP
1560 These effects support the \fB\-\-plot\fR global option.
1561 .SP
1562 See also \fBsinc\fR for a bandpass filter with steeper shoulders.
1563 .TP
1564 \fBbandreject \fIfrequency\fR[\fBk\fR]\fI width\fR[\fBh\fR\^|\^\fBk\fR\^|\^\fBo\fR\^|\^\fBq\fR]
1565 Apply a band-reject filter.
1566 See the description of the \fBbandpass\fR effect for details.
1567 .TP
1568 \fBbass\fR\^|\^\fBtreble \fIgain\fR [\fIfrequency\fR[\fBk\fR]\fR [\fIwidth\fR[\fBs\fR\^|\^\fBh\fR\^|\^\fBk\fR\^|\^\fBo\fR\^|\^\fBq\fR]]]
1569 Boost or cut the bass (lower) or treble (upper) frequencies of the audio
1570 using a two-pole shelving filter with a response similar to that
1571 of a standard hi-fi's tone-controls.  This is also
1572 known as shelving equalisation (EQ).
1573 .SP
1574 \fIgain\fR gives the gain at 0\ Hz (for \fBbass\fR), or whichever is
1575 the lower of \(ap22\ kHz and the Nyquist frequency (for \fBtreble\fR).  Its
1576 useful range is about \-20 (for a large cut) to +20 (for a large
1577 boost).
1578 Beware of
1579 .B Clipping
1580 when using a positive \fIgain\fR.
1581 .SP
1582 If desired, the filter can be fine-tuned using the following
1583 optional parameters:
1584 .SP
1585 \fIfrequency\fR sets the filter's central frequency and so can be
1586 used to extend or reduce the frequency range to be boosted or
1587 cut.  The default value is 100\ Hz (for \fBbass\fR) or 3\ kHz (for
1588 \fBtreble\fR).
1589 .SP
1590 \fIwidth\fR
1591 determines how
1592 steep is the filter's shelf transition.  In addition to the common
1593 width specification methods described above,
1594 `slope' (the default, or if appended with `\fBs\fR') may be used.
1595 The useful range of `slope' is
1596 about 0\*d3, for a gentle slope, to 1 (the maximum), for a steep slope; the
1597 default value is 0\*d5.
1598 .SP
1599 The filters are described in detail in [1].
1600 .SP
1601 These effects support the \fB\-\-plot\fR global option.
1602 .SP
1603 See also \fBequalizer\fR for a peaking equalisation effect.
1604 .TP
1605 \fBbend\fR [\fB\-f \fIframe-rate\fR(25)] [\fB\-o \fIover-sample\fR(16)] { \fIdelay\fB,\fIcents\fB,\fIduration\fR }
1606 Changes pitch by specified amounts at specified times.
1607 Each given triple: \fIdelay\fB,\fIcents\fB,\fIduration\fR specifies one bend.
1608 .I delay
1609 is the amount of time after the start of the audio stream, or the end of the previous bend, at which to start bending the pitch;
1610 .I cents
1611 is the number of cents (100 cents = 1 semitone) by which to bend the pitch, and
1612 .I duration
1613 the length of time over which the pitch will be bent.
1614 .SP
1615 The pitch-bending algorithm utilises the Discrete Fourier Transform (DFT)
1616 at a particular frame rate and over-sampling rate.
1617 The
1618 .B \-f
1619 and
1620 .B \-o
1621 parameters may be used to adjust these parameters and thus control the
1622 smoothness of the changes in pitch.
1623 .SP
1624 For example, an initial tone is generated, then bent three times, yielding
1625 four different notes in total:
1626 .EX
1627 .ne 2
1628    play \-n synth 2.5 sin 667 gain 1 \\
1629         bend .35,180,.25  .15,740,.53  0,\-520,.3
1630 .EE
1631 Note that the clipping that is produced in this example is deliberate;
1632 to remove it, use
1633 .B gain\ \-5
1634 in place of
1635 .BR gain\ 1 .
1636 .SP
1637 See also \fBpitch\fR.
1638 .TP
1639 \fBbiquad \fIb0 b1 b2 a0 a1 a2\fR
1640 Apply a biquad IIR filter with the given coefficients. Where b* and a* are
1641 the numerator and denominator coefficients respectively.
1642 .SP
1643 See http://en.wikipedia.org/wiki/Digital_biquad_filter (where a0 = 1).
1644 .SP
1645 This effect supports the \fB\-\-plot\fR global option.
1646 .TP
1647 \fBchannels \fICHANNELS\fR
1648 Invoke a simple algorithm to change the number of channels in
1649 the audio signal to the given number
1650 .IR CHANNELS :
1651 mixing if decreasing the number of channels or duplicating if
1652 increasing the number of channels.
1653 .SP
1654 The
1655 .B channels
1656 effect is invoked automatically if SoX's \fB\-c\fR option specifies a
1657 number of channels that is different to that of the input file(s).
1658 Alternatively, if this effect is given explicitly, then SoX's
1659 .B \-c
1660 option need not be given.  For example, the following two commands are
1661 equivalent:
1662 .EX
1663 .ne 2
1664    sox input.wav \-c 1 output.wav bass \-b 24
1665    sox input.wav      output.wav bass \-b 24 channels 1
1666 .EE
1667 though the second form is more flexible as it allows the effects to
1668 be ordered arbitrarily.
1669 .SP
1670 See also
1671 .B remix
1672 for an effect that allows channels to be mixed/selected arbitrarily.
1673 .TP
1674 \fBchorus \fIgain-in gain-out\fR <\fIdelay decay speed depth \fB\-s\fR\^|\^\fB\-t\fR>
1675 Add a chorus effect to the audio.  This can make a single vocal sound
1676 like a chorus, but can also be applied to instrumentation.
1677 .SP
1678 Chorus resembles an echo effect with a short delay, but
1679 whereas with echo the delay is constant, with chorus, it
1680 is varied using sinusoidal or triangular modulation.  The modulation
1681 depth defines the range the modulated delay is played before or after the
1682 delay. Hence the delayed sound will sound slower or faster, that is the delayed
1683 sound tuned around the original one, like in a chorus where some vocals are
1684 slightly off key.
1685 See [3] for more discussion of the chorus effect.
1686 .SP
1687 Each four-tuple parameter
1688 delay/decay/speed/depth gives the delay in milliseconds
1689 and the decay (relative to gain-in) with a modulation
1690 speed in Hz using depth in milliseconds.
1691 The modulation is either sinusoidal (\fB\-s\fR) or triangular
1692 (\fB\-t\fR).  Gain-out is the volume of the output.
1693 .SP
1694 A typical delay is around 40ms to 60ms; the modulation speed is best
1695 near 0\*d25Hz and the modulation depth around 2ms.
1696 For example, a single delay:
1697 .EX
1698    play guitar1.wav chorus 0.7 0.9 55 0.4 0.25 2 \-t
1699 .EE
1700 Two delays of the original samples:
1701 .EX
1702 .ne 2
1703    play guitar1.wav chorus 0.6 0.9 50 0.4 0.25 2 \-t \\
1704          60 0.32 0.4 1.3 \-s
1705 .EE
1706 A fuller sounding chorus (with three additional delays):
1707 .EX
1708 .ne 2
1709    play guitar1.wav chorus 0.5 0.9 50 0.4 0.25 2 \-t \\
1710          60 0.32 0.4 2.3 \-t 40 0.3 0.3 1.3 \-s
1711 .EE
1712 .TP
1713 \fBcompand \fIattack1\fB,\fIdecay1\fR{\fB,\fIattack2\fB,\fIdecay2\fR}
1714 [\fIsoft-knee-dB\fB:\fR]\fIin-dB1\fR[\fB,\fIout-dB1\fR]{\fB,\fIin-dB2\fB,\fIout-dB2\fR}
1715 .br
1716 [\fIgain\fR [\fIinitial-volume-dB\fR [\fIdelay\fR]]]
1717 .SP
1718 Compand (compress or expand) the dynamic range of the audio.
1719 .SP
1720 The
1721 .I attack
1722 and
1723 .I decay
1724 parameters (in seconds) determine the time over which the
1725 instantaneous level of the input signal is averaged to determine its
1726 volume; attacks refer to increases in volume and decays refer to
1727 decreases.
1728 For most situations, the attack time (response to the music getting
1729 louder) should be shorter than the decay time because the human ear is more
1730 sensitive to sudden loud music than sudden soft music.
1731 Where more than one pair of attack/decay parameters are
1732 specified, each input channel is companded separately and the number of
1733 pairs must agree with the number of input channels.
1734 Typical values are
1735 .B 0\*d3,0\*d8
1736 seconds.
1737 .SP
1738 The second parameter is a list of points on the compander's transfer
1739 function specified in dB relative to the maximum possible signal
1740 amplitude.  The input values must be in a strictly increasing order but
1741 the transfer function does not have to be monotonically rising.  If
1742 omitted, the value of
1743 .I out-dB1
1744 defaults to the same value as
1745 .IR in-dB1 ;
1746 levels below
1747 .I in-dB1
1748 are not companded (but may have gain applied to them).
1749 The point \fB0,0\fR is assumed but may be overridden (by
1750 \fB0,\fIout-dBn\fR).
1751 If the list is preceded by a
1752 .I soft-knee-dB
1753 value, then the points at where adjacent line segments on the
1754 transfer function meet will be rounded by the amount given.
1755 Typical values for the transfer function are
1756 .BR 6:\-70,\-60,\-20 .
1757 .SP
1758 The third (optional) parameter is an additional gain in dB to be applied
1759 at all points on the transfer function and allows easy adjustment
1760 of the overall gain.
1761 .SP
1762 The fourth (optional) parameter is an initial level to be assumed for
1763 each channel when companding starts.  This permits the user to supply a
1764 nominal level initially, so that, for example, a very large gain is not
1765 applied to initial signal levels before the companding action has begun
1766 to operate: it is quite probable that in such an event, the output would
1767 be severely clipped while the compander gain properly adjusts itself.
1768 A typical value (for audio which is initially quiet) is
1769 .B \-90
1770 dB.
1771 .SP
1772 The fifth (optional) parameter is a delay in seconds.  The input signal
1773 is analysed immediately to control the compander, but it is delayed
1774 before being fed to the volume adjuster.  Specifying a delay
1775 approximately equal to the attack/decay times allows the compander to
1776 effectively operate in a `predictive' rather than a reactive mode.
1777 A typical value is
1778 .B 0\*d2
1779 seconds.
1780 .TS
1781 center;
1782 c8 c8 c.
1783 *       *       *
1784 .TE
1785 .DT
1786 .SP
1787 The following example might be used to make a piece of music with both
1788 quiet and loud passages suitable for listening to in a noisy environment
1789 such as a moving vehicle:
1790 .EX
1791    sox asz.wav asz-car.wav compand 0.3,1 6:\-70,\-60,\-20 \-5 \-90 0.2
1792 .EE
1793 The transfer function (`6:\-70,...') says that very soft sounds (below
1794 \-70dB) will remain unchanged.  This will stop the compander from
1795 boosting the volume on `silent' passages such as between movements.
1796 However, sounds in the range \-60dB to 0dB (maximum
1797 volume) will be boosted so that the 60dB dynamic range of the
1798 original music will be compressed 3-to-1 into a 20dB range, which is
1799 wide enough to enjoy the music but narrow enough to get around the
1800 road noise.  The `6:' selects 6dB soft-knee companding.
1801 The \-5 (dB) output gain is needed to avoid clipping (the number is
1802 inexact, and was derived by experimentation).
1803 The \-90 (dB) for the initial volume will work fine for a clip that starts
1804 with near silence, and the delay of 0\*d2 (seconds) has the effect of causing
1805 the compander to react a bit more quickly to sudden volume changes.
1806 .SP
1807 In the next example, compand is being used as a noise-gate for when the
1808 noise is at a lower level than the signal:
1809 .EX
1810    play infile compand .1,.2 \-inf,\-50.1,\-inf,\-50,\-50 0 \-90 .1
1811 .EE
1812 Here is another noise-gate, this time for when the
1813 noise is at a higher level than the signal (making it, in some ways,
1814 similar to squelch):
1815 .EX
1816    play infile compand .1,.1 \-45.1,\-45,\-inf,0,\-inf 45 \-90 .1
1817 .EE
1818 This effect supports the \fB\-\-plot\fR global option (for the transfer function).
1819 .SP
1820 See also
1821 .B mcompand
1822 for a multiple-band companding effect.
1823 .TP
1824 \fBcontrast \fR[\fIenhancement-amount\fR(75)]
1825 Comparable with compression, this effect modifies an audio signal to
1826 make it sound louder.
1827 .I enhancement-amount
1828 controls the amount of the enhancement and is a number in the range 0\-100.
1829 Note that
1830 .I enhancement-amount
1831 = 0 still gives a significant contrast enhancement.
1832 .SP
1833 See also the
1834 .B compand
1835 and
1836 .B mcompand
1837 effects.
1838 .TP
1839 \fBdcshift \fIshift\fR [\fIlimitergain\fR]
1840 Apply a DC shift to the audio.  This can be useful to remove a DC
1841 offset (caused perhaps by a hardware problem in the recording chain)
1842 from the audio.  The effect of a DC offset is reduced headroom and
1843 hence volume.
1844 The
1845 .B stat
1846 or
1847 .B stats
1848 effect can be used to determine if a signal has a DC offset.
1849 .SP
1850 The given \fIdcshift\fR value is a floating point number in the range
1851 of \(+-2 that indicates the amount to shift the audio (which is in the
1852 range of \(+-1).
1853 .SP
1854 An optional
1855 .I limitergain
1856 can be specified as well.  It should have a value much less than 1
1857 (e.g. 0\*d05 or 0\*d02) and is used only on peaks to prevent clipping.
1858 .TS
1859 center;
1860 c8 c8 c.
1861 *       *       *
1862 .TE
1863 .DT
1864 .SP
1865 An alternative approach to removing a DC offset (albeit with a short delay)
1866 is to use the
1867 .B highpass
1868 filter effect at a frequency of say 10Hz, as illustrated in the following
1869 example:
1870 .EX
1871    sox \-n dc.wav synth 5 sin %0 50
1872    sox dc.wav fixed.wav highpass 10
1873 .EE
1874 .TP
1875 \fBdeemph\fR
1876 Apply Compact Disc (IEC 60908) de-emphasis (a treble attenuation shelving
1877 filter).
1878 .SP
1879 Pre-emphasis was applied in the mastering of some CDs issued in the early
1880 1980s.  These included many classical music albums, as well as now
1881 sought-after issues of albums by The Beatles, Pink Floyd and others.
1882 Pre-emphasis should be removed at playback time by a de-emphasis
1883 filter in the playback device.  However, not all modern CD players have
1884 this filter, and very few PC CD drives have it; playing pre-emphasised
1885 audio without the correct de-emphasis filter results in audio that sounds harsh
1886 and is far from what its creators intended.
1887 .SP
1888 With the
1889 .B deemph
1890 effect, it is possible to apply the necessary de-emphasis to audio that
1891 has been extracted from a pre-emphasised CD, and then either burn the
1892 de-emphasised audio to a new CD (which will then play correctly on any
1893 CD player), or simply play the correctly de-emphasised audio files on the
1894 PC.  For example:
1895 .EX
1896    sox track1.wav track1\-deemph.wav deemph
1897 .EE
1898 and then burn track1-deemph.wav to CD, or
1899 .EX
1900    play track1\-deemph.wav
1901 .EE
1902 or simply
1903 .EX
1904    play track1.wav deemph
1905 .EE
1906 The de-emphasis filter is implemented as a biquad; its maximum deviation
1907 from the ideal response is only 0\*d06dB (up to 20kHz).
1908 .SP
1909 This effect supports the \fB\-\-plot\fR global option.
1910 .SP
1911 See also the \fBbass\fR and \fBtreble\fR shelving equalisation effects.
1912 .TP
1913 \fBdelay\fR {\fIlength\fR}
1914 Delay one or more audio channels.
1915 .I length
1916 can specify a time or, if appended with an `s', a number of samples.
1917 Do not specify both time and samples delays in the same command.
1918 For example,
1919 .B delay 1\*d5 0 0\*d5
1920 delays the first channel by 1\*d5 seconds, the third channel by 0\*d5
1921 seconds, and leaves the second channel (and any other channels that may be
1922 present) un-delayed.
1923 The following (one long) command plays a chime sound:
1924 .EX
1925 .ne 3
1926    play \-n synth \-j 3 sin %3 sin %\-2 sin %\-5 sin %\-9 \\
1927         sin %\-14 sin %\-21 fade h .01 2 1.5 delay \\
1928         1.3 1 .76 .54 .27 remix \- fade h 0 2.7 2.5 norm \-1
1929 .EE
1930 and this plays a guitar chord:
1931 .EX
1932 .ne 2
1933    play \-n synth pl G2 pl B2 pl D3 pl G3 pl D4 pl G4 \\
1934         delay 0 .05 .1 .15 .2 .25 remix \- fade 0 4 .1 norm \-1
1935 .EE
1936 .TP
1937 \fBdither\fR [\fB\-S\fR\^|\^\fB\-s\fR\^|\^\fB\-f \fIfilter\fR] [\fB\-a\fR] [\fB\-p \fIprecision\fR]
1938 Apply dithering to the audio.
1939 Dithering deliberately adds a small amount of noise to the signal in
1940 order to mask audible quantization effects that can occur if the output
1941 sample size is less than 24 bits.  With no options, this effect will
1942 add triangular (TPDF) white noise.  Noise-shaping (only for certain
1943 sample rates) can be selected with
1944 .BR \-s .
1945 With the
1946 .B \-f
1947 option, it is possible to select a particular noise-shaping filter from
1948 the following list: lipshitz, f-weighted, modified-e-weighted,
1949 improved-e-weighted, gesemann, shibata, low-shibata, high-shibata.  Note
1950 that most filter types are available only with 44100Hz sample rate.  The
1951 filter types are distinguished by the following properties: audibility
1952 of noise, level of (inaudible, but in some circumstances, otherwise
1953 problematic) shaped high frequency noise, and processing speed.
1954 .br
1955 See http://sox.sourceforge.net/SoX/NoiseShaping for graphs of the different
1956 noise-shaping curves.
1957 .SP
1958 The
1959 .B \-S
1960 option selects a slightly `sloped' TPDF, biased towards higher
1961 frequencies.  It can be used at any sampling rate but below \(~~22k,
1962 plain TPDF is probably better, and above \(~~ 37k, noise-shaped
1963 is probably better.
1964 .SP
1965 The
1966 .B \-a
1967 option enables a mode where dithering (and noise-shaping if applicable)
1968 are automatically enabled only when needed.  The most likely use for
1969 this is when applying fade in or out to an already dithered file, so
1970 that the redithering applies only to the faded portions.  However, auto
1971 dithering is not fool-proof, so the fades should be carefully checked
1972 for any noise modulation; if this occurs, then either re-dither the whole
1973 file, or use
1974 .BR trim ,
1975 .BR fade ,
1976 and concatencate.
1977 .SP
1978 The
1979 .B \-p
1980 option allows overriding the target precision.
1981 .SP
1982 If the SoX global option
1983 .B \-R
1984 option is not given, then the pseudo-random number generator used to
1985 generate the white noise will be `reseeded', i.e. the generated noise
1986 will be different between invocations.
1987 .SP
1988 This effect should not be followed by any other effect that
1989 affects the audio.
1990 .SP
1991 See also the `Dithering' section above.
1992 .TP
1993 \fBdownsample\fR [\fIfactor\fR(2)]
1994 Downsample the signal by an integer factor: Only the first out of
1995 each \fIfactor\fR samples is retained, the others are discarded.
1996 .SP
1997 No decimation filter is applied.  If the input is not a properly
1998 bandlimited baseband signal, aliasing will occur.  This may be
1999 desirable, e.g., for frequency translation.
2000 .SP
2001 For a general resampling effect with anti-aliasing, see \fBrate\fR.  See
2002 also \fBupsample\fR.
2003 .TP
2004 \fBearwax\fR
2005 Makes audio easier to listen to on headphones.
2006 Adds `cues' to 44\*d1kHz stereo (i.e. audio CD format) audio so that
2007 when listened to on headphones the stereo image is
2008 moved from inside
2009 your head (standard for headphones) to outside and in front of the
2010 listener (standard for speakers).
2011 .TP
2012 \fBecho \fIgain-in gain-out\fR <\fIdelay decay\fR>
2013 Add echoing to the audio.
2014 Echoes are reflected sound and can occur naturally amongst mountains
2015 (and sometimes large buildings) when talking or shouting; digital echo
2016 effects emulate this behaviour and are often used to help fill
2017 out the sound of a single instrument or vocal.  The time difference
2018 between the original signal and the reflection is the `delay' (time),
2019 and the loudness of the reflected signal is the `decay'.  Multiple echoes
2020 can have different delays and decays.
2021 .SP
2022 Each given
2023 .I "delay decay"
2024 pair gives the delay in milliseconds
2025 and the decay (relative to gain-in) of that echo.
2026 Gain-out is the volume of the output.
2027 For example:
2028 This will make it sound as if there are twice as many instruments as are
2029 actually playing:
2030 .EX
2031    play lead.aiff echo 0.8 0.88 60 0.4
2032 .EE
2033 If the delay is very short, then it sound like a (metallic) robot playing
2034 music:
2035 .EX
2036    play lead.aiff echo 0.8 0.88 6 0.4
2037 .EE
2038 A longer delay will sound like an open air concert in the mountains:
2039 .EX
2040    play lead.aiff echo 0.8 0.9 1000 0.3
2041 .EE
2042 One mountain more, and:
2043 .EX
2044    play lead.aiff echo 0.8 0.9 1000 0.3 1800 0.25
2045 .EE
2046 .TP
2047 \fBechos \fIgain-in gain-out\fR <\fIdelay decay\fR>
2048 Add a sequence of echoes to the audio.
2049 Each
2050 .I "delay decay"
2051 pair gives the delay in milliseconds
2052 and the decay (relative to gain-in) of that echo.
2053 Gain-out is the volume of the output.
2054 .SP
2055 Like the echo effect, echos stand for `ECHO in Sequel', that is the first echos
2056 takes the input, the second the input and the first echos, the third the input
2057 and the first and the second echos, ... and so on.
2058 Care should be taken using many echos; a single echos
2059 has the same effect as a single echo.
2060 .SP
2061 The sample will be bounced twice in symmetric echos:
2062 .EX
2063    play lead.aiff echos 0.8 0.7 700 0.25 700 0.3
2064 .EE
2065 The sample will be bounced twice in asymmetric echos:
2066 .EX
2067    play lead.aiff echos 0.8 0.7 700 0.25 900 0.3
2068 .EE
2069 The sample will sound as if played in a garage:
2070 .EX
2071    play lead.aiff echos 0.8 0.7 40 0.25 63 0.3
2072 .EE
2073 .TP
2074 \fBequalizer \fIfrequency\fR[\fBk\fR]\fI width\fR[\fBq\fR\^|\^\fBo\fR\^|\^\fBh\fR\^|\^\fBk\fR] \fIgain\fR
2075 Apply a two-pole peaking equalisation (EQ) filter.
2076 With this filter, the signal-level at and around a selected frequency
2077 can be increased or decreased, whilst (unlike band-pass and band-reject
2078 filters) that at all other frequencies is unchanged.
2079 .SP
2080 \fIfrequency\fR gives the filter's central frequency in Hz,
2081 \fIwidth\fR, the band-width,
2082 and \fIgain\fR the required gain
2083 or attenuation in dB.
2084 Beware of
2085 .B Clipping
2086 when using a positive \fIgain\fR.
2087 .SP
2088 In order to produce complex equalisation curves, this effect
2089 can be given several times, each with a different central frequency.
2090 .SP
2091 The filter is described in detail in [1].
2092 .SP
2093 This effect supports the \fB\-\-plot\fR global option.
2094 .SP
2095 See also \fBbass\fR and \fBtreble\fR for shelving equalisation effects.
2096 .TP
2097 \fBfade\fR [\fItype\fR] \fIfade-in-length\fR [\fIstop-time\fR [\fIfade-out-length\fR]]
2098 Apply a fade effect to the beginning, end, or both of the audio.
2099 .SP
2100 An optional \fItype\fR can be specified to select the shape of the fade
2101 curve:
2102 \fBq\fR for quarter of a sine wave, \fBh\fR for half a sine
2103 wave, \fBt\fR for linear (`triangular') slope, \fBl\fR for logarithmic,
2104 and \fBp\fR for inverted parabola.  The default is logarithmic.
2105 .SP
2106 A fade-in starts from the first sample and ramps the signal level from 0 to full volume over \fIfade-in-length\fR seconds.  Specify 0 seconds if no fade-in is wanted.
2107 .SP
2108 For fade-outs, the audio will be truncated at
2109 .I stop-time
2110 and
2111 the signal level will be ramped from full volume down to 0 starting at
2112 \fIfade-out-length\fR seconds before the \fIstop-time\fR.  If
2113 .I fade-out-length
2114 is not specified, it defaults to the same value as
2115 \fIfade-in-length\fR.
2116 No fade-out is performed if
2117 .I stop-time
2118 is not specified.
2119 If the file length can be determined from the input file header and length-changing effects are not in effect, then \fB0\fR may be specified for
2120 .I stop-time
2121 to indicate the usual case of a fade-out that ends at the end of the input
2122 audio stream.
2123 .SP
2124 All times can be specified in either periods of time or sample counts.
2125 To specify time periods use the format hh:mm:ss.frac format.  To specify
2126 using sample counts, specify the number of samples and append the letter `s'
2127 to the sample count (for example `8000s').
2128 .SP
2129 See also the
2130 .B splice
2131 effect.
2132 .TP
2133 \fBfir\fR [\fIcoefs-file\fR\^|\^\fIcoefs\fR]
2134 Use SoX's FFT convolution engine with given FIR filter
2135 coefficients.
2136 If a single argument is given then this is treated as the name of a file
2137 containing the filter coefficients (white-space separated; may contain
2138 `#' comments).  If the given filename is `\-', or if no argument is
2139 given, then the coefficients are read from the `standard input' (stdin);
2140 otherwise, coefficients may be given on the command line.
2141 Examples:
2142 .EX
2143    sox infile outfile fir 0.0195 \-0.082 0.234 0.891 \-0.145 0.043
2144 .EE
2145 .EX
2146    sox infile outfile fir coefs.txt
2147 .EE
2148 with coefs.txt containing
2149 .EX
2150    # HP filter
2151    # freq=10000
2152      1.2311233052619888e\-01
2153     \-4.4777096106211783e\-01
2154      5.1031563346705155e\-01
2155     \-6.6502926320995331e\-02
2156    ...
2157 .EE
2158 .SP
2159 This effect supports the \fB\-\-plot\fR global option.
2160 .TP
2161 \fBflanger\fR [\fIdelay depth regen width speed shape phase interp\fR]
2162 Apply a flanging effect to the audio.
2163 See [3] for a detailed description of flanging.
2164 .SP
2165 All parameters are optional (right to left).
2166 .ne 15
2167 .TS
2168 center;
2169 cI cI cI lI
2170 cI c c l.
2171 \       Range   Default Description
2172 delay   0 \- 30 0       Base delay in milliseconds.
2173 depth   0 \- 10 2       Added swept delay in milliseconds.
2174 regen   \-95 \- 95      0       T{
2175 .na
2176 Percentage regeneration (delayed signal feedback).
2177 T}
2178 width   0 \- 100        71      T{
2179 .na
2180 Percentage of delayed signal mixed with original.
2181 T}
2182 speed   0\*d1 \- 10     0\*d5   Sweeps per second (Hz).
2183 shape   \       sin     Swept wave shape: \fBsine\fR\^|\^\fBtriangle\fR.
2184 phase   0 \- 100        25      T{
2185 .na
2186 Swept wave percentage phase-shift for multi-channel (e.g. stereo) flange;
2187 0 = 100 = same phase on each channel.
2188 T}
2189 interp  \       lin     T{
2190 .na
2191 Digital delay-line interpolation: \fBlinear\fR\^|\^\fBquadratic\fR.
2192 T}
2193 .TE
2194 .DT
2195 .TP
2196 \fBgain \fR[\fB\-e\fR\^|\^\fB\-B\fR\^|\^\fB\-b\fR\^|\^\fB\-r\fR] [\fB\-n\fR] [\fB\-l\fR\^|\^\fB\-h\fR] [\fIgain-dB\fR]
2197 Apply amplification or attenuation to the audio signal, or, in some
2198 cases, to some of its channels.
2199 Note that use of any of
2200 .BR \-e ,
2201 .BR \-B ,
2202 .BR \-b ,
2203 .BR \-r ,
2204 or
2205 .B \-n
2206 requires temporary file space to store the audio to be processed, so may
2207 be unsuitable for use with `streamed' audio.
2208 .SP
2209 Without other options,
2210 .I gain-dB
2211 is used to adjust the signal power level by the given number of dB:
2212 positive amplifies (beware of Clipping), negative attenuates.  With
2213 other options, the
2214 .I gain-dB
2215 amplification or attenuation is (logically) applied after the processing due to those options.
2216 .SP
2217 Given the
2218 .B \-e
2219 option, the levels of the audio channels of a multi-channel file are `equalised', i.e.
2220 gain is applied to all channels other than that with the highest peak
2221 level, such that all channels attain the same peak level
2222 (but, without also giving
2223 .BR \-n ,
2224 the audio is not `normalised').
2225 .SP
2226 The
2227 .B \-B
2228 (balance) option is similar to
2229 .BR \-e ,
2230 but with
2231 .BR \-B,
2232 the RMS level is used instead of the peak level.
2233 .B \-B
2234 might be used to correct stereo imbalance caused by an imperfect record
2235 turntable cartridge.   Note
2236 that unlike
2237 .BR \-e ,
2238 .B \-B
2239 might cause some clipping.
2240 .SP
2241 .B \-b
2242 is similar to
2243 .B \-B
2244 but has clipping protection, i.e.  if necessary to prevent clipping
2245 whilst balancing, attenuation is applied to all channels.
2246 Note, however, that in conjunction with
2247 .BR \-n ,
2248 .B \-B
2249 and
2250 .B \-b
2251 are synonymous.
2252 .SP
2253 The
2254 .B \-r
2255 option is used in conjunction with a prior invocation of
2256 .B gain
2257 with the
2258 .B \-h
2259 option\*msee below for details.
2260 .SP
2261 The
2262 .B \-n
2263 option normalises the audio to 0dB FSD; it is often used in conjunction with a negative
2264 .I gain-dB
2265 to the effect that the audio is normalised to a given level below 0dB.
2266 For example,
2267 .EX
2268    sox infile outfile gain \-n
2269 .EE
2270 normalises to 0dB, and
2271 .EX
2272    sox infile outfile gain \-n \-3
2273 .EE
2274 normalises to \-3dB.
2275 .SP
2276 The
2277 .B \-l
2278 option invokes a simple limiter, e.g.
2279 .EX
2280    sox infile outfile gain \-l 6
2281 .EE
2282 will apply 6dB of gain but never clip.  Note that limiting more than a
2283 few dBs more than occasionally (in a piece of audio) is not recommended
2284 as it can cause audible distortion.
2285 See the
2286 .B compand
2287 effect for a more capable limiter.
2288 .SP
2289 The
2290 .B \-h
2291 option is used to apply gain to provide head-room for subsequent
2292 processing.  For example, with
2293 .EX
2294    sox infile outfile gain \-h bass +6
2295 .EE
2296 6dB of attenuation will be applied prior to the bass boosting effect
2297 thus ensuring that it will not clip.  Of course, with bass, it is
2298 obvious how much headroom will be needed, but with other effects (e.g.
2299 rate, dither) it is not always as clear.  Another advantage of using
2300 \fBgain \-h\fR rather than an explicit attenuation, is that if the
2301 headroom is not used by subsequent effects, it can be reclaimed with
2302 \fBgain \-r\fR, for example:
2303 .EX
2304    sox infile outfile gain \-h bass +6 rate 44100 gain \-r
2305 .EE
2306 The above effects chain guarantees never to clip nor amplify;
2307 it attenuates if necessary to prevent clipping, but by only as
2308 much as is needed to do so.
2309 .SP
2310 Output formatting (dithering and bit-depth reduction) also requires
2311 headroom (which cannot be `reclaimed'), e.g.
2312 .EX
2313    sox infile outfile gain \-h bass +6 rate 44100 gain \-rh dither
2314 .EE
2315 Here, the second
2316 .B gain
2317 invocation, reclaims as much of the headroom as it can from the
2318 preceding effects, but retains as much headroom as is needed for
2319 subsequent processing.
2320 The SoX global option
2321 .B \-G
2322 can be given to automatically invoke \fBgain \-h\fR and \fBgain \-r\fR.
2323 .SP
2324 See also the
2325 .B norm
2326 and
2327 .B vol
2328 effects.
2329 .TP
2330 \fBhighpass\fR\^|\^\fBlowpass\fR [\fB\-1\fR|\fB\-2\fR] \fIfrequency\fR[\fBk\fR]\fR [\fRwidth\fR[\fBq\fR\^|\^\fBo\fR\^|\^\fBh\fR\^|\^\fBk\fR]]
2331 Apply a high-pass or low-pass filter with 3dB point \fIfrequency\fR.
2332 The filter can be either single-pole (with
2333 .BR \-1 ),
2334 or double-pole (the default, or with
2335 .BR \-2 ).
2336 .I width
2337 applies only to double-pole filters;
2338 the default is Q = 0\*d707 and gives a Butterworth response.  The filters
2339 roll off at 6dB per pole per octave (20dB per pole per decade).  The
2340 double-pole filters are described in detail in [1].
2341 .SP
2342 These effects support the \fB\-\-plot\fR global option.
2343 .SP
2344 See also \fBsinc\fR for filters with a steeper roll-off.
2345 .TP
2346 \fBhilbert\fR [\fB\-n \fItaps\fR]
2347 Apply an odd-tap Hilbert transform filter, phase-shifting the signal
2348 by 90 degrees.
2349 .SP
2350 This is used in many matrix coding schemes and for analytic signal
2351 generation.  The process is often written as a multiplication by \fIi\fR
2352 (or \fIj\fR), the imaginary unit.
2353 .SP
2354 An odd-tap Hilbert transform filter has a bandpass characteristic,
2355 attenuating the lowest and highest frequencies.  Its bandwidth can be
2356 controlled by the number of filter taps, which can be specified with
2357 \fB\-n\fR.  By default, the number of taps is chosen for a cutoff
2358 frequency of about 75 Hz.
2359 .SP
2360 This effect supports the \fB\-\-plot\fR global option.
2361 .TP
2362 \fBladspa\fR \fBmodule\fR [\fBplugin\fR] [\fBargument\fR...]
2363 Apply a LADSPA [5] (Linux Audio Developer's Simple Plugin API) plugin.
2364 Despite the name, LADSPA is not Linux-specific, and a wide range of
2365 effects is available as LADSPA plugins, such as cmt [6] (the Computer
2366 Music Toolkit) and Steve Harris's plugin collection [7]. The first
2367 argument is the plugin module, the second the name of the plugin (a
2368 module can contain more than one plugin) and any other arguments are
2369 for the control ports of the plugin. Missing arguments are supplied by
2370 default values if possible. Only plugins with at most one audio input
2371 and one audio output port can be used.  If found, the environment variable
2372 LADSPA_PATH will be used as search path for plugins.
2373 .TP
2374 \fBloudness\fR [\fIgain\fR [\fIreference\fR]]
2375 Loudness control\*msimilar to the
2376 .B gain
2377 effect, but provides equalisation for the human auditory system.  See
2378 http://en.wikipedia.org/wiki/Loudness for a detailed description of
2379 loudness.  The gain is adjusted by the given
2380 .I gain
2381 parameter (usually negative) and the signal equalised according to ISO
2382 226 w.r.t. a reference level of 65dB, though an alternative
2383 .I reference
2384 level may be given if the original audio has been equalised for some
2385 other optimal level.
2386 A default gain of \-10dB is used if a
2387 .I gain
2388 value is not given.
2389 .SP
2390 See also the
2391 .B gain
2392 effect.
2393 .TP
2394 \fBlowpass\fR [\fB\-1\fR|\fB\-2\fR] \fIfrequency\fR[\fBk\fR]\fR [\fRwidth\fR[\fBq\fR\^|\^\fBo\fR\^|\^\fBh\fR\^|\^\fBk\fR]]
2395 Apply a low-pass filter.
2396 See the description of the \fBhighpass\fR effect for details.
2397 .TP
2398 \fBmcompand\fR \(dq\fIattack1\fB,\fIdecay1\fR{\fB,\fIattack2\fB,\fIdecay2\fR}
2399 [\fIsoft-knee-dB\fB:\fR]\fIin-dB1\fR[\fB,\fIout-dB1\fR]{\fB,\fIin-dB2\fB,\fIout-dB2\fR}
2400 .br
2401 [\fIgain\fR [\fIinitial-volume-dB\fR [\fIdelay\fR]]]\(dq {\fIcrossover-freq\fR[\fBk\fR] \(dqattack1,...\(dq}
2402 .SP
2403 The multi-band compander is similar to the single-band compander but the
2404 audio is first divided into bands using Linkwitz-Riley cross-over filters
2405 and a separately specifiable compander run on each band.  See the
2406 \fBcompand\fR effect for the definition of its parameters.  Compand
2407 parameters are specified between double quotes and the crossover
2408 frequency for that band is given by \fIcrossover-freq\fR; these can be
2409 repeated to create multiple bands.
2410 .SP
2411 For example, the following (one long) command shows how multi-band
2412 companding is typically used in FM radio:
2413 .EX
2414 .ne 8
2415    play track1.wav gain \-3 sinc 8000\- 29 100 mcompand \\
2416         \(dq0.005,0.1 \-47,\-40,\-34,\-34,\-17,\-33\(dq 100 \\
2417         \(dq0.003,0.05 \-47,\-40,\-34,\-34,\-17,\-33\(dq 400 \\
2418         \(dq0.000625,0.0125 \-47,\-40,\-34,\-34,\-15,\-33\(dq 1600 \\
2419         \(dq0.0001,0.025 \-47,\-40,\-34,\-34,\-31,\-31,\-0,\-30\(dq 6400 \\
2420         \(dq0,0.025 \-38,\-31,\-28,\-28,\-0,\-25\(dq \\
2421         gain 15 highpass 22 highpass 22 sinc \-n 255 \-b 16 \-17500 \\
2422         gain 9 lowpass \-1 17801
2423 .EE
2424 The audio file is played with a simulated FM radio sound (or broadcast
2425 signal condition if the lowpass filter at the end is skipped).
2426 Note that the pipeline is set up with US-style 75us pre-emphasis.
2427 .SP
2428 See also
2429 .B compand
2430 for a single-band companding effect.
2431 .TP
2432 \fBnoiseprof\fR [\fIprofile-file\fR]
2433 Calculate a profile of the audio for use in noise reduction.  See the
2434 description of the \fBnoisered\fR effect for details.
2435 .TP
2436 \fBnoisered\fR [\fIprofile-file\fR [\fIamount\fR]]
2437 Reduce noise in the audio signal by profiling and filtering.  This
2438 effect is moderately effective at removing consistent background noise
2439 such as hiss or hum.  To use it, first run SoX with the \fBnoiseprof\fR
2440 effect on a section of audio that ideally would contain silence but in
2441 fact contains noise\*msuch sections are typically found at the beginning
2442 or the end of a recording.  \fBnoiseprof\fR will write out a noise
2443 profile to \fIprofile-file\fR, or to stdout if no \fIprofile-file\fR or
2444 if `\-' is given.  E.g.
2445 .EX
2446    sox speech.wav \-n trim 0 1.5 noiseprof speech.noise-profile
2447 .EE
2448 To actually remove the noise, run SoX again, this time with the \fBnoisered\fR
2449 effect;
2450 .B noisered
2451 will reduce noise according to a noise profile (which was generated by
2452 .BR noiseprof ),
2453 from
2454 .IR profile-file ,
2455 or from stdin if no \fIprofile-file\fR or if `\-' is given.  E.g.
2456 .EX
2457    sox speech.wav cleaned.wav noisered speech.noise-profile 0.3
2458 .EE
2459 How much noise should be removed is specified by
2460 .IR amount \*ma
2461 number between 0 and 1 with a default of 0\*d5.  Higher numbers will
2462 remove more noise but present a greater likelihood of removing wanted
2463 components of the audio signal.  Before replacing an original recording
2464 with a noise-reduced version, experiment with different
2465 .I amount
2466 values to find the optimal one for your audio; use headphones to check
2467 that you are happy with the results, paying particular attention to quieter
2468 sections of the audio.
2469 .SP
2470 On most systems, the two stages\*mprofiling and reduction\*mcan be combined
2471 using a pipe, e.g.
2472 .EX
2473    sox noisy.wav \-n trim 0 1 noiseprof | play noisy.wav noisered
2474 .EE
2475 .TP
2476 \fBnorm\fR [\fIdB-level\fR]
2477 Normalise the audio.
2478 .B norm
2479 is just an alias for \fBgain \-n\fR; see the
2480 .B gain
2481 effect for details.
2482 .TP
2483 \fBoops\fR
2484 Out Of Phase Stereo effect.
2485 Mixes stereo to twin-mono where each mono channel contains the
2486 difference between the left and right stereo channels.
2487 This is sometimes known as the `karaoke' effect as it often has the effect
2488 of removing most or all of the vocals from a recording.
2489 It is equivalent to \fBremix 1,2i 1,2i\fR.
2490 .TP
2491 \fBoverdrive\fR [\fIgain\fR(20) [\fIcolour\fR(20)]]
2492 Non linear distortion.
2493 The \fIcolour\fR parameter controls the amount of even harmonic content
2494 in the over-driven output.
2495 .TP
2496 \fBpad\fR { \fIlength\fR[\fB@\fIposition\fR] }
2497 Pad the audio with silence, at the beginning, the end, or any
2498 specified points through the audio.
2499 Both
2500 .I length
2501 and
2502 .I position
2503 can specify a time or, if appended with an `s', a number of samples.
2504 .I length
2505 is the amount of silence to insert and
2506 .I position
2507 the position in the input audio stream at which to insert it.
2508 Any number of lengths and positions may be specified, provided that
2509 a specified position is not less that the previous one.
2510 .I position
2511 is optional for the first and last lengths specified and
2512 if omitted correspond to the beginning and the end of the audio respectively.
2513 For example,
2514 .B pad 1\*d5 1\*d5
2515 adds 1\*d5 seconds of silence padding at each end of the audio, whilst
2516 .B pad 4000s@3:00
2517 inserts 4000 samples of silence 3 minutes into the audio.
2518 If silence is wanted only at the end of the audio, specify either the end
2519 position or specify a zero-length pad at the start.
2520 .SP
2521 See also
2522 .B delay
2523 for an effect that can add silence at the beginning of
2524 the audio on a channel-by-channel basis.
2525 .TP
2526 \fBphaser \fIgain-in gain-out delay decay speed\fR [\fB\-s\fR\^|\^\fB\-t\fR]
2527 Add a phasing effect to the audio.
2528 See [3] for a detailed description of phasing.
2529 .SP
2530 delay/decay/speed gives the delay in milliseconds
2531 and the decay (relative to gain-in) with a modulation
2532 speed in Hz.
2533 The modulation is either sinusoidal (\fB\-s\fR) \*mpreferable for multiple
2534 instruments, or triangular
2535 (\fB\-t\fR) \*mgives single instruments a sharper phasing effect.
2536 The decay should be less than 0\*d5 to avoid
2537 feedback, and usually no less than 0\*d1.  Gain-out is the volume of the output.
2538 .SP
2539 For example:
2540 .EX
2541    play snare.flac phaser 0.8 0.74 3 0.4 0.5 \-t
2542 .EE
2543 Gentler:
2544 .EX
2545    play snare.flac phaser 0.9 0.85 4 0.23 1.3 \-s
2546 .EE
2547 A popular sound:
2548 .EX
2549    play snare.flac phaser 0.89 0.85 1 0.24 2 \-t
2550 .EE
2551 More severe:
2552 .EX
2553    play snare.flac phaser 0.6 0.66 3 0.6 2 \-t
2554 .EE
2555 .TP
2556 \fBpitch \fR[\fB\-q\fR] \fIshift\fR [\fIsegment\fR [\fIsearch\fR [\fIoverlap\fR]]]
2557 Change the audio pitch (but not tempo).
2558 .SP
2559 .I shift
2560 gives the pitch shift as positive or negative `cents' (i.e. 100ths of a
2561 semitone).  See the
2562 .B tempo
2563 effect for a description of the other parameters.
2564 .SP
2565 See also the \fBbend\fR, \fBspeed\fR,
2566 and
2567 .B tempo
2568 effects.
2569 .TP
2570 \fBrate\fR [\fB\-q\fR\^|\^\fB\-l\fR\^|\^\fB\-m\fR\^|\^\fB\-h\fR\^|\^\fB\-v\fR] [override-options] \fIRATE\fR[\fBk\fR]
2571 Change the audio sampling rate (i.e. resample the audio) to any given
2572 .I RATE
2573 (even non-integer if this is supported by the output file format)
2574 using a quality level defined as follows:
2575 .ne 10
2576 .TS
2577 center;
2578 cI cI2w9 cI2w6 cIw6 lIw17
2579 cB c c c l.
2580 \       Quality T{
2581 .na
2582 Band-width
2583 T}      Rej dB  T{
2584 .na
2585 Typical Use
2586 T}
2587 \-q     T{
2588 .na
2589 quick
2590 T}      n/a     T{
2591 .na
2592 \(~=30 @ \ Fs/4
2593 T}      T{
2594 .na
2595 playback on ancient hardware
2596 T}
2597 \-l     low     80%     100     T{
2598 .na
2599 playback on old hardware
2600 T}
2601 \-m     medium  95%     100     T{
2602 .na
2603 audio playback
2604 T}
2605 \-h     high    95%     125     T{
2606 .na
2607 16-bit mastering (use with dither)
2608 T}
2609 \-v     T{
2610 .na
2611 very high
2612 T}      95%     175     24-bit mastering
2613 .TE
2614 .DT
2615 .SP
2616 where
2617 .I Band-width
2618 is the percentage of the audio frequency band that is preserved and
2619 .I Rej dB
2620 is the level of noise rejection.  Increasing levels of resampling
2621 quality come at the expense of increasing amounts of time to process the
2622 audio.  If no quality option is given, the quality level used is `high'
2623 (but see `Playing & Recording Audio' above regarding playback).
2624 .SP
2625 The `quick' algorithm uses cubic interpolation; all others use
2626 band-limited interpolation.  By default, all algorithms have
2627 a `linear' phase response; for `medium', `high' and
2628 `very high', the phase response is configurable (see below).
2629 .SP
2630 The
2631 .B rate
2632 effect is invoked automatically if SoX's \fB\-r\fR option specifies a
2633 rate that is different to that of the input file(s).  Alternatively, if
2634 this effect is given explicitly, then SoX's
2635 .B \-r
2636 option need not be given.  For example, the following two commands are
2637 equivalent:
2638 .EX
2639 .ne 2
2640    sox input.wav \-r 48k output.wav bass \-b 24
2641    sox input.wav        output.wav bass \-b 24 rate 48k
2642 .EE
2643 though the second command is more flexible as it allows
2644 .B rate
2645 options to be given, and allows the effects to be ordered arbitrarily.
2646 .TS
2647 center;
2648 c8 c8 c.
2649 *       *       *
2650 .TE
2651 .DT
2652 .SP
2653 Warning: technically detailed discussion follows.
2654 .SP
2655 The simple quality selection described above provides settings that
2656 satisfy the needs of the vast majority of resampling tasks.
2657 Occasionally, however, it may be desirable to fine-tune the resampler's
2658 filter response; this can be achieved using
2659 .IR override\ options ,
2660 as detailed in the following table:
2661 .ne 6
2662 .TS
2663 center;
2664 lB lw52.
2665 \-M/\-I/\-L     Phase response = minimum/intermediate/linear
2666 \-s     Steep filter (band-width = 99%)
2667 \-a     Allow aliasing/imaging above the pass-band
2668 \-b\ 74\-99\*d7 Any band-width %
2669 \-p\ 0\-100     T{
2670 .na
2671 Any phase response (0 = minimum, 25 = intermediate, 50 = linear, 100 = maximum)
2672 T}
2673 .TE
2674 .DT
2675 .SP
2676 N.B.  Override options cannot be used with the `quick' or `low'
2677 quality algorithms.
2678 .SP
2679 All resamplers use filters that can sometimes create `echo' (a.k.a.
2680 `ringing') artefacts with transient signals such as those that occur
2681 with `finger snaps' or other highly percussive sounds.  Such artefacts are
2682 much more noticeable to the human ear if they occur before the transient
2683 (`pre-echo') than if they occur after it (`post-echo').  Note that
2684 frequency of any such artefacts is related to the smaller of the
2685 original and new sampling rates but that if this is at least 44\*d1kHz,
2686 then the artefacts will lie outside the range of human hearing.
2687 .SP
2688 A phase response setting may be used to control the distribution of any
2689 transient echo between
2690 `pre' and `post': with minimum phase, there is no pre-echo but the
2691 longest post-echo; with linear phase, pre and post echo are in equal
2692 amounts (in signal terms, but not audibility terms); the intermediate
2693 phase setting attempts to find the best compromise by selecting a small
2694 length (and level) of pre-echo and a medium lengthed post-echo.
2695 .SP
2696 Minimum, intermediate, or linear phase response is selected using the
2697 .BR \-M ,
2698 .BR \-I ,
2699 or
2700 .B \-L
2701 option; a custom phase response can be created with the
2702 .B \-p
2703 option.  Note that phase responses between `linear' and `maximum'
2704 (greater than 50) are rarely useful.
2705 .SP
2706 A resampler's band-width setting determines how much of the frequency
2707 content of the original signal (w.r.t. the original sample rate when
2708 up-sampling, or the new sample rate when down-sampling) is preserved
2709 during conversion.  The term `pass-band' is used to refer to all frequencies
2710 up to the band-width point (e.g. for 44\*d1kHz sampling rate, and a
2711 resampling band-width of 95%, the pass-band represents frequencies from
2712 0Hz (D.C.) to circa 21kHz).  Increasing the resampler's band-width
2713 results in a slower conversion and can increase transient echo
2714 artefacts (and vice versa).
2715 .SP
2716 The
2717 .B \-s
2718 `steep filter' option changes resampling band-width from the default 95%
2719 (based on the 3dB point), to 99%.  The
2720 .B \-b
2721 option allows the band-width to be set to any value in the range
2722 74\-99\*d7 %, but note that band-width values greater than 99% are not
2723 recommended for normal use as they can cause excessive transient echo.
2724 .SP
2725 If the
2726 .B \-a
2727 option is given, then aliasing/imaging above the pass-band is allowed.  For
2728 example, with 44\*d1kHz sampling rate, and a
2729 resampling band-width of 95%, this means that frequency content above
2730 21kHz can be distorted; however, since this is above the pass-band (i.e.
2731 above the highest frequency of interest/audibility), this may not be a
2732 problem.  The benefits of allowing aliasing/imaging are reduced processing time,
2733 and reduced (by almost half) transient echo artefacts.
2734 Note that if this option is given, then
2735 the minimum band-width allowable with
2736 .B \-b
2737 increases to 85%.
2738 .SP
2739 Examples:
2740 .EX
2741    sox input.wav \-b 16 output.wav rate \-s \-a 44100 dither \-s
2742 .EE
2743 default (high) quality resampling; overrides: steep filter, allow
2744 aliasing; to 44\*d1kHz sample rate; noise-shaped dither to 16-bit WAV
2745 file.
2746 .EX
2747    sox input.wav \-b 24 output.aiff rate \-v \-I \-b 90 48k
2748 .EE
2749 very high quality resampling; overrides: intermediate phase, band-width 90%;
2750 to 48k sample rate; store output to 24-bit AIFF file.
2751 .TS
2752 center;
2753 c8 c8 c.
2754 *       *       *
2755 .TE
2756 .DT
2757 .SP
2758 The
2759 .B pitch
2760 and
2761 .B speed
2762 effects use the
2763 .B rate
2764 effect at their core.
2765 .TP
2766 \fBremix\fR [\fB\-a\fR\^|\^\fB\-m\fR\^|\^\fB\-p\fR] <\fIout-spec\fR>
2767 \fIout-spec\fR  = \fIin-spec\fR{\fB,\fIin-spec\fR} | \fB0\fR
2768 .br
2769 \fIin-spec\fR   = [\fIin-chan\fR]\^[\fB\-\fR[\fIin-chan2\fR]]\^[\fIvol-spec\fR]
2770 .br
2771 \fIvol-spec\fR  = \fBp\fR\^|\^\fBi\fR\^|\^\fBv\^\fR[\fIvolume\fR]
2772 .br
2773 .SP
2774 Select and mix input audio channels into output audio channels.  Each output
2775 channel is specified, in turn, by a given \fIout-spec\fR: a list of
2776 contributing input channels and volume specifications.
2777 .SP
2778 Note that this effect operates on the audio
2779 .I channels
2780 within the SoX effects processing chain; it should not be confused with the
2781 .B \-m
2782 global option (where multiple
2783 .I files
2784 are mix-combined before entering the effects chain).
2785 .SP
2786 An
2787 .I out-spec
2788 contains comma-separated input channel-numbers and hyphen-delimited
2789 channel-number ranges; alternatively,
2790 .B 0
2791 may be given to create a silent output channel.  For example,
2792 .EX
2793    sox input.wav output.wav remix 6 7 8 0
2794 .EE
2795 creates an output file with four channels, where channels 1, 2, and 3 are
2796 copies of channels 6, 7, and 8 in the input file, and channel 4 is silent.
2797 Whereas
2798 .EX
2799    sox input.wav output.wav remix 1\-3,7 3
2800 .EE
2801 creates a (somewhat bizarre) stereo output file where the left channel
2802 is a mix-down of input channels 1, 2, 3, and 7, and the right channel is
2803 a copy of input channel 3.
2804 .SP
2805 Where a range of channels is specified, the channel numbers to the left and
2806 right of the hyphen are optional and default to 1 and to the number of input
2807 channels respectively. Thus
2808 .EX
2809    sox input.wav output.wav remix \-
2810 .EE
2811 performs a mix-down of all input channels to mono.
2812 .SP
2813 By default, where an output channel is mixed from multiple (n) input
2814 channels, each input channel will be scaled by a factor of \(S1/\s-2n\s+2.
2815 Custom mixing volumes can be set by following a given input channel or range
2816 of input channels with a \fIvol-spec\fR (volume specification).
2817 This is one of the letters \fBp\fR, \fBi\fR, or \fBv\fR,
2818 followed by a volume number, the meaning of which depends on the given
2819 letter and is defined as follows:
2820 .TS
2821 center;
2822 lI lI lI
2823 c l l.
2824 Letter  Volume number   Notes
2825 p       power adjust in dB      0 = no change
2826 i       power adjust in dB      T{
2827 .na
2828 As `p', but invert the audio
2829 T}
2830 v       voltage multiplier      T{
2831 .na
2832 1 = no change, 0\*d5 \(~= 6dB attenuation, 2 \(~= 6dB gain, \-1 = invert
2833 T}
2834 .TE
2835 .DT
2836 .SP
2837 If an
2838 .I out-spec
2839 includes at least one
2840 .I vol-spec
2841 then, by default, \(S1/\s-2n\s+2 scaling is not applied to any other channels in the
2842 same out-spec (though may be in other out-specs).
2843 The \-a (automatic)
2844 option however, can be given to retain the automatic scaling in this
2845 case.  For example,
2846 .EX
2847    sox input.wav output.wav remix 1,2 3,4v0.8
2848 .EE
2849 results in channel level multipliers of 0\*d5,0\*d5 1,0\*d8, whereas
2850 .EX
2851    sox input.wav output.wav remix \-a 1,2 3,4v0.8
2852 .EE
2853 results in channel level multipliers of 0\*d5,0\*d5 0\*d5,0\*d8.
2854 .SP
2855 The \-m (manual) option disables all automatic volume adjustments, so
2856 .EX
2857    sox input.wav output.wav remix \-m 1,2 3,4v0.8
2858 .EE
2859 results in channel level multipliers of 1,1 1,0\*d8.
2860 .SP
2861 The volume number is optional and omitting it corresponds to no volume
2862 change; however, the only case in which this is useful is in conjunction
2863 with
2864 .BR i .
2865 For example, if
2866 .I input.wav
2867 is stereo, then
2868 .EX
2869    sox input.wav output.wav remix 1,2i
2870 .EE
2871 is a mono equivalent of the
2872 .B oops
2873 effect.
2874 .SP
2875 If the \fB\-p\fR option is given, then any automatic \(S1/\s-2n\s+2 scaling
2876 is replaced by \(S1/\s-2\(srn\s+2 (`power') scaling; this gives a louder mix
2877 but one that might occasionally clip.
2878 .TS
2879 center;
2880 c8 c8 c.
2881 *       *       *
2882 .TE
2883 .DT
2884 .SP
2885 One use of the
2886 .B remix
2887 effect is to split an audio file into a set of files, each containing
2888 one of the constituent channels (in order to perform subsequent
2889 processing on individual audio channels).  Where more than a few
2890 channels are involved, a script such as the following (Bourne shell
2891 script) is useful:
2892 .EX
2893 #!/bin/sh
2894 chans=\`soxi \-c "$1"\`
2895 while [ $chans \-ge 1 ]; do
2896    chans0=\`printf %02i $chans\`   # 2 digits hence up to 99 chans
2897    out=\`echo "$1"|sed "s/\\(.*\\)\\.\\(.*\\)/\\1\-$chans0.\\2/"\`
2898    sox "$1" "$out" remix $chans
2899    chans=\`expr $chans \- 1\`
2900 done
2901 .EE
2902 If a file
2903 .I input.wav
2904 containing six audio channels were given, the script would produce six
2905 output files:
2906 .IR input-01.wav ,
2907 \fIinput-02.wav\fR, ...,
2908 .IR input-06.wav .
2909 .SP
2910 See also the \fBswap\fR effect.
2911 .TP
2912 \fBrepeat\fR [\fIcount\fR (1)]
2913 Repeat the entire audio \fIcount\fR times, or once if \fIcount\fR is not given.
2914 Requires temporary file space to store the audio to be repeated.
2915 Note that repeating once yields two copies: the original audio and the
2916 repeated audio.
2917 .TP
2918 \fBreverb\fR [\fB\-w\fR|\fB\-\-wet-only\fR] [\fIreverberance\fR (50%) [\fIHF-damping\fR (50%)
2919 [\fIroom-scale\fR (100%) [\fIstereo-depth\fR (100%)
2920 .br
2921 [\fIpre-delay\fR (0ms) [\fIwet-gain\fR (0dB)]]]]]]
2922 .SP
2923 Add reverberation to the audio using the `freeverb' algorithm.  A
2924 reverberation effect is sometimes desirable for concert halls that are too
2925 small or contain so many people that the hall's natural reverberance is
2926 diminished.  Applying a small amount of stereo reverb to a (dry) mono signal
2927 will usually make it sound more natural.  See [3] for a detailed description
2928 of reverberation.
2929 .SP
2930 Note that this effect
2931 increases both the volume and the length of the audio, so to prevent clipping
2932 in these domains, a typical invocation might be:
2933 .EX
2934    play dry.wav gain \-3 pad 0 3 reverb
2935 .EE
2936 The
2937 .B \-w
2938 option can be given to select only the `wet' signal, thus allowing it to be
2939 processed further, independently of the `dry' signal.  E.g.
2940 .EX
2941    play \-m voice.wav "|sox voice.wav \-p reverse reverb \-w reverse"
2942 .EE
2943 for a reverse reverb effect.
2944 .TP
2945 \fBreverse\fR
2946 Reverse the audio completely.
2947 Requires temporary file space to store the audio to be reversed.
2948 .TP
2949 \fBriaa\fR
2950 Apply RIAA vinyl playback equalisation.
2951 The sampling rate must be one of: 44\*d1, 48, 88\*d2, 96 kHz.
2952 .SP
2953 This effect supports the \fB\-\-plot\fR global option.
2954 .TP
2955 \fBsilence \fR[\fB\-l\fR] \fIabove-periods\fR [\fIduration threshold\fR[\fBd\fR\^|\^\fB%\fR]
2956 [\fIbelow-periods duration threshold\fR[\fBd\fR\^|\^\fB%\fR]]
2957 .SP
2958 Removes silence from the beginning, middle, or end of the audio.
2959 `Silence' is determined by a specified threshold.
2960 .SP
2961 The \fIabove-periods\fR value is used to indicate if audio should be
2962 trimmed at the beginning of the audio. A value of zero indicates no
2963 silence should be trimmed from the beginning. When specifying an
2964 non-zero \fIabove-periods\fR, it trims audio up until it finds
2965 non-silence. Normally, when trimming silence from beginning of audio
2966 the \fIabove-periods\fR will be 1 but it can be increased to higher
2967 values to trim all audio up to a specific count of non-silence
2968 periods. For example, if you had an audio file with two songs that
2969 each contained 2 seconds of silence before the song, you could specify
2970 an \fIabove-period\fR of 2 to strip out both silence periods and the
2971 first song.
2972 .SP
2973 When \fIabove-periods\fR is non-zero, you must also specify a
2974 \fIduration\fR and \fIthreshold\fR. \fIDuration\fR indications the
2975 amount of time that non-silence must be detected before it stops
2976 trimming audio. By increasing the duration, burst of noise can be
2977 treated as silence and trimmed off.
2978 .SP
2979 \fIThreshold\fR is used to indicate what sample value you should treat as
2980 silence.  For digital audio, a value of 0 may be fine but for audio
2981 recorded from analog, you may wish to increase the value to account
2982 for background noise.
2983 .SP
2984 When optionally trimming silence from the end of the audio, you specify
2985 a \fIbelow-periods\fR count.  In this case, \fIbelow-period\fR means
2986 to remove all audio after silence is detected.
2987 Normally, this will be a value 1 of but it can
2988 be increased to skip over periods of silence that are wanted.  For example,
2989 if you have a song with 2 seconds of silence in the middle and 2 second
2990 at the end, you could set below-period to a value of 2 to skip over the
2991 silence in the middle of the audio.
2992 .SP
2993 For \fIbelow-periods\fR, \fIduration\fR specifies a period of silence
2994 that must exist before audio is not copied any more.  By specifying
2995 a higher duration, silence that is wanted can be left in the audio.
2996 For example, if you have a song with an expected 1 second of silence
2997 in the middle and 2 seconds of silence at the end, a duration of 2
2998 seconds could be used to skip over the middle silence.
2999 .SP
3000 Unfortunately, you must know the length of the silence at the
3001 end of your audio file to trim off silence reliably.  A work around is
3002 to use the \fBsilence\fR effect in combination with the \fBreverse\fR effect.
3003 By first reversing the audio, you can use the \fIabove-periods\fR
3004 to reliably trim all audio from what looks like the front of the file.
3005 Then reverse the file again to get back to normal.
3006 .SP
3007 To remove silence from the middle of a file, specify a
3008 \fIbelow-periods\fR that is negative.  This value is then
3009 treated as a positive value and is also used to indicate the
3010 effect should restart processing as specified by the
3011 \fIabove-periods\fR, making it suitable for removing periods of
3012 silence in the middle of the audio.
3013 .SP
3014 The option
3015 .B \-l
3016 indicates that \fIbelow-periods\fR \fIduration\fR length of audio
3017 should be left intact at the beginning of each period of silence.
3018 For example, if you want to remove long pauses between words
3019 but do not want to remove the pauses completely.
3020 .SP
3021 The \fIperiod\fR counts are in units of samples. \fIDuration\fR counts
3022 may be in the format of hh:mm:ss.frac, or the exact count of samples.
3023 \fIThreshold\fR numbers may be suffixed with
3024 .B d
3025 to indicate the value is in decibels, or
3026 .B %
3027 to indicate a percentage of maximum value of the sample value
3028 (\fB0%\fR specifies pure digital silence).
3029 .SP
3030 The following example shows how this effect can be used to start a recording
3031 that does not contain the delay at the start which usually occurs between
3032 `pressing the record button' and the start of the performance:
3033 .EX
3034    rec \fIparameters filename other-effects\fR silence 1 5 2%
3035 .EE
3036 .na
3037 .TP
3038 \fBsinc\fR [\fB\-a\fI att\fR\^|\^\fB\-b\fI beta\fR] [\fB\-p\fI phase\fR\^|\^\fB\-M\fR\^|\^\fB\-I\fR\^|\^\fB\-L\fR] \:[\fB\-t\fI tbw\fR\^|\^\fB\-n\fI taps\fR] [\fIfreqHP\fR]\:[\fB\-\fIfreqLP\fR [\fB\-t\fR tbw\^|\^\fB\-n\fR taps]]
3039 .ad
3040 Apply a sinc kaiser-windowed low-pass, high-pass, band-pass, or band-reject filter
3041 to the signal.
3042 The \fIfreqHP\fR and \fIfreqLP\fR parameters give the frequencies of the
3043 6dB points of a high-pass and low-pass filter that may be invoked
3044 individually, or together.  If both are
3045 given, then \fIfreqHP\fR less than \fIfreqLP\fR creates a band-pass filter,
3046 \fIfreqHP\fR greater than \fIfreqLP\fR creates a band-reject filter.
3047 For example, the invocations
3048 .EX
3049    sinc 3k
3050    sinc -4k
3051    sinc 3k-4k
3052    sinc 4k-3k
3053 .EE
3054 create a high-pass, low-pass, band-pass, and band-reject filter
3055 respectively.
3056 .SP
3057 The default stop-band attenuation of 120dB can be overridden with
3058 \fB\-a\fR; alternatively, the kaiser-window `beta' parameter can be
3059 given directly with \fB\-b\fR.
3060 .SP
3061 The default transition band-width of 5% of the total band can be
3062 overridden with \fB\-t\fR (and \fItbw\fR in Hertz); alternatively, the
3063 number of filter taps can be given directly with \fB\-n\fR.
3064 .SP
3065 If both \fIfreqHP\fR and \fIfreqLP\fR are given, then a \fB\-t\fR or
3066 \fB\-n\fR option given to the left of the frequencies applies to both
3067 frequencies; one of these options given to the right of the frequencies
3068 applies only to \fIfreqLP\fR.
3069 .SP
3070 The
3071 .BR \-p ,
3072 .BR \-M ,
3073 .BR \-I ,
3074 and
3075 .B \-L
3076 options control the filter's phase response; see the \fBrate\fR effect
3077 for details.
3078 .SP
3079 This effect supports the \fB\-\-plot\fR global option.
3080 .TP
3081 \fBspectrogram \fR[\fIoptions\fR]
3082 Create a spectrogram of the audio; the audio is passed unmodified
3083 through the SoX processing chain.  This effect is optional\*mtype
3084 \fBsox \-\-help\fR and check the list of supported effects to see if
3085 it has been included.
3086 .SP
3087 The spectrogram is rendered in a Portable Network Graphic (PNG) file,
3088 and shows time in the X-axis, frequency in the Y-axis, and audio
3089 signal magnitude in the Z-axis.  Z-axis values are represented by the
3090 colour (or optionally the intensity) of the pixels in the X-Y plane.
3091 If the audio signal contains multiple channels then these are shown
3092 from top to bottom starting from channel 1 (which is the left channel
3093 for stereo audio).
3094 .SP
3095 For example, if `my.wav' is a stereo file, then with
3096 .EX
3097    sox my.wav \-n spectrogram
3098 .EE
3099 a spectrogram of the entire file will be created in the file
3100 `spectrogram.png'.  More often though, analysis of a smaller portion
3101 of the audio is required; e.g. with
3102 .EX
3103    sox my.wav \-n remix 2 trim 20 30 spectrogram
3104 .EE
3105 the spectrogram shows information only from the second (right)
3106 channel, and of thirty seconds of audio starting from twenty seconds
3107 in.  To analyse a small portion of the frequency domain, the
3108 .B rate
3109 effect may be used, e.g.
3110 .EX
3111    sox my.wav \-n rate 6k spectrogram
3112 .EE
3113 allows detailed analysis of frequencies up to 3kHz (half the sampling
3114 rate) i.e. where the human auditory system is most sensitive.
3115 With
3116 .EX
3117    sox my.wav \-n trim 0 10 spectrogram \-x 600 \-y 200 \-z 100
3118 .EE
3119 the given options control the size of the spectrogram's X, Y & Z axes
3120 (in this case, the spectrogram area of the produced image will be 600
3121 by 200 pixels in size and the Z-axis range will be 100 dB).  Note that
3122 the produced image includes axes legends etc. and so will be a little
3123 larger than the specified spectrogram size.  In this example:
3124 .EX
3125    sox \-n \-n synth 6 tri 10k:14k spectrogram \-z 100 \-w kaiser
3126 .EE
3127 an analysis `window' with high dynamic range is selected to best
3128 display the spectrogram of a swept triangular wave.  For a smilar
3129 example, append the following to the `chime' command in the
3130 description of the
3131 .B delay
3132 effect (above):
3133 .EX
3134    rate 2k spectrogram \-X 200 \-Z \-10 \-w kaiser
3135 .EE
3136 Options are also avaliable to control the appearance (colour-set,
3137 brightness, contrast, etc.) and filename of the spectrogram; e.g. with
3138 .EX
3139    sox my.wav \-n spectrogram \-m \-l \-o print.png
3140 .EE
3141 a spectrogram is created suitable for printing on a `black and white'
3142 printer.
3143 .SP
3144 .I Options:
3145 .RS
3146 .IP \fB\-x\ \fInum\fR
3147 Change the (maximum) width (X-axis) of the spectrogram from its default
3148 value of 800 pixels to a given number between 100 and 200000.
3149 See also \fB\-X\fR and \fB\-d\fR.
3150 .IP \fB\-X\ \fInum\fR
3151 X-axis pixels/second; the default is auto-calculated to fit the given
3152 or known audio duration to the X-axis size, or 100 otherwise.  If
3153 given in conjunction with \fB\-d\fR, this option affects the width of
3154 the spectrogram; otherwise, it affects the duration of the
3155 spectrogram.
3156 .I num
3157 can be from 1 (low time resolution) to 5000 (high time resolution)
3158 and need not be an integer.  SoX
3159 may make a slight adjustment to the given number for processing
3160 quantisation reasons; if so, SoX will report the actual number used
3161 (viewable when the SoX global option
3162 .B \-V
3163 is in effect).
3164 See also \fB\-x\fR and \fB\-d\fR.
3165 .IP \fB\-y\ \fInum\fR
3166 Sets the Y-axis size in pixels (per channel); this is the number of
3167 frequency `bins' used in the Fourier analysis that produces the
3168 spectrogram.  N.B. it can be slow to produce the spectrogram if this
3169 number is not one more than a power of two (e.g. 129).  By default the
3170 Y-axis size is chosen automatically (depending on the number of
3171 channels).  See
3172 .B \-Y
3173 for alternative way of setting spectrogram height.
3174 .IP \fB\-Y\ \fInum\fR
3175 Sets the target total height of the spectrogram(s).  The default value
3176 is 550 pixels.  Using this option (and by default), SoX will choose a
3177 height for individual spectrogram channels that is one more than a
3178 power of two, so the actual total height may fall short of the given
3179 number.  However, there is also a minimum height per channel so if
3180 there are many channels, the number may be exceeded.
3181 See
3182 .B \-y
3183 for alternative way of setting spectrogram height.
3184 .IP \fB\-z\ \fInum\fR
3185 Z-axis (colour) range in dB, default 120.  This sets the dynamic-range
3186 of the spectrogram to be \-\fInum\fR\ dBFS to 0\ dBFS.
3187 .I Num
3188 may range from 20 to 180.  Decreasing dynamic-range effectively
3189 increases the `contrast' of the spectrogram display, and vice versa.
3190 .IP \fB\-Z\ \fInum\fR
3191 Sets the upper limit of the Z-axis in dBFS.
3192 A negative
3193 .I num
3194 effectively increases the `brightness' of the spectrogram display,
3195 and vice versa.
3196 .IP \fB\-q\ \fInum\fR
3197 Sets the Z-axis quantisation, i.e. the number of different colours (or
3198 intensities) in which to render Z-axis
3199 values.  A small number (e.g. 4) will give a `poster'-like effect making
3200 it easier to discern magnitude bands of similar level.  Small numbers
3201 also usually
3202 result in small PNG files.  The number given specifies the number of
3203 colours to use inside the Z-axis range; two colours are reserved to
3204 represent out-of-range values.
3205 .IP \fB\-w\ \fIname\fR
3206 Window: Hann (default), Hamming, Bartlett, Rectangular or Kaiser.  The
3207 spectrogram is produced using the Discrete Fourier Transform (DFT)
3208 algorithm.  A significant parameter to this algorithm is the choice of
3209 `window function'.  By default, SoX uses the Hann window which has good
3210 all-round frequency-resolution and dynamic-range properties.  For better
3211 frequency resolution (but lower dynamic-range), select a Hamming window;
3212 for higher dynamic-range (but poorer frequency-resolution), select a
3213 Kaiser window.  Bartlett and Rectangular windows are also available.
3214 .IP \fB\-W\ \fInum\fR
3215 Window adjustment parameter.  This can be used to make small
3216 adjustments to the Kaiser window shape.  A positive number (up to
3217 ten) increases its dynamic range, a negative number decreases it.
3218 .IP \fB\-s\fR
3219 Allow slack overlapping of DFT windows.
3220 This can, in some cases, increase image sharpness and give greater adherence
3221 to the
3222 .B \-x
3223 value, but at the expense of a little spectral loss.
3224 .IP \fB\-m\fR
3225 Creates a monochrome spectrogram (the default is colour).
3226 .IP \fB\-h\fR
3227 Selects a high-colour palette\*mless visually pleasing than the default
3228 colour palette, but it may make it easier to differentiate different levels.
3229 If this option is used in conjunction with
3230 .BR \-m ,
3231 the result will be a hybrid monochrome/colour palette.
3232 .IP \fB\-p\ \fInum\fR
3233 Permute the colours in a colour or hybrid palette.
3234 The
3235 .I num
3236 parameter, from 1 (the default) to 6, selects the permutation.
3237 .IP \fB\-l\fR
3238 Creates a `printer friendly' spectrogram with a light background (the
3239 default has a dark background).
3240 .IP \fB\-a\fR
3241 Suppress the display of the axis lines.  This is sometimes useful in
3242 helping to discern artefacts at the spectrogram edges.
3243 .IP \fB\-r\fR
3244 Raw spectrogram: suppress the display of axes and legends.
3245 .IP \fB\-A\fR
3246 Selects an alternative, fixed colour-set.  This is provided only for
3247 compatibility with spectrograms produced by another package.  It should
3248 not normally be used as it has some problems, not least, a lack of
3249 differentiation at the bottom end which results in masking of low-level
3250 artefacts.
3251 .IP \fB\-t\ \fItext\fR
3252 Set the image title\*mtext to display above the spectrogram.
3253 .IP \fB\-c\ \fItext\fR
3254 Set (or clear) the image comment\*mtext to display below and to the
3255 left of the spectrogram.
3256 .IP \fB\-o\ \fItext\fR
3257 Name of the spectrogram output PNG file, default `spectrogram.png'.
3258 .RE
3259 .TP
3260 \
3261 .I Advanced Options:
3262 .br
3263 In order to process a smaller section of audio without affecting other
3264 effects or the output signal (unlike when the
3265 .B trim
3266 effect is used), the following options may be used.
3267 .RS
3268 .IP \fB\-d\ \fIduration\fR
3269 This option sets the X-axis resolution such that audio with the given
3270 .I duration
3271 ([[HH:]MM:]SS) fits the selected (or default) X-axis width.  For
3272 example,
3273 .EX
3274    sox input.mp3 output.wav \-n spectrogram \-d 1:00 stats
3275 .EE
3276 creates a spectrogram showing the first minute of the audio, whilst
3277 .EE
3278 the
3279 .B stats
3280 effect is applied to the entire audio signal.
3281 .SP
3282 See also
3283 .B \-X
3284 for an alternative way of setting the X-axis resolution.
3285 .IP \fB\-S\ \fItime\fR
3286 Start the spectrogram at the given point in the audio stream.  For
3287 example
3288 .EX
3289    sox input.aiff output.wav spectrogram \-S 1:00
3290 .EE
3291 creates a spectrogram showing all but the first minute of the audio
3292 (the output file however, receives the entire audio stream).
3293 .RE
3294 .TP
3295 \
3296 For the ability to perform off-line processing of spectral data, see the
3297 .B stat
3298 effect.
3299 .TP
3300 \fBspeed \fIfactor\fR[\fBc\fR]
3301 Adjust the audio speed (pitch and tempo together).  \fIfactor\fR
3302 is either the ratio of the new speed to the old speed: greater
3303 than 1 speeds up, less than 1 slows down, or, if appended with the
3304 letter
3305 `c', the number of cents (i.e. 100ths of a semitone) by
3306 which the pitch (and tempo) should be adjusted: greater than 0
3307 increases, less than 0 decreases.
3308 .SP
3309 Technically, the speed effect only changes the sample rate information,
3310 leaving the samples themselves untouched.  The \fBrate\fR effect is invoked
3311 automatically to resample to the output sample rate, using its default
3312 quality/speed.  For higher quality or higher speed
3313 resampling, in addition to the \fBspeed\fR effect, specify
3314 the \fBrate\fR effect with the desired quality option.
3315 .SP
3316 See also the \fBbend\fR, \fBpitch\fR,
3317 and
3318 .B tempo
3319 effects.
3320 .TP
3321 \fBsplice \fR [\fB\-h\fR\^|\^\fB\-t\fR\^|\^\fB\-q\fR] { \fIposition\fR[\fB,\fIexcess\fR[\fB,\fIleeway\fR]] }
3322 Splice together audio sections.  This effect provides two things over
3323 simple audio concatenation: a (usually short) cross-fade is applied at
3324 the join, and a wave similarity comparison is made to help determine the
3325 best place at which to make the join.
3326 .SP
3327 One of the options
3328 .BR \-h ,
3329 .BR \-t ,
3330 or
3331 .B \-q
3332 may be given to select the fade envelope as half-cosine wave (the default),
3333 triangular (a.k.a. linear), or quarter-cosine wave respectively.
3334 .TS
3335 center;
3336 cI lI lI lI
3337 cB l l l.
3338 Type    Audio   Fade level      Transitions
3339 t       correlated      constant gain   abrupt
3340 h       correlated      constant gain   smooth
3341 q       uncorrelated    constant power  smooth
3342 .TE
3343 .DT
3344 .SP
3345 To perform a splice, first use the
3346 .B trim
3347 effect to select the audio sections to be joined together.  As when
3348 performing a tape splice, the end of the section to be spliced onto
3349 should be trimmed with a small
3350 .I excess
3351 (default 0\*d005 seconds) of audio after the ideal joining point.  The
3352 beginning of the audio section to splice on should be trimmed with the
3353 same
3354 .IR excess
3355 (before the ideal joining point), plus an additional
3356 .I leeway
3357 (default 0\*d005 seconds).  SoX should then be invoked with the two
3358 audio sections as input files and the
3359 .B splice
3360 effect given with the position at which to perform the splice\*mthis is
3361 length of the first audio section (including the excess).
3362 .SP
3363 The following diagram uses the tape analogy to illustrate the splice
3364 operation.  The effect simulates the diagonal cuts and joins the two pieces:
3365 .EX
3366
3367       length1   excess
3368     -----------><--->
3369     _________   :   :  _________________
3370              \\  :   : :\\     `
3371               \\ :   : : \\     `
3372                \\:   : :  \\     `
3373                 *   : :   * - - *
3374                  \\  : :   :\\     `
3375                   \\ : :   : \\     `
3376     _______________\\: :   :  \\_____`____
3377                       :   :   :     :
3378                       <--->   <----->
3379                       excess  leeway
3380
3381 .EE
3382 where * indicates the joining points.
3383 .SP
3384 For example, a long song begins with two verses which start (as
3385 determined e.g. by using the
3386 .B play
3387 command with the
3388 .B trim
3389 (\fIstart\fR) effect) at times 0:30\*d125 and 1:03\*d432.
3390 The following commands cut out the first verse:
3391 .EX
3392    sox too-long.wav part1.wav trim 0 30.130
3393 .EE
3394 (5 ms excess, after the first verse starts)
3395 .EX
3396    sox too-long.wav part2.wav trim 1:03.422
3397 .EE
3398 (5 ms excess plus 5 ms leeway, before the second verse starts)
3399 .EX
3400    sox part1.wav part2.wav just-right.wav splice 30.130
3401 .EE
3402 For another example, the SoX command
3403 .EX
3404    play "|sox \-n \-p synth 1 sin %1" "|sox \-n \-p synth 1 sin %3"
3405 .EE
3406 generates and plays two notes, but there is a nasty click at the
3407 transition; the click can be removed by splicing instead of
3408 concatenating the audio, i.e. by appending \fBsplice 1\fR to the
3409 command. (Clicks at the beginning and end of the audio can be removed by
3410 \fIpreceding\fR the splice effect with \fBfade q .01 2 .01\fR).
3411 .SP
3412 Provided your arithmetic is good enough, multiple splices can be
3413 performed with a single
3414 .B splice
3415 invocation.  For example:
3416 .EX
3417 #!/bin/sh
3418 # Audio Copy and Paste Over
3419 # acpo infile copy-start copy-stop paste-over-start outfile
3420 # All times measured in samples.
3421 rate=\`soxi \-r "$1"\`
3422 e=\`expr $rate '*' 5 / 1000\`  # Using default excess
3423 l=$e                         # and leeway.
3424 sox "$1" piece.wav trim \`expr $2 \- $e \- $l\`s \\
3425    \`expr $3 \- $2 + $e + $l + $e\`s
3426 sox "$1" part1.wav trim 0 \`expr $4 + $e\`s
3427 sox "$1" part2.wav trim \`expr $4 + $3 \- $2 \- $e \- $l\`s
3428 sox part1.wav piece.wav part2.wav "$5" splice \\
3429    \`expr $4 + $e\`s \\
3430    \`expr $4 + $e + $3 \- $2 + $e + $l + $e\`s
3431 .EE
3432 In the above Bourne shell script,
3433 two splices are used to `copy and paste' audio.
3434 .TS
3435 center;
3436 c8 c8 c.
3437 *       *       *
3438 .TE
3439 .DT
3440 .SP
3441 It is also possible to use this effect to perform general cross-fades,
3442 e.g. to join two songs.  In this case,
3443 .I excess
3444 would typically be an number of seconds, the
3445 .B \-q
3446 option would typically be given (to select an `equal power' cross-fade), and
3447 .I leeway
3448 should be zero (which is the default if
3449 .B \-q
3450 is given).  For example, if f1.wav and f2.wav are audio files
3451 to be cross-faded, then
3452 .EX
3453    sox f1.wav f2.wav out.wav splice \-q $(soxi \-D f1.wav),3
3454 .EE
3455 cross-fades the files where the point of equal loudness is 3 seconds
3456 before the end of f1.wav, i.e. the total length of the cross-fade is
3457 2 \(mu 3 = 6 seconds (Note: the $(...) notation is POSIX shell).
3458 .TP
3459 \fBstat\fR [\fB\-s \fIscale\fR] [\fB\-rms\fR] [\fB\-freq\fR] [\fB\-v\fR] [\fB\-d\fR]
3460 Display time and frequency domain statistical information about the audio.
3461 Audio is passed unmodified through the SoX processing chain.
3462 .SP
3463 The information is output to the `standard error' (stderr) stream and is
3464 calculated, where
3465 .I n
3466 is the duration of the audio in samples,
3467 .I c
3468 is the number of audio channels,
3469 .I r
3470 is the audio sample rate, and
3471 .I x\s-2\dk\u\s0
3472 represents the PCM value (in the range \-1 to +1 by default) of each successive
3473 sample in the audio,
3474 as follows:
3475 .TS
3476 center;
3477 lI l l.
3478 Samples read    \fIn\fR\^\(mu\^\fIc\fR  \
3479 Length (seconds)        \fIn\fR\^\(di\^\fIr\fR
3480 Scaled by       \       See \-s below.
3481 Maximum amplitude       max(\fIx\s-2\dk\u\s0\fR)        T{
3482 The maximum sample value in the audio; usually this will be a positive number.
3483 T}
3484 Minimum amplitude       min(\fIx\s-2\dk\u\s0\fR)        T{
3485 The minimum sample value in the audio; usually this will be a negative number.
3486 T}
3487 Midline amplitude       \(12\^min(\fIx\s-2\dk\u\s0\fR)\^+\^\(12\^max(\fIx\s-2\dk\u\s0\fR)
3488 Mean norm       \(S1/\s-2n\s+2\^\(*S\^\^\(br\^\fIx\s-2\dk\u\s0\fR\^\(br\^       T{
3489 The average of the absolute value of each sample in the audio.
3490 T}
3491 Mean amplitude  \(S1/\s-2n\s+2\^\(*S\^\fIx\s-2\dk\u\s0\fR       T{
3492 The average of each sample in the audio.  If this figure is non-zero, then it indicates the
3493 presence of a D.C. offset (which could be removed using the
3494 .B dcshift
3495 effect).
3496 T}
3497 RMS amplitude   \(sr(\(S1/\s-2n\s+2\^\(*S\^\fIx\s-2\dk\u\s0\fR\(S2)     T{
3498 The level of a D.C. signal that would have the same power
3499 as the audio's average power.
3500 T}
3501 Maximum delta   max(\^\(br\^\fIx\s-2\dk\u\s0\fR\^\-\^\fIx\s-2\dk\-1\u\s0\fR\^\(br\^)
3502 Minimum delta   min(\^\(br\^\fIx\s-2\dk\u\s0\fR\^\-\^\fIx\s-2\dk\-1\u\s0\fR\^\(br\^)
3503 Mean delta      \(S1/\s-2n\-1\s+2\^\(*S\^\^\(br\^\fIx\s-2\dk\u\s0\fR\^\-\^\fIx\s-2\dk\-1\u\s0\fR\^\(br\^
3504 RMS delta       \(sr(\(S1/\s-2n\-1\s+2\^\(*S\^(\fIx\s-2\dk\u\s0\fR\^\-\^\fIx\s-2\dk\-1\u\s0\fR)\(S2)
3505 Rough frequency \       In Hz.
3506 Volume Adjustment       \       T{
3507 The parameter to the
3508 .B vol
3509 effect which would make the audio as loud as possible without clipping.
3510 Note: See the discussion on
3511 .B Clipping
3512 above for reasons why it is rarely a good idea actually to do this.
3513 T}
3514 .TE
3515 .DT
3516 .SP
3517 Note that the delta measurements are not applicable for multi-channel audio.
3518 .SP
3519 The
3520 .B \-s
3521 option can be used to scale the input data by a given factor.
3522 The default value of
3523 .I scale
3524 is 2147483647 (i.e. the maximum value of a 32-bit signed integer).
3525 Internal effects
3526 always work with signed long PCM data and so the value should relate to this
3527 fact.
3528 .SP
3529 The
3530 .B \-rms
3531 option will convert all output average values to `root mean square'
3532 format.
3533 .SP
3534 The
3535 .B \-v
3536 option displays only the `Volume Adjustment' value.
3537 .SP
3538 The
3539 .B \-freq
3540 option calculates the input's power spectrum (4096 point DFT) instead of the
3541 statistics listed above.  This should only be used with a single channel
3542 audio file.
3543 .SP
3544 The
3545 .B \-d
3546 option
3547 displays a hex dump of the 32-bit signed PCM data
3548 audio in SoX's internal buffer.
3549 This is mainly used to help track down endian problems that
3550 sometimes occur in cross-platform versions of SoX.
3551 .SP
3552 See also the
3553 .B stats
3554 effect.
3555 .TP
3556 \fBstats\fR [\fB\-b \fIbits\fR\^|\^\fB\-x \fIbits\fR\^|\^\fB\-s \fIscale\fR] [\fB\-w \fIwindow-time\fR]
3557 Display time domain statistical information about the audio channels;
3558 audio is passed unmodified through the SoX processing chain.
3559 Statistics are calculated and displayed for each audio channel and,
3560 where applicable, an overall figure is also given.
3561 .SP
3562 For example, for a typical well-mastered stereo music file:
3563 .TS
3564 center;
3565 l.
3566 .ft CW
3567              Overall     Left      Right
3568 DC offset   0.000803 \-0.000391  0.000803
3569 Min level  \-0.750977 \-0.750977 \-0.653412
3570 Max level   0.708801  0.708801  0.653534
3571 Pk lev dB      \-2.49     \-2.49     \-3.69
3572 RMS lev dB    \-19.41    \-19.13    \-19.71
3573 RMS Pk dB     \-13.82    \-13.82    \-14.38
3574 RMS Tr dB     \-85.25    \-85.25    \-82.66
3575 Crest factor       \-      6.79      6.32
3576 Flat factor     0.00      0.00      0.00
3577 Pk count           2         2         2
3578 Bit-depth      16/16     16/16     16/16
3579 Num samples    7.72M
3580 Length s     174.973
3581 Scale max   1.000000
3582 Window s       0.050
3583 .ft R
3584 .TE
3585 .DT
3586 .SP
3587 .IR DC\ offset ,
3588 .IR Min\ level ,
3589 and
3590 .I Max\ level
3591 are shown, by default, in the range \(+-1.
3592 If the
3593 .B \-b
3594 (bits) options is given, then these three measurements will be scaled to a signed integer
3595 with the given number of bits; for example, for 16 bits, the scale would be \-32768 to +32767.
3596 The
3597 .B \-x
3598 option behaves the same way as
3599 .B \-b
3600 except that the signed integer values are displayed in hexadecimal.
3601 The
3602 .B \-s
3603 option scales the three measurements by a given floating-point number.
3604 .SP
3605 .I Pk\ lev\ dB
3606 and
3607 .I RMS\ lev\ dB
3608 are standard peak and RMS level measured in dBFS.
3609 .I RMS\ Pk\ dB
3610 and
3611 .I RMS\ Tr\ dB
3612 are peak and trough values for RMS level measured over a short window (default 50ms).
3613 .SP
3614 .I Crest\ factor
3615 is the standard ratio of peak to RMS level (note: not in dB).
3616 .SP
3617 .I Flat\ factor
3618 is a measure of the flatness (i.e. consecutive samples with the same value) of the signal at
3619 its peak levels (i.e. either
3620 .IR Min\ level ,
3621 or
3622 .IR Max\ level ).
3623 .I Pk\ count
3624 is the number of occasions (not the number of samples) that the signal attained either
3625 .IR Min\ level ,
3626 or
3627 .IR Max\ level .
3628 .SP
3629 The right-hand
3630 .I Bit-depth
3631 figure is the standard definition of bit-depth i.e. bits less
3632 significant than the given number are fixed at zero.  The left-hand
3633 figure is the number of most significant bits that are fixed at zero (or
3634 one for negative numbers) subtracted from the right-hand figure (the
3635 number subtracted is directly related to
3636 .IR Pk\ lev\ dB ).
3637 .SP
3638 For multi-channel audio, an overall figure for each of the above
3639 measurements is given and derived from the channel figures as follows:
3640 .IR DC\ offset :
3641 maximum magnitude;
3642 .IR Max\ level ,
3643 .IR Pk\ lev\ dB ,
3644 .IR RMS\ Pk\ dB ,
3645 .IR Bit-depth :
3646 maximum;
3647 .IR Min\ level ,
3648 .IR RMS\ Tr\ dB :
3649 minimum;
3650 .IR RMS\ lev\ dB ,
3651 .IR Flat\ factor ,
3652 .IR Pk\ count :
3653 average;
3654 .IR Crest\ factor :
3655 not applicable.
3656 .SP
3657 .I Length\ s
3658 is the duration in seconds of the audio, and
3659 .I Num\ samples
3660 is equal to the sample-rate multiplied by
3661 .IR Length .
3662 .I Scale\ Max
3663 is the scaling applied to the first three measurements;
3664 specifically, it is the maximum value that could apply to
3665 .IR Max\ level .
3666 .I Window\ s
3667 is the length of the window used for the peak and trough RMS measurements.
3668 .SP
3669 See also the
3670 .B stat
3671 effect.
3672 .TP
3673 \fBswap\fR
3674 Swap stereo channels.
3675 See also
3676 .B remix
3677 for an effect that allows arbitrary channel selection and ordering
3678 (and mixing).
3679 .TP
3680 \fBstretch \fIfactor\fR [\fIwindow fade shift fading\fR]
3681 Change the audio duration (but not its pitch).
3682 This effect is broadly equivalent to the
3683 .B tempo
3684 effect with (\fIfactor\fR inverted and)
3685 .I search
3686 set to zero, so in general, its results are comparatively poor;
3687 it is retained as it can sometimes out-perform
3688 .B tempo
3689 for small
3690 .IR factor s.
3691 .SP
3692 .I factor
3693 of stretching: >1 lengthen, <1 shorten duration.
3694 .I window
3695 size is in ms.  Default is 20ms.  The
3696 .I fade
3697 option, can be `lin'.
3698 .I shift
3699 ratio, in [0 1].  Default depends on stretch factor. 1
3700 to shorten, 0\*d8 to lengthen.  The
3701 .I fading
3702 ratio, in [0 0\*d5].  The amount of a fade's default depends on
3703 .I factor
3704 and \fIshift\fR.
3705 .SP
3706 See also the
3707 .B tempo
3708 effect.
3709 .na
3710 .TP
3711 \fBsynth\fR [\fB\-j \fIKEY\fR] [\fB\-n\fR] [\fIlen\fR [\fIoff\fR [\fIph\fR [\fIp1\fR [\fIp2\fR [\fIp3\fR]]]]]] {[\fItype\fR] [\fIcombine\fR] \:[[\fB%\fR]\fIfreq\fR[\fBk\fR][\fB:\fR\^|\^\fB+\fR\^|\^\fB/\fR\^|\^\fB\-\fR[\fB%\fR]\fIfreq2\fR[\fBk\fR]]] [\fIoff\fR [\fIph\fR [\fIp1\fR [\fIp2\fR [\fIp3\fR]]]]]}
3712 .ad
3713 This effect can be used to generate fixed or swept frequency audio tones
3714 with various wave shapes, or to generate wide-band noise of various
3715 `colours'.
3716 Multiple synth effects can be cascaded to produce more complex
3717 waveforms; at each stage it is possible to choose whether the generated
3718 waveform will be mixed with, or modulated onto
3719 the output from the previous stage.
3720 Audio for each channel in a multi-channel audio file can be synthesised
3721 independently.
3722 .SP
3723 Though this effect is used to generate audio, an input file must still
3724 be given, the characteristics of which will be used to set the
3725 synthesised audio length, the number of channels, and the sampling rate;
3726 however, since the input file's audio is not normally needed, a `null
3727 file' (with the special name \fB\-n\fR) is often given instead (and the
3728 length specified as a parameter to \fBsynth\fR or by another given
3729 effect that can has an associated length).
3730 .SP
3731 For example, the following produces a 3 second, 48kHz,
3732 audio file containing a sine-wave swept from 300 to 3300\ Hz:
3733 .EX
3734    sox \-n output.wav synth 3 sine 300\-3300
3735 .EE
3736 and this produces an 8\ kHz version:
3737 .EX
3738    sox \-r 8000 \-n output.wav synth 3 sine 300\-3300
3739 .EE
3740 Multiple channels can be synthesised by specifying the set of
3741 parameters shown between braces multiple times;
3742 the following puts the swept tone in the left channel and adds `brown'
3743 noise in the right:
3744 .EX
3745    sox \-n output.wav synth 3 sine 300\-3300 brownnoise
3746 .EE
3747 The following example shows how two synth effects can be cascaded
3748 to create a more complex waveform:
3749 .EX
3750 .ne 2
3751    play \-n synth 0.5 sine 200\-500 synth 0.5 sine fmod 700\-100
3752 .EE
3753 Frequencies can also be given in `scientific' note notation, or, by
3754 prefixing a `%' character, as a number of semitones relative to
3755 `middle A' (440\ Hz).  For example, the following could be used to
3756 help tune a guitar's low `E' string:
3757 .EX
3758    play \-n synth 4 pluck %\-29
3759 .EE
3760 or with a (Bourne shell) loop, the whole guitar:
3761 .EX
3762 .ne 2
3763    for n in E2 A2 D3 G3 B3 E4; do
3764         play \-n synth 4 pluck $n repeat 2; done
3765 .EE
3766 See the
3767 .B delay
3768 effect (above) and the reference to `SoX scripting examples' (below)
3769 for more
3770 .B synth
3771 examples.
3772 .SP
3773 .B N.B.
3774 This effect generates audio at maximum volume (0dBFS), which means that there
3775 is a high chance of clipping when using the audio subsequently, so
3776 in many cases, you will want to follow this effect with the \fBgain\fR
3777 effect to prevent this from happening. (See also
3778 .B Clipping
3779 above.)
3780 Note that, by default, the
3781 .B synth
3782 effect incorporates the functionality of \fBgain \-h\fR (see the
3783 .B gain
3784 effect for details);
3785 .BR synth 's
3786 .B \-n
3787 option may be given to disable this behaviour.
3788 .SP
3789 A detailed description of each
3790 .B synth
3791 parameter follows:
3792 .SP
3793 \fIlen\fR is the length of audio to synthesise expressed as a time
3794 or as a number of samples;
3795 0=inputlength, default=0.
3796 .SP
3797 The format for specifying lengths in time is hh:mm:ss.frac.  The format
3798 for specifying sample counts is the number of samples with the letter
3799 `s' appended to it.
3800 .SP
3801 \fItype\fR is one of sine, square, triangle, sawtooth, trapezium, exp,
3802 [white]noise, tpdfnoise pinknoise, brownnoise, pluck; default=sine.
3803 .SP
3804 \fIcombine\fR is one of create, mix, amod (amplitude modulation), fmod
3805 (frequency modulation); default=create.
3806 .SP
3807 \fIfreq\fR/\fIfreq2\fR are the frequencies at the beginning/end of
3808 synthesis in Hz or, if preceded with `%', semitones relative to A
3809 (440\ Hz); alternatively, `scientific' note notation (e.g. E2) may
3810 be used.  The default frequency is 440Hz.  By default, the tuning used
3811 with the note notations is `equal temperament'; the
3812 .B \-j
3813 .I KEY
3814 option selects `just intonation', where
3815 .I KEY
3816 is an integer number of semitones relative to A (so for example, \-9
3817 or 3 selects the key of C), or a note in scientific notation.
3818 .SP
3819 If
3820 .I freq2
3821 is given, then
3822 .I len
3823 must also have been given and the generated tone will be swept between
3824 the given frequencies.  The two given frequencies must be separated by
3825 one of the characters `:', `+', `/', or `\-'.  This character is used to
3826 specify the sweep function as follows:
3827 .RS
3828 .IP \fB:\fR
3829 Linear: the tone will change by a fixed number of hertz per second.
3830 .IP \fB+\fR
3831 Square: a second-order function is used to change the tone.
3832 .IP \fB/\fR
3833 Exponential: the tone will change by a fixed number of semitones per second.
3834 .IP \fB\-\fR
3835 Exponential: as `/', but initial phase always zero, and stepped (less
3836 smooth) frequency changes.
3837 .RE
3838 .TP
3839 \
3840 Not used for noise.
3841 .SP
3842 \fIoff\fR is the bias (DC-offset) of the signal in percent; default=0.
3843 .SP
3844 \fIph\fR is the phase shift in percentage of 1 cycle; default=0.  Not
3845 used for noise.
3846 .SP
3847 \fIp1\fR is the percentage of each cycle that is `on' (square), or
3848 `rising' (triangle, exp, trapezium); default=50 (square, triangle, exp),
3849 default=10 (trapezium), or sustain (pluck); default=40.
3850 .SP
3851 \fIp2\fR (trapezium): the percentage through each cycle at which `falling'
3852 begins; default=50. exp: the amplitude in multiples of 2dB; default=50,
3853 or tone-1 (pluck); default=20.
3854 .SP
3855 \fIp3\fR (trapezium): the percentage through each cycle at which `falling'
3856 ends; default=60, or tone-2 (pluck); default=90.
3857 .TP
3858 \fBtempo \fR[\fB\-q\fR] [\fB\-m\fR\^|\^\fB\-s\fR\^|\^\fB\-l\fR] \fIfactor\fR [\fIsegment\fR [\fIsearch\fR [\fIoverlap\fR]]]
3859 Change the audio playback speed but not its pitch. This effect uses the
3860 WSOLA algorithm. The audio is chopped up into segments which are then
3861 shifted in the time domain and overlapped (cross-faded) at points where
3862 their waveforms are most similar as determined by measurement of `least
3863 squares'.
3864 .SP
3865 By default, linear searches are used to find the best overlapping
3866 points. If the optional
3867 .B \-q
3868 parameter is given, tree searches are used instead. This makes the effect
3869 work more quickly, but the result may not sound as good. However, if you
3870 must improve the processing speed, this generally reduces the sound quality
3871 less than reducing the search or overlap values.
3872 .SP
3873 The
3874 .B \-m
3875 option is used to optimize default values of segment, search and
3876 overlap for music processing.
3877 .SP
3878 The
3879 .B \-s
3880 option is used to optimize default values of segment, search and
3881 overlap for speech processing.
3882 .SP
3883 The
3884 .B \-l
3885 option is used to optimize default values of segment, search and
3886 overlap for `linear' processing that tends to cause more
3887 noticeable distortion but may be useful when factor is close to 1.
3888 .SP
3889 If \-m, \-s, or \-l is specified, the default value of segment will be
3890 calculated based on factor, while default search and overlap values are
3891 based on segment. Any values you provide still override these default
3892 values.
3893 .SP
3894 .I factor
3895 gives the ratio of new tempo to the old tempo, so e.g. 1.1 speeds up the
3896 tempo by 10%, and 0.9 slows it down by 10%.
3897 .SP
3898 The optional
3899 .I segment
3900 parameter selects the algorithm's segment size in milliseconds.  If no other
3901 flags are specified, the default value is 82 and is typically suited to
3902 making small changes to the tempo of music. For larger changes (e.g. a factor
3903 of 2), 41\ ms may give a better result.  The \-m, \-s, and \-l flags will cause
3904 the segment default to be automatically adjusted based on factor.
3905 For example using \-s (for speech) with a tempo of 1.25 will calculate a
3906 default segment value of 32.
3907 .SP
3908 The optional
3909 .I search
3910 parameter gives the audio length in milliseconds over which
3911 the algorithm will search for overlapping points.  If no other
3912 flags are specified, the default value is 14.68.  Larger values use
3913 more processing time and may or may not produce better results.
3914 A practical maximum is half the value of segment. Search
3915 can be reduced to cut processing time at the risk of degrading output
3916 quality. The \-m, \-s, and \-l flags will cause
3917 the search default to be automatically adjusted based on segment.
3918 .SP
3919 The optional
3920 .I overlap
3921 parameter gives the segment overlap length in milliseconds.
3922 Default value is 12, but \-m, \-s, or \-l flags automatically
3923 adjust overlap based on segment size. Increasing overlap increases
3924 processing time and may increase quality. A practical maximum for overlap
3925 is the value of search, with overlap typically being (at least) a little
3926 smaller then search.
3927 .SP
3928 See also
3929 .B speed
3930 for an effect that changes tempo and pitch together,
3931 .B pitch
3932 and \fBbend\fR for effects that change pitch only, and
3933 .B stretch
3934 for an effect that changes tempo using a different algorithm.
3935 .TP
3936 \fBtreble \fIgain\fR [\fIfrequency\fR[\fBk\fR]\fR [\fIwidth\fR[\fBs\fR\^|\^\fBh\fR\^|\^\fBk\fR\^|\^\fBo\fR\^|\^\fBq\fR]]]
3937 Apply a treble tone-control effect.
3938 See the description of the \fBbass\fR effect for details.
3939 .TP
3940 \fBtremolo \fIspeed\fR [\fIdepth\fR]
3941 Apply a tremolo (low frequency amplitude modulation) effect to the audio.
3942 The tremolo frequency in Hz is given by
3943 .IR speed ,
3944 and the depth as a percentage by
3945 .I depth
3946 (default 40).
3947 .TP
3948 \fBtrim\fR {[\fB=\fR\^|\^\fB\-\fR]\fIposition\fR}
3949 Cuts portions out of the audio.  Any number of \fIposition\fRs may be
3950 given; audio is not sent to the output until the first \fIposition\fR
3951 is reached.  The effect then alternates between copying and discarding
3952 audio at each \fIposition\fR.
3953 .SP
3954 If a \fIposition\fR is preceded by an equals or minus sign, it is
3955 interpreted relative to the beginning or the end of the audio,
3956 respectively.  (The audio length must be known for end-relative
3957 locations to work.)  Otherwise, it is considered an offset from the
3958 last \fIposition\fR, or from the start of audio for the first
3959 parameter.  Using a value of 0 for the first \fIposition\fR
3960 parameter allows copying from the beginning of the audio.
3961 .SP
3962 All parameters can be specified using either an amount of time or an
3963 exact count of samples.  The format for specifying lengths in time is
3964 hh:mm:ss.frac.  A value of 1:30\*d5 for the first parameter will not
3965 start until 1 minute, thirty and \(12 seconds into the audio.  The format
3966 for specifying sample counts is the number of samples with the letter `s'
3967 appended to it.  A value of 8000s for the first parameter will wait until
3968 8000 samples are read before starting to process audio.
3969 .SP
3970 For example,
3971 .EX
3972    sox infile outfile trim 0 10
3973 .EE
3974 will copy the first ten seconds, while
3975 .EX
3976    play infile trim 12:34 =15:00 -2:00
3977 .EE
3978 will play from 12 minutes 34 seconds into the audio up to 15 minutes into
3979 the audio (i.e. 2 minutes and 26 seconds long), then resume playing two
3980 minutes before the end of audio.
3981 .TP
3982 \fBupsample\fR [\fIfactor\fR]
3983 Upsample the signal by an integer factor: \fIfactor\fR\-1 zero-value
3984 samples are inserted between each pair of input samples.  As a result, the
3985 original spectrum is replicated into the new frequency space (aliasing) and
3986 attenuated.  This attenuation can be compensated for by adding
3987 \fBvol \fIfactor\fR after any further processing.  The upsample effect is
3988 typically used in combination with filtering effects.
3989 .SP
3990 For a general resampling effect with anti-aliasing, see \fBrate\fR.  See
3991 also \fBdownsample\fR.
3992 .TP
3993 \fBvad \fR[\fIoptions\fR]
3994 Voice Activity Detector.  Attempts to trim silence and quiet
3995 background sounds from the ends of (fairly high resolution
3996 i.e. 16-bit, 44\-48kHz) recordings of speech.  The algorithm currently
3997 uses a simple cepstral power measurement to detect voice, so may be
3998 fooled by other things, especially music.  The effect can trim only
3999 from the front of the audio, so in order to trim from the back, the
4000 .B reverse
4001 effect must also be used.  E.g.
4002 .EX
4003    play speech.wav norm vad
4004 .EE
4005 to trim from the front,
4006 .EX
4007    play speech.wav norm reverse vad reverse
4008 .EE
4009 to trim from the back, and
4010 .EX
4011    play speech.wav norm vad reverse vad reverse
4012 .EE
4013 to trim from both ends.  The use of the
4014 .B norm
4015 effect is recommended, but remember that neither
4016 .B reverse
4017 nor
4018 .B norm
4019 is suitable for use with streamed audio.
4020 .SP
4021 .I Options:
4022 .br
4023 Default values are shown in parenthesis.
4024 .RS
4025 .IP \fB\-t\ \fInum\fR\ (7)
4026 The measurement level used to trigger activity detection.  This might
4027 need to be changed depending on the noise level, signal level and
4028 other charactistics of the input audio.
4029 .IP \fB\-T\ \fInum\fR\ (0.25)
4030 The time constant (in seconds) used to help ignore short bursts of
4031 sound.
4032 .IP \fB\-s\ \fInum\fR\ (1)
4033 The amount of audio (in seconds) to search for quieter/shorter bursts
4034 of audio to include prior to the detected trigger point.
4035 .IP \fB\-g\ \fInum\fR\ (0.25)
4036 Allowed gap (in seconds) between quieter/shorter bursts of audio to
4037 include prior to the detected trigger point.
4038 .IP \fB\-p\ \fInum\fR\ (0)
4039 The amount of audio (in seconds) to preserve before the trigger point
4040 and any found quieter/shorter bursts.
4041 .RE
4042 .TP
4043 \
4044 .I Advanced Options:
4045 .br
4046 These allow fine tuning of the algorithm's internal parameters.
4047 .RS
4048 .IP \fB\-b\ \fInum\fR
4049 The algorithm (internally) uses adaptive noise estimation/reduction in
4050 order to detect the start of the wanted audio.  This option sets the
4051 time for the initial noise estimate.
4052 .IP \fB\-N\ \fInum\fR
4053 Time constant used by the adaptive noise estimator for when the noise
4054 level is increasing.
4055 .IP \fB\-n\ \fInum\fR
4056 Time constant used by the adaptive noise estimator for when the noise
4057 level is decreasing.
4058 .IP \fB\-r\ \fInum\fR
4059 Amount of noise reduction to use in the detection algorithm (e.g. 0,
4060 0.5, ...).
4061 .IP \fB\-f\ \fInum\fR
4062 Frequency of the algorithm's processing/measurements.
4063 .IP \fB\-m\ \fInum\fR
4064 Measurement duration; by default, twice the measurement period; i.e.
4065 with overlap.
4066 .IP \fB\-M\ \fInum\fR
4067 Time constant used to smooth spectral measurements.
4068 .IP \fB\-h\ \fInum\fR
4069 `Brick-wall' frequency of high-pass filter applied at the input to the
4070 detector algorithm.
4071 .IP \fB\-l\ \fInum\fR
4072 `Brick-wall' frequency of low-pass filter applied at the input to the
4073 detector algorithm.
4074 .IP \fB\-H\ \fInum\fR
4075 `Brick-wall' frequency of high-pass lifter used in the detector
4076 algorithm.
4077 .IP \fB\-L\ \fInum\fR
4078 `Brick-wall' frequency of low-pass lifter used in the detector
4079 algorithm.
4080 .RE
4081 .TP
4082 \
4083 See also the
4084 .B silence
4085 effect.
4086 .TP
4087 \fBvol \fIgain\fR [\fItype\fR [\fIlimitergain\fR]]
4088 Apply an amplification or an attenuation to the audio signal.
4089 Unlike the
4090 .B \-v
4091 option (which is used for balancing multiple input files as they enter the
4092 SoX effects processing chain),
4093 .B vol
4094 is an effect like any other so can be applied anywhere, and several times
4095 if necessary, during the processing chain.
4096 .SP
4097 The amount to change the volume is given by
4098 .I gain
4099 which is interpreted, according to the given \fItype\fR, as follows: if
4100 .I type
4101 is \fBamplitude\fR (or is omitted), then
4102 .I gain
4103 is an amplitude (i.e. voltage or linear) ratio,
4104 if \fBpower\fR, then a power (i.e. wattage or voltage-squared) ratio,
4105 and if \fBdB\fR, then a power change in dB.
4106 .SP
4107 When
4108 .I type
4109 is \fBamplitude\fR or \fBpower\fR, a
4110 .I gain
4111 of 1 leaves the volume unchanged,
4112 less than 1 decreases it,
4113 and greater than 1 increases it;
4114 a negative
4115 .I gain
4116 inverts the audio signal in addition to adjusting its volume.
4117 .SP
4118 When
4119 .I type
4120 is \fBdB\fR, a
4121 .I gain
4122 of 0 leaves the volume unchanged,
4123 less than 0 decreases it,
4124 and greater than 0 increases it.
4125 .SP
4126 See [4]
4127 for a detailed discussion on electrical (and hence audio signal)
4128 voltage and power ratios.
4129 .SP
4130 Beware of
4131 .B Clipping
4132 when the increasing the volume.
4133 .SP
4134 The
4135 .I gain
4136 and the
4137 .I type
4138 parameters can be concatenated if desired, e.g.
4139 .BR "vol 10dB" .
4140 .SP
4141 An optional \fIlimitergain\fR value can be specified and should be a
4142 value much less
4143 than 1 (e.g. 0\*d05 or 0\*d02) and is used only on peaks to prevent clipping.
4144 Not specifying this parameter will cause no limiter to be used.  In verbose
4145 mode, this effect will display the percentage of the audio that needed to be
4146 limited.
4147 .SP
4148 See also
4149 .B gain
4150 for a volume-changing effect with different capabilities, and
4151 .B compand
4152 for a dynamic-range compression/expansion/limiting effect.
4153 .SS Deprecated Effects
4154 The following effects have been renamed or have their functionality
4155 included in another effect; they continue to work in this version of
4156 SoX but may be removed in future.
4157 .TP
4158 \fBmixer\fR [ \fB\-l\fR\^|\^\fB\-r\fR\^|\^\fB\-f\fR\^|\^\fB\-b\fR\^|\^\fB\-1\fR\^|\^\fB\-2\fR\^|\^\fB\-3\fR\^|\^\fB\-4\fR\^|\^\fIn\fR{\fB,\fIn\fR} ]
4159 Reduce the number of audio channels by mixing or selecting channels,
4160 or increase the number of channels by duplicating channels.
4161 Note: this effect operates on the audio
4162 .I channels
4163 within the SoX effects processing chain; it should not be confused with the
4164 .B \-m
4165 global option (where multiple
4166 .I files
4167 are mix-combined before entering the effects chain).
4168 .SP
4169 When reducing the number of channels it is possible to
4170 use the \fB\-l\fR, \fB\-r\fR, \fB\-f\fR, \fB\-b\fR,
4171 \fB\-1\fR, \fB\-2\fR, \fB\-3\fR, \fB\-4\fR, options to select only
4172 the left, right, front, back channel(s) or specific channel
4173 for the output instead of averaging the channels.
4174 The \fB\-l\fR, and \fB\-r\fR options will do averaging
4175 in quad-channel files so select the exact channel to prevent this.
4176 .SP
4177 The
4178 .B mixer
4179 effect can also be invoked with up to 16
4180 numbers, separated by commas, which specify the proportion (0 = 0% and 1 = 100%)
4181 of each input channel that is to be mixed into each output channel.
4182 In two-channel mode, 4 numbers are given: l \*(RA l, l \*(RA r, r \*(RA l, and r \*(RA r,
4183 respectively.
4184 In four-channel mode, the first 4 numbers give the proportions for the
4185 left-front output channel, as follows: lf \*(RA lf, rf \*(RA lf, lb \*(RA lf, and
4186 rb \*(RA rf.
4187 The next 4 give the right-front output in the same order, then
4188 left-back and right-back.
4189 .SP
4190 It is also possible to use the 16 numbers to expand or reduce the
4191 channel count; just specify 0 for unused channels.
4192 .SP
4193 Finally, certain reduced combination of numbers can be specified
4194 for certain input/output channel combinations.
4195 .ne 7
4196 .TS
4197 center;
4198 cI cI cI lI
4199 c c c l .
4200 In Ch   Out Ch  Num     Mappings
4201 2       1       2       l \*(RA l, r \*(RA l
4202 2       2       1       adjust balance
4203 4       1       4       lf \*(RA l, rf \*(RA l, lb \*(RA l, rb \*(RA l
4204 4       2       2       lf \*(RA l&rf \*(RA r, lb \*(RA l&rb \*(RA r
4205 4       4       1       adjust balance
4206 4       4       2       front balance, back balance
4207 .TE
4208 .DT
4209 .SP
4210 This effect has been superseded by the
4211 .B remix
4212 effect that handles any number of channels.
4213 .SH DIAGNOSTICS
4214 Exit status is 0 for no error, 1 if there is a problem with the
4215 command-line parameters, or 2 if an error occurs during file processing.
4216 .SH BUGS
4217 Please report any bugs found in this version of SoX to the mailing list
4218 (sox-users@lists.sourceforge.net).
4219 .SH SEE ALSO
4220 .BR soxi (1),
4221 .BR soxformat (7),
4222 .BR libsox (3)
4223 .br
4224 .BR audacity (1),
4225 .BR gnuplot (1),
4226 .BR octave (1),
4227 .BR wget (1)
4228 .br
4229 The SoX web site at http://sox.sourceforge.net
4230 .br
4231 SoX scripting examples at http://sox.sourceforge.net/Docs/Scripts
4232 .SS References
4233 .TP
4234 [1]
4235 R. Bristow-Johnson,
4236 .IR "Cookbook formulae for audio EQ biquad filter coefficients" ,
4237 http://musicdsp.org/files/Audio-EQ-Cookbook.txt
4238 .TP
4239 [2]
4240 Wikipedia,
4241 .IR "Q-factor" ,
4242 http://en.wikipedia.org/wiki/Q_factor
4243 .TP
4244 [3]
4245 Scott Lehman,
4246 .IR "Effects Explained" ,
4247 http://harmony-central.com/Effects/effects-explained.html
4248 .TP
4249 [4]
4250 Wikipedia,
4251 .IR "Decibel" ,
4252 http://en.wikipedia.org/wiki/Decibel
4253 .TP
4254 [5]
4255 Richard Furse,
4256 .IR "Linux Audio Developer's Simple Plugin API" ,
4257 http://www.ladspa.org
4258 .TP
4259 [6]
4260 Richard Furse,
4261 .IR "Computer Music Toolkit" ,
4262 http://www.ladspa.org/cmt
4263 .TP
4264 [7]
4265 Steve Harris,
4266 .IR "LADSPA plugins" ,
4267 http://plugin.org.uk
4268 .SH LICENSE
4269 Copyright 1998\-2011 Chris Bagwell and SoX Contributors.
4270 .br
4271 Copyright 1991 Lance Norskog and Sundry Contributors.
4272 .SP
4273 This program is free software; you can redistribute it and/or modify
4274 it under the terms of the GNU General Public License as published by
4275 the Free Software Foundation; either version 2, or (at your option)
4276 any later version.
4277 .SP
4278 This program is distributed in the hope that it will be useful,
4279 but WITHOUT ANY WARRANTY; without even the implied warranty of
4280 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
4281 GNU General Public License for more details.
4282 .SH AUTHORS
4283 Chris Bagwell (cbagwell@users.sourceforge.net).
4284 Other authors and contributors are listed in the ChangeLog file that
4285 is distributed with the source code.