2 '\" The line above instructs most `man' programs to invoke tbl
4 '\" Separate paragraphs; not the same as PP which resets indent level.
10 '\" Replacement em-dash for nroff (default is too short).
14 '\" Placeholder macro for if longer nroff arrow is needed.
17 '\" Decimal point set slightly raised
18 .if t .ds d \v'-.15m'.\v'+.15m'
21 '\" Enclosure macro for examples
32 .TH SoX 1 "February 19, 2011" "sox" "Sound eXchange"
34 SoX \- Sound eXchange, the Swiss Army knife of audio manipulation
37 \fBsox\fR [\fIglobal-options\fR] [\fIformat-options\fR] \fIinfile1\fR
38 [[\fIformat-options\fR] \fIinfile2\fR] ... [\fIformat-options\fR] \fIoutfile\fR
39 [\fIeffect\fR [\fIeffect-options\fR]] ...
41 \fBplay\fR [\fIglobal-options\fR] [\fIformat-options\fR] \fIinfile1\fR
42 [[\fIformat-options\fR] \fIinfile2\fR] ... [\fIformat-options\fR]
43 [\fIeffect\fR [\fIeffect-options\fR]] ...
45 \fBrec\fR [\fIglobal-options\fR] [\fIformat-options\fR] \fIoutfile\fR
46 [\fIeffect\fR [\fIeffect-options\fR]] ...
50 SoX reads and writes audio files in most popular formats and can
51 optionally apply effects to them. It can combine multiple input
52 sources, synthesise audio, and, on many systems, act as a general
53 purpose audio player or a multi-track audio recorder. It also has
54 limited ability to split the input into multiple output files.
56 All SoX functionality is available using just the \fBsox\fR command.
57 To simplify playing and recording audio, if SoX is invoked as
58 \fBplay\fR, the output file is automatically set to be the default sound
59 device, and if invoked as \fBrec\fR, the default sound device is used as an
63 command provides a convenient way to just query audio file header information.
65 The heart of SoX is a library called libSoX. Those interested in
66 extending SoX or using it in other programs should refer to the libSoX
70 SoX is a command-line audio processing tool, particularly suited to making
71 quick, simple edits and to batch processing.
72 If you need an interactive, graphical audio editor, use
81 The overall SoX processing chain can be summarised as follows:
85 Input(s) \*(RA Combiner \*(RA Effects \*(RA Output(s)
89 Note however, that on the SoX command line, the positions of the
90 Output(s) and the Effects are swapped w.r.t. the logical flow just
91 shown. Note also that whilst options pertaining to files are placed
92 before their respective file name, the opposite is true for effects.
93 To show how this works in practice, here is a selection of examples of
94 how SoX might be used. The simple
96 sox recital.au recital.wav
98 translates an audio file in Sun AU format to a Microsoft WAV file, whilst
100 sox recital.au \-b 16 recital.wav channels 1 rate 16k fade 3 norm
102 performs the same format translation, but also applies four effects
103 (down-mix to one channel, sample rate change, fade-in, nomalize),
104 and stores the result at a bit-depth of 16.
106 sox \-r 16k \-e signed \-b 8 \-c 1 voice-memo.raw voice-memo.wav
108 converts `raw' (a.k.a. `headerless') audio to a self-describing file format,
110 sox slow.aiff fixed.aiff speed 1.027
114 sox short.wav long.wav longer.wav
116 concatenates two audio files, and
118 sox \-m music.mp3 voice.wav mixed.flac
120 mixes together two audio files.
122 play \(dqThe Moonbeams/Greatest/*.ogg\(dq bass +3
124 plays a collection of audio files whilst applying a bass boosting effect,
126 play \-n \-c1 synth sin %\-12 sin %\-9 sin %\-5 sin %\-2 fade h 0.1 1 0.1
128 plays a synthesised `A minor seventh' chord with a pipe-organ sound,
130 rec \-c 2 radio.aiff trim 0 30:00
132 records half an hour of stereo audio, and
134 play \-q take1.aiff & rec \-M take1.aiff take1\-dub.aiff
136 (with POSIX shell and where supported by hardware)
137 records a new track in a multi-track recording. Finally,
140 rec \-r 44100 \-b 16 \-s \-p silence 1 0.50 0.1% 1 10:00 0.1% | \\
141 sox \-p song.ogg silence 1 0.50 0.1% 1 2.0 0.1% : \\
144 records a stream of audio such as LP/cassette and splits in to multiple
145 audio files at points with 2 seconds of silence. Also, it does not start
146 recording until it detects audio is playing and stops after it sees
147 10 minutes of silence.
149 N.B. The above is just an overview of SoX's capabilities; detailed
150 explanations of how to use \fIall\fR SoX parameters, file formats, and
151 effects can be found below in this manual, in
155 .SS File Format Types
156 SoX can work with `self-describing' and `raw' audio files.
157 `self-describing' formats (e.g. WAV, FLAC, MP3) have a header that
158 completely describes the signal and encoding attributes of the audio
159 data that follows. `raw' or `headerless' formats do not contain this
160 information, so the audio characteristics of these must be described
161 on the SoX command line or inferred from those of the input file.
163 The following four characteristics are used to describe the format of
164 audio data such that it can be processed with SoX:
167 The sample rate in samples per second (`Hertz' or `Hz').
168 Digital telephony traditionally uses a sample rate of 8000\ Hz (8\ kHz),
169 though these days, 16 and even 32\ kHz are becoming more common. Audio
170 Compact Discs use 44100\ Hz (44\*d1\ kHz). Digital Audio Tape and many
171 computer systems use 48\ kHz. Professional audio systems often use 96
175 The number of bits used to store each sample. Today, 16-bit is
176 commonly used. 8-bit was popular in the early days of computer
177 audio. 24-bit is used in the professional audio arena. Other sizes are
181 The way in which each audio sample is represented (or `encoded'). Some
182 encodings have variants with different byte-orderings or bit-orderings.
183 Some compress the audio data so that the stored audio data takes up less
184 space (i.e. disk space or transmission bandwidth) than the other format
185 parameters and the number of samples would imply. Commonly-used
186 encoding types include floating-point, \(*m-law, ADPCM, signed-integer
190 The number of audio channels contained in the file. One (`mono') and
191 two (`stereo') are widely used. `Surround sound' audio typically
192 contains six or more channels.
194 The term `bit-rate' is a measure of the amount of storage occupied by an
195 encoded audio signal over a unit of time. It can depend on all of the
196 above and is typically denoted as a number of kilo-bits per second
197 (kbps). An A-law telephony signal has a bit-rate of 64 kbps. MP3-encoded
198 stereo music typically has a bit-rate of 128\-196 kbps. FLAC-encoded
199 stereo music typically has a bit-rate of 550\-760 kbps.
201 Most self-describing formats also allow textual `comments' to be
202 embedded in the file that can be used to describe the audio in some way,
203 e.g. for music, the title, the author, etc.
205 One important use of audio file comments is to convey `Replay Gain'
206 information. SoX supports applying Replay Gain information, but not
207 generating it. Note that by default, SoX copies input file comments
208 to output files that support comments, so output files may contain
209 Replay Gain information if some was present in the input file. In this
210 case, if anything other than a simple format conversion was performed
211 then the output file Replay Gain information is likely to be incorrect
212 and so should be recalculated using a tool that supports this (not SoX).
216 command can be used to display information from audio file headers.
217 .SS Determining & Setting The File Format
218 There are several mechanisms available for SoX to use to determine or set the
219 format characteristics of an audio file. Depending on the circumstances,
220 individual characteristics may be determined or set using different mechanisms.
222 To determine the format of an input file, SoX will use, in order of
223 precedence and as given or available:
225 Command-line format options.
227 The contents of the file header.
229 The filename extension.
231 To set the output file format, SoX will use, in order of
232 precedence and as given or available:
234 Command-line format options.
236 The filename extension.
238 The input file format characteristics, or the closest
239 that is supported by the output file type.
241 For all files, SoX will exit with an error
242 if the file type cannot be determined. Command-line format options may
243 need to be added or changed to resolve the problem.
244 .SS Playing & Recording Audio
249 commands are provided so that basic playing and
250 recording is as simple as
252 play existing-file.wav
258 These two commands are functionally equivalent to
260 sox existing-file.wav \-d
266 Of course, further options and effects (as described below) can be
267 added to the commands in either form.
275 Some systems provide more than one type of (SoX-compatible) audio
276 driver, e.g. ALSA & OSS, or SUNAU & AO.
277 Systems can also have more than one audio device (a.k.a. `sound card').
278 If more than one audio driver has been
279 built-in to SoX, and the default selected by SoX when recording or playing
280 is not the one that is wanted, then the
282 environment variable can be used to override the default. For example
290 environment variable can be used to override the default audio device,
293 set AUDIODEV=/dev/dsp2
299 set AUDIODEV=hw:soundwave,1,2
303 Note that the way of setting environment variables varies from system
304 to system\*mfor some specific examples, see `SOX_OPTS' below.
306 When playing a file with a sample rate that is not supported by the
307 audio output device, SoX will automatically invoke the \fBrate\fR effect
308 to perform the necessary sample rate conversion. For
309 compatibility with old hardware, the
310 default \fBrate\fR quality level is set to `low'. This
311 can be changed by explicitly specifying the \fBrate\fR
312 effect with a different quality level, e.g.
317 .B \-\-play\-rate\-arg
326 On some systems, SoX allows audio playback volume to be adjusted whilst
329 Where supported, this is achieved by tapping the `v' & `V' keys during
332 To help with setting a suitable recording level, SoX includes a peak-level
333 meter which can be invoked (before making the actual recording) as follows:
337 The recording level should be adjusted (using the system-provided mixer
338 program, not SoX) so that the meter is \fIat most occasionally\fR full
339 scale, and never `in the red' (an exclamation mark is shown).
340 See also \fB\-S\fR below.
342 Many file formats that compress audio discard some of the audio signal
343 information whilst doing so. Converting to such a format and then converting
344 back again will not produce an exact copy of the original audio. This
345 is the case for many formats used in telephony (e.g. A-law, GSM) where
346 low signal bandwidth is more important than high audio fidelity, and for
347 many formats used in portable music players (e.g. MP3, Vorbis) where
348 adequate fidelity can be retained even with the large compression ratios
349 that are needed to make portable players practical.
351 Formats that discard audio signal information are called `lossy'.
352 Formats that do not are called `lossless'. The term `quality' is used as a
353 measure of how closely the original audio signal can be reproduced when
354 using a lossy format.
356 Audio file conversion with SoX is lossless when it can be, i.e. when not
357 using lossy compression, when not reducing the sampling rate or number
358 of channels, and when the number of bits used in the destination format
359 is not less than in the source format. E.g. converting from an 8-bit
360 PCM format to a 16-bit PCM format is lossless but converting from an
361 8-bit PCM format to (8-bit) A-law isn't.
364 SoX converts all audio files to an internal uncompressed
365 format before performing any audio processing. This means that
366 manipulating a file that is stored in a lossy format can cause further
367 losses in audio fidelity. E.g. with
369 sox long.mp3 short.mp3 trim 10
371 SoX first decompresses the input MP3 file, then applies the
373 effect, and finally creates the output MP3 file by re-compressing the
374 audio\*mwith a possible reduction in fidelity above that which
375 occurred when the input file was created.
376 Hence, if what is ultimately desired is lossily compressed audio, it is
377 highly recommended to perform all audio processing using lossless file
378 formats and then convert to the lossy format only at the final stage.
381 Applying multiple effects with a single SoX invocation will,
382 in general, produce more accurate results than those produced using
383 multiple SoX invocations.
385 Dithering is a technique used to maximise the dynamic range of audio
386 stored at a particular bit-depth. Any distortion introduced by
387 quantisation is decorrelated by adding a small amount of white noise
388 to the signal. In most cases, SoX can determine whether the selected
389 processing requires dither and will add it during output formatting if
392 Specifically, by default, SoX automatically adds TPDF dither
393 when the output bit-depth is less than 24 and any
394 of the following are true:
396 bit-depth reduction has been specified explicitly using a command-line
399 the output file format supports only bit-depths lower than that of the
402 an effect has increased effective bit-depth within the internal
405 For example, adjusting volume with
407 requires two additional bits in which to losslessly store its results
408 (since 0\*d25 decimal equals 0\*d01 binary). So if the input file
409 bit-depth is 16, then SoX's internal representation will utilise 18
410 bits after processing this volume change. In order to store the
411 output at the same depth as the input, dithering is used to remove the
416 option to see what processing SoX has automatically added. The
418 option may be given to override automatic dithering. To invoke
419 dithering manually (e.g. to select a noise-shaping curve), see the
423 Clipping is distortion that occurs when an audio signal level (or
424 `volume') exceeds the range of the chosen representation. In most
425 cases, clipping is undesirable and so should be corrected by adjusting
426 the level prior to the point (in the processing chain) at which it
429 In SoX, clipping could occur, as you might expect, when using the
433 effects to increase the audio volume. Clipping could also occur with many
434 other effects, when converting one format to another, and even when
435 simply playing the audio.
437 Playing an audio file often involves resampling, and processing by
438 analogue components can introduce a small DC offset and/or
439 amplification, all of which can produce distortion if the audio signal
440 level was initially too close to the clipping point.
442 For these reasons, it is usual to make sure that an audio
443 file's signal level has some `headroom', i.e. it does not exceed a particular
444 level below the maximum possible level for the given representation.
445 Some standards bodies recommend as much as 9dB headroom, but in most cases,
446 3dB (\(~~ 70% linear) is enough. Note that this wisdom
447 seems to have been lost in modern music production; in fact, many CDs,
448 MP3s, etc. are now mastered at levels \fIabove\fR 0dBFS i.e. the
449 audio is clipped as delivered.
455 effects can assist in determining the signal level in an audio file. The
459 effect can be used to prevent clipping, e.g.
461 sox dull.wav bright.wav gain \-6 treble +6
463 guarantees that the treble boost will not clip.
465 If clipping occurs at any point during processing,
466 SoX will display a warning message to that effect.
475 .SS Input File Combining
476 SoX's input combiner can be configured (see OPTIONS below) to
477 combine multiple files using any of the
478 following methods: `concatenate', `sequence', `mix', `mix-power',
479 `merge', or `multiply'.
480 The default method is `sequence' for
482 and `concatenate' for
487 For all methods other than `sequence', multiple input files must have
488 the same sampling rate. If necessary, separate SoX invocations can be
489 used to make sampling rate adjustments prior to combining.
491 If the `concatenate' combining method is selected (usually, this will be
492 by default) then the input files must also have the same number of
493 channels. The audio from each input will be concatenated in the order
494 given to form the output file.
496 The `sequence' combining method is selected automatically for
498 It is similar to `concatenate' in that the audio from each input file is
499 sent serially to the output file. However, here the output file may be
500 closed and reopened at the corresponding transition between input
501 files. This may be just what is needed when sending different types of
502 audio to an output device, but is not generally useful when the output is a
505 If either the `mix' or `mix-power' combining method is selected then two or
506 more input files must be given and will be mixed together to form the
507 output file. The number of channels in each input file need not be the
508 same, but SoX will issue a warning if they are not and some
509 channels in the output file will not contain audio from every input
510 file. A mixed audio file cannot be un-mixed without reference to the
511 original input files.
513 If the `merge' combining method is selected then two or
514 more input files must be given and will be merged together to form the
515 output file. The number of channels in each input file need not be the
516 same. A merged audio file comprises all of the channels from all of the
517 input files. Un-merging is possible using multiple
518 invocations of SoX with the
521 For example, two mono files could be merged to form one stereo file. The
522 first and second mono files would become the left and right channels of
525 The `multiply' combining method multiplies the sample values of
526 corresponding channels (treated as numbers in the interval \-1 to +1).
527 If the number of channels in the input files is not the same, the
528 missing channels are considered to contain all zero.
530 When combining input files, SoX applies any specified effects
531 (including, for example, the
533 volume adjustment effect) after the audio has been combined. However, it
534 is often useful to be able to set the volume of (i.e. `balance') the
535 inputs individually, before combining takes place.
537 For all combining methods, input
538 file volume adjustments can be made manually using the
540 option (below) which can be given for one or more input files. If it is
541 given for only some of the input files then the others receive no volume
542 adjustment. In some circumstances, automatic volume
543 adjustments may be applied (see below).
545 The \fB\-V\fR option (below) can be used to show the input file volume
546 adjustments that have been selected (either manually or automatically).
548 There are some special considerations that need to made when mixing
551 Unlike the other methods, `mix' combining has the
552 potential to cause clipping in the combiner if no balancing is
553 performed. In this case, if manual volume adjustments are not given,
554 SoX will try to ensure that clipping does not occur by automatically
556 volume (amplitude) of each input signal by a factor of \(S1/\s-2n\s+2,
557 where n is the number of input files. If this results in audio that is
558 too quiet or otherwise unbalanced then the input file volumes can be
559 set manually as described above. Using the
561 effect on the mix is another alternative.
563 If mixed audio seems loud enough at some points but
564 too quiet in others then dynamic range compression should be applied to
565 correct this\*msee the
569 With the `mix-power' combine method, the
570 mixed volume is approximately equal to that of one of the input signals.
571 This is achieved by balancing using a factor of
572 \(S1/\s-2\(srn\s+2 instead of \(S1/\s-2n\s+2.
573 Note that this balancing factor does not guarantee that clipping will not occur,
574 but the number of clips will usually be low and the resultant
575 distortion is generally imperceptible.
577 SoX's default behaviour is to take one or more input files and
578 write them to a single output file.
580 This behaviour can be changed by specifying the pseudo-effect `newfile'
581 within the effects list. SoX will then enter multiple output mode.
583 In multiple output mode, a new file is created when the effects
584 prior to the `newfile' indicate they are done.
585 The effects chain listed after `newfile'
586 is then started up and its output is saved to the new file.
588 In multiple output mode, a unique number will automatically be appended
589 to the end of all filenames. If the filename has an extension
590 then the number is inserted before the extension. This behaviour can
591 be customized by placing a %n anywhere in the filename where the
592 number should be substituted. An optional number can be placed after
593 the % to indicate a minimum fixed width for the number.
595 Multiple output mode is not very useful unless an effect that will
596 stop the effects chain early is
597 specified before the `newfile'. If end of file is
598 reached before the effects chain stops itself then no new file
599 will be created as it would be empty.
601 The following is an example of splitting the first 60 seconds of an input
602 file into two 30 second files and ignoring the rest.
604 sox song.wav ringtone%1n.wav trim 0 30 : newfile : trim 0 30
606 Usually SoX will complete its processing and exit automatically once
607 it has read all available audio data from the input files.
609 If desired, it can be terminated earlier by sending an
610 interrupt signal to the process (usually by pressing the
611 keyboard interrupt key which is normally Ctrl-C). This is a natural requirement
612 in some circumstances, e.g. when using SoX to make a recording. Note
613 that when using SoX to play multiple files, Ctrl-C behaves slightly
614 differently: pressing it once causes SoX to skip to the next file;
615 pressing it twice in quick succession causes SoX to exit.
617 Another option to stop processing early is to use an effect that
618 has a time period or sample count to determine the stopping
619 point. The trim effect is an example of this. Once all
620 effects chains have stopped then SoX will also stop.
622 Filenames can be simple file names, absolute or relative path names,
623 or URLs (input files only). Note that URL support requires that
628 Giving SoX an input or output filename that is the same as a SoX
629 effect-name will not work since SoX will treat it as an effect
630 specification. The only work-around to this is to avoid such
631 filenames. This is generally not difficult since most audio
632 filenames have a filename `extension', whilst effect-names do not.
633 .SS Special Filenames
634 The following special filenames may be used in certain circumstances
635 in place of a normal filename on the command line:
638 SoX can be used in simple pipeline operations by using the special
640 if used as an input filename, will cause
641 SoX will read audio data from `standard input' (stdin),
643 if used as the output filename, will cause
644 SoX will send audio data to `standard output' (stdout).
645 Note that when using this option for the output file, and sometimes
646 when using it for an input file, the file-type (see
648 below) must also be given.
650 \fB\(dq\^|\^\fIprogram \fR[\fIoptions\fR] ...\fB\(dq\fR
651 This can be used in place of an input filename to specify the
652 the given program's standard output (stdout) be used as an input file.
655 (above), this can be used for several inputs to one SoX command. For
656 example, if `genw' generates mono WAV formatted signals to its
657 standard output, then the following command makes a stereo file
658 from two generated signals:
660 sox \-M "|genw \-\-imd \-" "|genw \-\-thd \-" out.wav
662 For headerless (raw) audio,
664 (and perhaps other format options) will need to be given, preceding the input
667 \fB\(dq\fIwildcard-filename\fB\(dq\fR
668 Specifies that filename `globbing' (wild-card matching) should be performed
669 by SoX instead of by the shell. This allows a single set of file options to be
670 applied to a group of files. For example, if the current directory contains
671 three `vox' files, file1.vox, file2.vox, and file3.vox, then
673 play \-\-rate 6k *.vox
675 will be expanded by the `shell' (in most environments) to
677 play \-\-rate 6k file1.vox file2.vox file3.vox
679 which will treat only the first vox file as having a sample rate of 6k.
682 play \-\-rate 6k "*.vox"
684 the given sample rate option will be applied to all three vox files.
686 \fB\-p\fR, \fB\-\-sox\-pipe\fR
687 This can be used in place of an output filename to specify that
688 the SoX command should be used as in input pipe to another SoX command.
689 For example, the command:
691 play "|sox \-n \-p synth 2" "|sox \-n \-p synth 2 tremolo 10" stat
693 plays two `files' in succession, each with different effects.
696 is in fact an alias for `\fB\-t sox \-\fR'.
698 \fB\-d\fR, \fB\-\-default\-device\fR
699 This can be used in place of an input or output filename to specify that
700 the default audio device (if one has been built into SoX) is to be used.
701 This is akin to invoking
705 (as described above).
707 \fB\-n\fR, \fB\-\-null\fR
708 This can be used in place of an input or output filename to specify that
709 a `null file' is to be used. Note that here, `null file' refers to a
710 SoX-specific mechanism and is not related to any operating-system
711 mechanism with a similar name.
713 Using a null file to input audio is equivalent to
714 using a normal audio file that contains an infinite amount
715 of silence, and as such is not generally useful unless used
716 with an effect that specifies a finite time length
717 (such as \fBtrim\fR or \fBsynth\fR).
719 Using a null file to output audio amounts to discarding the audio
720 and is useful mainly with effects that produce information about the
721 audio instead of affecting it (such as \fBnoiseprof\fR or \fBstat\fR).
723 The sampling rate associated with a null file
724 is by default 48\ kHz, but, as with a normal
725 file, this can be overridden if desired using command-line format
727 .SS Supported File & Audio Device Types
730 for a list and description of the supported file formats and audio device
734 These options can be specified on the command line at any point
735 before the first effect name.
739 environment variable can be used to provide alternative default values for
740 SoX's global options.
743 SOX_OPTS="\-\-buffer 20000 \-\-play\-rate\-arg \-hs \-\-temp /mnt/temp"
745 Note that setting SOX_OPTS can potentially create unwanted changes in
746 the behaviour of scripts or other programs that invoke SoX. SOX_OPTS
747 might best be used for things (such as in the given example) that reflect the
748 environment in which SoX is being run. Enabling options such as
750 as default might be handled better using a shell alias
751 since a shell alias will not affect operation in scripts etc.
753 One way to ensure that a script cannot be affected by SOX_OPTS is to
754 clear SOX_OPTS at the start of the script, but this of course loses
755 the benefit of SOX_OPTS carrying some system-wide default options. An
756 alternative approach is to explicitly invoke SoX with default
759 SOX_OPTS="\-V \-\-no-clobber"
761 sox \-V2 \-\-clobber $input $output ...
763 Note that the way to set environment variables varies from system
764 to system. Here are some examples:
768 export SOX_OPTS="\-V \-\-no-clobber"
772 setenv SOX_OPTS "\-V \-\-no-clobber"
776 set SOX_OPTS=\-V \-\-no-clobber
778 MS-Windows GUI: via Control Panel : System : Advanced : Environment
781 Mac OS X GUI: Refer to Apple's Technical Q&A QA1067 document.
783 \fB\-\-buffer\fR \fBBYTES\fR, \fB\-\-input\-buffer\fR \fBBYTES\fR
784 Set the size in bytes of the buffers used for processing audio (default 8192).
786 applies to input, effects, and output processing;
788 applies only to input processing (for which it overrides
792 Be aware that large values for
794 will cause SoX to be become slow to respond to requests to terminate or to skip
795 the current input file.
798 Don't prompt before overwriting an existing file with the same name as that
799 given for the output file. This is the default behaviour.
801 \fB\-\-combine concatenate\fR\^|\^\fBmerge\fR\^|\^\fBmix\fR\^|\^\fBmix\-power\fR\^|\^\fBmultiply\fR\^|\^\fBsequence\fR
802 Select the input file combining method;
803 for some of these, short options are available:
811 See \fBInput File Combining\fR above for a description of the different
814 \fB\-D\fR, \fB\-\-no\-dither\fR
815 Disable automatic dither\*msee `Dithering' above. An example of why this
816 might occasionally be useful is if a file has been converted from 16 to
817 24 bit with the intention of doing some processing on it, but in fact
818 no processing is needed after all and the original 16 bit file has
819 been lost, then, strictly speaking, no dither is needed if converting the
820 file back to 16 bit. See also the
822 effect for how to determine the actual bit depth of the audio within a
825 \fB\-\-effects\-file \fIFILENAME\fR
826 Use FILENAME to obtain all effects and their arguments.
827 The file is parsed as if the values were specified on the
828 command line. A new line can be used in place of the special \fB:\fR
829 marker to separate effect chains. For convenience, such markers at the
830 end of the file are normally ignored; if you want to specify an empty
831 last effects chain, use an explicit \fB:\fR by itself on the last line
832 of the file. This option causes any effects specified on the command
833 line to be discarded.
835 \fB\-G\fR, \fB\-\-guard\fR
836 Automatically invoke the
838 effect to guard against clipping. E.g.
840 sox \-G infile \-b 16 outfile rate 44100 dither \-s
844 sox infile \-b 16 outfile gain \-h rate 44100 gain \-rh dither \-s
853 \fB\-h\fR, \fB\-\-help\fR
854 Show version number and usage information.
856 \fB\-\-help\-effect \fINAME\fR
857 Show usage information on the specified effect. The name
858 \fBall\fR can be used to show usage on all effects.
860 \fB\-\-help\-format \fINAME\fR
861 Show information about the specified file format. The name
862 \fBall\fR can be used to show information on all formats.
864 \fB\-\-i\fR, \fB\-\-info\fR
865 Only if given as the first parameter to
870 \fB\-m\fR\^|\^\fB\-M\fR
871 Equivalent to \fB\-\-combine mix\fR and \fB\-\-combine merge\fR, respectively.
874 If SoX has been built with the optional `libmagic' library then this
875 option can be given to enable its use in helping to detect audio file types.
877 \fB\-\-multi\-threaded\fR | \fB\-\-single\-threaded\fR
878 By default, SoX is `single threaded'.
879 If the \fB\-\-multi\-threaded\fR option is given however then SoX
880 will process audio channels for most multi-channel
881 effects in parallel on hyper-threading/multi-core architectures. This
882 may reduce processing time, though sometimes it may be necessary to use
883 this option in conjuction with a larger buffer size than is the default
884 to gain any benefit from multi-threaded processing
885 (e.g. 131072; see \fB\-\-buffer\fR above).
887 \fB\-\-no\-clobber\fR
888 Prompt before overwriting an existing file with the same name as that
889 given for the output file.
892 Unintentionally overwriting a file is easier than you might think, for
893 example, if you accidentally enter
895 sox file1 file2 effect1 effect2 ...
897 when what you really meant was
899 play file1 file2 effect1 effect2 ...
901 then, without this option, file2 will be overwritten. Hence, using
902 this option is recommended. SOX_OPTS (above), a `shell'
903 alias, script, or batch file may be an appropriate way of permanently
906 \fB\-\-norm\fR[\fB=\fIdB-level\fR]
907 Automatically invoke the
909 effect to guard against clipping and to normalise the audio. E.g.
911 sox \-\-norm infile \-b 16 outfile rate 44100 dither \-s
915 sox infile \-b 16 outfile gain \-h rate 44100 gain \-nh dither \-s
917 Optionally, the audio can be normalized to a given level (usually)
920 sox \-\-norm=\-3 infile outfile
930 \fB\-\-play\-rate\-arg ARG\fR
931 Selects a quality option to be used when the `rate' effect is automatically
932 invoked whilst playing audio. This option is typically set via the
934 environment variable (see above).
936 \fB\-\-plot gnuplot\fR\^|\^\fBoctave\fR\^|\^\fBoff\fR
941 is not given), run in a mode that can be used, in conjunction with the
942 gnuplot program or the GNU Octave program, to assist with the selection
943 and configuration of many of the transfer-function based effects.
944 For the first given effect that supports the selected plotting program,
945 SoX will output commands to plot the effect's transfer function, and
946 then exit without actually processing any audio. E.g.
948 sox \-\-plot octave input-file \-n highpass 1320 > highpass.plt
952 \fB\-q\fR, \fB\-\-no\-show\-progress\fR
953 Run in quiet mode when SoX wouldn't otherwise do so.
954 This is the opposite of the \fB\-S\fR option.
957 Run in `repeatable' mode. When this option is given, where
958 applicable, SoX will embed a fixed time-stamp in the output file (e.g.
959 \fBAIFF\fR) and will `seed' pseudo random number generators (e.g.
960 \fBdither\fR) with a fixed number, thus ensuring that successive SoX
961 invocations with the same inputs and the same parameters yield the
964 \fB\-\-replay\-gain track\fR\^|\^\fBalbum\fR\^|\^\fBoff\fR
965 Select whether or not to apply replay-gain adjustment to input files.
975 where (at least) the first two input files are tagged with the same Artist and
982 \fB\-S\fR, \fB\-\-show\-progress\fR
983 Display input file format/header information, and processing progress as
984 input file(s) percentage complete, elapsed time, and remaining time (if
985 known; shown in brackets), and the number of samples written to the
986 output file. Also shown is a peak-level meter, and an indication if
987 clipping has occurred. The peak-level meter shows up to two channels
988 and is calibrated for digital audio as follows (right channel shown):
994 dB FSD Display dB FSD Display
1001 \-17 ==\- \-3 ======
1007 A three-second peak-held value of headroom in dBs will be shown to the right
1008 of the meter if this is below 6dB.
1010 This option is enabled by default when using
1011 SoX to play or record audio.
1014 Equivalent to \fB\-\-combine multiply\fR.
1016 \fB\-\-temp\fI DIRECTORY\fR
1017 Specify that any temporary files should be created in the given
1019 This can be useful if there are permission or free-space problems with the
1020 default location. In this case, using `\fB\-\-temp .\fR' (to use the
1021 current directory) is often a good solution.
1024 Show SoX's version number and exit.
1025 .IP \fB\-V\fR[\fIlevel\fR]
1026 Set verbosity. This is particularly useful for seeing how any automatic
1027 effects have been invoked by SoX.
1029 SoX displays messages on the console (stderr) according to the following
1034 No messages are shown at all; use the exit status to determine
1035 if an error has occurred.
1037 Only error messages are shown. These are generated if
1038 SoX cannot complete the requested commands.
1040 Warning messages are also shown. These are generated if
1041 SoX can complete the requested commands,
1042 but not exactly according to the requested command parameters,
1043 or if clipping occurs.
1046 SoX's processing phases are also shown.
1047 Useful for seeing exactly how
1048 SoX is processing your audio.
1050 Messages to help with debugging
1054 By default, the verbosity level is set to 2 (shows errors and
1055 warnings). Each occurrence of the \fB\-V\fR option increases the
1056 verbosity level by 1. Alternatively, the verbosity level can be set
1057 to an absolute number by specifying it immediately after the
1063 .SS Input File Options
1064 These options apply only to input files and may precede only input
1065 filenames on the command line.
1067 \fB\-\-ignore\-length\fR
1068 Override an (incorrect) audio length given in an audio file's header. If
1069 this option is given then SoX will keep reading audio until it reaches
1070 the end of the input file.
1072 \fB\-v\fR, \fB\-\-volume\fR \fIFACTOR\fR
1073 Intended for use when combining multiple input files, this option
1074 adjusts the volume of the file that follows it on the command line by a
1075 factor of \fIFACTOR\fR. This allows it to be `balanced' w.r.t. the other
1076 input files. This is a linear (amplitude) adjustment, so a number less
1077 than 1 decreases the volume and a number greater than 1 increases it. If a
1078 negative number is given then in addition to the volume adjustment,
1079 the audio signal will be inverted.
1086 effects, and see \fBInput File Balancing\fR above.
1087 .SS Input & Output File Format Options
1088 These options apply to the input or output file whose name they
1089 immediately precede on the command line and are used mainly when
1090 working with headerless file formats or when specifying a format
1091 for the output file that is different to that of the input file.
1093 \fB\-b\fR \fIBITS\fR, \fB\-\-bits\fR \fIBITS\fR
1094 The number of bits (a.k.a. bit-depth or sometimes word-length) in each
1095 encoded sample. Not applicable to complex encodings such as MP3 or GSM.
1096 Not necessary with encodings that have a fixed number of bits, e.g.
1099 For an input file, the most common use for this option is to inform
1100 SoX of the number of bits per sample in a `raw' (`headerless') audio
1103 sox \-r 16k \-e signed \-b 8 input.raw output.wav
1105 converts a particular `raw' file to a self-describing `WAV' file.
1107 For an output file, this option can be used (perhaps along with
1109 to set the output encoding size. By default (i.e. if this option is
1110 not given), the output encoding size will (providing it is supported
1111 by the output file type) be set to the input encoding size. For
1114 sox input.cdda \-b 24 output.wav
1116 converts raw CD digital audio (16-bit, signed-integer) to a
1117 24-bit (signed-integer) `WAV' file.
1119 \fB\-1\fR\^/\fB\-2\fR\^/\fB\-3\fR\^/\fB\-4\fR\^/\fB\-8\fR
1120 The number of bytes in each encoded sample. Deprecated aliases for
1121 \fB\-b 8\fR, \fB\-b 16\fR, \fB\-b 24\fR, \fB\-b 32\fR, \fB\-b 64\fR
1124 \fB\-c\fR \fICHANNELS\fR, \fB\-\-channels\fR \fICHANNELS\fR
1125 The number of audio channels in the audio file. This can be any number
1128 For an input file, the most common use for this option is to inform
1129 SoX of the number of channels in a `raw' (`headerless') audio file.
1130 Occasionally, it may be useful to use this option with a `headered'
1131 file, in order to override the (presumably incorrect) value in the
1132 header\*mnote that this is only supported with certain file types.
1135 sox \-r 48k \-e float \-b 32 \-c 2 input.raw output.wav
1137 converts a particular `raw' file to a self-describing `WAV' file.
1139 play \-c 1 music.wav
1141 interprets the file data as belonging to a single channel regardless
1142 of what is indicated in the file header. Note that if the file does
1143 in fact have two channels, this will result in the file playing at
1146 For an output file, this option provides a shorthand for specifying
1149 effect should be invoked in order to change (if necessary) the number
1150 of channels in the audio signal to the number given. For
1151 example, the following two commands are equivalent:
1154 sox input.wav \-c 1 output.wav bass \-b 24
1155 sox input.wav output.wav bass \-b 24 channels 1
1157 though the second form is more flexible as it allows the effects to
1158 be ordered arbitrarily.
1160 \fB\-e \fIENCODING\fR, \fB\-\-encoding\fR \fIENCODING\fR
1161 The audio encoding type. Sometimes needed with file-types that
1162 support more than one encoding type. For example, with raw, WAV, or
1163 AU (but not, for example, with MP3 or FLAC).
1164 The available encoding types are as follows:
1166 .IP \fBsigned-integer\fR
1167 PCM data stored as signed (`two's complement') integers. Commonly used
1168 with a 16 or 24 \-bit encoding size.
1169 A value of 0 represents minimum signal power.
1170 .IP \fBunsigned-integer\fR
1171 PCM data stored as unsigned integers. Commonly used
1172 with an 8-bit encoding size. A value of 0 represents maximum signal
1174 .IP \fBfloating-point\fR
1175 PCM data stored as IEEE 753 single precision (32-bit) or double
1176 precision (64-bit) floating-point (`real') numbers.
1177 A value of 0 represents minimum signal power.
1179 International telephony standard for logarithmic encoding to 8 bits per
1180 sample. It has a precision equivalent to roughly 13-bit PCM and is
1181 sometimes encoded with reversed bit-ordering (see the
1184 .IP \fBu-law,\ mu-law\fR
1185 North American telephony standard for logarithmic encoding to 8 bits per
1186 sample. A.k.a. \(*m-law. It has a precision equivalent to roughly
1188 sometimes encoded with reversed bit-ordering (see the
1192 OKI (a.k.a. VOX, Dialogic, or Intel) 4-bit ADPCM;
1193 it has a precision equivalent to roughly 12-bit PCM.
1194 ADPCM is a form of audio compression that has a good
1195 compromise between audio quality and encoding/decoding speed.
1197 IMA (a.k.a. DVI) 4-bit ADPCM;
1198 it has a precision equivalent to roughly 13-bit PCM.
1200 Microsoft 4-bit ADPCM; it has a precision equivalent to roughly 14-bit
1202 .IP \fBgsm-full-rate\fR
1203 GSM is currently used for the vast majority of the world's digital
1204 wireless telephone calls. It utilises several audio
1205 formats with different bit-rates and associated speech quality.
1206 SoX has support for GSM's original 13kbps `Full Rate' audio format.
1207 It is usually CPU-intensive to work with GSM audio.
1211 Encoding names can be abbreviated where this would not be ambiguous;
1212 e.g. `unsigned-integer' can be given as `un', but not `u' (ambiguous
1215 For an input file, the most common use for this option is to inform
1216 SoX of the encoding of a `raw' (`headerless') audio
1217 file (see the examples in
1223 For an output file, this option can be used (perhaps along with
1225 to set the output encoding type For example
1227 sox input.cdda \-e float output1.wav
1229 sox input.cdda \-b 64 \-e float output2.wav
1231 convert raw CD digital audio (16-bit, signed-integer) to
1232 floating-point `WAV' files (single & double precision respectively).
1234 By default (i.e. if this option is not given), the output encoding
1235 type will (providing it is supported by the output file type) be set
1236 to the input encoding type.
1238 \fB\-s\fR\^/\fB\-u\fR\^/\fB\-f\fR\^/\fB\-A\fR\^/\fB\-U\fR\^/\fB\-o\fR\^/\fB\-i\fR\^/\fB\-a\fR\^/\fB\-g\fR
1239 Deprecated aliases for specifying the encoding types
1240 \fBsigned-integer\fR, \fBunsigned-integer\fR, \fBfloating-point\fR, \fBa-law\fR, \fBmu-law\fR, \fBoki-adpcm\fR, \fBima-adpcm\fR, \fBms-adpcm\fR, \fBgsm-full-rate\fR
1246 Specifies that filename `globbing' (wild-card matching) should not be
1247 performed by SoX on the following filename. For example, if the current
1248 directory contains the two files `five-seconds.wav' and `five*.wav', then
1250 play \-\-no\-glob "five*.wav"
1252 can be used to play just the single file `five*.wav'.
1254 \fB\-r, \fB\-\-rate\fR \fIRATE\fR[\fBk\fR]
1255 Gives the sample rate in Hz (or kHz if appended with `k') of the file.
1257 For an input file, the most common use for this option is to inform
1258 SoX of the sample rate of a `raw' (`headerless') audio file (see the
1264 Occasionally it may be useful to use this option with a `headered'
1265 file, in order to override the (presumably incorrect) value in the
1266 header\*mnote that this is only supported with certain file types.
1267 For example, if audio was recorded with a sample-rate of say 48k from
1268 a source that played back a little, say 1\*d5%, too slowly, then
1270 sox \-r 48720 input.wav output.wav
1272 effectively corrects the speed by changing only the file header (but see
1275 effect for the more usual solution to this problem).
1277 For an output file, this option provides a shorthand for specifying
1280 effect should be invoked in order to change (if necessary) the sample
1281 rate of the audio signal to the given value. For example, the
1282 following two commands are equivalent:
1285 sox input.wav \-r 48k output.wav bass \-b 24
1286 sox input.wav output.wav bass \-b 24 rate 48k
1288 though the second form is more flexible as it allows
1290 options to be given, and allows the effects to be ordered arbitrarily.
1292 \fB\-t\fR, \fB\-\-type\fR \fIFILE-TYPE\fR
1293 Gives the type of the audio file. For both input and output files,
1294 this option is commonly used to inform SoX of the type a `headerless'
1295 audio file (e.g. raw, mp3) where the actual/desired type cannot be
1296 determined from a given filename extension. For example:
1298 another-command | sox \-t mp3 \- output.wav
1300 sox input.wav \-t raw output.bin
1302 It can also be used to override the type implied by an input filename
1303 extension, but if overriding with a type that has a header, SoX will
1304 exit with an appropriate error message if such a header is not
1309 for a list of supported file types.
1311 \fB\-L\fR, \fB\-\-endian little\fR
1313 \fB\-B\fR, \fB\-\-endian big\fR
1315 \fB\-x\fR, \fB\-\-endian swap\fR
1320 These options specify whether the byte-order of the audio data is,
1321 respectively, `little endian', `big endian', or the opposite to that of
1322 the system on which SoX is being used. Endianness applies only to data
1323 encoded as floating-point, or as signed or unsigned integers of 16 or
1324 more bits. It is often necessary to specify one of these options for
1325 headerless files, and sometimes necessary for (otherwise)
1326 self-describing files. A given endian-setting option may be ignored
1327 for an input file whose header contains a specific endianness
1328 identifier, or for an output file that is actually an audio device.
1331 Unlike other format characteristics, the endianness (byte, nibble, &
1332 bit ordering) of the input file is not automatically used for the output
1333 file; so, for example, when the following is run on a little-endian system:
1335 sox \-B audio.s16 trimmed.s16 trim 2
1337 trimmed.s16 will be created as little-endian;
1339 sox \-B audio.s16 \-B trimmed.s16 trim 2
1341 must be used to preserve big-endianness in the output file.
1345 option can be used to check the selected orderings.
1347 \fB\-N\fR, \fB\-\-reverse\-nibbles\fR
1348 Specifies that the nibble ordering (i.e. the 2 halves of a byte) of the samples should be reversed;
1349 sometimes useful with ADPCM-based formats.
1352 See also N.B. in section on
1356 \fB\-X\fR, \fB\-\-reverse\-bits\fR
1357 Specifies that the bit ordering of the samples should be reversed;
1358 sometimes useful with a few (mostly headerless) formats.
1361 See also N.B. in section on
1364 .SS Output File Format Options
1365 These options apply only to the output file and may precede only the output
1366 filename on the command line.
1368 \fB\-\-add\-comment \fITEXT\fR
1369 Append a comment in the output file header (where applicable).
1371 \fB\-\-comment \fITEXT\fR
1372 Specify the comment text to store in the output file header (where
1375 SoX will provide a default comment if this option (or
1376 .BR \-\-comment\-file )
1377 is not given. To specify that no comment should be stored in the output file,
1379 .B "\-\-comment \(dq\(dq" .
1381 \fB\-\-comment\-file \fIFILENAME\fR
1382 Specify a file containing the comment text to store in the output
1383 file header (where applicable).
1385 \fB\-C\fR, \fB\-\-compression\fR \fIFACTOR\fR
1386 The compression factor for variably compressing output file formats. If
1387 this option is not given then a default compression factor will apply.
1388 The compression factor is interpreted differently for different
1389 compressing file formats. See the description of the file formats that
1392 for more information.
1394 In addition to converting, playing and recording audio files, SoX can
1395 be used to invoke a number of audio `effects'. Multiple effects may
1396 be applied by specifying them one after another at the end of the SoX
1397 command line, forming an `effects chain'.
1398 Note that applying multiple effects in real-time (i.e. when playing audio)
1399 is likely to require a high performance computer. Stopping other applications
1400 may alleviate performance issues should they occur.
1402 Some of the SoX effects are primarily intended to be applied to a single
1403 instrument or `voice'. To facilitate this, the \fBremix\fR effect and
1404 the global SoX option \fB\-M\fR can be used to isolate then recombine
1405 tracks from a multi-track recording.
1406 .SS Multiple Effect Chains
1407 A single effects chain is made up of one or more effects. Audio from
1408 the input runs through the chain until either the end of the input file
1409 is reached or an effect in the chain requests to terminate the chain.
1411 SoX supports running multiple effects chains over the input audio.
1412 In this case, when one chain indicates it is done processing audio,
1413 the audio data is then sent through the next effects chain. This
1414 continues until either no more effects chains exist or the input has
1415 reached the end of the file.
1417 An effects chain is terminated by placing a
1419 (colon) after an effect. Any following effects are a part of a new effects chain.
1421 It is important to place the effect that will stop the chain
1422 as the first effect in the chain. This is because any samples
1423 that are buffered by effects to the left of the terminating effect
1424 will be discarded. The amount of samples discarded is related to the
1426 option and it should be kept small, relative to the sample rate, if
1427 the terminating effect cannot be first. Further information on
1428 stopping effects can be found in the
1432 There are a few pseudo-effects that aid using multiple effects chains.
1435 which will start writing to a new output file before moving to the
1436 next effects chain and
1438 which will move back to the first effects chain. Pseudo-effects
1439 must be specified as the first effect in a chain and as the only
1440 effect in a chain (they must have a
1442 before and after they are specified).
1444 The following is an example of multiple effects chains. It will split the
1445 input file into multiple files of 30 seconds in length. Each output filename
1446 will have unique number in its name as documented in the
1450 sox infile.wav output.wav trim 0 30 : newfile : restart
1452 .SS Common Notation And Parameters
1453 In the descriptions that follow,
1454 brackets [ ] are used to denote parameters that are optional, braces
1455 { } to denote those that are both optional and repeatable,
1456 and angle brackets < > to denote those that are repeatable but not
1458 Where applicable, default values for optional parameters are shown in parenthesis ( ).
1460 The following parameters are used with, and have the same meaning for,
1463 \fIcenter\fR[\fBk\fR]
1467 \fIfrequency\fR[\fBk\fR]
1468 A frequency in Hz, or, if appended with `k', kHz.
1472 Zero gives no gain; less than zero gives an attenuation.
1474 \fIwidth\fR[\fBh\fR\^|\^\fBk\fR\^|\^\fBo\fR\^|\^\fBq\fR]
1475 Used to specify the band-width of a filter. A number of different
1476 methods to specify the width are available (though not all for every effect).
1477 One of the characters shown may be appended to select the desired method
1492 For each effect that uses this parameter, the default method (i.e. if no
1493 character is appended) is the one that it listed first in the first line of
1494 the effect's description.
1496 To see if SoX has support for an optional effect, enter
1498 and look for its name under the list: `EFFECTS'.
1499 .SS Supported Effects
1500 Note: a categorised list of the effects can be found in the
1501 accompanying `README' file.
1503 \fBallpass\fR \fIfrequency\fR[\fBk\fR]\fI width\fR[\fBh\fR\^|\^\fBk\fR\^|\^\fBo\fR\^|\^\fBq\fR]
1504 Apply a two-pole all-pass filter with central frequency (in Hz)
1505 \fIfrequency\fR, and filter-width \fIwidth\fR.
1506 An all-pass filter changes the
1507 audio's frequency to phase relationship without changing its frequency
1508 to amplitude relationship. The filter is described in detail in [1].
1510 This effect supports the \fB\-\-plot\fR global option.
1512 \fBband\fR [\fB\-n\fR] \fIcenter\fR[\fBk\fR]\fR [\fIwidth\fR[\fBh\fR\^|\^\fBk\fR\^|\^\fBo\fR\^|\^\fBq\fR]]
1513 Apply a band-pass filter.
1514 The frequency response drops logarithmically
1520 parameter gives the slope of the drop.
1529 will be half of their original amplitudes.
1531 defaults to a mode oriented to pitched audio,
1532 i.e. voice, singing, or instrumental music.
1533 The \fB\-n\fR (for noise) option uses the alternate mode
1534 for un-pitched audio (e.g. percussion).
1536 \fB\-n\fR introduces a power-gain of about 11dB in the filter, so beware
1539 introduces noise in the shape of the filter,
1542 frequency and settling around it.
1544 This effect supports the \fB\-\-plot\fR global option.
1546 See also \fBsinc\fR for a bandpass filter with steeper shoulders.
1548 \fBbandpass\fR\^|\^\fBbandreject\fR [\fB\-c\fR] \fIfrequency\fR[\fBk\fR]\fI width\fR[\fBh\fR\^|\^\fBk\fR\^|\^\fBo\fR\^|\^\fBq\fR]
1549 Apply a two-pole Butterworth band-pass or band-reject filter with
1550 central frequency \fIfrequency\fR, and (3dB-point) band-width
1553 option applies only to
1555 and selects a constant skirt gain (peak gain = Q) instead of the
1556 default: constant 0dB peak gain.
1557 The filters roll off at 6dB per octave (20dB per decade)
1558 and are described in detail in [1].
1560 These effects support the \fB\-\-plot\fR global option.
1562 See also \fBsinc\fR for a bandpass filter with steeper shoulders.
1564 \fBbandreject \fIfrequency\fR[\fBk\fR]\fI width\fR[\fBh\fR\^|\^\fBk\fR\^|\^\fBo\fR\^|\^\fBq\fR]
1565 Apply a band-reject filter.
1566 See the description of the \fBbandpass\fR effect for details.
1568 \fBbass\fR\^|\^\fBtreble \fIgain\fR [\fIfrequency\fR[\fBk\fR]\fR [\fIwidth\fR[\fBs\fR\^|\^\fBh\fR\^|\^\fBk\fR\^|\^\fBo\fR\^|\^\fBq\fR]]]
1569 Boost or cut the bass (lower) or treble (upper) frequencies of the audio
1570 using a two-pole shelving filter with a response similar to that
1571 of a standard hi-fi's tone-controls. This is also
1572 known as shelving equalisation (EQ).
1574 \fIgain\fR gives the gain at 0\ Hz (for \fBbass\fR), or whichever is
1575 the lower of \(ap22\ kHz and the Nyquist frequency (for \fBtreble\fR). Its
1576 useful range is about \-20 (for a large cut) to +20 (for a large
1580 when using a positive \fIgain\fR.
1582 If desired, the filter can be fine-tuned using the following
1583 optional parameters:
1585 \fIfrequency\fR sets the filter's central frequency and so can be
1586 used to extend or reduce the frequency range to be boosted or
1587 cut. The default value is 100\ Hz (for \fBbass\fR) or 3\ kHz (for
1592 steep is the filter's shelf transition. In addition to the common
1593 width specification methods described above,
1594 `slope' (the default, or if appended with `\fBs\fR') may be used.
1595 The useful range of `slope' is
1596 about 0\*d3, for a gentle slope, to 1 (the maximum), for a steep slope; the
1597 default value is 0\*d5.
1599 The filters are described in detail in [1].
1601 These effects support the \fB\-\-plot\fR global option.
1603 See also \fBequalizer\fR for a peaking equalisation effect.
1605 \fBbend\fR [\fB\-f \fIframe-rate\fR(25)] [\fB\-o \fIover-sample\fR(16)] { \fIdelay\fB,\fIcents\fB,\fIduration\fR }
1606 Changes pitch by specified amounts at specified times.
1607 Each given triple: \fIdelay\fB,\fIcents\fB,\fIduration\fR specifies one bend.
1609 is the amount of time after the start of the audio stream, or the end of the previous bend, at which to start bending the pitch;
1611 is the number of cents (100 cents = 1 semitone) by which to bend the pitch, and
1613 the length of time over which the pitch will be bent.
1615 The pitch-bending algorithm utilises the Discrete Fourier Transform (DFT)
1616 at a particular frame rate and over-sampling rate.
1621 parameters may be used to adjust these parameters and thus control the
1622 smoothness of the changes in pitch.
1624 For example, an initial tone is generated, then bent three times, yielding
1625 four different notes in total:
1628 play \-n synth 2.5 sin 667 gain 1 \\
1629 bend .35,180,.25 .15,740,.53 0,\-520,.3
1631 Note that the clipping that is produced in this example is deliberate;
1637 See also \fBpitch\fR.
1639 \fBbiquad \fIb0 b1 b2 a0 a1 a2\fR
1640 Apply a biquad IIR filter with the given coefficients. Where b* and a* are
1641 the numerator and denominator coefficients respectively.
1643 See http://en.wikipedia.org/wiki/Digital_biquad_filter (where a0 = 1).
1645 This effect supports the \fB\-\-plot\fR global option.
1647 \fBchannels \fICHANNELS\fR
1648 Invoke a simple algorithm to change the number of channels in
1649 the audio signal to the given number
1651 mixing if decreasing the number of channels or duplicating if
1652 increasing the number of channels.
1656 effect is invoked automatically if SoX's \fB\-c\fR option specifies a
1657 number of channels that is different to that of the input file(s).
1658 Alternatively, if this effect is given explicitly, then SoX's
1660 option need not be given. For example, the following two commands are
1664 sox input.wav \-c 1 output.wav bass \-b 24
1665 sox input.wav output.wav bass \-b 24 channels 1
1667 though the second form is more flexible as it allows the effects to
1668 be ordered arbitrarily.
1672 for an effect that allows channels to be mixed/selected arbitrarily.
1674 \fBchorus \fIgain-in gain-out\fR <\fIdelay decay speed depth \fB\-s\fR\^|\^\fB\-t\fR>
1675 Add a chorus effect to the audio. This can make a single vocal sound
1676 like a chorus, but can also be applied to instrumentation.
1678 Chorus resembles an echo effect with a short delay, but
1679 whereas with echo the delay is constant, with chorus, it
1680 is varied using sinusoidal or triangular modulation. The modulation
1681 depth defines the range the modulated delay is played before or after the
1682 delay. Hence the delayed sound will sound slower or faster, that is the delayed
1683 sound tuned around the original one, like in a chorus where some vocals are
1685 See [3] for more discussion of the chorus effect.
1687 Each four-tuple parameter
1688 delay/decay/speed/depth gives the delay in milliseconds
1689 and the decay (relative to gain-in) with a modulation
1690 speed in Hz using depth in milliseconds.
1691 The modulation is either sinusoidal (\fB\-s\fR) or triangular
1692 (\fB\-t\fR). Gain-out is the volume of the output.
1694 A typical delay is around 40ms to 60ms; the modulation speed is best
1695 near 0\*d25Hz and the modulation depth around 2ms.
1696 For example, a single delay:
1698 play guitar1.wav chorus 0.7 0.9 55 0.4 0.25 2 \-t
1700 Two delays of the original samples:
1703 play guitar1.wav chorus 0.6 0.9 50 0.4 0.25 2 \-t \\
1706 A fuller sounding chorus (with three additional delays):
1709 play guitar1.wav chorus 0.5 0.9 50 0.4 0.25 2 \-t \\
1710 60 0.32 0.4 2.3 \-t 40 0.3 0.3 1.3 \-s
1713 \fBcompand \fIattack1\fB,\fIdecay1\fR{\fB,\fIattack2\fB,\fIdecay2\fR}
1714 [\fIsoft-knee-dB\fB:\fR]\fIin-dB1\fR[\fB,\fIout-dB1\fR]{\fB,\fIin-dB2\fB,\fIout-dB2\fR}
1716 [\fIgain\fR [\fIinitial-volume-dB\fR [\fIdelay\fR]]]
1718 Compand (compress or expand) the dynamic range of the audio.
1724 parameters (in seconds) determine the time over which the
1725 instantaneous level of the input signal is averaged to determine its
1726 volume; attacks refer to increases in volume and decays refer to
1728 For most situations, the attack time (response to the music getting
1729 louder) should be shorter than the decay time because the human ear is more
1730 sensitive to sudden loud music than sudden soft music.
1731 Where more than one pair of attack/decay parameters are
1732 specified, each input channel is companded separately and the number of
1733 pairs must agree with the number of input channels.
1738 The second parameter is a list of points on the compander's transfer
1739 function specified in dB relative to the maximum possible signal
1740 amplitude. The input values must be in a strictly increasing order but
1741 the transfer function does not have to be monotonically rising. If
1742 omitted, the value of
1744 defaults to the same value as
1748 are not companded (but may have gain applied to them).
1749 The point \fB0,0\fR is assumed but may be overridden (by
1750 \fB0,\fIout-dBn\fR).
1751 If the list is preceded by a
1753 value, then the points at where adjacent line segments on the
1754 transfer function meet will be rounded by the amount given.
1755 Typical values for the transfer function are
1756 .BR 6:\-70,\-60,\-20 .
1758 The third (optional) parameter is an additional gain in dB to be applied
1759 at all points on the transfer function and allows easy adjustment
1760 of the overall gain.
1762 The fourth (optional) parameter is an initial level to be assumed for
1763 each channel when companding starts. This permits the user to supply a
1764 nominal level initially, so that, for example, a very large gain is not
1765 applied to initial signal levels before the companding action has begun
1766 to operate: it is quite probable that in such an event, the output would
1767 be severely clipped while the compander gain properly adjusts itself.
1768 A typical value (for audio which is initially quiet) is
1772 The fifth (optional) parameter is a delay in seconds. The input signal
1773 is analysed immediately to control the compander, but it is delayed
1774 before being fed to the volume adjuster. Specifying a delay
1775 approximately equal to the attack/decay times allows the compander to
1776 effectively operate in a `predictive' rather than a reactive mode.
1787 The following example might be used to make a piece of music with both
1788 quiet and loud passages suitable for listening to in a noisy environment
1789 such as a moving vehicle:
1791 sox asz.wav asz-car.wav compand 0.3,1 6:\-70,\-60,\-20 \-5 \-90 0.2
1793 The transfer function (`6:\-70,...') says that very soft sounds (below
1794 \-70dB) will remain unchanged. This will stop the compander from
1795 boosting the volume on `silent' passages such as between movements.
1796 However, sounds in the range \-60dB to 0dB (maximum
1797 volume) will be boosted so that the 60dB dynamic range of the
1798 original music will be compressed 3-to-1 into a 20dB range, which is
1799 wide enough to enjoy the music but narrow enough to get around the
1800 road noise. The `6:' selects 6dB soft-knee companding.
1801 The \-5 (dB) output gain is needed to avoid clipping (the number is
1802 inexact, and was derived by experimentation).
1803 The \-90 (dB) for the initial volume will work fine for a clip that starts
1804 with near silence, and the delay of 0\*d2 (seconds) has the effect of causing
1805 the compander to react a bit more quickly to sudden volume changes.
1807 In the next example, compand is being used as a noise-gate for when the
1808 noise is at a lower level than the signal:
1810 play infile compand .1,.2 \-inf,\-50.1,\-inf,\-50,\-50 0 \-90 .1
1812 Here is another noise-gate, this time for when the
1813 noise is at a higher level than the signal (making it, in some ways,
1814 similar to squelch):
1816 play infile compand .1,.1 \-45.1,\-45,\-inf,0,\-inf 45 \-90 .1
1818 This effect supports the \fB\-\-plot\fR global option (for the transfer function).
1822 for a multiple-band companding effect.
1824 \fBcontrast \fR[\fIenhancement-amount\fR(75)]
1825 Comparable with compression, this effect modifies an audio signal to
1826 make it sound louder.
1827 .I enhancement-amount
1828 controls the amount of the enhancement and is a number in the range 0\-100.
1830 .I enhancement-amount
1831 = 0 still gives a significant contrast enhancement.
1839 \fBdcshift \fIshift\fR [\fIlimitergain\fR]
1840 Apply a DC shift to the audio. This can be useful to remove a DC
1841 offset (caused perhaps by a hardware problem in the recording chain)
1842 from the audio. The effect of a DC offset is reduced headroom and
1848 effect can be used to determine if a signal has a DC offset.
1850 The given \fIdcshift\fR value is a floating point number in the range
1851 of \(+-2 that indicates the amount to shift the audio (which is in the
1856 can be specified as well. It should have a value much less than 1
1857 (e.g. 0\*d05 or 0\*d02) and is used only on peaks to prevent clipping.
1865 An alternative approach to removing a DC offset (albeit with a short delay)
1868 filter effect at a frequency of say 10Hz, as illustrated in the following
1871 sox \-n dc.wav synth 5 sin %0 50
1872 sox dc.wav fixed.wav highpass 10
1876 Apply Compact Disc (IEC 60908) de-emphasis (a treble attenuation shelving
1879 Pre-emphasis was applied in the mastering of some CDs issued in the early
1880 1980s. These included many classical music albums, as well as now
1881 sought-after issues of albums by The Beatles, Pink Floyd and others.
1882 Pre-emphasis should be removed at playback time by a de-emphasis
1883 filter in the playback device. However, not all modern CD players have
1884 this filter, and very few PC CD drives have it; playing pre-emphasised
1885 audio without the correct de-emphasis filter results in audio that sounds harsh
1886 and is far from what its creators intended.
1890 effect, it is possible to apply the necessary de-emphasis to audio that
1891 has been extracted from a pre-emphasised CD, and then either burn the
1892 de-emphasised audio to a new CD (which will then play correctly on any
1893 CD player), or simply play the correctly de-emphasised audio files on the
1896 sox track1.wav track1\-deemph.wav deemph
1898 and then burn track1-deemph.wav to CD, or
1900 play track1\-deemph.wav
1904 play track1.wav deemph
1906 The de-emphasis filter is implemented as a biquad; its maximum deviation
1907 from the ideal response is only 0\*d06dB (up to 20kHz).
1909 This effect supports the \fB\-\-plot\fR global option.
1911 See also the \fBbass\fR and \fBtreble\fR shelving equalisation effects.
1913 \fBdelay\fR {\fIlength\fR}
1914 Delay one or more audio channels.
1916 can specify a time or, if appended with an `s', a number of samples.
1917 Do not specify both time and samples delays in the same command.
1919 .B delay 1\*d5 0 0\*d5
1920 delays the first channel by 1\*d5 seconds, the third channel by 0\*d5
1921 seconds, and leaves the second channel (and any other channels that may be
1922 present) un-delayed.
1923 The following (one long) command plays a chime sound:
1926 play \-n synth \-j 3 sin %3 sin %\-2 sin %\-5 sin %\-9 \\
1927 sin %\-14 sin %\-21 fade h .01 2 1.5 delay \\
1928 1.3 1 .76 .54 .27 remix \- fade h 0 2.7 2.5 norm \-1
1930 and this plays a guitar chord:
1933 play \-n synth pl G2 pl B2 pl D3 pl G3 pl D4 pl G4 \\
1934 delay 0 .05 .1 .15 .2 .25 remix \- fade 0 4 .1 norm \-1
1937 \fBdither\fR [\fB\-S\fR\^|\^\fB\-s\fR\^|\^\fB\-f \fIfilter\fR] [\fB\-a\fR] [\fB\-p \fIprecision\fR]
1938 Apply dithering to the audio.
1939 Dithering deliberately adds a small amount of noise to the signal in
1940 order to mask audible quantization effects that can occur if the output
1941 sample size is less than 24 bits. With no options, this effect will
1942 add triangular (TPDF) white noise. Noise-shaping (only for certain
1943 sample rates) can be selected with
1947 option, it is possible to select a particular noise-shaping filter from
1948 the following list: lipshitz, f-weighted, modified-e-weighted,
1949 improved-e-weighted, gesemann, shibata, low-shibata, high-shibata. Note
1950 that most filter types are available only with 44100Hz sample rate. The
1951 filter types are distinguished by the following properties: audibility
1952 of noise, level of (inaudible, but in some circumstances, otherwise
1953 problematic) shaped high frequency noise, and processing speed.
1955 See http://sox.sourceforge.net/SoX/NoiseShaping for graphs of the different
1956 noise-shaping curves.
1960 option selects a slightly `sloped' TPDF, biased towards higher
1961 frequencies. It can be used at any sampling rate but below \(~~22k,
1962 plain TPDF is probably better, and above \(~~ 37k, noise-shaped
1967 option enables a mode where dithering (and noise-shaping if applicable)
1968 are automatically enabled only when needed. The most likely use for
1969 this is when applying fade in or out to an already dithered file, so
1970 that the redithering applies only to the faded portions. However, auto
1971 dithering is not fool-proof, so the fades should be carefully checked
1972 for any noise modulation; if this occurs, then either re-dither the whole
1980 option allows overriding the target precision.
1982 If the SoX global option
1984 option is not given, then the pseudo-random number generator used to
1985 generate the white noise will be `reseeded', i.e. the generated noise
1986 will be different between invocations.
1988 This effect should not be followed by any other effect that
1991 See also the `Dithering' section above.
1993 \fBdownsample\fR [\fIfactor\fR(2)]
1994 Downsample the signal by an integer factor: Only the first out of
1995 each \fIfactor\fR samples is retained, the others are discarded.
1997 No decimation filter is applied. If the input is not a properly
1998 bandlimited baseband signal, aliasing will occur. This may be
1999 desirable, e.g., for frequency translation.
2001 For a general resampling effect with anti-aliasing, see \fBrate\fR. See
2002 also \fBupsample\fR.
2005 Makes audio easier to listen to on headphones.
2006 Adds `cues' to 44\*d1kHz stereo (i.e. audio CD format) audio so that
2007 when listened to on headphones the stereo image is
2009 your head (standard for headphones) to outside and in front of the
2010 listener (standard for speakers).
2012 \fBecho \fIgain-in gain-out\fR <\fIdelay decay\fR>
2013 Add echoing to the audio.
2014 Echoes are reflected sound and can occur naturally amongst mountains
2015 (and sometimes large buildings) when talking or shouting; digital echo
2016 effects emulate this behaviour and are often used to help fill
2017 out the sound of a single instrument or vocal. The time difference
2018 between the original signal and the reflection is the `delay' (time),
2019 and the loudness of the reflected signal is the `decay'. Multiple echoes
2020 can have different delays and decays.
2024 pair gives the delay in milliseconds
2025 and the decay (relative to gain-in) of that echo.
2026 Gain-out is the volume of the output.
2028 This will make it sound as if there are twice as many instruments as are
2031 play lead.aiff echo 0.8 0.88 60 0.4
2033 If the delay is very short, then it sound like a (metallic) robot playing
2036 play lead.aiff echo 0.8 0.88 6 0.4
2038 A longer delay will sound like an open air concert in the mountains:
2040 play lead.aiff echo 0.8 0.9 1000 0.3
2042 One mountain more, and:
2044 play lead.aiff echo 0.8 0.9 1000 0.3 1800 0.25
2047 \fBechos \fIgain-in gain-out\fR <\fIdelay decay\fR>
2048 Add a sequence of echoes to the audio.
2051 pair gives the delay in milliseconds
2052 and the decay (relative to gain-in) of that echo.
2053 Gain-out is the volume of the output.
2055 Like the echo effect, echos stand for `ECHO in Sequel', that is the first echos
2056 takes the input, the second the input and the first echos, the third the input
2057 and the first and the second echos, ... and so on.
2058 Care should be taken using many echos; a single echos
2059 has the same effect as a single echo.
2061 The sample will be bounced twice in symmetric echos:
2063 play lead.aiff echos 0.8 0.7 700 0.25 700 0.3
2065 The sample will be bounced twice in asymmetric echos:
2067 play lead.aiff echos 0.8 0.7 700 0.25 900 0.3
2069 The sample will sound as if played in a garage:
2071 play lead.aiff echos 0.8 0.7 40 0.25 63 0.3
2074 \fBequalizer \fIfrequency\fR[\fBk\fR]\fI width\fR[\fBq\fR\^|\^\fBo\fR\^|\^\fBh\fR\^|\^\fBk\fR] \fIgain\fR
2075 Apply a two-pole peaking equalisation (EQ) filter.
2076 With this filter, the signal-level at and around a selected frequency
2077 can be increased or decreased, whilst (unlike band-pass and band-reject
2078 filters) that at all other frequencies is unchanged.
2080 \fIfrequency\fR gives the filter's central frequency in Hz,
2081 \fIwidth\fR, the band-width,
2082 and \fIgain\fR the required gain
2083 or attenuation in dB.
2086 when using a positive \fIgain\fR.
2088 In order to produce complex equalisation curves, this effect
2089 can be given several times, each with a different central frequency.
2091 The filter is described in detail in [1].
2093 This effect supports the \fB\-\-plot\fR global option.
2095 See also \fBbass\fR and \fBtreble\fR for shelving equalisation effects.
2097 \fBfade\fR [\fItype\fR] \fIfade-in-length\fR [\fIstop-time\fR [\fIfade-out-length\fR]]
2098 Apply a fade effect to the beginning, end, or both of the audio.
2100 An optional \fItype\fR can be specified to select the shape of the fade
2102 \fBq\fR for quarter of a sine wave, \fBh\fR for half a sine
2103 wave, \fBt\fR for linear (`triangular') slope, \fBl\fR for logarithmic,
2104 and \fBp\fR for inverted parabola. The default is logarithmic.
2106 A fade-in starts from the first sample and ramps the signal level from 0 to full volume over \fIfade-in-length\fR seconds. Specify 0 seconds if no fade-in is wanted.
2108 For fade-outs, the audio will be truncated at
2111 the signal level will be ramped from full volume down to 0 starting at
2112 \fIfade-out-length\fR seconds before the \fIstop-time\fR. If
2114 is not specified, it defaults to the same value as
2115 \fIfade-in-length\fR.
2116 No fade-out is performed if
2119 If the file length can be determined from the input file header and length-changing effects are not in effect, then \fB0\fR may be specified for
2121 to indicate the usual case of a fade-out that ends at the end of the input
2124 All times can be specified in either periods of time or sample counts.
2125 To specify time periods use the format hh:mm:ss.frac format. To specify
2126 using sample counts, specify the number of samples and append the letter `s'
2127 to the sample count (for example `8000s').
2133 \fBfir\fR [\fIcoefs-file\fR\^|\^\fIcoefs\fR]
2134 Use SoX's FFT convolution engine with given FIR filter
2136 If a single argument is given then this is treated as the name of a file
2137 containing the filter coefficients (white-space separated; may contain
2138 `#' comments). If the given filename is `\-', or if no argument is
2139 given, then the coefficients are read from the `standard input' (stdin);
2140 otherwise, coefficients may be given on the command line.
2143 sox infile outfile fir 0.0195 \-0.082 0.234 0.891 \-0.145 0.043
2146 sox infile outfile fir coefs.txt
2148 with coefs.txt containing
2152 1.2311233052619888e\-01
2153 \-4.4777096106211783e\-01
2154 5.1031563346705155e\-01
2155 \-6.6502926320995331e\-02
2159 This effect supports the \fB\-\-plot\fR global option.
2161 \fBflanger\fR [\fIdelay depth regen width speed shape phase interp\fR]
2162 Apply a flanging effect to the audio.
2163 See [3] for a detailed description of flanging.
2165 All parameters are optional (right to left).
2171 \ Range Default Description
2172 delay 0 \- 30 0 Base delay in milliseconds.
2173 depth 0 \- 10 2 Added swept delay in milliseconds.
2174 regen \-95 \- 95 0 T{
2176 Percentage regeneration (delayed signal feedback).
2178 width 0 \- 100 71 T{
2180 Percentage of delayed signal mixed with original.
2182 speed 0\*d1 \- 10 0\*d5 Sweeps per second (Hz).
2183 shape \ sin Swept wave shape: \fBsine\fR\^|\^\fBtriangle\fR.
2184 phase 0 \- 100 25 T{
2186 Swept wave percentage phase-shift for multi-channel (e.g. stereo) flange;
2187 0 = 100 = same phase on each channel.
2191 Digital delay-line interpolation: \fBlinear\fR\^|\^\fBquadratic\fR.
2196 \fBgain \fR[\fB\-e\fR\^|\^\fB\-B\fR\^|\^\fB\-b\fR\^|\^\fB\-r\fR] [\fB\-n\fR] [\fB\-l\fR\^|\^\fB\-h\fR] [\fIgain-dB\fR]
2197 Apply amplification or attenuation to the audio signal, or, in some
2198 cases, to some of its channels.
2199 Note that use of any of
2206 requires temporary file space to store the audio to be processed, so may
2207 be unsuitable for use with `streamed' audio.
2209 Without other options,
2211 is used to adjust the signal power level by the given number of dB:
2212 positive amplifies (beware of Clipping), negative attenuates. With
2215 amplification or attenuation is (logically) applied after the processing due to those options.
2219 option, the levels of the audio channels of a multi-channel file are `equalised', i.e.
2220 gain is applied to all channels other than that with the highest peak
2221 level, such that all channels attain the same peak level
2222 (but, without also giving
2224 the audio is not `normalised').
2228 (balance) option is similar to
2232 the RMS level is used instead of the peak level.
2234 might be used to correct stereo imbalance caused by an imperfect record
2235 turntable cartridge. Note
2239 might cause some clipping.
2244 but has clipping protection, i.e. if necessary to prevent clipping
2245 whilst balancing, attenuation is applied to all channels.
2246 Note, however, that in conjunction with
2255 option is used in conjunction with a prior invocation of
2259 option\*msee below for details.
2263 option normalises the audio to 0dB FSD; it is often used in conjunction with a negative
2265 to the effect that the audio is normalised to a given level below 0dB.
2268 sox infile outfile gain \-n
2270 normalises to 0dB, and
2272 sox infile outfile gain \-n \-3
2274 normalises to \-3dB.
2278 option invokes a simple limiter, e.g.
2280 sox infile outfile gain \-l 6
2282 will apply 6dB of gain but never clip. Note that limiting more than a
2283 few dBs more than occasionally (in a piece of audio) is not recommended
2284 as it can cause audible distortion.
2287 effect for a more capable limiter.
2291 option is used to apply gain to provide head-room for subsequent
2292 processing. For example, with
2294 sox infile outfile gain \-h bass +6
2296 6dB of attenuation will be applied prior to the bass boosting effect
2297 thus ensuring that it will not clip. Of course, with bass, it is
2298 obvious how much headroom will be needed, but with other effects (e.g.
2299 rate, dither) it is not always as clear. Another advantage of using
2300 \fBgain \-h\fR rather than an explicit attenuation, is that if the
2301 headroom is not used by subsequent effects, it can be reclaimed with
2302 \fBgain \-r\fR, for example:
2304 sox infile outfile gain \-h bass +6 rate 44100 gain \-r
2306 The above effects chain guarantees never to clip nor amplify;
2307 it attenuates if necessary to prevent clipping, but by only as
2308 much as is needed to do so.
2310 Output formatting (dithering and bit-depth reduction) also requires
2311 headroom (which cannot be `reclaimed'), e.g.
2313 sox infile outfile gain \-h bass +6 rate 44100 gain \-rh dither
2317 invocation, reclaims as much of the headroom as it can from the
2318 preceding effects, but retains as much headroom as is needed for
2319 subsequent processing.
2320 The SoX global option
2322 can be given to automatically invoke \fBgain \-h\fR and \fBgain \-r\fR.
2330 \fBhighpass\fR\^|\^\fBlowpass\fR [\fB\-1\fR|\fB\-2\fR] \fIfrequency\fR[\fBk\fR]\fR [\fRwidth\fR[\fBq\fR\^|\^\fBo\fR\^|\^\fBh\fR\^|\^\fBk\fR]]
2331 Apply a high-pass or low-pass filter with 3dB point \fIfrequency\fR.
2332 The filter can be either single-pole (with
2334 or double-pole (the default, or with
2337 applies only to double-pole filters;
2338 the default is Q = 0\*d707 and gives a Butterworth response. The filters
2339 roll off at 6dB per pole per octave (20dB per pole per decade). The
2340 double-pole filters are described in detail in [1].
2342 These effects support the \fB\-\-plot\fR global option.
2344 See also \fBsinc\fR for filters with a steeper roll-off.
2346 \fBhilbert\fR [\fB\-n \fItaps\fR]
2347 Apply an odd-tap Hilbert transform filter, phase-shifting the signal
2350 This is used in many matrix coding schemes and for analytic signal
2351 generation. The process is often written as a multiplication by \fIi\fR
2352 (or \fIj\fR), the imaginary unit.
2354 An odd-tap Hilbert transform filter has a bandpass characteristic,
2355 attenuating the lowest and highest frequencies. Its bandwidth can be
2356 controlled by the number of filter taps, which can be specified with
2357 \fB\-n\fR. By default, the number of taps is chosen for a cutoff
2358 frequency of about 75 Hz.
2360 This effect supports the \fB\-\-plot\fR global option.
2362 \fBladspa\fR \fBmodule\fR [\fBplugin\fR] [\fBargument\fR...]
2363 Apply a LADSPA [5] (Linux Audio Developer's Simple Plugin API) plugin.
2364 Despite the name, LADSPA is not Linux-specific, and a wide range of
2365 effects is available as LADSPA plugins, such as cmt [6] (the Computer
2366 Music Toolkit) and Steve Harris's plugin collection [7]. The first
2367 argument is the plugin module, the second the name of the plugin (a
2368 module can contain more than one plugin) and any other arguments are
2369 for the control ports of the plugin. Missing arguments are supplied by
2370 default values if possible. Only plugins with at most one audio input
2371 and one audio output port can be used. If found, the environment variable
2372 LADSPA_PATH will be used as search path for plugins.
2374 \fBloudness\fR [\fIgain\fR [\fIreference\fR]]
2375 Loudness control\*msimilar to the
2377 effect, but provides equalisation for the human auditory system. See
2378 http://en.wikipedia.org/wiki/Loudness for a detailed description of
2379 loudness. The gain is adjusted by the given
2381 parameter (usually negative) and the signal equalised according to ISO
2382 226 w.r.t. a reference level of 65dB, though an alternative
2384 level may be given if the original audio has been equalised for some
2385 other optimal level.
2386 A default gain of \-10dB is used if a
2394 \fBlowpass\fR [\fB\-1\fR|\fB\-2\fR] \fIfrequency\fR[\fBk\fR]\fR [\fRwidth\fR[\fBq\fR\^|\^\fBo\fR\^|\^\fBh\fR\^|\^\fBk\fR]]
2395 Apply a low-pass filter.
2396 See the description of the \fBhighpass\fR effect for details.
2398 \fBmcompand\fR \(dq\fIattack1\fB,\fIdecay1\fR{\fB,\fIattack2\fB,\fIdecay2\fR}
2399 [\fIsoft-knee-dB\fB:\fR]\fIin-dB1\fR[\fB,\fIout-dB1\fR]{\fB,\fIin-dB2\fB,\fIout-dB2\fR}
2401 [\fIgain\fR [\fIinitial-volume-dB\fR [\fIdelay\fR]]]\(dq {\fIcrossover-freq\fR[\fBk\fR] \(dqattack1,...\(dq}
2403 The multi-band compander is similar to the single-band compander but the
2404 audio is first divided into bands using Linkwitz-Riley cross-over filters
2405 and a separately specifiable compander run on each band. See the
2406 \fBcompand\fR effect for the definition of its parameters. Compand
2407 parameters are specified between double quotes and the crossover
2408 frequency for that band is given by \fIcrossover-freq\fR; these can be
2409 repeated to create multiple bands.
2411 For example, the following (one long) command shows how multi-band
2412 companding is typically used in FM radio:
2415 play track1.wav gain \-3 sinc 8000\- 29 100 mcompand \\
2416 \(dq0.005,0.1 \-47,\-40,\-34,\-34,\-17,\-33\(dq 100 \\
2417 \(dq0.003,0.05 \-47,\-40,\-34,\-34,\-17,\-33\(dq 400 \\
2418 \(dq0.000625,0.0125 \-47,\-40,\-34,\-34,\-15,\-33\(dq 1600 \\
2419 \(dq0.0001,0.025 \-47,\-40,\-34,\-34,\-31,\-31,\-0,\-30\(dq 6400 \\
2420 \(dq0,0.025 \-38,\-31,\-28,\-28,\-0,\-25\(dq \\
2421 gain 15 highpass 22 highpass 22 sinc \-n 255 \-b 16 \-17500 \\
2422 gain 9 lowpass \-1 17801
2424 The audio file is played with a simulated FM radio sound (or broadcast
2425 signal condition if the lowpass filter at the end is skipped).
2426 Note that the pipeline is set up with US-style 75us pre-emphasis.
2430 for a single-band companding effect.
2432 \fBnoiseprof\fR [\fIprofile-file\fR]
2433 Calculate a profile of the audio for use in noise reduction. See the
2434 description of the \fBnoisered\fR effect for details.
2436 \fBnoisered\fR [\fIprofile-file\fR [\fIamount\fR]]
2437 Reduce noise in the audio signal by profiling and filtering. This
2438 effect is moderately effective at removing consistent background noise
2439 such as hiss or hum. To use it, first run SoX with the \fBnoiseprof\fR
2440 effect on a section of audio that ideally would contain silence but in
2441 fact contains noise\*msuch sections are typically found at the beginning
2442 or the end of a recording. \fBnoiseprof\fR will write out a noise
2443 profile to \fIprofile-file\fR, or to stdout if no \fIprofile-file\fR or
2444 if `\-' is given. E.g.
2446 sox speech.wav \-n trim 0 1.5 noiseprof speech.noise-profile
2448 To actually remove the noise, run SoX again, this time with the \fBnoisered\fR
2451 will reduce noise according to a noise profile (which was generated by
2455 or from stdin if no \fIprofile-file\fR or if `\-' is given. E.g.
2457 sox speech.wav cleaned.wav noisered speech.noise-profile 0.3
2459 How much noise should be removed is specified by
2461 number between 0 and 1 with a default of 0\*d5. Higher numbers will
2462 remove more noise but present a greater likelihood of removing wanted
2463 components of the audio signal. Before replacing an original recording
2464 with a noise-reduced version, experiment with different
2466 values to find the optimal one for your audio; use headphones to check
2467 that you are happy with the results, paying particular attention to quieter
2468 sections of the audio.
2470 On most systems, the two stages\*mprofiling and reduction\*mcan be combined
2473 sox noisy.wav \-n trim 0 1 noiseprof | play noisy.wav noisered
2476 \fBnorm\fR [\fIdB-level\fR]
2477 Normalise the audio.
2479 is just an alias for \fBgain \-n\fR; see the
2484 Out Of Phase Stereo effect.
2485 Mixes stereo to twin-mono where each mono channel contains the
2486 difference between the left and right stereo channels.
2487 This is sometimes known as the `karaoke' effect as it often has the effect
2488 of removing most or all of the vocals from a recording.
2489 It is equivalent to \fBremix 1,2i 1,2i\fR.
2491 \fBoverdrive\fR [\fIgain\fR(20) [\fIcolour\fR(20)]]
2492 Non linear distortion.
2493 The \fIcolour\fR parameter controls the amount of even harmonic content
2494 in the over-driven output.
2496 \fBpad\fR { \fIlength\fR[\fB@\fIposition\fR] }
2497 Pad the audio with silence, at the beginning, the end, or any
2498 specified points through the audio.
2503 can specify a time or, if appended with an `s', a number of samples.
2505 is the amount of silence to insert and
2507 the position in the input audio stream at which to insert it.
2508 Any number of lengths and positions may be specified, provided that
2509 a specified position is not less that the previous one.
2511 is optional for the first and last lengths specified and
2512 if omitted correspond to the beginning and the end of the audio respectively.
2515 adds 1\*d5 seconds of silence padding at each end of the audio, whilst
2517 inserts 4000 samples of silence 3 minutes into the audio.
2518 If silence is wanted only at the end of the audio, specify either the end
2519 position or specify a zero-length pad at the start.
2523 for an effect that can add silence at the beginning of
2524 the audio on a channel-by-channel basis.
2526 \fBphaser \fIgain-in gain-out delay decay speed\fR [\fB\-s\fR\^|\^\fB\-t\fR]
2527 Add a phasing effect to the audio.
2528 See [3] for a detailed description of phasing.
2530 delay/decay/speed gives the delay in milliseconds
2531 and the decay (relative to gain-in) with a modulation
2533 The modulation is either sinusoidal (\fB\-s\fR) \*mpreferable for multiple
2534 instruments, or triangular
2535 (\fB\-t\fR) \*mgives single instruments a sharper phasing effect.
2536 The decay should be less than 0\*d5 to avoid
2537 feedback, and usually no less than 0\*d1. Gain-out is the volume of the output.
2541 play snare.flac phaser 0.8 0.74 3 0.4 0.5 \-t
2545 play snare.flac phaser 0.9 0.85 4 0.23 1.3 \-s
2549 play snare.flac phaser 0.89 0.85 1 0.24 2 \-t
2553 play snare.flac phaser 0.6 0.66 3 0.6 2 \-t
2556 \fBpitch \fR[\fB\-q\fR] \fIshift\fR [\fIsegment\fR [\fIsearch\fR [\fIoverlap\fR]]]
2557 Change the audio pitch (but not tempo).
2560 gives the pitch shift as positive or negative `cents' (i.e. 100ths of a
2563 effect for a description of the other parameters.
2565 See also the \fBbend\fR, \fBspeed\fR,
2570 \fBrate\fR [\fB\-q\fR\^|\^\fB\-l\fR\^|\^\fB\-m\fR\^|\^\fB\-h\fR\^|\^\fB\-v\fR] [override-options] \fIRATE\fR[\fBk\fR]
2571 Change the audio sampling rate (i.e. resample the audio) to any given
2573 (even non-integer if this is supported by the output file format)
2574 using a quality level defined as follows:
2578 cI cI2w9 cI2w6 cIw6 lIw17
2595 playback on ancient hardware
2599 playback on old hardware
2601 \-m medium 95% 100 T{
2607 16-bit mastering (use with dither)
2612 T} 95% 175 24-bit mastering
2618 is the percentage of the audio frequency band that is preserved and
2620 is the level of noise rejection. Increasing levels of resampling
2621 quality come at the expense of increasing amounts of time to process the
2622 audio. If no quality option is given, the quality level used is `high'
2623 (but see `Playing & Recording Audio' above regarding playback).
2625 The `quick' algorithm uses cubic interpolation; all others use
2626 band-limited interpolation. By default, all algorithms have
2627 a `linear' phase response; for `medium', `high' and
2628 `very high', the phase response is configurable (see below).
2632 effect is invoked automatically if SoX's \fB\-r\fR option specifies a
2633 rate that is different to that of the input file(s). Alternatively, if
2634 this effect is given explicitly, then SoX's
2636 option need not be given. For example, the following two commands are
2640 sox input.wav \-r 48k output.wav bass \-b 24
2641 sox input.wav output.wav bass \-b 24 rate 48k
2643 though the second command is more flexible as it allows
2645 options to be given, and allows the effects to be ordered arbitrarily.
2653 Warning: technically detailed discussion follows.
2655 The simple quality selection described above provides settings that
2656 satisfy the needs of the vast majority of resampling tasks.
2657 Occasionally, however, it may be desirable to fine-tune the resampler's
2658 filter response; this can be achieved using
2659 .IR override\ options ,
2660 as detailed in the following table:
2665 \-M/\-I/\-L Phase response = minimum/intermediate/linear
2666 \-s Steep filter (band-width = 99%)
2667 \-a Allow aliasing/imaging above the pass-band
2668 \-b\ 74\-99\*d7 Any band-width %
2671 Any phase response (0 = minimum, 25 = intermediate, 50 = linear, 100 = maximum)
2676 N.B. Override options cannot be used with the `quick' or `low'
2679 All resamplers use filters that can sometimes create `echo' (a.k.a.
2680 `ringing') artefacts with transient signals such as those that occur
2681 with `finger snaps' or other highly percussive sounds. Such artefacts are
2682 much more noticeable to the human ear if they occur before the transient
2683 (`pre-echo') than if they occur after it (`post-echo'). Note that
2684 frequency of any such artefacts is related to the smaller of the
2685 original and new sampling rates but that if this is at least 44\*d1kHz,
2686 then the artefacts will lie outside the range of human hearing.
2688 A phase response setting may be used to control the distribution of any
2689 transient echo between
2690 `pre' and `post': with minimum phase, there is no pre-echo but the
2691 longest post-echo; with linear phase, pre and post echo are in equal
2692 amounts (in signal terms, but not audibility terms); the intermediate
2693 phase setting attempts to find the best compromise by selecting a small
2694 length (and level) of pre-echo and a medium lengthed post-echo.
2696 Minimum, intermediate, or linear phase response is selected using the
2701 option; a custom phase response can be created with the
2703 option. Note that phase responses between `linear' and `maximum'
2704 (greater than 50) are rarely useful.
2706 A resampler's band-width setting determines how much of the frequency
2707 content of the original signal (w.r.t. the original sample rate when
2708 up-sampling, or the new sample rate when down-sampling) is preserved
2709 during conversion. The term `pass-band' is used to refer to all frequencies
2710 up to the band-width point (e.g. for 44\*d1kHz sampling rate, and a
2711 resampling band-width of 95%, the pass-band represents frequencies from
2712 0Hz (D.C.) to circa 21kHz). Increasing the resampler's band-width
2713 results in a slower conversion and can increase transient echo
2714 artefacts (and vice versa).
2718 `steep filter' option changes resampling band-width from the default 95%
2719 (based on the 3dB point), to 99%. The
2721 option allows the band-width to be set to any value in the range
2722 74\-99\*d7 %, but note that band-width values greater than 99% are not
2723 recommended for normal use as they can cause excessive transient echo.
2727 option is given, then aliasing/imaging above the pass-band is allowed. For
2728 example, with 44\*d1kHz sampling rate, and a
2729 resampling band-width of 95%, this means that frequency content above
2730 21kHz can be distorted; however, since this is above the pass-band (i.e.
2731 above the highest frequency of interest/audibility), this may not be a
2732 problem. The benefits of allowing aliasing/imaging are reduced processing time,
2733 and reduced (by almost half) transient echo artefacts.
2734 Note that if this option is given, then
2735 the minimum band-width allowable with
2741 sox input.wav \-b 16 output.wav rate \-s \-a 44100 dither \-s
2743 default (high) quality resampling; overrides: steep filter, allow
2744 aliasing; to 44\*d1kHz sample rate; noise-shaped dither to 16-bit WAV
2747 sox input.wav \-b 24 output.aiff rate \-v \-I \-b 90 48k
2749 very high quality resampling; overrides: intermediate phase, band-width 90%;
2750 to 48k sample rate; store output to 24-bit AIFF file.
2764 effect at their core.
2766 \fBremix\fR [\fB\-a\fR\^|\^\fB\-m\fR\^|\^\fB\-p\fR] <\fIout-spec\fR>
2767 \fIout-spec\fR = \fIin-spec\fR{\fB,\fIin-spec\fR} | \fB0\fR
2769 \fIin-spec\fR = [\fIin-chan\fR]\^[\fB\-\fR[\fIin-chan2\fR]]\^[\fIvol-spec\fR]
2771 \fIvol-spec\fR = \fBp\fR\^|\^\fBi\fR\^|\^\fBv\^\fR[\fIvolume\fR]
2774 Select and mix input audio channels into output audio channels. Each output
2775 channel is specified, in turn, by a given \fIout-spec\fR: a list of
2776 contributing input channels and volume specifications.
2778 Note that this effect operates on the audio
2780 within the SoX effects processing chain; it should not be confused with the
2782 global option (where multiple
2784 are mix-combined before entering the effects chain).
2788 contains comma-separated input channel-numbers and hyphen-delimited
2789 channel-number ranges; alternatively,
2791 may be given to create a silent output channel. For example,
2793 sox input.wav output.wav remix 6 7 8 0
2795 creates an output file with four channels, where channels 1, 2, and 3 are
2796 copies of channels 6, 7, and 8 in the input file, and channel 4 is silent.
2799 sox input.wav output.wav remix 1\-3,7 3
2801 creates a (somewhat bizarre) stereo output file where the left channel
2802 is a mix-down of input channels 1, 2, 3, and 7, and the right channel is
2803 a copy of input channel 3.
2805 Where a range of channels is specified, the channel numbers to the left and
2806 right of the hyphen are optional and default to 1 and to the number of input
2807 channels respectively. Thus
2809 sox input.wav output.wav remix \-
2811 performs a mix-down of all input channels to mono.
2813 By default, where an output channel is mixed from multiple (n) input
2814 channels, each input channel will be scaled by a factor of \(S1/\s-2n\s+2.
2815 Custom mixing volumes can be set by following a given input channel or range
2816 of input channels with a \fIvol-spec\fR (volume specification).
2817 This is one of the letters \fBp\fR, \fBi\fR, or \fBv\fR,
2818 followed by a volume number, the meaning of which depends on the given
2819 letter and is defined as follows:
2824 Letter Volume number Notes
2825 p power adjust in dB 0 = no change
2826 i power adjust in dB T{
2828 As `p', but invert the audio
2830 v voltage multiplier T{
2832 1 = no change, 0\*d5 \(~= 6dB attenuation, 2 \(~= 6dB gain, \-1 = invert
2839 includes at least one
2841 then, by default, \(S1/\s-2n\s+2 scaling is not applied to any other channels in the
2842 same out-spec (though may be in other out-specs).
2844 option however, can be given to retain the automatic scaling in this
2847 sox input.wav output.wav remix 1,2 3,4v0.8
2849 results in channel level multipliers of 0\*d5,0\*d5 1,0\*d8, whereas
2851 sox input.wav output.wav remix \-a 1,2 3,4v0.8
2853 results in channel level multipliers of 0\*d5,0\*d5 0\*d5,0\*d8.
2855 The \-m (manual) option disables all automatic volume adjustments, so
2857 sox input.wav output.wav remix \-m 1,2 3,4v0.8
2859 results in channel level multipliers of 1,1 1,0\*d8.
2861 The volume number is optional and omitting it corresponds to no volume
2862 change; however, the only case in which this is useful is in conjunction
2869 sox input.wav output.wav remix 1,2i
2871 is a mono equivalent of the
2875 If the \fB\-p\fR option is given, then any automatic \(S1/\s-2n\s+2 scaling
2876 is replaced by \(S1/\s-2\(srn\s+2 (`power') scaling; this gives a louder mix
2877 but one that might occasionally clip.
2887 effect is to split an audio file into a set of files, each containing
2888 one of the constituent channels (in order to perform subsequent
2889 processing on individual audio channels). Where more than a few
2890 channels are involved, a script such as the following (Bourne shell
2894 chans=\`soxi \-c "$1"\`
2895 while [ $chans \-ge 1 ]; do
2896 chans0=\`printf %02i $chans\` # 2 digits hence up to 99 chans
2897 out=\`echo "$1"|sed "s/\\(.*\\)\\.\\(.*\\)/\\1\-$chans0.\\2/"\`
2898 sox "$1" "$out" remix $chans
2899 chans=\`expr $chans \- 1\`
2904 containing six audio channels were given, the script would produce six
2907 \fIinput-02.wav\fR, ...,
2910 See also the \fBswap\fR effect.
2912 \fBrepeat\fR [\fIcount\fR (1)]
2913 Repeat the entire audio \fIcount\fR times, or once if \fIcount\fR is not given.
2914 Requires temporary file space to store the audio to be repeated.
2915 Note that repeating once yields two copies: the original audio and the
2918 \fBreverb\fR [\fB\-w\fR|\fB\-\-wet-only\fR] [\fIreverberance\fR (50%) [\fIHF-damping\fR (50%)
2919 [\fIroom-scale\fR (100%) [\fIstereo-depth\fR (100%)
2921 [\fIpre-delay\fR (0ms) [\fIwet-gain\fR (0dB)]]]]]]
2923 Add reverberation to the audio using the `freeverb' algorithm. A
2924 reverberation effect is sometimes desirable for concert halls that are too
2925 small or contain so many people that the hall's natural reverberance is
2926 diminished. Applying a small amount of stereo reverb to a (dry) mono signal
2927 will usually make it sound more natural. See [3] for a detailed description
2930 Note that this effect
2931 increases both the volume and the length of the audio, so to prevent clipping
2932 in these domains, a typical invocation might be:
2934 play dry.wav gain \-3 pad 0 3 reverb
2938 option can be given to select only the `wet' signal, thus allowing it to be
2939 processed further, independently of the `dry' signal. E.g.
2941 play \-m voice.wav "|sox voice.wav \-p reverse reverb \-w reverse"
2943 for a reverse reverb effect.
2946 Reverse the audio completely.
2947 Requires temporary file space to store the audio to be reversed.
2950 Apply RIAA vinyl playback equalisation.
2951 The sampling rate must be one of: 44\*d1, 48, 88\*d2, 96 kHz.
2953 This effect supports the \fB\-\-plot\fR global option.
2955 \fBsilence \fR[\fB\-l\fR] \fIabove-periods\fR [\fIduration threshold\fR[\fBd\fR\^|\^\fB%\fR]
2956 [\fIbelow-periods duration threshold\fR[\fBd\fR\^|\^\fB%\fR]]
2958 Removes silence from the beginning, middle, or end of the audio.
2959 `Silence' is determined by a specified threshold.
2961 The \fIabove-periods\fR value is used to indicate if audio should be
2962 trimmed at the beginning of the audio. A value of zero indicates no
2963 silence should be trimmed from the beginning. When specifying an
2964 non-zero \fIabove-periods\fR, it trims audio up until it finds
2965 non-silence. Normally, when trimming silence from beginning of audio
2966 the \fIabove-periods\fR will be 1 but it can be increased to higher
2967 values to trim all audio up to a specific count of non-silence
2968 periods. For example, if you had an audio file with two songs that
2969 each contained 2 seconds of silence before the song, you could specify
2970 an \fIabove-period\fR of 2 to strip out both silence periods and the
2973 When \fIabove-periods\fR is non-zero, you must also specify a
2974 \fIduration\fR and \fIthreshold\fR. \fIDuration\fR indications the
2975 amount of time that non-silence must be detected before it stops
2976 trimming audio. By increasing the duration, burst of noise can be
2977 treated as silence and trimmed off.
2979 \fIThreshold\fR is used to indicate what sample value you should treat as
2980 silence. For digital audio, a value of 0 may be fine but for audio
2981 recorded from analog, you may wish to increase the value to account
2982 for background noise.
2984 When optionally trimming silence from the end of the audio, you specify
2985 a \fIbelow-periods\fR count. In this case, \fIbelow-period\fR means
2986 to remove all audio after silence is detected.
2987 Normally, this will be a value 1 of but it can
2988 be increased to skip over periods of silence that are wanted. For example,
2989 if you have a song with 2 seconds of silence in the middle and 2 second
2990 at the end, you could set below-period to a value of 2 to skip over the
2991 silence in the middle of the audio.
2993 For \fIbelow-periods\fR, \fIduration\fR specifies a period of silence
2994 that must exist before audio is not copied any more. By specifying
2995 a higher duration, silence that is wanted can be left in the audio.
2996 For example, if you have a song with an expected 1 second of silence
2997 in the middle and 2 seconds of silence at the end, a duration of 2
2998 seconds could be used to skip over the middle silence.
3000 Unfortunately, you must know the length of the silence at the
3001 end of your audio file to trim off silence reliably. A work around is
3002 to use the \fBsilence\fR effect in combination with the \fBreverse\fR effect.
3003 By first reversing the audio, you can use the \fIabove-periods\fR
3004 to reliably trim all audio from what looks like the front of the file.
3005 Then reverse the file again to get back to normal.
3007 To remove silence from the middle of a file, specify a
3008 \fIbelow-periods\fR that is negative. This value is then
3009 treated as a positive value and is also used to indicate the
3010 effect should restart processing as specified by the
3011 \fIabove-periods\fR, making it suitable for removing periods of
3012 silence in the middle of the audio.
3016 indicates that \fIbelow-periods\fR \fIduration\fR length of audio
3017 should be left intact at the beginning of each period of silence.
3018 For example, if you want to remove long pauses between words
3019 but do not want to remove the pauses completely.
3021 The \fIperiod\fR counts are in units of samples. \fIDuration\fR counts
3022 may be in the format of hh:mm:ss.frac, or the exact count of samples.
3023 \fIThreshold\fR numbers may be suffixed with
3025 to indicate the value is in decibels, or
3027 to indicate a percentage of maximum value of the sample value
3028 (\fB0%\fR specifies pure digital silence).
3030 The following example shows how this effect can be used to start a recording
3031 that does not contain the delay at the start which usually occurs between
3032 `pressing the record button' and the start of the performance:
3034 rec \fIparameters filename other-effects\fR silence 1 5 2%
3038 \fBsinc\fR [\fB\-a\fI att\fR\^|\^\fB\-b\fI beta\fR] [\fB\-p\fI phase\fR\^|\^\fB\-M\fR\^|\^\fB\-I\fR\^|\^\fB\-L\fR] \:[\fB\-t\fI tbw\fR\^|\^\fB\-n\fI taps\fR] [\fIfreqHP\fR]\:[\fB\-\fIfreqLP\fR [\fB\-t\fR tbw\^|\^\fB\-n\fR taps]]
3040 Apply a sinc kaiser-windowed low-pass, high-pass, band-pass, or band-reject filter
3042 The \fIfreqHP\fR and \fIfreqLP\fR parameters give the frequencies of the
3043 6dB points of a high-pass and low-pass filter that may be invoked
3044 individually, or together. If both are
3045 given, then \fIfreqHP\fR less than \fIfreqLP\fR creates a band-pass filter,
3046 \fIfreqHP\fR greater than \fIfreqLP\fR creates a band-reject filter.
3047 For example, the invocations
3054 create a high-pass, low-pass, band-pass, and band-reject filter
3057 The default stop-band attenuation of 120dB can be overridden with
3058 \fB\-a\fR; alternatively, the kaiser-window `beta' parameter can be
3059 given directly with \fB\-b\fR.
3061 The default transition band-width of 5% of the total band can be
3062 overridden with \fB\-t\fR (and \fItbw\fR in Hertz); alternatively, the
3063 number of filter taps can be given directly with \fB\-n\fR.
3065 If both \fIfreqHP\fR and \fIfreqLP\fR are given, then a \fB\-t\fR or
3066 \fB\-n\fR option given to the left of the frequencies applies to both
3067 frequencies; one of these options given to the right of the frequencies
3068 applies only to \fIfreqLP\fR.
3076 options control the filter's phase response; see the \fBrate\fR effect
3079 This effect supports the \fB\-\-plot\fR global option.
3081 \fBspectrogram \fR[\fIoptions\fR]
3082 Create a spectrogram of the audio; the audio is passed unmodified
3083 through the SoX processing chain. This effect is optional\*mtype
3084 \fBsox \-\-help\fR and check the list of supported effects to see if
3085 it has been included.
3087 The spectrogram is rendered in a Portable Network Graphic (PNG) file,
3088 and shows time in the X-axis, frequency in the Y-axis, and audio
3089 signal magnitude in the Z-axis. Z-axis values are represented by the
3090 colour (or optionally the intensity) of the pixels in the X-Y plane.
3091 If the audio signal contains multiple channels then these are shown
3092 from top to bottom starting from channel 1 (which is the left channel
3095 For example, if `my.wav' is a stereo file, then with
3097 sox my.wav \-n spectrogram
3099 a spectrogram of the entire file will be created in the file
3100 `spectrogram.png'. More often though, analysis of a smaller portion
3101 of the audio is required; e.g. with
3103 sox my.wav \-n remix 2 trim 20 30 spectrogram
3105 the spectrogram shows information only from the second (right)
3106 channel, and of thirty seconds of audio starting from twenty seconds
3107 in. To analyse a small portion of the frequency domain, the
3109 effect may be used, e.g.
3111 sox my.wav \-n rate 6k spectrogram
3113 allows detailed analysis of frequencies up to 3kHz (half the sampling
3114 rate) i.e. where the human auditory system is most sensitive.
3117 sox my.wav \-n trim 0 10 spectrogram \-x 600 \-y 200 \-z 100
3119 the given options control the size of the spectrogram's X, Y & Z axes
3120 (in this case, the spectrogram area of the produced image will be 600
3121 by 200 pixels in size and the Z-axis range will be 100 dB). Note that
3122 the produced image includes axes legends etc. and so will be a little
3123 larger than the specified spectrogram size. In this example:
3125 sox \-n \-n synth 6 tri 10k:14k spectrogram \-z 100 \-w kaiser
3127 an analysis `window' with high dynamic range is selected to best
3128 display the spectrogram of a swept triangular wave. For a smilar
3129 example, append the following to the `chime' command in the
3134 rate 2k spectrogram \-X 200 \-Z \-10 \-w kaiser
3136 Options are also avaliable to control the appearance (colour-set,
3137 brightness, contrast, etc.) and filename of the spectrogram; e.g. with
3139 sox my.wav \-n spectrogram \-m \-l \-o print.png
3141 a spectrogram is created suitable for printing on a `black and white'
3146 .IP \fB\-x\ \fInum\fR
3147 Change the (maximum) width (X-axis) of the spectrogram from its default
3148 value of 800 pixels to a given number between 100 and 200000.
3149 See also \fB\-X\fR and \fB\-d\fR.
3150 .IP \fB\-X\ \fInum\fR
3151 X-axis pixels/second; the default is auto-calculated to fit the given
3152 or known audio duration to the X-axis size, or 100 otherwise. If
3153 given in conjunction with \fB\-d\fR, this option affects the width of
3154 the spectrogram; otherwise, it affects the duration of the
3157 can be from 1 (low time resolution) to 5000 (high time resolution)
3158 and need not be an integer. SoX
3159 may make a slight adjustment to the given number for processing
3160 quantisation reasons; if so, SoX will report the actual number used
3161 (viewable when the SoX global option
3164 See also \fB\-x\fR and \fB\-d\fR.
3165 .IP \fB\-y\ \fInum\fR
3166 Sets the Y-axis size in pixels (per channel); this is the number of
3167 frequency `bins' used in the Fourier analysis that produces the
3168 spectrogram. N.B. it can be slow to produce the spectrogram if this
3169 number is not one more than a power of two (e.g. 129). By default the
3170 Y-axis size is chosen automatically (depending on the number of
3173 for alternative way of setting spectrogram height.
3174 .IP \fB\-Y\ \fInum\fR
3175 Sets the target total height of the spectrogram(s). The default value
3176 is 550 pixels. Using this option (and by default), SoX will choose a
3177 height for individual spectrogram channels that is one more than a
3178 power of two, so the actual total height may fall short of the given
3179 number. However, there is also a minimum height per channel so if
3180 there are many channels, the number may be exceeded.
3183 for alternative way of setting spectrogram height.
3184 .IP \fB\-z\ \fInum\fR
3185 Z-axis (colour) range in dB, default 120. This sets the dynamic-range
3186 of the spectrogram to be \-\fInum\fR\ dBFS to 0\ dBFS.
3188 may range from 20 to 180. Decreasing dynamic-range effectively
3189 increases the `contrast' of the spectrogram display, and vice versa.
3190 .IP \fB\-Z\ \fInum\fR
3191 Sets the upper limit of the Z-axis in dBFS.
3194 effectively increases the `brightness' of the spectrogram display,
3196 .IP \fB\-q\ \fInum\fR
3197 Sets the Z-axis quantisation, i.e. the number of different colours (or
3198 intensities) in which to render Z-axis
3199 values. A small number (e.g. 4) will give a `poster'-like effect making
3200 it easier to discern magnitude bands of similar level. Small numbers
3202 result in small PNG files. The number given specifies the number of
3203 colours to use inside the Z-axis range; two colours are reserved to
3204 represent out-of-range values.
3205 .IP \fB\-w\ \fIname\fR
3206 Window: Hann (default), Hamming, Bartlett, Rectangular or Kaiser. The
3207 spectrogram is produced using the Discrete Fourier Transform (DFT)
3208 algorithm. A significant parameter to this algorithm is the choice of
3209 `window function'. By default, SoX uses the Hann window which has good
3210 all-round frequency-resolution and dynamic-range properties. For better
3211 frequency resolution (but lower dynamic-range), select a Hamming window;
3212 for higher dynamic-range (but poorer frequency-resolution), select a
3213 Kaiser window. Bartlett and Rectangular windows are also available.
3214 .IP \fB\-W\ \fInum\fR
3215 Window adjustment parameter. This can be used to make small
3216 adjustments to the Kaiser window shape. A positive number (up to
3217 ten) increases its dynamic range, a negative number decreases it.
3219 Allow slack overlapping of DFT windows.
3220 This can, in some cases, increase image sharpness and give greater adherence
3223 value, but at the expense of a little spectral loss.
3225 Creates a monochrome spectrogram (the default is colour).
3227 Selects a high-colour palette\*mless visually pleasing than the default
3228 colour palette, but it may make it easier to differentiate different levels.
3229 If this option is used in conjunction with
3231 the result will be a hybrid monochrome/colour palette.
3232 .IP \fB\-p\ \fInum\fR
3233 Permute the colours in a colour or hybrid palette.
3236 parameter, from 1 (the default) to 6, selects the permutation.
3238 Creates a `printer friendly' spectrogram with a light background (the
3239 default has a dark background).
3241 Suppress the display of the axis lines. This is sometimes useful in
3242 helping to discern artefacts at the spectrogram edges.
3244 Raw spectrogram: suppress the display of axes and legends.
3246 Selects an alternative, fixed colour-set. This is provided only for
3247 compatibility with spectrograms produced by another package. It should
3248 not normally be used as it has some problems, not least, a lack of
3249 differentiation at the bottom end which results in masking of low-level
3251 .IP \fB\-t\ \fItext\fR
3252 Set the image title\*mtext to display above the spectrogram.
3253 .IP \fB\-c\ \fItext\fR
3254 Set (or clear) the image comment\*mtext to display below and to the
3255 left of the spectrogram.
3256 .IP \fB\-o\ \fItext\fR
3257 Name of the spectrogram output PNG file, default `spectrogram.png'.
3261 .I Advanced Options:
3263 In order to process a smaller section of audio without affecting other
3264 effects or the output signal (unlike when the
3266 effect is used), the following options may be used.
3268 .IP \fB\-d\ \fIduration\fR
3269 This option sets the X-axis resolution such that audio with the given
3271 ([[HH:]MM:]SS) fits the selected (or default) X-axis width. For
3274 sox input.mp3 output.wav \-n spectrogram \-d 1:00 stats
3276 creates a spectrogram showing the first minute of the audio, whilst
3280 effect is applied to the entire audio signal.
3284 for an alternative way of setting the X-axis resolution.
3285 .IP \fB\-S\ \fItime\fR
3286 Start the spectrogram at the given point in the audio stream. For
3289 sox input.aiff output.wav spectrogram \-S 1:00
3291 creates a spectrogram showing all but the first minute of the audio
3292 (the output file however, receives the entire audio stream).
3296 For the ability to perform off-line processing of spectral data, see the
3300 \fBspeed \fIfactor\fR[\fBc\fR]
3301 Adjust the audio speed (pitch and tempo together). \fIfactor\fR
3302 is either the ratio of the new speed to the old speed: greater
3303 than 1 speeds up, less than 1 slows down, or, if appended with the
3305 `c', the number of cents (i.e. 100ths of a semitone) by
3306 which the pitch (and tempo) should be adjusted: greater than 0
3307 increases, less than 0 decreases.
3309 Technically, the speed effect only changes the sample rate information,
3310 leaving the samples themselves untouched. The \fBrate\fR effect is invoked
3311 automatically to resample to the output sample rate, using its default
3312 quality/speed. For higher quality or higher speed
3313 resampling, in addition to the \fBspeed\fR effect, specify
3314 the \fBrate\fR effect with the desired quality option.
3316 See also the \fBbend\fR, \fBpitch\fR,
3321 \fBsplice \fR [\fB\-h\fR\^|\^\fB\-t\fR\^|\^\fB\-q\fR] { \fIposition\fR[\fB,\fIexcess\fR[\fB,\fIleeway\fR]] }
3322 Splice together audio sections. This effect provides two things over
3323 simple audio concatenation: a (usually short) cross-fade is applied at
3324 the join, and a wave similarity comparison is made to help determine the
3325 best place at which to make the join.
3332 may be given to select the fade envelope as half-cosine wave (the default),
3333 triangular (a.k.a. linear), or quarter-cosine wave respectively.
3338 Type Audio Fade level Transitions
3339 t correlated constant gain abrupt
3340 h correlated constant gain smooth
3341 q uncorrelated constant power smooth
3345 To perform a splice, first use the
3347 effect to select the audio sections to be joined together. As when
3348 performing a tape splice, the end of the section to be spliced onto
3349 should be trimmed with a small
3351 (default 0\*d005 seconds) of audio after the ideal joining point. The
3352 beginning of the audio section to splice on should be trimmed with the
3355 (before the ideal joining point), plus an additional
3357 (default 0\*d005 seconds). SoX should then be invoked with the two
3358 audio sections as input files and the
3360 effect given with the position at which to perform the splice\*mthis is
3361 length of the first audio section (including the excess).
3363 The following diagram uses the tape analogy to illustrate the splice
3364 operation. The effect simulates the diagonal cuts and joins the two pieces:
3369 _________ : : _________________
3376 _______________\\: : : \\_____`____
3382 where * indicates the joining points.
3384 For example, a long song begins with two verses which start (as
3385 determined e.g. by using the
3389 (\fIstart\fR) effect) at times 0:30\*d125 and 1:03\*d432.
3390 The following commands cut out the first verse:
3392 sox too-long.wav part1.wav trim 0 30.130
3394 (5 ms excess, after the first verse starts)
3396 sox too-long.wav part2.wav trim 1:03.422
3398 (5 ms excess plus 5 ms leeway, before the second verse starts)
3400 sox part1.wav part2.wav just-right.wav splice 30.130
3402 For another example, the SoX command
3404 play "|sox \-n \-p synth 1 sin %1" "|sox \-n \-p synth 1 sin %3"
3406 generates and plays two notes, but there is a nasty click at the
3407 transition; the click can be removed by splicing instead of
3408 concatenating the audio, i.e. by appending \fBsplice 1\fR to the
3409 command. (Clicks at the beginning and end of the audio can be removed by
3410 \fIpreceding\fR the splice effect with \fBfade q .01 2 .01\fR).
3412 Provided your arithmetic is good enough, multiple splices can be
3413 performed with a single
3415 invocation. For example:
3418 # Audio Copy and Paste Over
3419 # acpo infile copy-start copy-stop paste-over-start outfile
3420 # All times measured in samples.
3421 rate=\`soxi \-r "$1"\`
3422 e=\`expr $rate '*' 5 / 1000\` # Using default excess
3424 sox "$1" piece.wav trim \`expr $2 \- $e \- $l\`s \\
3425 \`expr $3 \- $2 + $e + $l + $e\`s
3426 sox "$1" part1.wav trim 0 \`expr $4 + $e\`s
3427 sox "$1" part2.wav trim \`expr $4 + $3 \- $2 \- $e \- $l\`s
3428 sox part1.wav piece.wav part2.wav "$5" splice \\
3429 \`expr $4 + $e\`s \\
3430 \`expr $4 + $e + $3 \- $2 + $e + $l + $e\`s
3432 In the above Bourne shell script,
3433 two splices are used to `copy and paste' audio.
3441 It is also possible to use this effect to perform general cross-fades,
3442 e.g. to join two songs. In this case,
3444 would typically be an number of seconds, the
3446 option would typically be given (to select an `equal power' cross-fade), and
3448 should be zero (which is the default if
3450 is given). For example, if f1.wav and f2.wav are audio files
3451 to be cross-faded, then
3453 sox f1.wav f2.wav out.wav splice \-q $(soxi \-D f1.wav),3
3455 cross-fades the files where the point of equal loudness is 3 seconds
3456 before the end of f1.wav, i.e. the total length of the cross-fade is
3457 2 \(mu 3 = 6 seconds (Note: the $(...) notation is POSIX shell).
3459 \fBstat\fR [\fB\-s \fIscale\fR] [\fB\-rms\fR] [\fB\-freq\fR] [\fB\-v\fR] [\fB\-d\fR]
3460 Display time and frequency domain statistical information about the audio.
3461 Audio is passed unmodified through the SoX processing chain.
3463 The information is output to the `standard error' (stderr) stream and is
3466 is the duration of the audio in samples,
3468 is the number of audio channels,
3470 is the audio sample rate, and
3472 represents the PCM value (in the range \-1 to +1 by default) of each successive
3473 sample in the audio,
3478 Samples read \fIn\fR\^\(mu\^\fIc\fR \
3479 Length (seconds) \fIn\fR\^\(di\^\fIr\fR
3480 Scaled by \ See \-s below.
3481 Maximum amplitude max(\fIx\s-2\dk\u\s0\fR) T{
3482 The maximum sample value in the audio; usually this will be a positive number.
3484 Minimum amplitude min(\fIx\s-2\dk\u\s0\fR) T{
3485 The minimum sample value in the audio; usually this will be a negative number.
3487 Midline amplitude \(12\^min(\fIx\s-2\dk\u\s0\fR)\^+\^\(12\^max(\fIx\s-2\dk\u\s0\fR)
3488 Mean norm \(S1/\s-2n\s+2\^\(*S\^\^\(br\^\fIx\s-2\dk\u\s0\fR\^\(br\^ T{
3489 The average of the absolute value of each sample in the audio.
3491 Mean amplitude \(S1/\s-2n\s+2\^\(*S\^\fIx\s-2\dk\u\s0\fR T{
3492 The average of each sample in the audio. If this figure is non-zero, then it indicates the
3493 presence of a D.C. offset (which could be removed using the
3497 RMS amplitude \(sr(\(S1/\s-2n\s+2\^\(*S\^\fIx\s-2\dk\u\s0\fR\(S2) T{
3498 The level of a D.C. signal that would have the same power
3499 as the audio's average power.
3501 Maximum delta max(\^\(br\^\fIx\s-2\dk\u\s0\fR\^\-\^\fIx\s-2\dk\-1\u\s0\fR\^\(br\^)
3502 Minimum delta min(\^\(br\^\fIx\s-2\dk\u\s0\fR\^\-\^\fIx\s-2\dk\-1\u\s0\fR\^\(br\^)
3503 Mean delta \(S1/\s-2n\-1\s+2\^\(*S\^\^\(br\^\fIx\s-2\dk\u\s0\fR\^\-\^\fIx\s-2\dk\-1\u\s0\fR\^\(br\^
3504 RMS delta \(sr(\(S1/\s-2n\-1\s+2\^\(*S\^(\fIx\s-2\dk\u\s0\fR\^\-\^\fIx\s-2\dk\-1\u\s0\fR)\(S2)
3505 Rough frequency \ In Hz.
3506 Volume Adjustment \ T{
3507 The parameter to the
3509 effect which would make the audio as loud as possible without clipping.
3510 Note: See the discussion on
3512 above for reasons why it is rarely a good idea actually to do this.
3517 Note that the delta measurements are not applicable for multi-channel audio.
3521 option can be used to scale the input data by a given factor.
3522 The default value of
3524 is 2147483647 (i.e. the maximum value of a 32-bit signed integer).
3526 always work with signed long PCM data and so the value should relate to this
3531 option will convert all output average values to `root mean square'
3536 option displays only the `Volume Adjustment' value.
3540 option calculates the input's power spectrum (4096 point DFT) instead of the
3541 statistics listed above. This should only be used with a single channel
3547 displays a hex dump of the 32-bit signed PCM data
3548 audio in SoX's internal buffer.
3549 This is mainly used to help track down endian problems that
3550 sometimes occur in cross-platform versions of SoX.
3556 \fBstats\fR [\fB\-b \fIbits\fR\^|\^\fB\-x \fIbits\fR\^|\^\fB\-s \fIscale\fR] [\fB\-w \fIwindow-time\fR]
3557 Display time domain statistical information about the audio channels;
3558 audio is passed unmodified through the SoX processing chain.
3559 Statistics are calculated and displayed for each audio channel and,
3560 where applicable, an overall figure is also given.
3562 For example, for a typical well-mastered stereo music file:
3568 DC offset 0.000803 \-0.000391 0.000803
3569 Min level \-0.750977 \-0.750977 \-0.653412
3570 Max level 0.708801 0.708801 0.653534
3571 Pk lev dB \-2.49 \-2.49 \-3.69
3572 RMS lev dB \-19.41 \-19.13 \-19.71
3573 RMS Pk dB \-13.82 \-13.82 \-14.38
3574 RMS Tr dB \-85.25 \-85.25 \-82.66
3575 Crest factor \- 6.79 6.32
3576 Flat factor 0.00 0.00 0.00
3578 Bit-depth 16/16 16/16 16/16
3591 are shown, by default, in the range \(+-1.
3594 (bits) options is given, then these three measurements will be scaled to a signed integer
3595 with the given number of bits; for example, for 16 bits, the scale would be \-32768 to +32767.
3598 option behaves the same way as
3600 except that the signed integer values are displayed in hexadecimal.
3603 option scales the three measurements by a given floating-point number.
3608 are standard peak and RMS level measured in dBFS.
3612 are peak and trough values for RMS level measured over a short window (default 50ms).
3615 is the standard ratio of peak to RMS level (note: not in dB).
3618 is a measure of the flatness (i.e. consecutive samples with the same value) of the signal at
3619 its peak levels (i.e. either
3624 is the number of occasions (not the number of samples) that the signal attained either
3631 figure is the standard definition of bit-depth i.e. bits less
3632 significant than the given number are fixed at zero. The left-hand
3633 figure is the number of most significant bits that are fixed at zero (or
3634 one for negative numbers) subtracted from the right-hand figure (the
3635 number subtracted is directly related to
3638 For multi-channel audio, an overall figure for each of the above
3639 measurements is given and derived from the channel figures as follows:
3658 is the duration in seconds of the audio, and
3660 is equal to the sample-rate multiplied by
3663 is the scaling applied to the first three measurements;
3664 specifically, it is the maximum value that could apply to
3667 is the length of the window used for the peak and trough RMS measurements.
3674 Swap stereo channels.
3677 for an effect that allows arbitrary channel selection and ordering
3680 \fBstretch \fIfactor\fR [\fIwindow fade shift fading\fR]
3681 Change the audio duration (but not its pitch).
3682 This effect is broadly equivalent to the
3684 effect with (\fIfactor\fR inverted and)
3686 set to zero, so in general, its results are comparatively poor;
3687 it is retained as it can sometimes out-perform
3693 of stretching: >1 lengthen, <1 shorten duration.
3695 size is in ms. Default is 20ms. The
3697 option, can be `lin'.
3699 ratio, in [0 1]. Default depends on stretch factor. 1
3700 to shorten, 0\*d8 to lengthen. The
3702 ratio, in [0 0\*d5]. The amount of a fade's default depends on
3711 \fBsynth\fR [\fB\-j \fIKEY\fR] [\fB\-n\fR] [\fIlen\fR [\fIoff\fR [\fIph\fR [\fIp1\fR [\fIp2\fR [\fIp3\fR]]]]]] {[\fItype\fR] [\fIcombine\fR] \:[[\fB%\fR]\fIfreq\fR[\fBk\fR][\fB:\fR\^|\^\fB+\fR\^|\^\fB/\fR\^|\^\fB\-\fR[\fB%\fR]\fIfreq2\fR[\fBk\fR]]] [\fIoff\fR [\fIph\fR [\fIp1\fR [\fIp2\fR [\fIp3\fR]]]]]}
3713 This effect can be used to generate fixed or swept frequency audio tones
3714 with various wave shapes, or to generate wide-band noise of various
3716 Multiple synth effects can be cascaded to produce more complex
3717 waveforms; at each stage it is possible to choose whether the generated
3718 waveform will be mixed with, or modulated onto
3719 the output from the previous stage.
3720 Audio for each channel in a multi-channel audio file can be synthesised
3723 Though this effect is used to generate audio, an input file must still
3724 be given, the characteristics of which will be used to set the
3725 synthesised audio length, the number of channels, and the sampling rate;
3726 however, since the input file's audio is not normally needed, a `null
3727 file' (with the special name \fB\-n\fR) is often given instead (and the
3728 length specified as a parameter to \fBsynth\fR or by another given
3729 effect that can has an associated length).
3731 For example, the following produces a 3 second, 48kHz,
3732 audio file containing a sine-wave swept from 300 to 3300\ Hz:
3734 sox \-n output.wav synth 3 sine 300\-3300
3736 and this produces an 8\ kHz version:
3738 sox \-r 8000 \-n output.wav synth 3 sine 300\-3300
3740 Multiple channels can be synthesised by specifying the set of
3741 parameters shown between braces multiple times;
3742 the following puts the swept tone in the left channel and adds `brown'
3745 sox \-n output.wav synth 3 sine 300\-3300 brownnoise
3747 The following example shows how two synth effects can be cascaded
3748 to create a more complex waveform:
3751 play \-n synth 0.5 sine 200\-500 synth 0.5 sine fmod 700\-100
3753 Frequencies can also be given in `scientific' note notation, or, by
3754 prefixing a `%' character, as a number of semitones relative to
3755 `middle A' (440\ Hz). For example, the following could be used to
3756 help tune a guitar's low `E' string:
3758 play \-n synth 4 pluck %\-29
3760 or with a (Bourne shell) loop, the whole guitar:
3763 for n in E2 A2 D3 G3 B3 E4; do
3764 play \-n synth 4 pluck $n repeat 2; done
3768 effect (above) and the reference to `SoX scripting examples' (below)
3774 This effect generates audio at maximum volume (0dBFS), which means that there
3775 is a high chance of clipping when using the audio subsequently, so
3776 in many cases, you will want to follow this effect with the \fBgain\fR
3777 effect to prevent this from happening. (See also
3780 Note that, by default, the
3782 effect incorporates the functionality of \fBgain \-h\fR (see the
3784 effect for details);
3787 option may be given to disable this behaviour.
3789 A detailed description of each
3793 \fIlen\fR is the length of audio to synthesise expressed as a time
3794 or as a number of samples;
3795 0=inputlength, default=0.
3797 The format for specifying lengths in time is hh:mm:ss.frac. The format
3798 for specifying sample counts is the number of samples with the letter
3801 \fItype\fR is one of sine, square, triangle, sawtooth, trapezium, exp,
3802 [white]noise, tpdfnoise pinknoise, brownnoise, pluck; default=sine.
3804 \fIcombine\fR is one of create, mix, amod (amplitude modulation), fmod
3805 (frequency modulation); default=create.
3807 \fIfreq\fR/\fIfreq2\fR are the frequencies at the beginning/end of
3808 synthesis in Hz or, if preceded with `%', semitones relative to A
3809 (440\ Hz); alternatively, `scientific' note notation (e.g. E2) may
3810 be used. The default frequency is 440Hz. By default, the tuning used
3811 with the note notations is `equal temperament'; the
3814 option selects `just intonation', where
3816 is an integer number of semitones relative to A (so for example, \-9
3817 or 3 selects the key of C), or a note in scientific notation.
3823 must also have been given and the generated tone will be swept between
3824 the given frequencies. The two given frequencies must be separated by
3825 one of the characters `:', `+', `/', or `\-'. This character is used to
3826 specify the sweep function as follows:
3829 Linear: the tone will change by a fixed number of hertz per second.
3831 Square: a second-order function is used to change the tone.
3833 Exponential: the tone will change by a fixed number of semitones per second.
3835 Exponential: as `/', but initial phase always zero, and stepped (less
3836 smooth) frequency changes.
3842 \fIoff\fR is the bias (DC-offset) of the signal in percent; default=0.
3844 \fIph\fR is the phase shift in percentage of 1 cycle; default=0. Not
3847 \fIp1\fR is the percentage of each cycle that is `on' (square), or
3848 `rising' (triangle, exp, trapezium); default=50 (square, triangle, exp),
3849 default=10 (trapezium), or sustain (pluck); default=40.
3851 \fIp2\fR (trapezium): the percentage through each cycle at which `falling'
3852 begins; default=50. exp: the amplitude in multiples of 2dB; default=50,
3853 or tone-1 (pluck); default=20.
3855 \fIp3\fR (trapezium): the percentage through each cycle at which `falling'
3856 ends; default=60, or tone-2 (pluck); default=90.
3858 \fBtempo \fR[\fB\-q\fR] [\fB\-m\fR\^|\^\fB\-s\fR\^|\^\fB\-l\fR] \fIfactor\fR [\fIsegment\fR [\fIsearch\fR [\fIoverlap\fR]]]
3859 Change the audio playback speed but not its pitch. This effect uses the
3860 WSOLA algorithm. The audio is chopped up into segments which are then
3861 shifted in the time domain and overlapped (cross-faded) at points where
3862 their waveforms are most similar as determined by measurement of `least
3865 By default, linear searches are used to find the best overlapping
3866 points. If the optional
3868 parameter is given, tree searches are used instead. This makes the effect
3869 work more quickly, but the result may not sound as good. However, if you
3870 must improve the processing speed, this generally reduces the sound quality
3871 less than reducing the search or overlap values.
3875 option is used to optimize default values of segment, search and
3876 overlap for music processing.
3880 option is used to optimize default values of segment, search and
3881 overlap for speech processing.
3885 option is used to optimize default values of segment, search and
3886 overlap for `linear' processing that tends to cause more
3887 noticeable distortion but may be useful when factor is close to 1.
3889 If \-m, \-s, or \-l is specified, the default value of segment will be
3890 calculated based on factor, while default search and overlap values are
3891 based on segment. Any values you provide still override these default
3895 gives the ratio of new tempo to the old tempo, so e.g. 1.1 speeds up the
3896 tempo by 10%, and 0.9 slows it down by 10%.
3900 parameter selects the algorithm's segment size in milliseconds. If no other
3901 flags are specified, the default value is 82 and is typically suited to
3902 making small changes to the tempo of music. For larger changes (e.g. a factor
3903 of 2), 41\ ms may give a better result. The \-m, \-s, and \-l flags will cause
3904 the segment default to be automatically adjusted based on factor.
3905 For example using \-s (for speech) with a tempo of 1.25 will calculate a
3906 default segment value of 32.
3910 parameter gives the audio length in milliseconds over which
3911 the algorithm will search for overlapping points. If no other
3912 flags are specified, the default value is 14.68. Larger values use
3913 more processing time and may or may not produce better results.
3914 A practical maximum is half the value of segment. Search
3915 can be reduced to cut processing time at the risk of degrading output
3916 quality. The \-m, \-s, and \-l flags will cause
3917 the search default to be automatically adjusted based on segment.
3921 parameter gives the segment overlap length in milliseconds.
3922 Default value is 12, but \-m, \-s, or \-l flags automatically
3923 adjust overlap based on segment size. Increasing overlap increases
3924 processing time and may increase quality. A practical maximum for overlap
3925 is the value of search, with overlap typically being (at least) a little
3926 smaller then search.
3930 for an effect that changes tempo and pitch together,
3932 and \fBbend\fR for effects that change pitch only, and
3934 for an effect that changes tempo using a different algorithm.
3936 \fBtreble \fIgain\fR [\fIfrequency\fR[\fBk\fR]\fR [\fIwidth\fR[\fBs\fR\^|\^\fBh\fR\^|\^\fBk\fR\^|\^\fBo\fR\^|\^\fBq\fR]]]
3937 Apply a treble tone-control effect.
3938 See the description of the \fBbass\fR effect for details.
3940 \fBtremolo \fIspeed\fR [\fIdepth\fR]
3941 Apply a tremolo (low frequency amplitude modulation) effect to the audio.
3942 The tremolo frequency in Hz is given by
3944 and the depth as a percentage by
3948 \fBtrim\fR {[\fB=\fR\^|\^\fB\-\fR]\fIposition\fR}
3949 Cuts portions out of the audio. Any number of \fIposition\fRs may be
3950 given; audio is not sent to the output until the first \fIposition\fR
3951 is reached. The effect then alternates between copying and discarding
3952 audio at each \fIposition\fR.
3954 If a \fIposition\fR is preceded by an equals or minus sign, it is
3955 interpreted relative to the beginning or the end of the audio,
3956 respectively. (The audio length must be known for end-relative
3957 locations to work.) Otherwise, it is considered an offset from the
3958 last \fIposition\fR, or from the start of audio for the first
3959 parameter. Using a value of 0 for the first \fIposition\fR
3960 parameter allows copying from the beginning of the audio.
3962 All parameters can be specified using either an amount of time or an
3963 exact count of samples. The format for specifying lengths in time is
3964 hh:mm:ss.frac. A value of 1:30\*d5 for the first parameter will not
3965 start until 1 minute, thirty and \(12 seconds into the audio. The format
3966 for specifying sample counts is the number of samples with the letter `s'
3967 appended to it. A value of 8000s for the first parameter will wait until
3968 8000 samples are read before starting to process audio.
3972 sox infile outfile trim 0 10
3974 will copy the first ten seconds, while
3976 play infile trim 12:34 =15:00 -2:00
3978 will play from 12 minutes 34 seconds into the audio up to 15 minutes into
3979 the audio (i.e. 2 minutes and 26 seconds long), then resume playing two
3980 minutes before the end of audio.
3982 \fBupsample\fR [\fIfactor\fR]
3983 Upsample the signal by an integer factor: \fIfactor\fR\-1 zero-value
3984 samples are inserted between each pair of input samples. As a result, the
3985 original spectrum is replicated into the new frequency space (aliasing) and
3986 attenuated. This attenuation can be compensated for by adding
3987 \fBvol \fIfactor\fR after any further processing. The upsample effect is
3988 typically used in combination with filtering effects.
3990 For a general resampling effect with anti-aliasing, see \fBrate\fR. See
3991 also \fBdownsample\fR.
3993 \fBvad \fR[\fIoptions\fR]
3994 Voice Activity Detector. Attempts to trim silence and quiet
3995 background sounds from the ends of (fairly high resolution
3996 i.e. 16-bit, 44\-48kHz) recordings of speech. The algorithm currently
3997 uses a simple cepstral power measurement to detect voice, so may be
3998 fooled by other things, especially music. The effect can trim only
3999 from the front of the audio, so in order to trim from the back, the
4001 effect must also be used. E.g.
4003 play speech.wav norm vad
4005 to trim from the front,
4007 play speech.wav norm reverse vad reverse
4009 to trim from the back, and
4011 play speech.wav norm vad reverse vad reverse
4013 to trim from both ends. The use of the
4015 effect is recommended, but remember that neither
4019 is suitable for use with streamed audio.
4023 Default values are shown in parenthesis.
4025 .IP \fB\-t\ \fInum\fR\ (7)
4026 The measurement level used to trigger activity detection. This might
4027 need to be changed depending on the noise level, signal level and
4028 other charactistics of the input audio.
4029 .IP \fB\-T\ \fInum\fR\ (0.25)
4030 The time constant (in seconds) used to help ignore short bursts of
4032 .IP \fB\-s\ \fInum\fR\ (1)
4033 The amount of audio (in seconds) to search for quieter/shorter bursts
4034 of audio to include prior to the detected trigger point.
4035 .IP \fB\-g\ \fInum\fR\ (0.25)
4036 Allowed gap (in seconds) between quieter/shorter bursts of audio to
4037 include prior to the detected trigger point.
4038 .IP \fB\-p\ \fInum\fR\ (0)
4039 The amount of audio (in seconds) to preserve before the trigger point
4040 and any found quieter/shorter bursts.
4044 .I Advanced Options:
4046 These allow fine tuning of the algorithm's internal parameters.
4048 .IP \fB\-b\ \fInum\fR
4049 The algorithm (internally) uses adaptive noise estimation/reduction in
4050 order to detect the start of the wanted audio. This option sets the
4051 time for the initial noise estimate.
4052 .IP \fB\-N\ \fInum\fR
4053 Time constant used by the adaptive noise estimator for when the noise
4054 level is increasing.
4055 .IP \fB\-n\ \fInum\fR
4056 Time constant used by the adaptive noise estimator for when the noise
4057 level is decreasing.
4058 .IP \fB\-r\ \fInum\fR
4059 Amount of noise reduction to use in the detection algorithm (e.g. 0,
4061 .IP \fB\-f\ \fInum\fR
4062 Frequency of the algorithm's processing/measurements.
4063 .IP \fB\-m\ \fInum\fR
4064 Measurement duration; by default, twice the measurement period; i.e.
4066 .IP \fB\-M\ \fInum\fR
4067 Time constant used to smooth spectral measurements.
4068 .IP \fB\-h\ \fInum\fR
4069 `Brick-wall' frequency of high-pass filter applied at the input to the
4071 .IP \fB\-l\ \fInum\fR
4072 `Brick-wall' frequency of low-pass filter applied at the input to the
4074 .IP \fB\-H\ \fInum\fR
4075 `Brick-wall' frequency of high-pass lifter used in the detector
4077 .IP \fB\-L\ \fInum\fR
4078 `Brick-wall' frequency of low-pass lifter used in the detector
4087 \fBvol \fIgain\fR [\fItype\fR [\fIlimitergain\fR]]
4088 Apply an amplification or an attenuation to the audio signal.
4091 option (which is used for balancing multiple input files as they enter the
4092 SoX effects processing chain),
4094 is an effect like any other so can be applied anywhere, and several times
4095 if necessary, during the processing chain.
4097 The amount to change the volume is given by
4099 which is interpreted, according to the given \fItype\fR, as follows: if
4101 is \fBamplitude\fR (or is omitted), then
4103 is an amplitude (i.e. voltage or linear) ratio,
4104 if \fBpower\fR, then a power (i.e. wattage or voltage-squared) ratio,
4105 and if \fBdB\fR, then a power change in dB.
4109 is \fBamplitude\fR or \fBpower\fR, a
4111 of 1 leaves the volume unchanged,
4112 less than 1 decreases it,
4113 and greater than 1 increases it;
4116 inverts the audio signal in addition to adjusting its volume.
4122 of 0 leaves the volume unchanged,
4123 less than 0 decreases it,
4124 and greater than 0 increases it.
4127 for a detailed discussion on electrical (and hence audio signal)
4128 voltage and power ratios.
4132 when the increasing the volume.
4138 parameters can be concatenated if desired, e.g.
4141 An optional \fIlimitergain\fR value can be specified and should be a
4143 than 1 (e.g. 0\*d05 or 0\*d02) and is used only on peaks to prevent clipping.
4144 Not specifying this parameter will cause no limiter to be used. In verbose
4145 mode, this effect will display the percentage of the audio that needed to be
4150 for a volume-changing effect with different capabilities, and
4152 for a dynamic-range compression/expansion/limiting effect.
4153 .SS Deprecated Effects
4154 The following effects have been renamed or have their functionality
4155 included in another effect; they continue to work in this version of
4156 SoX but may be removed in future.
4158 \fBmixer\fR [ \fB\-l\fR\^|\^\fB\-r\fR\^|\^\fB\-f\fR\^|\^\fB\-b\fR\^|\^\fB\-1\fR\^|\^\fB\-2\fR\^|\^\fB\-3\fR\^|\^\fB\-4\fR\^|\^\fIn\fR{\fB,\fIn\fR} ]
4159 Reduce the number of audio channels by mixing or selecting channels,
4160 or increase the number of channels by duplicating channels.
4161 Note: this effect operates on the audio
4163 within the SoX effects processing chain; it should not be confused with the
4165 global option (where multiple
4167 are mix-combined before entering the effects chain).
4169 When reducing the number of channels it is possible to
4170 use the \fB\-l\fR, \fB\-r\fR, \fB\-f\fR, \fB\-b\fR,
4171 \fB\-1\fR, \fB\-2\fR, \fB\-3\fR, \fB\-4\fR, options to select only
4172 the left, right, front, back channel(s) or specific channel
4173 for the output instead of averaging the channels.
4174 The \fB\-l\fR, and \fB\-r\fR options will do averaging
4175 in quad-channel files so select the exact channel to prevent this.
4179 effect can also be invoked with up to 16
4180 numbers, separated by commas, which specify the proportion (0 = 0% and 1 = 100%)
4181 of each input channel that is to be mixed into each output channel.
4182 In two-channel mode, 4 numbers are given: l \*(RA l, l \*(RA r, r \*(RA l, and r \*(RA r,
4184 In four-channel mode, the first 4 numbers give the proportions for the
4185 left-front output channel, as follows: lf \*(RA lf, rf \*(RA lf, lb \*(RA lf, and
4187 The next 4 give the right-front output in the same order, then
4188 left-back and right-back.
4190 It is also possible to use the 16 numbers to expand or reduce the
4191 channel count; just specify 0 for unused channels.
4193 Finally, certain reduced combination of numbers can be specified
4194 for certain input/output channel combinations.
4200 In Ch Out Ch Num Mappings
4201 2 1 2 l \*(RA l, r \*(RA l
4202 2 2 1 adjust balance
4203 4 1 4 lf \*(RA l, rf \*(RA l, lb \*(RA l, rb \*(RA l
4204 4 2 2 lf \*(RA l&rf \*(RA r, lb \*(RA l&rb \*(RA r
4205 4 4 1 adjust balance
4206 4 4 2 front balance, back balance
4210 This effect has been superseded by the
4212 effect that handles any number of channels.
4214 Exit status is 0 for no error, 1 if there is a problem with the
4215 command-line parameters, or 2 if an error occurs during file processing.
4217 Please report any bugs found in this version of SoX to the mailing list
4218 (sox-users@lists.sourceforge.net).
4229 The SoX web site at http://sox.sourceforge.net
4231 SoX scripting examples at http://sox.sourceforge.net/Docs/Scripts
4236 .IR "Cookbook formulae for audio EQ biquad filter coefficients" ,
4237 http://musicdsp.org/files/Audio-EQ-Cookbook.txt
4242 http://en.wikipedia.org/wiki/Q_factor
4246 .IR "Effects Explained" ,
4247 http://harmony-central.com/Effects/effects-explained.html
4252 http://en.wikipedia.org/wiki/Decibel
4256 .IR "Linux Audio Developer's Simple Plugin API" ,
4257 http://www.ladspa.org
4261 .IR "Computer Music Toolkit" ,
4262 http://www.ladspa.org/cmt
4266 .IR "LADSPA plugins" ,
4267 http://plugin.org.uk
4269 Copyright 1998\-2011 Chris Bagwell and SoX Contributors.
4271 Copyright 1991 Lance Norskog and Sundry Contributors.
4273 This program is free software; you can redistribute it and/or modify
4274 it under the terms of the GNU General Public License as published by
4275 the Free Software Foundation; either version 2, or (at your option)
4278 This program is distributed in the hope that it will be useful,
4279 but WITHOUT ANY WARRANTY; without even the implied warranty of
4280 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
4281 GNU General Public License for more details.
4283 Chris Bagwell (cbagwell@users.sourceforge.net).
4284 Other authors and contributors are listed in the ChangeLog file that
4285 is distributed with the source code.