1 <!DOCTYPE html PUBLIC
"-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
4 <meta http-equiv=
"Content-Type" content=
"text/html; charset=UTF-8">
5 <meta http-equiv=
"Content-Style-Type" content=
"text/css">
7 <meta name=
"Generator" content=
"Cocoa HTML Writer">
8 <meta name=
"CocoaVersion" content=
"949.43">
9 <style type=
"text/css">
10 p
.p1
{margin: 0.0px 0.0px 0.0px 0.0px; font: 18.0px Helvetica
}
11 p
.p2
{margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica
; min-height: 14.0px}
12 p
.p3
{margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica
}
13 p
.p4
{margin: 0.0px 0.0px 0.0px 0.0px; font: 9.0px Monaco
}
14 p
.p5
{margin: 0.0px 0.0px 0.0px 0.0px; font: 9.0px Monaco
; color: #ad140d}
15 p
.p6
{margin: 0.0px 0.0px 0.0px 0.0px; font: 9.0px Monaco
; color: #606060}
16 p
.p7
{margin: 0.0px 0.0px 0.0px 0.0px; font: 9.0px Monaco
; min-height: 12.0px}
17 p
.p8
{margin: 0.0px 0.0px 0.0px 0.0px; font: 9.0px Monaco
; color: #ad140d; min-height: 12.0px}
18 p
.p9
{margin: 0.0px 0.0px 0.0px 33.0px; font: 12.0px Arial
}
19 span
.s1
{color: #001bb9}
20 span
.s2
{color: #ad140d}
21 span
.s3
{color: #000000}
22 span
.s4
{color: #2c7014}
23 span
.s5
{text-decoration: underline
; color: #2c7014}
24 span
.Apple-tab-span
{white-space:pre
}
25 ul
.ul1
{list-style-type: disc
}
26 ul
.ul2
{list-style-type: circle
}
30 <p class=
"p1"><b>Onsets
<span class=
"Apple-tab-span"> </span><span class=
"Apple-tab-span"> </span>Onset detector
</b></p>
31 <p class=
"p2"><br></p>
32 <p class=
"p3"><span class=
"Apple-tab-span"> </span><b>Onsets.kr(chain, threshold, odftype)
</b></p>
33 <p class=
"p2"><br></p>
34 <p class=
"p3">An onset detector for musical audio signals - detects the beginning of notes/drumbeats/etc. Outputs a control-rate trigger signal which is
1 when an onset is detected, and
0 otherwise.
</p>
35 <p class=
"p2"><br></p>
36 <p class=
"p3"><b>chain
</b> - an
<a href=
"SC://FFT"><span class=
"s1">FFT
</span></a> chain
</p>
37 <p class=
"p3"><b>threshold
</b> - the detection threshold, typically between
0 and
1, although in rare cases you may find values outside this range useful
</p>
38 <p class=
"p3"><b>odftype
</b> - the function used to analyse the signal (options described below; OK to leave this at its default value)
</p>
39 <p class=
"p2"><br></p>
40 <p class=
"p3">For the FFT chain, you should typically use a frame size of
512 or
1024 (at
44.1 kHz sampling rate) and
50% hop size (which is the default setting in SC). For different sampling rates choose an FFT size to cover a similar time-span (around
10 to
20 ms).
</p>
41 <p class=
"p2"><br></p>
42 <p class=
"p3">The onset detection should work well for a general range of monophonic and polyphonic audio signals. The onset detection is purely based on signal analysis and does not make use of any
"top-down" inferences such as tempo.
</p>
43 <p class=
"p2"><br></p>
44 <p class=
"p2"><br></p>
45 <p class=
"p3"><b>Example
</b></p>
46 <p class=
"p2"><br></p>
48 <p class=
"p4">s.boot.doWhenBooted {
</p>
49 <p class=
"p5"><span class=
"Apple-tab-span"> </span>// Prepare the buffers
</p>
50 <p class=
"p4"><span class=
"s2"><span class=
"Apple-tab-span"> </span></span>b =
<span class=
"s1">Buffer
</span>.alloc(s,
512);
</p>
51 <p class=
"p5"><span class=
"s3"><span class=
"Apple-tab-span"> </span></span>// Feel free to load a more interesting clip!
</p>
52 <p class=
"p5"><span class=
"Apple-tab-span"> </span>// a11wlk01 is not an ideal example of musical onsets.
</p>
53 <p class=
"p6"><span class=
"s2"><span class=
"Apple-tab-span"> </span></span><span class=
"s3">d =
</span><span class=
"s1">Buffer
</span><span class=
"s3">.read(s,
</span>"sounds/a11wlk01.wav"<span class=
"s3">);
</span></p>
56 <p class=
"p7"><br></p>
57 <p class=
"p5">////////////////////////////////////////////////////////////////////////////////////////////////
</p>
58 <p class=
"p5">// Move the mouse to vary the threshold
</p>
60 <p class=
"p4">x = {
</p>
61 <p class=
"p4"><span class=
"Apple-tab-span"> </span><span class=
"s1">var
</span> sig, chain, onsets, pips;
</p>
62 <p class=
"p7"><span class=
"Apple-tab-span"> </span></p>
63 <p class=
"p5"><span class=
"s3"><span class=
"Apple-tab-span"> </span></span>// A simple generative signal
</p>
64 <p class=
"p4"><span class=
"Apple-tab-span"> </span>sig =
<span class=
"s1">LPF
</span>.ar(
<span class=
"s1">Pulse
</span>.ar(
<span class=
"s1">TIRand
</span>.kr(
63,
75,
<span class=
"s1">Impulse
</span>.kr(
2)).midicps),
<span class=
"s1">LFNoise2
</span>.kr(
0.5).exprange(
100,
10000)) *
<span class=
"s1">Saw
</span>.ar(
2).range(
0,
1);
</p>
65 <p class=
"p5"><span class=
"s3"><span class=
"Apple-tab-span"> </span></span>// or, uncomment this line if you want to play the buffer in
</p>
66 <p class=
"p5"><span class=
"s3"><span class=
"Apple-tab-span"> </span></span>//sig = PlayBuf.ar(
1, d, BufRateScale.kr(d), loop:
1);
</p>
67 <p class=
"p7"><span class=
"Apple-tab-span"> </span></p>
68 <p class=
"p4"><span class=
"Apple-tab-span"> </span>chain =
<span class=
"s1">FFT
</span>(b, sig);
</p>
69 <p class=
"p7"><span class=
"Apple-tab-span"> </span></p>
70 <p class=
"p4"><span class=
"Apple-tab-span"> </span>onsets =
<span class=
"s1">Onsets
</span>.kr(chain,
<span class=
"s1">MouseX
</span>.kr(
0,
1),
<span class=
"s4">\rcomplex
</span>);
</p>
71 <p class=
"p7"><span class=
"Apple-tab-span"> </span></p>
72 <p class=
"p5"><span class=
"s3"><span class=
"Apple-tab-span"> </span></span>// You'll hear percussive
"ticks" whenever an onset is detected
</p>
73 <p class=
"p4"><span class=
"Apple-tab-span"> </span>pips =
<span class=
"s1">WhiteNoise
</span>.ar(
<span class=
"s1">EnvGen
</span>.kr(
<span class=
"s1">Env
</span>.perc(
0.001,
0.1,
0.2), onsets));
</p>
74 <p class=
"p4"><span class=
"Apple-tab-span"> </span><span class=
"s1">Out
</span>.ar(
0,
<span class=
"s1">Pan2
</span>.ar(sig, -
0.75,
0.2) +
<span class=
"s1">Pan2
</span>.ar(pips,
0.75,
1));
</p>
75 <p class=
"p4">}.play;
</p>
77 <p class=
"p5"><span class=
"s3">x.free;
</span>// Free the synth
</p>
78 <p class=
"p7"><br></p>
79 <p class=
"p7"><br></p>
80 <p class=
"p7"><br></p>
81 <p class=
"p5">////////////////////////////////////////////////////////////////////////////////////////////////
</p>
82 <p class=
"p5">// Or we could expand this multichannel, run a series of different thresholds at the same time,
<span class=
"Apple-converted-space"> </span></p>
83 <p class=
"p5">// to sonify the effect of the threshold value.
</p>
84 <p class=
"p5">// A little hard to listen to at first: try and identify a pitch at which the best sort of
<span class=
"Apple-converted-space"> </span></p>
85 <p class=
"p5">// detection is happening.
</p>
86 <p class=
"p5">// You'll hear
"bobbling" at low pitches where the threshold is definitely too low.
</p>
87 <p class=
"p8"><br></p>
89 <p class=
"p4">var threshes = (
0.1,
0.2 ..
1);
</p>
90 <p class=
"p4">x = {
</p>
91 <p class=
"p4"><span class=
"Apple-tab-span"> </span><span class=
"s1">var
</span> sig, chain, onsets, pips;
</p>
92 <p class=
"p7"><span class=
"Apple-tab-span"> </span></p>
93 <p class=
"p5"><span class=
"s3"><span class=
"Apple-tab-span"> </span></span>// A simple generative signal
</p>
94 <p class=
"p4"><span class=
"Apple-tab-span"> </span>sig =
<span class=
"s1">LPF
</span>.ar(
<span class=
"s1">Pulse
</span>.ar(
<span class=
"s1">TIRand
</span>.kr(
63,
75,
<span class=
"s1">Impulse
</span>.kr(
2)).midicps),
<span class=
"s1">LFNoise2
</span>.kr(
0.5).exprange(
100,
10000)) *
<span class=
"s1">Saw
</span>.ar(
2).range(
0,
1);
</p>
95 <p class=
"p5"><span class=
"s3"><span class=
"Apple-tab-span"> </span></span>// or, uncomment this line if you want to play the buffer in
</p>
96 <p class=
"p5"><span class=
"s3"><span class=
"Apple-tab-span"> </span></span>//sig = PlayBuf.ar(
1, d, BufRateScale.kr(d), loop:
1);
</p>
97 <p class=
"p7"><span class=
"Apple-tab-span"> </span></p>
98 <p class=
"p4"><span class=
"Apple-tab-span"> </span>chain =
<span class=
"s1">FFT
</span>(b, sig);
</p>
99 <p class=
"p7"><span class=
"Apple-tab-span"> </span></p>
100 <p class=
"p4"><span class=
"Apple-tab-span"> </span>onsets =
<span class=
"s1">Onsets
</span>.kr(chain, threshes,
<span class=
"s4">\rcomplex
</span>);
</p>
101 <p class=
"p7"><span class=
"Apple-tab-span"> </span></p>
102 <p class=
"p5"><span class=
"s3"><span class=
"Apple-tab-span"> </span></span>// Generate pips at a variety of pitches
</p>
103 <p class=
"p4"><span class=
"Apple-tab-span"> </span>pips =
<span class=
"s1">SinOsc
</span>.ar((threshes).linexp(
0,
1,
440,
3520),
0,
<span class=
"s1">EnvGen
</span>.kr(
<span class=
"s1">Env
</span>.perc(
0.001,
0.1,
0.5), onsets)).mean;
</p>
104 <p class=
"p4"><span class=
"Apple-tab-span"> </span><span class=
"s1">Out
</span>.ar(
0,
<span class=
"s1">Pan2
</span>.ar(sig, -
0.75,
0.2) +
<span class=
"s1">Pan2
</span>.ar(pips,
0.75,
1));
</p>
105 <p class=
"p4">}.play;
</p>
107 <p class=
"p5"><span class=
"s3">x.free;
</span>// Free the synth
</p>
108 <p class=
"p7"><br></p>
109 <p class=
"p7"><br></p>
110 <p class=
"p5"><span class=
"s3">[b,d].do(
</span><span class=
"s1">_
</span><span class=
"s3">.free);
</span>// Free the buffers
</p>
111 <p class=
"p2"><br></p>
112 <p class=
"p2"><br></p>
113 <p class=
"p3">The
<b>type
</b> argument chooses which
<i>onset detection function
</i> is used. In many cases the default will be fine. The following choices are available:
</p>
114 <p class=
"p2"><br></p>
116 <li style=
"margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica"><span class=
"s4">\power
</span><span class=
"Apple-converted-space"> </span>- generally OK, good for percussive input, and also very efficient
</li>
117 <li style=
"margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica"><span class=
"s4">\magsum
</span><span class=
"Apple-converted-space"> </span>- generally OK, good for percussive input, and also very efficient
</li>
118 <li style=
"margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica"><span class=
"s4">\complex
</span><span class=
"Apple-converted-space"> </span>- performs generally very well, but more CPU-intensive
</li>
119 <li style=
"margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica"><span class=
"s4">\rcomplex
</span> - performs generally very well, and slightly more efficient than
<span class=
"s4">\complex
</span></li>
120 <li style=
"margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica"><span class=
"s4">\phase
</span> <span class=
"Apple-converted-space"> </span>- generally good, especially for tonal input, medium efficiency
</li>
121 <li style=
"margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica"><span class=
"s4">\wphase
</span> <span class=
"Apple-converted-space"> </span>- generally very good, especially for tonal input, medium efficiency
</li>
122 <li style=
"margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica"><span class=
"s4">\mkl
</span><span class=
"Apple-converted-space"> </span>- generally very good, medium efficiency, pretty different from the other methods
</li>
124 <p class=
"p2"><br></p>
125 <p class=
"p3">Which of these should you choose? The differences aren't large, so I'd recommend you stick with the default
<span class=
"s4">\rcomplex
</span> unless you find specific problems with it. Then maybe try
<span class=
"s4">\wphase
</span>. The
<span class=
"s4">\mkl
</span> type is a bit different from the others so maybe try that too. They all have slightly different characteristics, and in tests perform at a similar quality level.
</p>
126 <p class=
"p2"><br></p>
127 <p class=
"p3">For more details of all the processes involved, the different
<i>onset detection functions
</i>, and their evaluation, see
</p>
128 <p class=
"p2"><br></p>
129 <p class=
"p9">D. Stowell and M. D. Plumbley.
<a href=
"http://www.elec.qmul.ac.uk/digitalmusic/papers/2007/StowellPlumbley07-icmc.pdf"><span class=
"s5">Adaptive whitening for improved real-time audio onset detection
</span></a>.
<i>Proceedings of the International Computer Music Conference (ICMC’
07)
</i>, Copenhagen, Denmark, August
2007.
</p>
130 <p class=
"p2"><br></p>
131 <p class=
"p2"><br></p>
132 <p class=
"p3"><b>Advanced features
</b></p>
133 <p class=
"p2"><br></p>
134 <p class=
"p3">Further options are available, which you are welcome to explore if you want. They are numbers that modulate the behaviour of the onset detector:
</p>
135 <p class=
"p2"><br></p>
137 <li style=
"margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica"><b>relaxtime
</b> and
<b>floor
</b> are parameters to the whitening process used, a kind of normalisation of the FFT signal. (Note: in
<span class=
"s4">\mkl
</span> mode these are not used.)
</li>
139 <li style=
"margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica"><b>relaxtime
</b> specifies the time (in seconds) for the normalisation to
"forget" about a recent onset. If you find too much re-triggering (e.g. as a note dies away unevenly) then you might wish to increase this value.
</li>
140 <li style=
"margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica"><b>floor
</b> is a lower limit, connected to the idea of how quiet the sound is expected to get without becoming indistinguishable from noise. For some cleanly-recorded classical music with wide dynamic variations, I found it helpful to go down as far as
0.000001.
</li>
142 <li style=
"margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica"><b>mingap
</b> specifies a minimum gap (in FFT frames) between onset detections, a brute-force way to prevent too many doubled detections.
</li>
143 <li style=
"margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica"><b>medianspan
</b> specifies the size (in FFT frames) of the median window used for smoothing the detection function before triggering.
</li>