3 Documentation written by Tom Zanussi
8 Histogram triggers are special event triggers that can be used to
9 aggregate trace event data into histograms. For information on
10 trace events and event triggers, see Documentation/trace/events.txt.
13 2. Histogram Trigger Command
14 ============================
16 A histogram trigger command is an event trigger command that
17 aggregates event hits into a hash table keyed on one or more trace
18 event format fields (or stacktrace) and a set of running totals
19 derived from one or more trace event format fields and/or event
22 The format of a hist trigger is as follows:
24 hist:keys=<field1[,field2,...]>[:values=<field1[,field2,...]>]
25 [:sort=<field1[,field2,...]>][:size=#entries][:pause][:continue]
26 [:clear][:name=histname1] [if <filter>]
28 When a matching event is hit, an entry is added to a hash table
29 using the key(s) and value(s) named. Keys and values correspond to
30 fields in the event's format description. Values must correspond to
31 numeric fields - on an event hit, the value(s) will be added to a
32 sum kept for that field. The special string 'hitcount' can be used
33 in place of an explicit value field - this is simply a count of
34 event hits. If 'values' isn't specified, an implicit 'hitcount'
35 value will be automatically created and used as the only value.
36 Keys can be any field, or the special string 'stacktrace', which
37 will use the event's kernel stacktrace as the key. The keywords
38 'keys' or 'key' can be used to specify keys, and the keywords
39 'values', 'vals', or 'val' can be used to specify values. Compound
40 keys consisting of up to two fields can be specified by the 'keys'
41 keyword. Hashing a compound key produces a unique entry in the
42 table for each unique combination of component keys, and can be
43 useful for providing more fine-grained summaries of event data.
44 Additionally, sort keys consisting of up to two fields can be
45 specified by the 'sort' keyword. If more than one field is
46 specified, the result will be a 'sort within a sort': the first key
47 is taken to be the primary sort key and the second the secondary
48 key. If a hist trigger is given a name using the 'name' parameter,
49 its histogram data will be shared with other triggers of the same
50 name, and trigger hits will update this common data. Only triggers
51 with 'compatible' fields can be combined in this way; triggers are
52 'compatible' if the fields named in the trigger share the same
53 number and type of fields and those fields also have the same names.
54 Note that any two events always share the compatible 'hitcount' and
55 'stacktrace' fields and can therefore be combined using those
56 fields, however pointless that may be.
58 'hist' triggers add a 'hist' file to each event's subdirectory.
59 Reading the 'hist' file for the event will dump the hash table in
60 its entirety to stdout. If there are multiple hist triggers
61 attached to an event, there will be a table for each trigger in the
62 output. The table displayed for a named trigger will be the same as
63 any other instance having the same name. Each printed hash table
64 entry is a simple list of the keys and values comprising the entry;
65 keys are printed first and are delineated by curly braces, and are
66 followed by the set of value fields for the entry. By default,
67 numeric fields are displayed as base-10 integers. This can be
68 modified by appending any of the following modifiers to the field
71 .hex display a number as a hex value
72 .sym display an address as a symbol
73 .sym-offset display an address as a symbol and offset
74 .syscall display a syscall id as a system call name
75 .execname display a common_pid as a program name
76 .log2 display log2 value rather than raw number
77 .usecs display a common_timestamp in microseconds
79 Note that in general the semantics of a given field aren't
80 interpreted when applying a modifier to it, but there are some
81 restrictions to be aware of in this regard:
83 - only the 'hex' modifier can be used for values (because values
84 are essentially sums, and the other modifiers don't make sense
86 - the 'execname' modifier can only be used on a 'common_pid'. The
87 reason for this is that the execname is simply the 'comm' value
88 saved for the 'current' process when an event was triggered,
89 which is the same as the common_pid value saved by the event
90 tracing code. Trying to apply that comm value to other pid
91 values wouldn't be correct, and typically events that care save
92 pid-specific comm fields in the event itself.
94 A typical usage scenario would be the following to enable a hist
95 trigger, read its current contents, and then turn it off:
97 # echo 'hist:keys=skbaddr.hex:vals=len' > \
98 /sys/kernel/debug/tracing/events/net/netif_rx/trigger
100 # cat /sys/kernel/debug/tracing/events/net/netif_rx/hist
102 # echo '!hist:keys=skbaddr.hex:vals=len' > \
103 /sys/kernel/debug/tracing/events/net/netif_rx/trigger
105 The trigger file itself can be read to show the details of the
106 currently attached hist trigger. This information is also displayed
107 at the top of the 'hist' file when read.
109 By default, the size of the hash table is 2048 entries. The 'size'
110 parameter can be used to specify more or fewer than that. The units
111 are in terms of hashtable entries - if a run uses more entries than
112 specified, the results will show the number of 'drops', the number
113 of hits that were ignored. The size should be a power of 2 between
114 128 and 131072 (any non- power-of-2 number specified will be rounded
117 The 'sort' parameter can be used to specify a value field to sort
118 on. The default if unspecified is 'hitcount' and the default sort
119 order is 'ascending'. To sort in the opposite direction, append
120 .descending' to the sort key.
122 The 'pause' parameter can be used to pause an existing hist trigger
123 or to start a hist trigger but not log any events until told to do
124 so. 'continue' or 'cont' can be used to start or restart a paused
127 The 'clear' parameter will clear the contents of a running hist
128 trigger and leave its current paused/active state.
130 Note that the 'pause', 'cont', and 'clear' parameters should be
131 applied using 'append' shell operator ('>>') if applied to an
132 existing trigger, rather than via the '>' operator, which will cause
133 the trigger to be removed through truncation.
135 - enable_hist/disable_hist
137 The enable_hist and disable_hist triggers can be used to have one
138 event conditionally start and stop another event's already-attached
139 hist trigger. Any number of enable_hist and disable_hist triggers
140 can be attached to a given event, allowing that event to kick off
141 and stop aggregations on a host of other events.
143 The format is very similar to the enable/disable_event triggers:
145 enable_hist:<system>:<event>[:count]
146 disable_hist:<system>:<event>[:count]
148 Instead of enabling or disabling the tracing of the target event
149 into the trace buffer as the enable/disable_event triggers do, the
150 enable/disable_hist triggers enable or disable the aggregation of
151 the target event into a hash table.
153 A typical usage scenario for the enable_hist/disable_hist triggers
154 would be to first set up a paused hist trigger on some event,
155 followed by an enable_hist/disable_hist pair that turns the hist
156 aggregation on and off when conditions of interest are hit:
158 # echo 'hist:keys=skbaddr.hex:vals=len:pause' > \
159 /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
161 # echo 'enable_hist:net:netif_receive_skb if filename==/usr/bin/wget' > \
162 /sys/kernel/debug/tracing/events/sched/sched_process_exec/trigger
164 # echo 'disable_hist:net:netif_receive_skb if comm==wget' > \
165 /sys/kernel/debug/tracing/events/sched/sched_process_exit/trigger
167 The above sets up an initially paused hist trigger which is unpaused
168 and starts aggregating events when a given program is executed, and
169 which stops aggregating when the process exits and the hist trigger
172 The examples below provide a more concrete illustration of the
173 concepts and typical usage patterns discussed above.
175 'special' event fields
176 ------------------------
178 There are a number of 'special event fields' available for use as
179 keys or values in a hist trigger. These look like and behave as if
180 they were actual event fields, but aren't really part of the event's
181 field definition or format file. They are however available for any
182 event, and can be used anywhere an actual event field could be.
185 common_timestamp u64 - timestamp (from ring buffer) associated
186 with the event, in nanoseconds. May be
187 modified by .usecs to have timestamps
188 interpreted as microseconds.
189 cpu int - the cpu on which the event occurred.
191 Extended error information
192 --------------------------
194 For some error conditions encountered when invoking a hist trigger
195 command, extended error information is available via the
196 corresponding event's 'hist' file. Reading the hist file after an
197 error will display more detailed information about what went wrong,
198 if information is available. This extended error information will
199 be available until the next hist trigger command for that event.
201 If available for a given error condition, the extended error
202 information and usage takes the following form:
204 # echo xxx > /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
205 echo: write error: Invalid argument
207 # cat /sys/kernel/debug/tracing/events/sched/sched_wakeup/hist
208 ERROR: Couldn't yyy: zzz
211 6.2 'hist' trigger examples
212 ---------------------------
214 The first set of examples creates aggregations using the kmalloc
215 event. The fields that can be used for the hist trigger are listed
216 in the kmalloc event's format file:
218 # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/format
222 field:unsigned short common_type; offset:0; size:2; signed:0;
223 field:unsigned char common_flags; offset:2; size:1; signed:0;
224 field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
225 field:int common_pid; offset:4; size:4; signed:1;
227 field:unsigned long call_site; offset:8; size:8; signed:0;
228 field:const void * ptr; offset:16; size:8; signed:0;
229 field:size_t bytes_req; offset:24; size:8; signed:0;
230 field:size_t bytes_alloc; offset:32; size:8; signed:0;
231 field:gfp_t gfp_flags; offset:40; size:4; signed:0;
233 We'll start by creating a hist trigger that generates a simple table
234 that lists the total number of bytes requested for each function in
235 the kernel that made one or more calls to kmalloc:
237 # echo 'hist:key=call_site:val=bytes_req' > \
238 /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
240 This tells the tracing system to create a 'hist' trigger using the
241 call_site field of the kmalloc event as the key for the table, which
242 just means that each unique call_site address will have an entry
243 created for it in the table. The 'val=bytes_req' parameter tells
244 the hist trigger that for each unique entry (call_site) in the
245 table, it should keep a running total of the number of bytes
246 requested by that call_site.
248 We'll let it run for awhile and then dump the contents of the 'hist'
249 file in the kmalloc event's subdirectory (for readability, a number
250 of entries have been omitted):
252 # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
253 # trigger info: hist:keys=call_site:vals=bytes_req:sort=hitcount:size=2048 [active]
255 { call_site: 18446744072106379007 } hitcount: 1 bytes_req: 176
256 { call_site: 18446744071579557049 } hitcount: 1 bytes_req: 1024
257 { call_site: 18446744071580608289 } hitcount: 1 bytes_req: 16384
258 { call_site: 18446744071581827654 } hitcount: 1 bytes_req: 24
259 { call_site: 18446744071580700980 } hitcount: 1 bytes_req: 8
260 { call_site: 18446744071579359876 } hitcount: 1 bytes_req: 152
261 { call_site: 18446744071580795365 } hitcount: 3 bytes_req: 144
262 { call_site: 18446744071581303129 } hitcount: 3 bytes_req: 144
263 { call_site: 18446744071580713234 } hitcount: 4 bytes_req: 2560
264 { call_site: 18446744071580933750 } hitcount: 4 bytes_req: 736
268 { call_site: 18446744072106047046 } hitcount: 69 bytes_req: 5576
269 { call_site: 18446744071582116407 } hitcount: 73 bytes_req: 2336
270 { call_site: 18446744072106054684 } hitcount: 136 bytes_req: 140504
271 { call_site: 18446744072106224230 } hitcount: 136 bytes_req: 19584
272 { call_site: 18446744072106078074 } hitcount: 153 bytes_req: 2448
273 { call_site: 18446744072106062406 } hitcount: 153 bytes_req: 36720
274 { call_site: 18446744071582507929 } hitcount: 153 bytes_req: 37088
275 { call_site: 18446744072102520590 } hitcount: 273 bytes_req: 10920
276 { call_site: 18446744071582143559 } hitcount: 358 bytes_req: 716
277 { call_site: 18446744072106465852 } hitcount: 417 bytes_req: 56712
278 { call_site: 18446744072102523378 } hitcount: 485 bytes_req: 27160
279 { call_site: 18446744072099568646 } hitcount: 1676 bytes_req: 33520
286 The output displays a line for each entry, beginning with the key
287 specified in the trigger, followed by the value(s) also specified in
288 the trigger. At the beginning of the output is a line that displays
289 the trigger info, which can also be displayed by reading the
292 # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
293 hist:keys=call_site:vals=bytes_req:sort=hitcount:size=2048 [active]
295 At the end of the output are a few lines that display the overall
296 totals for the run. The 'Hits' field shows the total number of
297 times the event trigger was hit, the 'Entries' field shows the total
298 number of used entries in the hash table, and the 'Dropped' field
299 shows the number of hits that were dropped because the number of
300 used entries for the run exceeded the maximum number of entries
301 allowed for the table (normally 0, but if not a hint that you may
302 want to increase the size of the table using the 'size' parameter).
304 Notice in the above output that there's an extra field, 'hitcount',
305 which wasn't specified in the trigger. Also notice that in the
306 trigger info output, there's a parameter, 'sort=hitcount', which
307 wasn't specified in the trigger either. The reason for that is that
308 every trigger implicitly keeps a count of the total number of hits
309 attributed to a given entry, called the 'hitcount'. That hitcount
310 information is explicitly displayed in the output, and in the
311 absence of a user-specified sort parameter, is used as the default
314 The value 'hitcount' can be used in place of an explicit value in
315 the 'values' parameter if you don't really need to have any
316 particular field summed and are mainly interested in hit
319 To turn the hist trigger off, simply call up the trigger in the
320 command history and re-execute it with a '!' prepended:
322 # echo '!hist:key=call_site:val=bytes_req' > \
323 /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
325 Finally, notice that the call_site as displayed in the output above
326 isn't really very useful. It's an address, but normally addresses
327 are displayed in hex. To have a numeric field displayed as a hex
328 value, simply append '.hex' to the field name in the trigger:
330 # echo 'hist:key=call_site.hex:val=bytes_req' > \
331 /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
333 # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
334 # trigger info: hist:keys=call_site.hex:vals=bytes_req:sort=hitcount:size=2048 [active]
336 { call_site: ffffffffa026b291 } hitcount: 1 bytes_req: 433
337 { call_site: ffffffffa07186ff } hitcount: 1 bytes_req: 176
338 { call_site: ffffffff811ae721 } hitcount: 1 bytes_req: 16384
339 { call_site: ffffffff811c5134 } hitcount: 1 bytes_req: 8
340 { call_site: ffffffffa04a9ebb } hitcount: 1 bytes_req: 511
341 { call_site: ffffffff8122e0a6 } hitcount: 1 bytes_req: 12
342 { call_site: ffffffff8107da84 } hitcount: 1 bytes_req: 152
343 { call_site: ffffffff812d8246 } hitcount: 1 bytes_req: 24
344 { call_site: ffffffff811dc1e5 } hitcount: 3 bytes_req: 144
345 { call_site: ffffffffa02515e8 } hitcount: 3 bytes_req: 648
346 { call_site: ffffffff81258159 } hitcount: 3 bytes_req: 144
347 { call_site: ffffffff811c80f4 } hitcount: 4 bytes_req: 544
351 { call_site: ffffffffa06c7646 } hitcount: 106 bytes_req: 8024
352 { call_site: ffffffffa06cb246 } hitcount: 132 bytes_req: 31680
353 { call_site: ffffffffa06cef7a } hitcount: 132 bytes_req: 2112
354 { call_site: ffffffff8137e399 } hitcount: 132 bytes_req: 23232
355 { call_site: ffffffffa06c941c } hitcount: 185 bytes_req: 171360
356 { call_site: ffffffffa06f2a66 } hitcount: 185 bytes_req: 26640
357 { call_site: ffffffffa036a70e } hitcount: 265 bytes_req: 10600
358 { call_site: ffffffff81325447 } hitcount: 292 bytes_req: 584
359 { call_site: ffffffffa072da3c } hitcount: 446 bytes_req: 60656
360 { call_site: ffffffffa036b1f2 } hitcount: 526 bytes_req: 29456
361 { call_site: ffffffffa0099c06 } hitcount: 1780 bytes_req: 35600
368 Even that's only marginally more useful - while hex values do look
369 more like addresses, what users are typically more interested in
370 when looking at text addresses are the corresponding symbols
371 instead. To have an address displayed as symbolic value instead,
372 simply append '.sym' or '.sym-offset' to the field name in the
375 # echo 'hist:key=call_site.sym:val=bytes_req' > \
376 /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
378 # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
379 # trigger info: hist:keys=call_site.sym:vals=bytes_req:sort=hitcount:size=2048 [active]
381 { call_site: [ffffffff810adcb9] syslog_print_all } hitcount: 1 bytes_req: 1024
382 { call_site: [ffffffff8154bc62] usb_control_msg } hitcount: 1 bytes_req: 8
383 { call_site: [ffffffffa00bf6fe] hidraw_send_report [hid] } hitcount: 1 bytes_req: 7
384 { call_site: [ffffffff8154acbe] usb_alloc_urb } hitcount: 1 bytes_req: 192
385 { call_site: [ffffffffa00bf1ca] hidraw_report_event [hid] } hitcount: 1 bytes_req: 7
386 { call_site: [ffffffff811e3a25] __seq_open_private } hitcount: 1 bytes_req: 40
387 { call_site: [ffffffff8109524a] alloc_fair_sched_group } hitcount: 2 bytes_req: 128
388 { call_site: [ffffffff811febd5] fsnotify_alloc_group } hitcount: 2 bytes_req: 528
389 { call_site: [ffffffff81440f58] __tty_buffer_request_room } hitcount: 2 bytes_req: 2624
390 { call_site: [ffffffff81200ba6] inotify_new_group } hitcount: 2 bytes_req: 96
391 { call_site: [ffffffffa05e19af] ieee80211_start_tx_ba_session [mac80211] } hitcount: 2 bytes_req: 464
392 { call_site: [ffffffff81672406] tcp_get_metrics } hitcount: 2 bytes_req: 304
393 { call_site: [ffffffff81097ec2] alloc_rt_sched_group } hitcount: 2 bytes_req: 128
394 { call_site: [ffffffff81089b05] sched_create_group } hitcount: 2 bytes_req: 1424
398 { call_site: [ffffffffa04a580c] intel_crtc_page_flip [i915] } hitcount: 1185 bytes_req: 123240
399 { call_site: [ffffffffa0287592] drm_mode_page_flip_ioctl [drm] } hitcount: 1185 bytes_req: 104280
400 { call_site: [ffffffffa04c4a3c] intel_plane_duplicate_state [i915] } hitcount: 1402 bytes_req: 190672
401 { call_site: [ffffffff812891ca] ext4_find_extent } hitcount: 1518 bytes_req: 146208
402 { call_site: [ffffffffa029070e] drm_vma_node_allow [drm] } hitcount: 1746 bytes_req: 69840
403 { call_site: [ffffffffa045e7c4] i915_gem_do_execbuffer.isra.23 [i915] } hitcount: 2021 bytes_req: 792312
404 { call_site: [ffffffffa02911f2] drm_modeset_lock_crtc [drm] } hitcount: 2592 bytes_req: 145152
405 { call_site: [ffffffffa0489a66] intel_ring_begin [i915] } hitcount: 2629 bytes_req: 378576
406 { call_site: [ffffffffa046041c] i915_gem_execbuffer2 [i915] } hitcount: 2629 bytes_req: 3783248
407 { call_site: [ffffffff81325607] apparmor_file_alloc_security } hitcount: 5192 bytes_req: 10384
408 { call_site: [ffffffffa00b7c06] hid_report_raw_event [hid] } hitcount: 5529 bytes_req: 110584
409 { call_site: [ffffffff8131ebf7] aa_alloc_task_context } hitcount: 21943 bytes_req: 702176
410 { call_site: [ffffffff8125847d] ext4_htree_store_dirent } hitcount: 55759 bytes_req: 5074265
417 Because the default sort key above is 'hitcount', the above shows a
418 the list of call_sites by increasing hitcount, so that at the bottom
419 we see the functions that made the most kmalloc calls during the
420 run. If instead we we wanted to see the top kmalloc callers in
421 terms of the number of bytes requested rather than the number of
422 calls, and we wanted the top caller to appear at the top, we can use
423 the 'sort' parameter, along with the 'descending' modifier:
425 # echo 'hist:key=call_site.sym:val=bytes_req:sort=bytes_req.descending' > \
426 /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
428 # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
429 # trigger info: hist:keys=call_site.sym:vals=bytes_req:sort=bytes_req.descending:size=2048 [active]
431 { call_site: [ffffffffa046041c] i915_gem_execbuffer2 [i915] } hitcount: 2186 bytes_req: 3397464
432 { call_site: [ffffffffa045e7c4] i915_gem_do_execbuffer.isra.23 [i915] } hitcount: 1790 bytes_req: 712176
433 { call_site: [ffffffff8125847d] ext4_htree_store_dirent } hitcount: 8132 bytes_req: 513135
434 { call_site: [ffffffff811e2a1b] seq_buf_alloc } hitcount: 106 bytes_req: 440128
435 { call_site: [ffffffffa0489a66] intel_ring_begin [i915] } hitcount: 2186 bytes_req: 314784
436 { call_site: [ffffffff812891ca] ext4_find_extent } hitcount: 2174 bytes_req: 208992
437 { call_site: [ffffffff811ae8e1] __kmalloc } hitcount: 8 bytes_req: 131072
438 { call_site: [ffffffffa04c4a3c] intel_plane_duplicate_state [i915] } hitcount: 859 bytes_req: 116824
439 { call_site: [ffffffffa02911f2] drm_modeset_lock_crtc [drm] } hitcount: 1834 bytes_req: 102704
440 { call_site: [ffffffffa04a580c] intel_crtc_page_flip [i915] } hitcount: 972 bytes_req: 101088
441 { call_site: [ffffffffa0287592] drm_mode_page_flip_ioctl [drm] } hitcount: 972 bytes_req: 85536
442 { call_site: [ffffffffa00b7c06] hid_report_raw_event [hid] } hitcount: 3333 bytes_req: 66664
443 { call_site: [ffffffff8137e559] sg_kmalloc } hitcount: 209 bytes_req: 61632
447 { call_site: [ffffffff81095225] alloc_fair_sched_group } hitcount: 2 bytes_req: 128
448 { call_site: [ffffffff81097ec2] alloc_rt_sched_group } hitcount: 2 bytes_req: 128
449 { call_site: [ffffffff812d8406] copy_semundo } hitcount: 2 bytes_req: 48
450 { call_site: [ffffffff81200ba6] inotify_new_group } hitcount: 1 bytes_req: 48
451 { call_site: [ffffffffa027121a] drm_getmagic [drm] } hitcount: 1 bytes_req: 48
452 { call_site: [ffffffff811e3a25] __seq_open_private } hitcount: 1 bytes_req: 40
453 { call_site: [ffffffff811c52f4] bprm_change_interp } hitcount: 2 bytes_req: 16
454 { call_site: [ffffffff8154bc62] usb_control_msg } hitcount: 1 bytes_req: 8
455 { call_site: [ffffffffa00bf1ca] hidraw_report_event [hid] } hitcount: 1 bytes_req: 7
456 { call_site: [ffffffffa00bf6fe] hidraw_send_report [hid] } hitcount: 1 bytes_req: 7
463 To display the offset and size information in addition to the symbol
464 name, just use 'sym-offset' instead:
466 # echo 'hist:key=call_site.sym-offset:val=bytes_req:sort=bytes_req.descending' > \
467 /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
469 # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
470 # trigger info: hist:keys=call_site.sym-offset:vals=bytes_req:sort=bytes_req.descending:size=2048 [active]
472 { call_site: [ffffffffa046041c] i915_gem_execbuffer2+0x6c/0x2c0 [i915] } hitcount: 4569 bytes_req: 3163720
473 { call_site: [ffffffffa0489a66] intel_ring_begin+0xc6/0x1f0 [i915] } hitcount: 4569 bytes_req: 657936
474 { call_site: [ffffffffa045e7c4] i915_gem_do_execbuffer.isra.23+0x694/0x1020 [i915] } hitcount: 1519 bytes_req: 472936
475 { call_site: [ffffffffa045e646] i915_gem_do_execbuffer.isra.23+0x516/0x1020 [i915] } hitcount: 3050 bytes_req: 211832
476 { call_site: [ffffffff811e2a1b] seq_buf_alloc+0x1b/0x50 } hitcount: 34 bytes_req: 148384
477 { call_site: [ffffffffa04a580c] intel_crtc_page_flip+0xbc/0x870 [i915] } hitcount: 1385 bytes_req: 144040
478 { call_site: [ffffffff811ae8e1] __kmalloc+0x191/0x1b0 } hitcount: 8 bytes_req: 131072
479 { call_site: [ffffffffa0287592] drm_mode_page_flip_ioctl+0x282/0x360 [drm] } hitcount: 1385 bytes_req: 121880
480 { call_site: [ffffffffa02911f2] drm_modeset_lock_crtc+0x32/0x100 [drm] } hitcount: 1848 bytes_req: 103488
481 { call_site: [ffffffffa04c4a3c] intel_plane_duplicate_state+0x2c/0xa0 [i915] } hitcount: 461 bytes_req: 62696
482 { call_site: [ffffffffa029070e] drm_vma_node_allow+0x2e/0xd0 [drm] } hitcount: 1541 bytes_req: 61640
483 { call_site: [ffffffff815f8d7b] sk_prot_alloc+0xcb/0x1b0 } hitcount: 57 bytes_req: 57456
487 { call_site: [ffffffff8109524a] alloc_fair_sched_group+0x5a/0x1a0 } hitcount: 2 bytes_req: 128
488 { call_site: [ffffffffa027b921] drm_vm_open_locked+0x31/0xa0 [drm] } hitcount: 3 bytes_req: 96
489 { call_site: [ffffffff8122e266] proc_self_follow_link+0x76/0xb0 } hitcount: 8 bytes_req: 96
490 { call_site: [ffffffff81213e80] load_elf_binary+0x240/0x1650 } hitcount: 3 bytes_req: 84
491 { call_site: [ffffffff8154bc62] usb_control_msg+0x42/0x110 } hitcount: 1 bytes_req: 8
492 { call_site: [ffffffffa00bf6fe] hidraw_send_report+0x7e/0x1a0 [hid] } hitcount: 1 bytes_req: 7
493 { call_site: [ffffffffa00bf1ca] hidraw_report_event+0x8a/0x120 [hid] } hitcount: 1 bytes_req: 7
500 We can also add multiple fields to the 'values' parameter. For
501 example, we might want to see the total number of bytes allocated
502 alongside bytes requested, and display the result sorted by bytes
503 allocated in a descending order:
505 # echo 'hist:keys=call_site.sym:values=bytes_req,bytes_alloc:sort=bytes_alloc.descending' > \
506 /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
508 # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
509 # trigger info: hist:keys=call_site.sym:vals=bytes_req,bytes_alloc:sort=bytes_alloc.descending:size=2048 [active]
511 { call_site: [ffffffffa046041c] i915_gem_execbuffer2 [i915] } hitcount: 7403 bytes_req: 4084360 bytes_alloc: 5958016
512 { call_site: [ffffffff811e2a1b] seq_buf_alloc } hitcount: 541 bytes_req: 2213968 bytes_alloc: 2228224
513 { call_site: [ffffffffa0489a66] intel_ring_begin [i915] } hitcount: 7404 bytes_req: 1066176 bytes_alloc: 1421568
514 { call_site: [ffffffffa045e7c4] i915_gem_do_execbuffer.isra.23 [i915] } hitcount: 1565 bytes_req: 557368 bytes_alloc: 1037760
515 { call_site: [ffffffff8125847d] ext4_htree_store_dirent } hitcount: 9557 bytes_req: 595778 bytes_alloc: 695744
516 { call_site: [ffffffffa045e646] i915_gem_do_execbuffer.isra.23 [i915] } hitcount: 5839 bytes_req: 430680 bytes_alloc: 470400
517 { call_site: [ffffffffa04c4a3c] intel_plane_duplicate_state [i915] } hitcount: 2388 bytes_req: 324768 bytes_alloc: 458496
518 { call_site: [ffffffffa02911f2] drm_modeset_lock_crtc [drm] } hitcount: 3911 bytes_req: 219016 bytes_alloc: 250304
519 { call_site: [ffffffff815f8d7b] sk_prot_alloc } hitcount: 235 bytes_req: 236880 bytes_alloc: 240640
520 { call_site: [ffffffff8137e559] sg_kmalloc } hitcount: 557 bytes_req: 169024 bytes_alloc: 221760
521 { call_site: [ffffffffa00b7c06] hid_report_raw_event [hid] } hitcount: 9378 bytes_req: 187548 bytes_alloc: 206312
522 { call_site: [ffffffffa04a580c] intel_crtc_page_flip [i915] } hitcount: 1519 bytes_req: 157976 bytes_alloc: 194432
526 { call_site: [ffffffff8109bd3b] sched_autogroup_create_attach } hitcount: 2 bytes_req: 144 bytes_alloc: 192
527 { call_site: [ffffffff81097ee8] alloc_rt_sched_group } hitcount: 2 bytes_req: 128 bytes_alloc: 128
528 { call_site: [ffffffff8109524a] alloc_fair_sched_group } hitcount: 2 bytes_req: 128 bytes_alloc: 128
529 { call_site: [ffffffff81095225] alloc_fair_sched_group } hitcount: 2 bytes_req: 128 bytes_alloc: 128
530 { call_site: [ffffffff81097ec2] alloc_rt_sched_group } hitcount: 2 bytes_req: 128 bytes_alloc: 128
531 { call_site: [ffffffff81213e80] load_elf_binary } hitcount: 3 bytes_req: 84 bytes_alloc: 96
532 { call_site: [ffffffff81079a2e] kthread_create_on_node } hitcount: 1 bytes_req: 56 bytes_alloc: 64
533 { call_site: [ffffffffa00bf6fe] hidraw_send_report [hid] } hitcount: 1 bytes_req: 7 bytes_alloc: 8
534 { call_site: [ffffffff8154bc62] usb_control_msg } hitcount: 1 bytes_req: 8 bytes_alloc: 8
535 { call_site: [ffffffffa00bf1ca] hidraw_report_event [hid] } hitcount: 1 bytes_req: 7 bytes_alloc: 8
542 Finally, to finish off our kmalloc example, instead of simply having
543 the hist trigger display symbolic call_sites, we can have the hist
544 trigger additionally display the complete set of kernel stack traces
545 that led to each call_site. To do that, we simply use the special
546 value 'stacktrace' for the key parameter:
548 # echo 'hist:keys=stacktrace:values=bytes_req,bytes_alloc:sort=bytes_alloc' > \
549 /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
551 The above trigger will use the kernel stack trace in effect when an
552 event is triggered as the key for the hash table. This allows the
553 enumeration of every kernel callpath that led up to a particular
554 event, along with a running total of any of the event fields for
555 that event. Here we tally bytes requested and bytes allocated for
556 every callpath in the system that led up to a kmalloc (in this case
557 every callpath to a kmalloc for a kernel compile):
559 # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
560 # trigger info: hist:keys=stacktrace:vals=bytes_req,bytes_alloc:sort=bytes_alloc:size=2048 [active]
563 __kmalloc_track_caller+0x10b/0x1a0
565 hidraw_report_event+0x8a/0x120 [hid]
566 hid_report_raw_event+0x3ea/0x440 [hid]
567 hid_input_report+0x112/0x190 [hid]
568 hid_irq_in+0xc2/0x260 [usbhid]
569 __usb_hcd_giveback_urb+0x72/0x120
570 usb_giveback_urb_bh+0x9e/0xe0
571 tasklet_hi_action+0xf8/0x100
572 __do_softirq+0x114/0x2c0
575 ret_from_intr+0x0/0x30
576 cpuidle_enter+0x17/0x20
577 cpu_startup_entry+0x315/0x3e0
579 } hitcount: 3 bytes_req: 21 bytes_alloc: 24
581 __kmalloc_track_caller+0x10b/0x1a0
583 hidraw_report_event+0x8a/0x120 [hid]
584 hid_report_raw_event+0x3ea/0x440 [hid]
585 hid_input_report+0x112/0x190 [hid]
586 hid_irq_in+0xc2/0x260 [usbhid]
587 __usb_hcd_giveback_urb+0x72/0x120
588 usb_giveback_urb_bh+0x9e/0xe0
589 tasklet_hi_action+0xf8/0x100
590 __do_softirq+0x114/0x2c0
593 ret_from_intr+0x0/0x30
594 } hitcount: 3 bytes_req: 21 bytes_alloc: 24
596 kmem_cache_alloc_trace+0xeb/0x150
597 aa_alloc_task_context+0x27/0x40
598 apparmor_cred_prepare+0x1f/0x50
599 security_prepare_creds+0x16/0x20
600 prepare_creds+0xdf/0x1a0
601 SyS_capset+0xb5/0x200
602 system_call_fastpath+0x12/0x6a
603 } hitcount: 1 bytes_req: 32 bytes_alloc: 32
608 __kmalloc+0x11b/0x1b0
609 i915_gem_execbuffer2+0x6c/0x2c0 [i915]
610 drm_ioctl+0x349/0x670 [drm]
611 do_vfs_ioctl+0x2f0/0x4f0
613 system_call_fastpath+0x12/0x6a
614 } hitcount: 17726 bytes_req: 13944120 bytes_alloc: 19593808
616 __kmalloc+0x11b/0x1b0
617 load_elf_phdrs+0x76/0xa0
618 load_elf_binary+0x102/0x1650
619 search_binary_handler+0x97/0x1d0
620 do_execveat_common.isra.34+0x551/0x6e0
622 return_from_execve+0x0/0x23
623 } hitcount: 33348 bytes_req: 17152128 bytes_alloc: 20226048
625 kmem_cache_alloc_trace+0xeb/0x150
626 apparmor_file_alloc_security+0x27/0x40
627 security_file_alloc+0x16/0x20
628 get_empty_filp+0x93/0x1c0
629 path_openat+0x31/0x5f0
630 do_filp_open+0x3a/0x90
631 do_sys_open+0x128/0x220
633 system_call_fastpath+0x12/0x6a
634 } hitcount: 4766422 bytes_req: 9532844 bytes_alloc: 38131376
636 __kmalloc+0x11b/0x1b0
637 seq_buf_alloc+0x1b/0x50
639 proc_reg_read+0x3d/0x80
643 system_call_fastpath+0x12/0x6a
644 } hitcount: 19133 bytes_req: 78368768 bytes_alloc: 78368768
651 If you key a hist trigger on common_pid, in order for example to
652 gather and display sorted totals for each process, you can use the
653 special .execname modifier to display the executable names for the
654 processes in the table rather than raw pids. The example below
655 keeps a per-process sum of total bytes read:
657 # echo 'hist:key=common_pid.execname:val=count:sort=count.descending' > \
658 /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
660 # cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/hist
661 # trigger info: hist:keys=common_pid.execname:vals=count:sort=count.descending:size=2048 [active]
663 { common_pid: gnome-terminal [ 3196] } hitcount: 280 count: 1093512
664 { common_pid: Xorg [ 1309] } hitcount: 525 count: 256640
665 { common_pid: compiz [ 2889] } hitcount: 59 count: 254400
666 { common_pid: bash [ 8710] } hitcount: 3 count: 66369
667 { common_pid: dbus-daemon-lau [ 8703] } hitcount: 49 count: 47739
668 { common_pid: irqbalance [ 1252] } hitcount: 27 count: 27648
669 { common_pid: 01ifupdown [ 8705] } hitcount: 3 count: 17216
670 { common_pid: dbus-daemon [ 772] } hitcount: 10 count: 12396
671 { common_pid: Socket Thread [ 8342] } hitcount: 11 count: 11264
672 { common_pid: nm-dhcp-client. [ 8701] } hitcount: 6 count: 7424
673 { common_pid: gmain [ 1315] } hitcount: 18 count: 6336
677 { common_pid: postgres [ 1892] } hitcount: 2 count: 32
678 { common_pid: postgres [ 1891] } hitcount: 2 count: 32
679 { common_pid: gmain [ 8704] } hitcount: 2 count: 32
680 { common_pid: upstart-dbus-br [ 2740] } hitcount: 21 count: 21
681 { common_pid: nm-dispatcher.a [ 8696] } hitcount: 1 count: 16
682 { common_pid: indicator-datet [ 2904] } hitcount: 1 count: 16
683 { common_pid: gdbus [ 2998] } hitcount: 1 count: 16
684 { common_pid: rtkit-daemon [ 2052] } hitcount: 1 count: 8
685 { common_pid: init [ 1] } hitcount: 2 count: 2
692 Similarly, if you key a hist trigger on syscall id, for example to
693 gather and display a list of systemwide syscall hits, you can use
694 the special .syscall modifier to display the syscall names rather
695 than raw ids. The example below keeps a running total of syscall
696 counts for the system during the run:
698 # echo 'hist:key=id.syscall:val=hitcount' > \
699 /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger
701 # cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist
702 # trigger info: hist:keys=id.syscall:vals=hitcount:sort=hitcount:size=2048 [active]
704 { id: sys_fsync [ 74] } hitcount: 1
705 { id: sys_newuname [ 63] } hitcount: 1
706 { id: sys_prctl [157] } hitcount: 1
707 { id: sys_statfs [137] } hitcount: 1
708 { id: sys_symlink [ 88] } hitcount: 1
709 { id: sys_sendmmsg [307] } hitcount: 1
710 { id: sys_semctl [ 66] } hitcount: 1
711 { id: sys_readlink [ 89] } hitcount: 3
712 { id: sys_bind [ 49] } hitcount: 3
713 { id: sys_getsockname [ 51] } hitcount: 3
714 { id: sys_unlink [ 87] } hitcount: 3
715 { id: sys_rename [ 82] } hitcount: 4
716 { id: unknown_syscall [ 58] } hitcount: 4
717 { id: sys_connect [ 42] } hitcount: 4
718 { id: sys_getpid [ 39] } hitcount: 4
722 { id: sys_rt_sigprocmask [ 14] } hitcount: 952
723 { id: sys_futex [202] } hitcount: 1534
724 { id: sys_write [ 1] } hitcount: 2689
725 { id: sys_setitimer [ 38] } hitcount: 2797
726 { id: sys_read [ 0] } hitcount: 3202
727 { id: sys_select [ 23] } hitcount: 3773
728 { id: sys_writev [ 20] } hitcount: 4531
729 { id: sys_poll [ 7] } hitcount: 8314
730 { id: sys_recvmsg [ 47] } hitcount: 13738
731 { id: sys_ioctl [ 16] } hitcount: 21843
738 The syscall counts above provide a rough overall picture of system
739 call activity on the system; we can see for example that the most
740 popular system call on this system was the 'sys_ioctl' system call.
742 We can use 'compound' keys to refine that number and provide some
743 further insight as to which processes exactly contribute to the
746 The command below keeps a hitcount for every unique combination of
747 system call id and pid - the end result is essentially a table
748 that keeps a per-pid sum of system call hits. The results are
749 sorted using the system call id as the primary key, and the
750 hitcount sum as the secondary key:
752 # echo 'hist:key=id.syscall,common_pid.execname:val=hitcount:sort=id,hitcount' > \
753 /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger
755 # cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist
756 # trigger info: hist:keys=id.syscall,common_pid.execname:vals=hitcount:sort=id.syscall,hitcount:size=2048 [active]
758 { id: sys_read [ 0], common_pid: rtkit-daemon [ 1877] } hitcount: 1
759 { id: sys_read [ 0], common_pid: gdbus [ 2976] } hitcount: 1
760 { id: sys_read [ 0], common_pid: console-kit-dae [ 3400] } hitcount: 1
761 { id: sys_read [ 0], common_pid: postgres [ 1865] } hitcount: 1
762 { id: sys_read [ 0], common_pid: deja-dup-monito [ 3543] } hitcount: 2
763 { id: sys_read [ 0], common_pid: NetworkManager [ 890] } hitcount: 2
764 { id: sys_read [ 0], common_pid: evolution-calen [ 3048] } hitcount: 2
765 { id: sys_read [ 0], common_pid: postgres [ 1864] } hitcount: 2
766 { id: sys_read [ 0], common_pid: nm-applet [ 3022] } hitcount: 2
767 { id: sys_read [ 0], common_pid: whoopsie [ 1212] } hitcount: 2
771 { id: sys_ioctl [ 16], common_pid: bash [ 8479] } hitcount: 1
772 { id: sys_ioctl [ 16], common_pid: bash [ 3472] } hitcount: 12
773 { id: sys_ioctl [ 16], common_pid: gnome-terminal [ 3199] } hitcount: 16
774 { id: sys_ioctl [ 16], common_pid: Xorg [ 1267] } hitcount: 1808
775 { id: sys_ioctl [ 16], common_pid: compiz [ 2994] } hitcount: 5580
779 { id: sys_waitid [247], common_pid: upstart-dbus-br [ 2690] } hitcount: 3
780 { id: sys_waitid [247], common_pid: upstart-dbus-br [ 2688] } hitcount: 16
781 { id: sys_inotify_add_watch [254], common_pid: gmain [ 975] } hitcount: 2
782 { id: sys_inotify_add_watch [254], common_pid: gmain [ 3204] } hitcount: 4
783 { id: sys_inotify_add_watch [254], common_pid: gmain [ 2888] } hitcount: 4
784 { id: sys_inotify_add_watch [254], common_pid: gmain [ 3003] } hitcount: 4
785 { id: sys_inotify_add_watch [254], common_pid: gmain [ 2873] } hitcount: 4
786 { id: sys_inotify_add_watch [254], common_pid: gmain [ 3196] } hitcount: 6
787 { id: sys_openat [257], common_pid: java [ 2623] } hitcount: 2
788 { id: sys_eventfd2 [290], common_pid: ibus-ui-gtk3 [ 2760] } hitcount: 4
789 { id: sys_eventfd2 [290], common_pid: compiz [ 2994] } hitcount: 6
796 The above list does give us a breakdown of the ioctl syscall by
797 pid, but it also gives us quite a bit more than that, which we
798 don't really care about at the moment. Since we know the syscall
799 id for sys_ioctl (16, displayed next to the sys_ioctl name), we
800 can use that to filter out all the other syscalls:
802 # echo 'hist:key=id.syscall,common_pid.execname:val=hitcount:sort=id,hitcount if id == 16' > \
803 /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger
805 # cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist
806 # trigger info: hist:keys=id.syscall,common_pid.execname:vals=hitcount:sort=id.syscall,hitcount:size=2048 if id == 16 [active]
808 { id: sys_ioctl [ 16], common_pid: gmain [ 2769] } hitcount: 1
809 { id: sys_ioctl [ 16], common_pid: evolution-addre [ 8571] } hitcount: 1
810 { id: sys_ioctl [ 16], common_pid: gmain [ 3003] } hitcount: 1
811 { id: sys_ioctl [ 16], common_pid: gmain [ 2781] } hitcount: 1
812 { id: sys_ioctl [ 16], common_pid: gmain [ 2829] } hitcount: 1
813 { id: sys_ioctl [ 16], common_pid: bash [ 8726] } hitcount: 1
814 { id: sys_ioctl [ 16], common_pid: bash [ 8508] } hitcount: 1
815 { id: sys_ioctl [ 16], common_pid: gmain [ 2970] } hitcount: 1
816 { id: sys_ioctl [ 16], common_pid: gmain [ 2768] } hitcount: 1
820 { id: sys_ioctl [ 16], common_pid: pool [ 8559] } hitcount: 45
821 { id: sys_ioctl [ 16], common_pid: pool [ 8555] } hitcount: 48
822 { id: sys_ioctl [ 16], common_pid: pool [ 8551] } hitcount: 48
823 { id: sys_ioctl [ 16], common_pid: avahi-daemon [ 896] } hitcount: 66
824 { id: sys_ioctl [ 16], common_pid: Xorg [ 1267] } hitcount: 26674
825 { id: sys_ioctl [ 16], common_pid: compiz [ 2994] } hitcount: 73443
832 The above output shows that 'compiz' and 'Xorg' are far and away
833 the heaviest ioctl callers (which might lead to questions about
834 whether they really need to be making all those calls and to
835 possible avenues for further investigation.)
837 The compound key examples used a key and a sum value (hitcount) to
838 sort the output, but we can just as easily use two keys instead.
839 Here's an example where we use a compound key composed of the the
840 common_pid and size event fields. Sorting with pid as the primary
841 key and 'size' as the secondary key allows us to display an
842 ordered summary of the recvfrom sizes, with counts, received by
845 # echo 'hist:key=common_pid.execname,size:val=hitcount:sort=common_pid,size' > \
846 /sys/kernel/debug/tracing/events/syscalls/sys_enter_recvfrom/trigger
848 # cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_recvfrom/hist
849 # trigger info: hist:keys=common_pid.execname,size:vals=hitcount:sort=common_pid.execname,size:size=2048 [active]
851 { common_pid: smbd [ 784], size: 4 } hitcount: 1
852 { common_pid: dnsmasq [ 1412], size: 4096 } hitcount: 672
853 { common_pid: postgres [ 1796], size: 1000 } hitcount: 6
854 { common_pid: postgres [ 1867], size: 1000 } hitcount: 10
855 { common_pid: bamfdaemon [ 2787], size: 28 } hitcount: 2
856 { common_pid: bamfdaemon [ 2787], size: 14360 } hitcount: 1
857 { common_pid: compiz [ 2994], size: 8 } hitcount: 1
858 { common_pid: compiz [ 2994], size: 20 } hitcount: 11
859 { common_pid: gnome-terminal [ 3199], size: 4 } hitcount: 2
860 { common_pid: firefox [ 8817], size: 4 } hitcount: 1
861 { common_pid: firefox [ 8817], size: 8 } hitcount: 5
862 { common_pid: firefox [ 8817], size: 588 } hitcount: 2
863 { common_pid: firefox [ 8817], size: 628 } hitcount: 1
864 { common_pid: firefox [ 8817], size: 6944 } hitcount: 1
865 { common_pid: firefox [ 8817], size: 408880 } hitcount: 2
866 { common_pid: firefox [ 8822], size: 8 } hitcount: 2
867 { common_pid: firefox [ 8822], size: 160 } hitcount: 2
868 { common_pid: firefox [ 8822], size: 320 } hitcount: 2
869 { common_pid: firefox [ 8822], size: 352 } hitcount: 1
873 { common_pid: pool [ 8923], size: 1960 } hitcount: 10
874 { common_pid: pool [ 8923], size: 2048 } hitcount: 10
875 { common_pid: pool [ 8924], size: 1960 } hitcount: 10
876 { common_pid: pool [ 8924], size: 2048 } hitcount: 10
877 { common_pid: pool [ 8928], size: 1964 } hitcount: 4
878 { common_pid: pool [ 8928], size: 1965 } hitcount: 2
879 { common_pid: pool [ 8928], size: 2048 } hitcount: 6
880 { common_pid: pool [ 8929], size: 1982 } hitcount: 1
881 { common_pid: pool [ 8929], size: 2048 } hitcount: 1
888 The above example also illustrates the fact that although a compound
889 key is treated as a single entity for hashing purposes, the sub-keys
890 it's composed of can be accessed independently.
892 The next example uses a string field as the hash key and
893 demonstrates how you can manually pause and continue a hist trigger.
894 In this example, we'll aggregate fork counts and don't expect a
895 large number of entries in the hash table, so we'll drop it to a
896 much smaller number, say 256:
898 # echo 'hist:key=child_comm:val=hitcount:size=256' > \
899 /sys/kernel/debug/tracing/events/sched/sched_process_fork/trigger
901 # cat /sys/kernel/debug/tracing/events/sched/sched_process_fork/hist
902 # trigger info: hist:keys=child_comm:vals=hitcount:sort=hitcount:size=256 [active]
904 { child_comm: dconf worker } hitcount: 1
905 { child_comm: ibus-daemon } hitcount: 1
906 { child_comm: whoopsie } hitcount: 1
907 { child_comm: smbd } hitcount: 1
908 { child_comm: gdbus } hitcount: 1
909 { child_comm: kthreadd } hitcount: 1
910 { child_comm: dconf worker } hitcount: 1
911 { child_comm: evolution-alarm } hitcount: 2
912 { child_comm: Socket Thread } hitcount: 2
913 { child_comm: postgres } hitcount: 2
914 { child_comm: bash } hitcount: 3
915 { child_comm: compiz } hitcount: 3
916 { child_comm: evolution-sourc } hitcount: 4
917 { child_comm: dhclient } hitcount: 4
918 { child_comm: pool } hitcount: 5
919 { child_comm: nm-dispatcher.a } hitcount: 8
920 { child_comm: firefox } hitcount: 8
921 { child_comm: dbus-daemon } hitcount: 8
922 { child_comm: glib-pacrunner } hitcount: 10
923 { child_comm: evolution } hitcount: 23
930 If we want to pause the hist trigger, we can simply append :pause to
931 the command that started the trigger. Notice that the trigger info
932 displays as [paused]:
934 # echo 'hist:key=child_comm:val=hitcount:size=256:pause' >> \
935 /sys/kernel/debug/tracing/events/sched/sched_process_fork/trigger
937 # cat /sys/kernel/debug/tracing/events/sched/sched_process_fork/hist
938 # trigger info: hist:keys=child_comm:vals=hitcount:sort=hitcount:size=256 [paused]
940 { child_comm: dconf worker } hitcount: 1
941 { child_comm: kthreadd } hitcount: 1
942 { child_comm: dconf worker } hitcount: 1
943 { child_comm: gdbus } hitcount: 1
944 { child_comm: ibus-daemon } hitcount: 1
945 { child_comm: Socket Thread } hitcount: 2
946 { child_comm: evolution-alarm } hitcount: 2
947 { child_comm: smbd } hitcount: 2
948 { child_comm: bash } hitcount: 3
949 { child_comm: whoopsie } hitcount: 3
950 { child_comm: compiz } hitcount: 3
951 { child_comm: evolution-sourc } hitcount: 4
952 { child_comm: pool } hitcount: 5
953 { child_comm: postgres } hitcount: 6
954 { child_comm: firefox } hitcount: 8
955 { child_comm: dhclient } hitcount: 10
956 { child_comm: emacs } hitcount: 12
957 { child_comm: dbus-daemon } hitcount: 20
958 { child_comm: nm-dispatcher.a } hitcount: 20
959 { child_comm: evolution } hitcount: 35
960 { child_comm: glib-pacrunner } hitcount: 59
967 To manually continue having the trigger aggregate events, append
968 :cont instead. Notice that the trigger info displays as [active]
969 again, and the data has changed:
971 # echo 'hist:key=child_comm:val=hitcount:size=256:cont' >> \
972 /sys/kernel/debug/tracing/events/sched/sched_process_fork/trigger
974 # cat /sys/kernel/debug/tracing/events/sched/sched_process_fork/hist
975 # trigger info: hist:keys=child_comm:vals=hitcount:sort=hitcount:size=256 [active]
977 { child_comm: dconf worker } hitcount: 1
978 { child_comm: dconf worker } hitcount: 1
979 { child_comm: kthreadd } hitcount: 1
980 { child_comm: gdbus } hitcount: 1
981 { child_comm: ibus-daemon } hitcount: 1
982 { child_comm: Socket Thread } hitcount: 2
983 { child_comm: evolution-alarm } hitcount: 2
984 { child_comm: smbd } hitcount: 2
985 { child_comm: whoopsie } hitcount: 3
986 { child_comm: compiz } hitcount: 3
987 { child_comm: evolution-sourc } hitcount: 4
988 { child_comm: bash } hitcount: 5
989 { child_comm: pool } hitcount: 5
990 { child_comm: postgres } hitcount: 6
991 { child_comm: firefox } hitcount: 8
992 { child_comm: dhclient } hitcount: 11
993 { child_comm: emacs } hitcount: 12
994 { child_comm: dbus-daemon } hitcount: 22
995 { child_comm: nm-dispatcher.a } hitcount: 22
996 { child_comm: evolution } hitcount: 35
997 { child_comm: glib-pacrunner } hitcount: 59
1004 The previous example showed how to start and stop a hist trigger by
1005 appending 'pause' and 'continue' to the hist trigger command. A
1006 hist trigger can also be started in a paused state by initially
1007 starting the trigger with ':pause' appended. This allows you to
1008 start the trigger only when you're ready to start collecting data
1009 and not before. For example, you could start the trigger in a
1010 paused state, then unpause it and do something you want to measure,
1011 then pause the trigger again when done.
1013 Of course, doing this manually can be difficult and error-prone, but
1014 it is possible to automatically start and stop a hist trigger based
1015 on some condition, via the enable_hist and disable_hist triggers.
1017 For example, suppose we wanted to take a look at the relative
1018 weights in terms of skb length for each callpath that leads to a
1019 netif_receieve_skb event when downloading a decent-sized file using
1022 First we set up an initially paused stacktrace trigger on the
1023 netif_receive_skb event:
1025 # echo 'hist:key=stacktrace:vals=len:pause' > \
1026 /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
1028 Next, we set up an 'enable_hist' trigger on the sched_process_exec
1029 event, with an 'if filename==/usr/bin/wget' filter. The effect of
1030 this new trigger is that it will 'unpause' the hist trigger we just
1031 set up on netif_receive_skb if and only if it sees a
1032 sched_process_exec event with a filename of '/usr/bin/wget'. When
1033 that happens, all netif_receive_skb events are aggregated into a
1034 hash table keyed on stacktrace:
1036 # echo 'enable_hist:net:netif_receive_skb if filename==/usr/bin/wget' > \
1037 /sys/kernel/debug/tracing/events/sched/sched_process_exec/trigger
1039 The aggregation continues until the netif_receive_skb is paused
1040 again, which is what the following disable_hist event does by
1041 creating a similar setup on the sched_process_exit event, using the
1042 filter 'comm==wget':
1044 # echo 'disable_hist:net:netif_receive_skb if comm==wget' > \
1045 /sys/kernel/debug/tracing/events/sched/sched_process_exit/trigger
1047 Whenever a process exits and the comm field of the disable_hist
1048 trigger filter matches 'comm==wget', the netif_receive_skb hist
1049 trigger is disabled.
1051 The overall effect is that netif_receive_skb events are aggregated
1052 into the hash table for only the duration of the wget. Executing a
1053 wget command and then listing the 'hist' file will display the
1054 output generated by the wget command:
1056 $ wget https://www.kernel.org/pub/linux/kernel/v3.x/patch-3.19.xz
1058 # cat /sys/kernel/debug/tracing/events/net/netif_receive_skb/hist
1059 # trigger info: hist:keys=stacktrace:vals=len:sort=hitcount:size=2048 [paused]
1062 __netif_receive_skb_core+0x46d/0x990
1063 __netif_receive_skb+0x18/0x60
1064 netif_receive_skb_internal+0x23/0x90
1065 napi_gro_receive+0xc8/0x100
1066 ieee80211_deliver_skb+0xd6/0x270 [mac80211]
1067 ieee80211_rx_handlers+0xccf/0x22f0 [mac80211]
1068 ieee80211_prepare_and_rx_handle+0x4e7/0xc40 [mac80211]
1069 ieee80211_rx+0x31d/0x900 [mac80211]
1070 iwlagn_rx_reply_rx+0x3db/0x6f0 [iwldvm]
1071 iwl_rx_dispatch+0x8e/0xf0 [iwldvm]
1072 iwl_pcie_irq_handler+0xe3c/0x12f0 [iwlwifi]
1073 irq_thread_fn+0x20/0x50
1074 irq_thread+0x11f/0x150
1076 ret_from_fork+0x42/0x70
1077 } hitcount: 85 len: 28884
1079 __netif_receive_skb_core+0x46d/0x990
1080 __netif_receive_skb+0x18/0x60
1081 netif_receive_skb_internal+0x23/0x90
1082 napi_gro_complete+0xa4/0xe0
1083 dev_gro_receive+0x23a/0x360
1084 napi_gro_receive+0x30/0x100
1085 ieee80211_deliver_skb+0xd6/0x270 [mac80211]
1086 ieee80211_rx_handlers+0xccf/0x22f0 [mac80211]
1087 ieee80211_prepare_and_rx_handle+0x4e7/0xc40 [mac80211]
1088 ieee80211_rx+0x31d/0x900 [mac80211]
1089 iwlagn_rx_reply_rx+0x3db/0x6f0 [iwldvm]
1090 iwl_rx_dispatch+0x8e/0xf0 [iwldvm]
1091 iwl_pcie_irq_handler+0xe3c/0x12f0 [iwlwifi]
1092 irq_thread_fn+0x20/0x50
1093 irq_thread+0x11f/0x150
1095 } hitcount: 98 len: 664329
1097 __netif_receive_skb_core+0x46d/0x990
1098 __netif_receive_skb+0x18/0x60
1099 process_backlog+0xa8/0x150
1100 net_rx_action+0x15d/0x340
1101 __do_softirq+0x114/0x2c0
1102 do_softirq_own_stack+0x1c/0x30
1103 do_softirq+0x65/0x70
1104 __local_bh_enable_ip+0xb5/0xc0
1105 ip_finish_output+0x1f4/0x840
1107 ip_local_out_sk+0x31/0x40
1108 ip_send_skb+0x1a/0x50
1109 udp_send_skb+0x173/0x2a0
1110 udp_sendmsg+0x2bf/0x9f0
1111 inet_sendmsg+0x64/0xa0
1112 sock_sendmsg+0x3d/0x50
1113 } hitcount: 115 len: 13030
1115 __netif_receive_skb_core+0x46d/0x990
1116 __netif_receive_skb+0x18/0x60
1117 netif_receive_skb_internal+0x23/0x90
1118 napi_gro_complete+0xa4/0xe0
1119 napi_gro_flush+0x6d/0x90
1120 iwl_pcie_irq_handler+0x92a/0x12f0 [iwlwifi]
1121 irq_thread_fn+0x20/0x50
1122 irq_thread+0x11f/0x150
1124 ret_from_fork+0x42/0x70
1125 } hitcount: 934 len: 5512212
1132 The above shows all the netif_receive_skb callpaths and their total
1133 lengths for the duration of the wget command.
1135 The 'clear' hist trigger param can be used to clear the hash table.
1136 Suppose we wanted to try another run of the previous example but
1137 this time also wanted to see the complete list of events that went
1138 into the histogram. In order to avoid having to set everything up
1139 again, we can just clear the histogram first:
1141 # echo 'hist:key=stacktrace:vals=len:clear' >> \
1142 /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
1144 Just to verify that it is in fact cleared, here's what we now see in
1147 # cat /sys/kernel/debug/tracing/events/net/netif_receive_skb/hist
1148 # trigger info: hist:keys=stacktrace:vals=len:sort=hitcount:size=2048 [paused]
1155 Since we want to see the detailed list of every netif_receive_skb
1156 event occurring during the new run, which are in fact the same
1157 events being aggregated into the hash table, we add some additional
1158 'enable_event' events to the triggering sched_process_exec and
1159 sched_process_exit events as such:
1161 # echo 'enable_event:net:netif_receive_skb if filename==/usr/bin/wget' > \
1162 /sys/kernel/debug/tracing/events/sched/sched_process_exec/trigger
1164 # echo 'disable_event:net:netif_receive_skb if comm==wget' > \
1165 /sys/kernel/debug/tracing/events/sched/sched_process_exit/trigger
1167 If you read the trigger files for the sched_process_exec and
1168 sched_process_exit triggers, you should see two triggers for each:
1169 one enabling/disabling the hist aggregation and the other
1170 enabling/disabling the logging of events:
1172 # cat /sys/kernel/debug/tracing/events/sched/sched_process_exec/trigger
1173 enable_event:net:netif_receive_skb:unlimited if filename==/usr/bin/wget
1174 enable_hist:net:netif_receive_skb:unlimited if filename==/usr/bin/wget
1176 # cat /sys/kernel/debug/tracing/events/sched/sched_process_exit/trigger
1177 enable_event:net:netif_receive_skb:unlimited if comm==wget
1178 disable_hist:net:netif_receive_skb:unlimited if comm==wget
1180 In other words, whenever either of the sched_process_exec or
1181 sched_process_exit events is hit and matches 'wget', it enables or
1182 disables both the histogram and the event log, and what you end up
1183 with is a hash table and set of events just covering the specified
1184 duration. Run the wget command again:
1186 $ wget https://www.kernel.org/pub/linux/kernel/v3.x/patch-3.19.xz
1188 Displaying the 'hist' file should show something similar to what you
1189 saw in the last run, but this time you should also see the
1190 individual events in the trace file:
1192 # cat /sys/kernel/debug/tracing/trace
1196 # entries-in-buffer/entries-written: 183/1426 #P:4
1199 # / _----=> need-resched
1200 # | / _---=> hardirq/softirq
1201 # || / _--=> preempt-depth
1203 # TASK-PID CPU# |||| TIMESTAMP FUNCTION
1205 wget-15108 [000] ..s1 31769.606929: netif_receive_skb: dev=lo skbaddr=ffff88009c353100 len=60
1206 wget-15108 [000] ..s1 31769.606999: netif_receive_skb: dev=lo skbaddr=ffff88009c353200 len=60
1207 dnsmasq-1382 [000] ..s1 31769.677652: netif_receive_skb: dev=lo skbaddr=ffff88009c352b00 len=130
1208 dnsmasq-1382 [000] ..s1 31769.685917: netif_receive_skb: dev=lo skbaddr=ffff88009c352200 len=138
1209 ##### CPU 2 buffer started ####
1210 irq/29-iwlwifi-559 [002] ..s. 31772.031529: netif_receive_skb: dev=wlan0 skbaddr=ffff88009d433d00 len=2948
1211 irq/29-iwlwifi-559 [002] ..s. 31772.031572: netif_receive_skb: dev=wlan0 skbaddr=ffff88009d432200 len=1500
1212 irq/29-iwlwifi-559 [002] ..s. 31772.032196: netif_receive_skb: dev=wlan0 skbaddr=ffff88009d433100 len=2948
1213 irq/29-iwlwifi-559 [002] ..s. 31772.032761: netif_receive_skb: dev=wlan0 skbaddr=ffff88009d433000 len=2948
1214 irq/29-iwlwifi-559 [002] ..s. 31772.033220: netif_receive_skb: dev=wlan0 skbaddr=ffff88009d432e00 len=1500
1219 The following example demonstrates how multiple hist triggers can be
1220 attached to a given event. This capability can be useful for
1221 creating a set of different summaries derived from the same set of
1222 events, or for comparing the effects of different filters, among
1225 # echo 'hist:keys=skbaddr.hex:vals=len if len < 0' >> \
1226 /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
1227 # echo 'hist:keys=skbaddr.hex:vals=len if len > 4096' >> \
1228 /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
1229 # echo 'hist:keys=skbaddr.hex:vals=len if len == 256' >> \
1230 /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
1231 # echo 'hist:keys=skbaddr.hex:vals=len' >> \
1232 /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
1233 # echo 'hist:keys=len:vals=common_preempt_count' >> \
1234 /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
1236 The above set of commands create four triggers differing only in
1237 their filters, along with a completely different though fairly
1238 nonsensical trigger. Note that in order to append multiple hist
1239 triggers to the same file, you should use the '>>' operator to
1240 append them ('>' will also add the new hist trigger, but will remove
1241 any existing hist triggers beforehand).
1243 Displaying the contents of the 'hist' file for the event shows the
1244 contents of all five histograms:
1246 # cat /sys/kernel/debug/tracing/events/net/netif_receive_skb/hist
1250 # trigger info: hist:keys=len:vals=hitcount,common_preempt_count:sort=hitcount:size=2048 [active]
1253 { len: 176 } hitcount: 1 common_preempt_count: 0
1254 { len: 223 } hitcount: 1 common_preempt_count: 0
1255 { len: 4854 } hitcount: 1 common_preempt_count: 0
1256 { len: 395 } hitcount: 1 common_preempt_count: 0
1257 { len: 177 } hitcount: 1 common_preempt_count: 0
1258 { len: 446 } hitcount: 1 common_preempt_count: 0
1259 { len: 1601 } hitcount: 1 common_preempt_count: 0
1263 { len: 1280 } hitcount: 66 common_preempt_count: 0
1264 { len: 116 } hitcount: 81 common_preempt_count: 40
1265 { len: 708 } hitcount: 112 common_preempt_count: 0
1266 { len: 46 } hitcount: 221 common_preempt_count: 0
1267 { len: 1264 } hitcount: 458 common_preempt_count: 0
1277 # trigger info: hist:keys=skbaddr.hex:vals=hitcount,len:sort=hitcount:size=2048 [active]
1280 { skbaddr: ffff8800baee5e00 } hitcount: 1 len: 130
1281 { skbaddr: ffff88005f3d5600 } hitcount: 1 len: 1280
1282 { skbaddr: ffff88005f3d4900 } hitcount: 1 len: 1280
1283 { skbaddr: ffff88009fed6300 } hitcount: 1 len: 115
1284 { skbaddr: ffff88009fe0ad00 } hitcount: 1 len: 115
1285 { skbaddr: ffff88008cdb1900 } hitcount: 1 len: 46
1286 { skbaddr: ffff880064b5ef00 } hitcount: 1 len: 118
1287 { skbaddr: ffff880044e3c700 } hitcount: 1 len: 60
1288 { skbaddr: ffff880100065900 } hitcount: 1 len: 46
1289 { skbaddr: ffff8800d46bd500 } hitcount: 1 len: 116
1290 { skbaddr: ffff88005f3d5f00 } hitcount: 1 len: 1280
1291 { skbaddr: ffff880100064700 } hitcount: 1 len: 365
1292 { skbaddr: ffff8800badb6f00 } hitcount: 1 len: 60
1296 { skbaddr: ffff88009fe0be00 } hitcount: 27 len: 24677
1297 { skbaddr: ffff88009fe0a400 } hitcount: 27 len: 23052
1298 { skbaddr: ffff88009fe0b700 } hitcount: 31 len: 25589
1299 { skbaddr: ffff88009fe0b600 } hitcount: 32 len: 27326
1300 { skbaddr: ffff88006a462800 } hitcount: 68 len: 71678
1301 { skbaddr: ffff88006a463700 } hitcount: 70 len: 72678
1302 { skbaddr: ffff88006a462b00 } hitcount: 71 len: 77589
1303 { skbaddr: ffff88006a463600 } hitcount: 73 len: 71307
1304 { skbaddr: ffff88006a462200 } hitcount: 81 len: 81032
1314 # trigger info: hist:keys=skbaddr.hex:vals=hitcount,len:sort=hitcount:size=2048 if len == 256 [active]
1326 # trigger info: hist:keys=skbaddr.hex:vals=hitcount,len:sort=hitcount:size=2048 if len > 4096 [active]
1329 { skbaddr: ffff88009fd2c300 } hitcount: 1 len: 7212
1330 { skbaddr: ffff8800d2bcce00 } hitcount: 1 len: 7212
1331 { skbaddr: ffff8800d2bcd700 } hitcount: 1 len: 7212
1332 { skbaddr: ffff8800d2bcda00 } hitcount: 1 len: 21492
1333 { skbaddr: ffff8800ae2e2d00 } hitcount: 1 len: 7212
1334 { skbaddr: ffff8800d2bcdb00 } hitcount: 1 len: 7212
1335 { skbaddr: ffff88006a4df500 } hitcount: 1 len: 4854
1336 { skbaddr: ffff88008ce47b00 } hitcount: 1 len: 18636
1337 { skbaddr: ffff8800ae2e2200 } hitcount: 1 len: 12924
1338 { skbaddr: ffff88005f3e1000 } hitcount: 1 len: 4356
1339 { skbaddr: ffff8800d2bcdc00 } hitcount: 2 len: 24420
1340 { skbaddr: ffff8800d2bcc200 } hitcount: 2 len: 12996
1350 # trigger info: hist:keys=skbaddr.hex:vals=hitcount,len:sort=hitcount:size=2048 if len < 0 [active]
1359 Named triggers can be used to have triggers share a common set of
1360 histogram data. This capability is mostly useful for combining the
1361 output of events generated by tracepoints contained inside inline
1362 functions, but names can be used in a hist trigger on any event.
1363 For example, these two triggers when hit will update the same 'len'
1364 field in the shared 'foo' histogram data:
1366 # echo 'hist:name=foo:keys=skbaddr.hex:vals=len' > \
1367 /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
1368 # echo 'hist:name=foo:keys=skbaddr.hex:vals=len' > \
1369 /sys/kernel/debug/tracing/events/net/netif_rx/trigger
1371 You can see that they're updating common histogram data by reading
1372 each event's hist files at the same time:
1374 # cat /sys/kernel/debug/tracing/events/net/netif_receive_skb/hist;
1375 cat /sys/kernel/debug/tracing/events/net/netif_rx/hist
1379 # trigger info: hist:name=foo:keys=skbaddr.hex:vals=hitcount,len:sort=hitcount:size=2048 [active]
1382 { skbaddr: ffff88000ad53500 } hitcount: 1 len: 46
1383 { skbaddr: ffff8800af5a1500 } hitcount: 1 len: 76
1384 { skbaddr: ffff8800d62a1900 } hitcount: 1 len: 46
1385 { skbaddr: ffff8800d2bccb00 } hitcount: 1 len: 468
1386 { skbaddr: ffff8800d3c69900 } hitcount: 1 len: 46
1387 { skbaddr: ffff88009ff09100 } hitcount: 1 len: 52
1388 { skbaddr: ffff88010f13ab00 } hitcount: 1 len: 168
1389 { skbaddr: ffff88006a54f400 } hitcount: 1 len: 46
1390 { skbaddr: ffff8800d2bcc500 } hitcount: 1 len: 260
1391 { skbaddr: ffff880064505000 } hitcount: 1 len: 46
1392 { skbaddr: ffff8800baf24e00 } hitcount: 1 len: 32
1393 { skbaddr: ffff88009fe0ad00 } hitcount: 1 len: 46
1394 { skbaddr: ffff8800d3edff00 } hitcount: 1 len: 44
1395 { skbaddr: ffff88009fe0b400 } hitcount: 1 len: 168
1396 { skbaddr: ffff8800a1c55a00 } hitcount: 1 len: 40
1397 { skbaddr: ffff8800d2bcd100 } hitcount: 1 len: 40
1398 { skbaddr: ffff880064505f00 } hitcount: 1 len: 174
1399 { skbaddr: ffff8800a8bff200 } hitcount: 1 len: 160
1400 { skbaddr: ffff880044e3cc00 } hitcount: 1 len: 76
1401 { skbaddr: ffff8800a8bfe700 } hitcount: 1 len: 46
1402 { skbaddr: ffff8800d2bcdc00 } hitcount: 1 len: 32
1403 { skbaddr: ffff8800a1f64800 } hitcount: 1 len: 46
1404 { skbaddr: ffff8800d2bcde00 } hitcount: 1 len: 988
1405 { skbaddr: ffff88006a5dea00 } hitcount: 1 len: 46
1406 { skbaddr: ffff88002e37a200 } hitcount: 1 len: 44
1407 { skbaddr: ffff8800a1f32c00 } hitcount: 2 len: 676
1408 { skbaddr: ffff88000ad52600 } hitcount: 2 len: 107
1409 { skbaddr: ffff8800a1f91e00 } hitcount: 2 len: 92
1410 { skbaddr: ffff8800af5a0200 } hitcount: 2 len: 142
1411 { skbaddr: ffff8800d2bcc600 } hitcount: 2 len: 220
1412 { skbaddr: ffff8800ba36f500 } hitcount: 2 len: 92
1413 { skbaddr: ffff8800d021f800 } hitcount: 2 len: 92
1414 { skbaddr: ffff8800a1f33600 } hitcount: 2 len: 675
1415 { skbaddr: ffff8800a8bfff00 } hitcount: 3 len: 138
1416 { skbaddr: ffff8800d62a1300 } hitcount: 3 len: 138
1417 { skbaddr: ffff88002e37a100 } hitcount: 4 len: 184
1418 { skbaddr: ffff880064504400 } hitcount: 4 len: 184
1419 { skbaddr: ffff8800a8bfec00 } hitcount: 4 len: 184
1420 { skbaddr: ffff88000ad53700 } hitcount: 5 len: 230
1421 { skbaddr: ffff8800d2bcdb00 } hitcount: 5 len: 196
1422 { skbaddr: ffff8800a1f90000 } hitcount: 6 len: 276
1423 { skbaddr: ffff88006a54f900 } hitcount: 6 len: 276
1431 # trigger info: hist:name=foo:keys=skbaddr.hex:vals=hitcount,len:sort=hitcount:size=2048 [active]
1434 { skbaddr: ffff88000ad53500 } hitcount: 1 len: 46
1435 { skbaddr: ffff8800af5a1500 } hitcount: 1 len: 76
1436 { skbaddr: ffff8800d62a1900 } hitcount: 1 len: 46
1437 { skbaddr: ffff8800d2bccb00 } hitcount: 1 len: 468
1438 { skbaddr: ffff8800d3c69900 } hitcount: 1 len: 46
1439 { skbaddr: ffff88009ff09100 } hitcount: 1 len: 52
1440 { skbaddr: ffff88010f13ab00 } hitcount: 1 len: 168
1441 { skbaddr: ffff88006a54f400 } hitcount: 1 len: 46
1442 { skbaddr: ffff8800d2bcc500 } hitcount: 1 len: 260
1443 { skbaddr: ffff880064505000 } hitcount: 1 len: 46
1444 { skbaddr: ffff8800baf24e00 } hitcount: 1 len: 32
1445 { skbaddr: ffff88009fe0ad00 } hitcount: 1 len: 46
1446 { skbaddr: ffff8800d3edff00 } hitcount: 1 len: 44
1447 { skbaddr: ffff88009fe0b400 } hitcount: 1 len: 168
1448 { skbaddr: ffff8800a1c55a00 } hitcount: 1 len: 40
1449 { skbaddr: ffff8800d2bcd100 } hitcount: 1 len: 40
1450 { skbaddr: ffff880064505f00 } hitcount: 1 len: 174
1451 { skbaddr: ffff8800a8bff200 } hitcount: 1 len: 160
1452 { skbaddr: ffff880044e3cc00 } hitcount: 1 len: 76
1453 { skbaddr: ffff8800a8bfe700 } hitcount: 1 len: 46
1454 { skbaddr: ffff8800d2bcdc00 } hitcount: 1 len: 32
1455 { skbaddr: ffff8800a1f64800 } hitcount: 1 len: 46
1456 { skbaddr: ffff8800d2bcde00 } hitcount: 1 len: 988
1457 { skbaddr: ffff88006a5dea00 } hitcount: 1 len: 46
1458 { skbaddr: ffff88002e37a200 } hitcount: 1 len: 44
1459 { skbaddr: ffff8800a1f32c00 } hitcount: 2 len: 676
1460 { skbaddr: ffff88000ad52600 } hitcount: 2 len: 107
1461 { skbaddr: ffff8800a1f91e00 } hitcount: 2 len: 92
1462 { skbaddr: ffff8800af5a0200 } hitcount: 2 len: 142
1463 { skbaddr: ffff8800d2bcc600 } hitcount: 2 len: 220
1464 { skbaddr: ffff8800ba36f500 } hitcount: 2 len: 92
1465 { skbaddr: ffff8800d021f800 } hitcount: 2 len: 92
1466 { skbaddr: ffff8800a1f33600 } hitcount: 2 len: 675
1467 { skbaddr: ffff8800a8bfff00 } hitcount: 3 len: 138
1468 { skbaddr: ffff8800d62a1300 } hitcount: 3 len: 138
1469 { skbaddr: ffff88002e37a100 } hitcount: 4 len: 184
1470 { skbaddr: ffff880064504400 } hitcount: 4 len: 184
1471 { skbaddr: ffff8800a8bfec00 } hitcount: 4 len: 184
1472 { skbaddr: ffff88000ad53700 } hitcount: 5 len: 230
1473 { skbaddr: ffff8800d2bcdb00 } hitcount: 5 len: 196
1474 { skbaddr: ffff8800a1f90000 } hitcount: 6 len: 276
1475 { skbaddr: ffff88006a54f900 } hitcount: 6 len: 276
1482 And here's an example that shows how to combine histogram data from
1483 any two events even if they don't share any 'compatible' fields
1484 other than 'hitcount' and 'stacktrace'. These commands create a
1485 couple of triggers named 'bar' using those fields:
1487 # echo 'hist:name=bar:key=stacktrace:val=hitcount' > \
1488 /sys/kernel/debug/tracing/events/sched/sched_process_fork/trigger
1489 # echo 'hist:name=bar:key=stacktrace:val=hitcount' > \
1490 /sys/kernel/debug/tracing/events/net/netif_rx/trigger
1492 And displaying the output of either shows some interesting if
1493 somewhat confusing output:
1495 # cat /sys/kernel/debug/tracing/events/sched/sched_process_fork/hist
1496 # cat /sys/kernel/debug/tracing/events/net/netif_rx/hist
1500 # trigger info: hist:name=bar:keys=stacktrace:vals=hitcount:sort=hitcount:size=2048 [active]
1504 _do_fork+0x18e/0x330
1505 kernel_thread+0x29/0x30
1506 kthreadd+0x154/0x1b0
1507 ret_from_fork+0x3f/0x70
1510 netif_rx_internal+0xb2/0xd0
1511 netif_rx_ni+0x20/0x70
1512 dev_loopback_xmit+0xaa/0xd0
1513 ip_mc_output+0x126/0x240
1514 ip_local_out_sk+0x31/0x40
1515 igmp_send_report+0x1e9/0x230
1516 igmp_timer_expire+0xe9/0x120
1517 call_timer_fn+0x39/0xf0
1518 run_timer_softirq+0x1e1/0x290
1519 __do_softirq+0xfd/0x290
1521 smp_apic_timer_interrupt+0x4a/0x60
1522 apic_timer_interrupt+0x6d/0x80
1523 cpuidle_enter+0x17/0x20
1524 call_cpuidle+0x3b/0x60
1525 cpu_startup_entry+0x22d/0x310
1528 netif_rx_internal+0xb2/0xd0
1529 netif_rx_ni+0x20/0x70
1530 dev_loopback_xmit+0xaa/0xd0
1531 ip_mc_output+0x17f/0x240
1532 ip_local_out_sk+0x31/0x40
1533 ip_send_skb+0x1a/0x50
1534 udp_send_skb+0x13e/0x270
1535 udp_sendmsg+0x2bf/0x980
1536 inet_sendmsg+0x67/0xa0
1537 sock_sendmsg+0x38/0x50
1538 SYSC_sendto+0xef/0x170
1540 entry_SYSCALL_64_fastpath+0x12/0x6a
1543 netif_rx_internal+0xb2/0xd0
1545 loopback_xmit+0x6c/0xb0
1546 dev_hard_start_xmit+0x219/0x3a0
1547 __dev_queue_xmit+0x415/0x4f0
1548 dev_queue_xmit_sk+0x13/0x20
1549 ip_finish_output2+0x237/0x340
1550 ip_finish_output+0x113/0x1d0
1552 ip_local_out_sk+0x31/0x40
1553 ip_send_skb+0x1a/0x50
1554 udp_send_skb+0x16d/0x270
1555 udp_sendmsg+0x2bf/0x980
1556 inet_sendmsg+0x67/0xa0
1557 sock_sendmsg+0x38/0x50
1558 ___sys_sendmsg+0x14e/0x270
1561 netif_rx_internal+0xb2/0xd0
1563 loopback_xmit+0x6c/0xb0
1564 dev_hard_start_xmit+0x219/0x3a0
1565 __dev_queue_xmit+0x415/0x4f0
1566 dev_queue_xmit_sk+0x13/0x20
1567 ip_finish_output2+0x237/0x340
1568 ip_finish_output+0x113/0x1d0
1570 ip_local_out_sk+0x31/0x40
1571 ip_send_skb+0x1a/0x50
1572 udp_send_skb+0x16d/0x270
1573 udp_sendmsg+0x2bf/0x980
1574 inet_sendmsg+0x67/0xa0
1575 sock_sendmsg+0x38/0x50
1576 ___sys_sendmsg+0x269/0x270
1579 netif_rx_internal+0xb2/0xd0
1581 loopback_xmit+0x6c/0xb0
1582 dev_hard_start_xmit+0x219/0x3a0
1583 __dev_queue_xmit+0x415/0x4f0
1584 dev_queue_xmit_sk+0x13/0x20
1585 ip_finish_output2+0x237/0x340
1586 ip_finish_output+0x113/0x1d0
1588 ip_local_out_sk+0x31/0x40
1589 ip_send_skb+0x1a/0x50
1590 udp_send_skb+0x16d/0x270
1591 udp_sendmsg+0x2bf/0x980
1592 inet_sendmsg+0x67/0xa0
1593 sock_sendmsg+0x38/0x50
1594 SYSC_sendto+0xef/0x170
1597 _do_fork+0x18e/0x330
1599 entry_SYSCALL_64_fastpath+0x12/0x6a
1608 2.2 Inter-event hist triggers
1609 -----------------------------
1611 Inter-event hist triggers are hist triggers that combine values from
1612 one or more other events and create a histogram using that data. Data
1613 from an inter-event histogram can in turn become the source for
1614 further combined histograms, thus providing a chain of related
1615 histograms, which is important for some applications.
1617 The most important example of an inter-event quantity that can be used
1618 in this manner is latency, which is simply a difference in timestamps
1619 between two events. Although latency is the most important
1620 inter-event quantity, note that because the support is completely
1621 general across the trace event subsystem, any event field can be used
1622 in an inter-event quantity.
1624 An example of a histogram that combines data from other histograms
1625 into a useful chain would be a 'wakeupswitch latency' histogram that
1626 combines a 'wakeup latency' histogram and a 'switch latency'
1629 Normally, a hist trigger specification consists of a (possibly
1630 compound) key along with one or more numeric values, which are
1631 continually updated sums associated with that key. A histogram
1632 specification in this case consists of individual key and value
1633 specifications that refer to trace event fields associated with a
1636 The inter-event hist trigger extension allows fields from multiple
1637 events to be referenced and combined into a multi-event histogram
1638 specification. In support of this overall goal, a few enabling
1639 features have been added to the hist trigger support:
1641 - In order to compute an inter-event quantity, a value from one
1642 event needs to saved and then referenced from another event. This
1643 requires the introduction of support for histogram 'variables'.
1645 - The computation of inter-event quantities and their combination
1646 require some minimal amount of support for applying simple
1647 expressions to variables (+ and -).
1649 - A histogram consisting of inter-event quantities isn't logically a
1650 histogram on either event (so having the 'hist' file for either
1651 event host the histogram output doesn't really make sense). To
1652 address the idea that the histogram is associated with a
1653 combination of events, support is added allowing the creation of
1654 'synthetic' events that are events derived from other events.
1655 These synthetic events are full-fledged events just like any other
1656 and can be used as such, as for instance to create the
1657 'combination' histograms mentioned previously.
1659 - A set of 'actions' can be associated with histogram entries -
1660 these can be used to generate the previously mentioned synthetic
1661 events, but can also be used for other purposes, such as for
1662 example saving context when a 'max' latency has been hit.
1664 - Trace events don't have a 'timestamp' associated with them, but
1665 there is an implicit timestamp saved along with an event in the
1666 underlying ftrace ring buffer. This timestamp is now exposed as a
1667 a synthetic field named 'common_timestamp' which can be used in
1668 histograms as if it were any other event field; it isn't an actual
1669 field in the trace format but rather is a synthesized value that
1670 nonetheless can be used as if it were an actual field. By default
1671 it is in units of nanoseconds; appending '.usecs' to a
1672 common_timestamp field changes the units to microseconds.
1674 A note on inter-event timestamps: If common_timestamp is used in a
1675 histogram, the trace buffer is automatically switched over to using
1676 absolute timestamps and the "global" trace clock, in order to avoid
1677 bogus timestamp differences with other clocks that aren't coherent
1678 across CPUs. This can be overridden by specifying one of the other
1679 trace clocks instead, using the "clock=XXX" hist trigger attribute,
1680 where XXX is any of the clocks listed in the tracing/trace_clock
1683 These features are described in more detail in the following sections.
1685 2.2.1 Histogram Variables
1686 -------------------------
1688 Variables are simply named locations used for saving and retrieving
1689 values between matching events. A 'matching' event is defined as an
1690 event that has a matching key - if a variable is saved for a histogram
1691 entry corresponding to that key, any subsequent event with a matching
1692 key can access that variable.
1694 A variable's value is normally available to any subsequent event until
1695 it is set to something else by a subsequent event. The one exception
1696 to that rule is that any variable used in an expression is essentially
1697 'read-once' - once it's used by an expression in a subsequent event,
1698 it's reset to its 'unset' state, which means it can't be used again
1699 unless it's set again. This ensures not only that an event doesn't
1700 use an uninitialized variable in a calculation, but that that variable
1701 is used only once and not for any unrelated subsequent match.
1703 The basic syntax for saving a variable is to simply prefix a unique
1704 variable name not corresponding to any keyword along with an '=' sign
1707 Either keys or values can be saved and retrieved in this way. This
1708 creates a variable named 'ts0' for a histogram entry with the key
1711 # echo 'hist:keys=next_pid:vals=$ts0:ts0=common_timestamp ... >> \
1714 The ts0 variable can be accessed by any subsequent event having the
1715 same pid as 'next_pid'.
1717 Variable references are formed by prepending the variable name with
1718 the '$' sign. Thus for example, the ts0 variable above would be
1719 referenced as '$ts0' in expressions.
1721 Because 'vals=' is used, the common_timestamp variable value above
1722 will also be summed as a normal histogram value would (though for a
1723 timestamp it makes little sense).
1725 The below shows that a key value can also be saved in the same way:
1727 # echo 'hist:timer_pid=common_pid:key=timer_pid ...' >> event/trigger
1729 If a variable isn't a key variable or prefixed with 'vals=', the
1730 associated event field will be saved in a variable but won't be summed
1733 # echo 'hist:keys=next_pid:ts1=common_timestamp ... >> event/trigger
1735 Multiple variables can be assigned at the same time. The below would
1736 result in both ts0 and b being created as variables, with both
1737 common_timestamp and field1 additionally being summed as values:
1739 # echo 'hist:keys=pid:vals=$ts0,$b:ts0=common_timestamp,b=field1 ... >> \
1742 Note that variable assignments can appear either preceding or
1743 following their use. The command below behaves identically to the
1746 # echo 'hist:keys=pid:ts0=common_timestamp,b=field1:vals=$ts0,$b ... >> \
1749 Any number of variables not bound to a 'vals=' prefix can also be
1750 assigned by simply separating them with colons. Below is the same
1751 thing but without the values being summed in the histogram:
1753 # echo 'hist:keys=pid:ts0=common_timestamp:b=field1 ... >> event/trigger
1755 Variables set as above can be referenced and used in expressions on
1758 For example, here's how a latency can be calculated:
1760 # echo 'hist:keys=pid,prio:ts0=common_timestamp ... >> event1/trigger
1761 # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp-$ts0 ... >> event2/trigger
1763 In the first line above, the event's timetamp is saved into the
1764 variable ts0. In the next line, ts0 is subtracted from the second
1765 event's timestamp to produce the latency, which is then assigned into
1766 yet another variable, 'wakeup_lat'. The hist trigger below in turn
1767 makes use of the wakeup_lat variable to compute a combined latency
1768 using the same key and variable from yet another event:
1770 # echo 'hist:key=pid:wakeupswitch_lat=$wakeup_lat+$switchtime_lat ... >> event3/trigger
1772 2.2.2 Synthetic Events
1773 ----------------------
1775 Synthetic events are user-defined events generated from hist trigger
1776 variables or fields associated with one or more other events. Their
1777 purpose is to provide a mechanism for displaying data spanning
1778 multiple events consistent with the existing and already familiar
1779 usage for normal events.
1781 To define a synthetic event, the user writes a simple specification
1782 consisting of the name of the new event along with one or more
1783 variables and their types, which can be any valid field type,
1784 separated by semicolons, to the tracing/synthetic_events file.
1786 For instance, the following creates a new event named 'wakeup_latency'
1787 with 3 fields: lat, pid, and prio. Each of those fields is simply a
1788 variable reference to a variable on another event:
1790 # echo 'wakeup_latency \
1794 /sys/kernel/debug/tracing/synthetic_events
1796 Reading the tracing/synthetic_events file lists all the currently
1797 defined synthetic events, in this case the event defined above:
1799 # cat /sys/kernel/debug/tracing/synthetic_events
1800 wakeup_latency u64 lat; pid_t pid; int prio
1802 An existing synthetic event definition can be removed by prepending
1803 the command that defined it with a '!':
1805 # echo '!wakeup_latency u64 lat pid_t pid int prio' >> \
1806 /sys/kernel/debug/tracing/synthetic_events
1808 At this point, there isn't yet an actual 'wakeup_latency' event
1809 instantiated in the event subsytem - for this to happen, a 'hist
1810 trigger action' needs to be instantiated and bound to actual fields
1811 and variables defined on other events (see Section 6.3.3 below).
1813 Once that is done, an event instance is created, and a histogram can
1814 be defined using it:
1816 # echo 'hist:keys=pid,prio,lat.log2:sort=pid,lat' >> \
1817 /sys/kernel/debug/tracing/events/synthetic/wakeup_latency/trigger
1819 The new event is created under the tracing/events/synthetic/ directory
1820 and looks and behaves just like any other event:
1822 # ls /sys/kernel/debug/tracing/events/synthetic/wakeup_latency
1823 enable filter format hist id trigger
1825 Like any other event, once a histogram is enabled for the event, the
1826 output can be displayed by reading the event's 'hist' file.
1828 2.2.3 Hist trigger 'actions'
1829 ----------------------------
1831 A hist trigger 'action' is a function that's executed whenever a
1832 histogram entry is added or updated.
1834 The default 'action' if no special function is explicity specified is
1835 as it always has been, to simply update the set of values associated
1836 with an entry. Some applications, however, may want to perform
1837 additional actions at that point, such as generate another event, or
1838 compare and save a maximum.
1840 The following additional actions are available. To specify an action
1841 for a given event, simply specify the action between colons in the
1842 hist trigger specification.
1844 - onmatch(matching.event).<synthetic_event_name>(param list)
1846 The 'onmatch(matching.event).<synthetic_event_name>(params)' hist
1847 trigger action is invoked whenever an event matches and the
1848 histogram entry would be added or updated. It causes the named
1849 synthetic event to be generated with the values given in the
1850 'param list'. The result is the generation of a synthetic event
1851 that consists of the values contained in those variables at the
1852 time the invoking event was hit.
1854 The 'param list' consists of one or more parameters which may be
1855 either variables or fields defined on either the 'matching.event'
1856 or the target event. The variables or fields specified in the
1857 param list may be either fully-qualified or unqualified. If a
1858 variable is specified as unqualified, it must be unique between
1859 the two events. A field name used as a param can be unqualified
1860 if it refers to the target event, but must be fully qualified if
1861 it refers to the matching event. A fully-qualified name is of the
1862 form 'system.event_name.$var_name' or 'system.event_name.field'.
1864 The 'matching.event' specification is simply the fully qualified
1865 event name of the event that matches the target event for the
1866 onmatch() functionality, in the form 'system.event_name'.
1868 Finally, the number and type of variables/fields in the 'param
1869 list' must match the number and types of the fields in the
1870 synthetic event being generated.
1872 As an example the below defines a simple synthetic event and uses
1873 a variable defined on the sched_wakeup_new event as a parameter
1874 when invoking the synthetic event. Here we define the synthetic
1877 # echo 'wakeup_new_test pid_t pid' >> \
1878 /sys/kernel/debug/tracing/synthetic_events
1880 # cat /sys/kernel/debug/tracing/synthetic_events
1881 wakeup_new_test pid_t pid
1883 The following hist trigger both defines the missing testpid
1884 variable and specifies an onmatch() action that generates a
1885 wakeup_new_test synthetic event whenever a sched_wakeup_new event
1886 occurs, which because of the 'if comm == "cyclictest"' filter only
1887 happens when the executable is cyclictest:
1889 # echo 'hist:keys=$testpid:testpid=pid:onmatch(sched.sched_wakeup_new).\
1890 wakeup_new_test($testpid) if comm=="cyclictest"' >> \
1891 /sys/kernel/debug/tracing/events/sched/sched_wakeup_new/trigger
1893 Creating and displaying a histogram based on those events is now
1894 just a matter of using the fields and new synthetic event in the
1895 tracing/events/synthetic directory, as usual:
1897 # echo 'hist:keys=pid:sort=pid' >> \
1898 /sys/kernel/debug/tracing/events/synthetic/wakeup_new_test/trigger
1900 Running 'cyclictest' should cause wakeup_new events to generate
1901 wakeup_new_test synthetic events which should result in histogram
1902 output in the wakeup_new_test event's hist file:
1904 # cat /sys/kernel/debug/tracing/events/synthetic/wakeup_new_test/hist
1906 A more typical usage would be to use two events to calculate a
1907 latency. The following example uses a set of hist triggers to
1908 produce a 'wakeup_latency' histogram:
1910 First, we define a 'wakeup_latency' synthetic event:
1912 # echo 'wakeup_latency u64 lat; pid_t pid; int prio' >> \
1913 /sys/kernel/debug/tracing/synthetic_events
1915 Next, we specify that whenever we see a sched_waking event for a
1916 cyclictest thread, save the timestamp in a 'ts0' variable:
1918 # echo 'hist:keys=$saved_pid:saved_pid=pid:ts0=common_timestamp.usecs \
1919 if comm=="cyclictest"' >> \
1920 /sys/kernel/debug/tracing/events/sched/sched_waking/trigger
1922 Then, when the corresponding thread is actually scheduled onto the
1923 CPU by a sched_switch event, calculate the latency and use that
1924 along with another variable and an event field to generate a
1925 wakeup_latency synthetic event:
1927 # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:\
1928 onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,\
1929 $saved_pid,next_prio) if next_comm=="cyclictest"' >> \
1930 /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
1932 We also need to create a histogram on the wakeup_latency synthetic
1933 event in order to aggregate the generated synthetic event data:
1935 # echo 'hist:keys=pid,prio,lat:sort=pid,lat' >> \
1936 /sys/kernel/debug/tracing/events/synthetic/wakeup_latency/trigger
1938 Finally, once we've run cyclictest to actually generate some
1939 events, we can see the output by looking at the wakeup_latency
1940 synthetic event's hist file:
1942 # cat /sys/kernel/debug/tracing/events/synthetic/wakeup_latency/hist
1944 - onmax(var).save(field,.. .)
1946 The 'onmax(var).save(field,...)' hist trigger action is invoked
1947 whenever the value of 'var' associated with a histogram entry
1948 exceeds the current maximum contained in that variable.
1950 The end result is that the trace event fields specified as the
1951 onmax.save() params will be saved if 'var' exceeds the current
1952 maximum for that hist trigger entry. This allows context from the
1953 event that exhibited the new maximum to be saved for later
1954 reference. When the histogram is displayed, additional fields
1955 displaying the saved values will be printed.
1957 As an example the below defines a couple of hist triggers, one for
1958 sched_waking and another for sched_switch, keyed on pid. Whenever
1959 a sched_waking occurs, the timestamp is saved in the entry
1960 corresponding to the current pid, and when the scheduler switches
1961 back to that pid, the timestamp difference is calculated. If the
1962 resulting latency, stored in wakeup_lat, exceeds the current
1963 maximum latency, the values specified in the save() fields are
1966 # echo 'hist:keys=pid:ts0=common_timestamp.usecs \
1967 if comm=="cyclictest"' >> \
1968 /sys/kernel/debug/tracing/events/sched/sched_waking/trigger
1970 # echo 'hist:keys=next_pid:\
1971 wakeup_lat=common_timestamp.usecs-$ts0:\
1972 onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm) \
1973 if next_comm=="cyclictest"' >> \
1974 /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
1976 When the histogram is displayed, the max value and the saved
1977 values corresponding to the max are displayed following the rest
1980 # cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist
1981 { next_pid: 2255 } hitcount: 239
1982 common_timestamp-ts0: 0
1984 next_comm: cyclictest
1985 prev_pid: 0 prev_prio: 120 prev_comm: swapper/1
1987 { next_pid: 2256 } hitcount: 2355
1988 common_timestamp-ts0: 0
1989 max: 49 next_comm: cyclictest
1990 prev_pid: 0 prev_prio: 120 prev_comm: swapper/0