5 The timerlat tracer aims to help the preemptive kernel developers to
6 find sources of wakeup latencies of real-time threads. Like cyclictest,
7 the tracer sets a periodic timer that wakes up a thread. The thread then
8 computes a *wakeup latency* value as the difference between the *current
9 time* and the *absolute time* that the timer was set to expire. The main
10 goal of timerlat is tracing in such a way to help kernel developers.
15 Write the ASCII text "timerlat" into the current_tracer file of the
16 tracing system (generally mounted at /sys/kernel/tracing).
20 [root@f32 ~]# cd /sys/kernel/tracing/
21 [root@f32 tracing]# echo timerlat > current_tracer
23 It is possible to follow the trace by reading the trace file::
25 [root@f32 tracing]# cat trace
29 # / _----=> need-resched
30 # | / _---=> hardirq/softirq
31 # || / _--=> preempt-depth
34 # TASK-PID CPU# |||| TIMESTAMP ID CONTEXT LATENCY
36 <idle>-0 [000] d.h1 54.029328: #1 context irq timer_latency 932 ns
37 <...>-867 [000] .... 54.029339: #1 context thread timer_latency 11700 ns
38 <idle>-0 [001] dNh1 54.029346: #1 context irq timer_latency 2833 ns
39 <...>-868 [001] .... 54.029353: #1 context thread timer_latency 9820 ns
40 <idle>-0 [000] d.h1 54.030328: #2 context irq timer_latency 769 ns
41 <...>-867 [000] .... 54.030330: #2 context thread timer_latency 3070 ns
42 <idle>-0 [001] d.h1 54.030344: #2 context irq timer_latency 935 ns
43 <...>-868 [001] .... 54.030347: #2 context thread timer_latency 4351 ns
46 The tracer creates a per-cpu kernel thread with real-time priority that
47 prints two lines at every activation. The first is the *timer latency*
48 observed at the *hardirq* context before the activation of the thread.
49 The second is the *timer latency* observed by the thread. The ACTIVATION
50 ID field serves to relate the *irq* execution to its respective *thread*
53 The *irq*/*thread* splitting is important to clarify in which context
54 the unexpected high value is coming from. The *irq* context can be
55 delayed by hardware-related actions, such as SMIs, NMIs, IRQs,
56 or by thread masking interrupts. Once the timer happens, the delay
57 can also be influenced by blocking caused by threads. For example, by
58 postponing the scheduler execution via preempt_disable(), scheduler
59 execution, or masking interrupts. Threads can also be delayed by the
60 interference from other threads and IRQs.
65 The timerlat tracer is built on top of osnoise tracer.
66 So its configuration is also done in the osnoise/ config
67 directory. The timerlat configs are:
69 - cpus: CPUs at which a timerlat thread will execute.
70 - timerlat_period_us: the period of the timerlat thread.
71 - stop_tracing_us: stop the system tracing if a
72 timer latency at the *irq* context higher than the configured
73 value happens. Writing 0 disables this option.
74 - stop_tracing_total_us: stop the system tracing if a
75 timer latency at the *thread* context is higher than the configured
76 value happens. Writing 0 disables this option.
77 - print_stack: save the stack of the IRQ occurrence. The stack is printed
78 after the *thread context* event, or at the IRQ handler if *stop_tracing_us*
82 ----------------------------
84 The timerlat can also take advantage of the osnoise: traceevents.
87 [root@f32 ~]# cd /sys/kernel/tracing/
88 [root@f32 tracing]# echo timerlat > current_tracer
89 [root@f32 tracing]# echo 1 > events/osnoise/enable
90 [root@f32 tracing]# echo 25 > osnoise/stop_tracing_total_us
91 [root@f32 tracing]# tail -10 trace
92 cc1-87882 [005] d..h... 548.771078: #402268 context irq timer_latency 13585 ns
93 cc1-87882 [005] dNLh1.. 548.771082: irq_noise: local_timer:236 start 548.771077442 duration 7597 ns
94 cc1-87882 [005] dNLh2.. 548.771099: irq_noise: qxl:21 start 548.771085017 duration 7139 ns
95 cc1-87882 [005] d...3.. 548.771102: thread_noise: cc1:87882 start 548.771078243 duration 9909 ns
96 timerlat/5-1035 [005] ....... 548.771104: #402268 context thread timer_latency 39960 ns
98 In this case, the root cause of the timer latency does not point to a
99 single cause but to multiple ones. Firstly, the timer IRQ was delayed
100 for 13 us, which may point to a long IRQ disabled section (see IRQ
101 stacktrace section). Then the timer interrupt that wakes up the timerlat
102 thread took 7597 ns, and the qxl:21 device IRQ took 7139 ns. Finally,
103 the cc1 thread noise took 9909 ns of time before the context switch.
104 Such pieces of evidence are useful for the developer to use other
105 tracing methods to figure out how to debug and optimize the system.
107 It is worth mentioning that the *duration* values reported
108 by the osnoise: events are *net* values. For example, the
109 thread_noise does not include the duration of the overhead caused
110 by the IRQ execution (which indeed accounted for 12736 ns). But
111 the values reported by the timerlat tracer (timerlat_latency)
114 The art below illustrates a CPU timeline and how the timerlat tracer
115 observes it at the top and the osnoise: events at the bottom. Each "-"
116 in the timelines means circa 1 us, and the time moves ==>::
118 External timer irq thread
119 clock latency latency
120 event 13585 ns 39960 ns
124 |-------------+-------------------------|
126 ========================================================================
128 [another thread...^ v..^ v.......][timerlat/ thread] <-- CPU timeline
129 =========================================================================
133 | | + thread_noise: 9909 ns
134 | +-> irq_noise: 6139 ns
135 +-> irq_noise: 7597 ns
138 ---------------------------
140 The osnoise/print_stack option is helpful for the cases in which a thread
141 noise causes the major factor for the timer latency, because of preempt or
142 irq disabled. For example::
144 [root@f32 tracing]# echo 500 > osnoise/stop_tracing_total_us
145 [root@f32 tracing]# echo 500 > osnoise/print_stack
146 [root@f32 tracing]# echo timerlat > current_tracer
147 [root@f32 tracing]# tail -21 per_cpu/cpu7/trace
148 insmod-1026 [007] dN.h1.. 200.201948: irq_noise: local_timer:236 start 200.201939376 duration 7872 ns
149 insmod-1026 [007] d..h1.. 200.202587: #29800 context irq timer_latency 1616 ns
150 insmod-1026 [007] dN.h2.. 200.202598: irq_noise: local_timer:236 start 200.202586162 duration 11855 ns
151 insmod-1026 [007] dN.h3.. 200.202947: irq_noise: local_timer:236 start 200.202939174 duration 7318 ns
152 insmod-1026 [007] d...3.. 200.203444: thread_noise: insmod:1026 start 200.202586933 duration 838681 ns
153 timerlat/7-1001 [007] ....... 200.203445: #29800 context thread timer_latency 859978 ns
154 timerlat/7-1001 [007] ....1.. 200.203446: <stack trace>
156 => __hrtimer_run_queues
158 => __sysvec_apic_timer_interrupt
159 => asm_call_irq_on_stack
160 => sysvec_apic_timer_interrupt
161 => asm_sysvec_apic_timer_interrupt
163 => dummy_load_1ms_pd_init
166 => __do_sys_finit_module
168 => entry_SYSCALL_64_after_hwframe
170 In this case, it is possible to see that the thread added the highest
171 contribution to the *timer latency* and the stack trace, saved during
172 the timerlat IRQ handler, points to a function named
173 dummy_load_1ms_pd_init, which had the following code (on purpose)::
175 static int __init dummy_load_1ms_pd_init(void)
185 ---------------------------
187 Timerlat allows user-space threads to use timerlat infra-structure to
188 measure scheduling latency. This interface is accessible via a per-CPU
189 file descriptor inside $tracing_dir/osnoise/per_cpu/cpu$ID/timerlat_fd.
191 This interface is accessible under the following conditions:
193 - timerlat tracer is enable
194 - osnoise workload option is set to NO_OSNOISE_WORKLOAD
195 - The user-space thread is affined to a single processor
196 - The thread opens the file associated with its single processor
197 - Only one thread can access the file at a time
199 The open() syscall will fail if any of these conditions are not met.
200 After opening the file descriptor, the user space can read from it.
202 The read() system call will run a timerlat code that will arm the
203 timer in the future and wait for it as the regular kernel thread does.
205 When the timer IRQ fires, the timerlat IRQ will execute, report the
206 IRQ latency and wake up the thread waiting in the read. The thread will be
207 scheduled and report the thread latency via tracer - as for the kernel
210 The difference from the in-kernel timerlat is that, instead of re-arming
211 the timer, timerlat will return to the read() system call. At this point,
212 the user can run any code.
214 If the application rereads the file timerlat file descriptor, the tracer
215 will report the return from user-space latency, which is the total
216 latency. If this is the end of the work, it can be interpreted as the
217 response time for the request.
219 After reporting the total latency, timerlat will restart the cycle, arm
220 a timer, and go to sleep for the following activation.
222 If at any time one of the conditions is broken, e.g., the thread migrates
223 while in user space, or the timerlat tracer is disabled, the SIG_KILL
224 signal will be sent to the user-space thread.
226 Here is an basic example of user-space code for timerlat::
233 long cpu = 0; /* place in CPU 0 */
239 if (sched_setaffinity(gettid(), sizeof(set), &set) == -1)
242 snprintf(buffer, sizeof(buffer),
243 "/sys/kernel/tracing/osnoise/per_cpu/cpu%ld/timerlat_fd",
246 timerlat_fd = open(buffer, O_RDONLY);
247 if (timerlat_fd < 0) {
248 printf("error opening %s: %s\n", buffer, strerror(errno));
253 retval = read(timerlat_fd, buffer, 1024);