"[PATCH] Fix leaks on /proc/{*/sched,sched_debug,timer_list,timer_stats}" and
[mmotm.git] / Documentation / DocBook / utrace.tmpl
blobb802c5535f3f402efba27957525139d9611cba71
1 <?xml version="1.0" encoding="UTF-8"?>
2 <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
3 "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
5 <book id="utrace">
6 <bookinfo>
7 <title>The utrace User Debugging Infrastructure</title>
8 </bookinfo>
10 <toc></toc>
12 <chapter id="concepts"><title>utrace concepts</title>
14 <sect1 id="intro"><title>Introduction</title>
16 <para>
17 <application>utrace</application> is infrastructure code for tracing
18 and controlling user threads. This is the foundation for writing
19 tracing engines, which can be loadable kernel modules.
20 </para>
22 <para>
23 The basic actors in <application>utrace</application> are the thread
24 and the tracing engine. A tracing engine is some body of code that
25 calls into the <filename>&lt;linux/utrace.h&gt;</filename>
26 interfaces, represented by a <structname>struct
27 utrace_engine_ops</structname>. (Usually it's a kernel module,
28 though the legacy <function>ptrace</function> support is a tracing
29 engine that is not in a kernel module.) The interface operates on
30 individual threads (<structname>struct task_struct</structname>).
31 If an engine wants to treat several threads as a group, that is up
32 to its higher-level code.
33 </para>
35 <para>
36 Tracing begins by attaching an engine to a thread, using
37 <function>utrace_attach_task</function> or
38 <function>utrace_attach_pid</function>. If successful, it returns a
39 pointer that is the handle used in all other calls.
40 </para>
42 </sect1>
44 <sect1 id="callbacks"><title>Events and Callbacks</title>
46 <para>
47 An attached engine does nothing by default. An engine makes something
48 happen by requesting callbacks via <function>utrace_set_events</function>
49 and poking the thread with <function>utrace_control</function>.
50 The synchronization issues related to these two calls
51 are discussed further below in <xref linkend="teardown"/>.
52 </para>
54 <para>
55 Events are specified using the macro
56 <constant>UTRACE_EVENT(<replaceable>type</replaceable>)</constant>.
57 Each event type is associated with a callback in <structname>struct
58 utrace_engine_ops</structname>. A tracing engine can leave unused
59 callbacks <constant>NULL</constant>. The only callbacks required
60 are those used by the event flags it sets.
61 </para>
63 <para>
64 Many engines can be attached to each thread. When a thread has an
65 event, each engine gets a callback if it has set the event flag for
66 that event type. Engines are called in the order they attached.
67 Engines that attach after the event has occurred do not get callbacks
68 for that event. This includes any new engines just attached by an
69 existing engine's callback function. Once the sequence of callbacks
70 for that one event has completed, such new engines are then eligible in
71 the next sequence that starts when there is another event.
72 </para>
74 <para>
75 Event reporting callbacks have details particular to the event type,
76 but are all called in similar environments and have the same
77 constraints. Callbacks are made from safe points, where no locks
78 are held, no special resources are pinned (usually), and the
79 user-mode state of the thread is accessible. So, callback code has
80 a pretty free hand. But to be a good citizen, callback code should
81 never block for long periods. It is fine to block in
82 <function>kmalloc</function> and the like, but never wait for i/o or
83 for user mode to do something. If you need the thread to wait, use
84 <constant>UTRACE_STOP</constant> and return from the callback
85 quickly. When your i/o finishes or whatever, you can use
86 <function>utrace_control</function> to resume the thread.
87 </para>
89 </sect1>
91 <sect1 id="safely"><title>Stopping Safely</title>
93 <sect2 id="well-behaved"><title>Writing well-behaved callbacks</title>
95 <para>
96 Well-behaved callbacks are important to maintain two essential
97 properties of the interface. The first of these is that unrelated
98 tracing engines should not interfere with each other. If your engine's
99 event callback does not return quickly, then another engine won't get
100 the event notification in a timely manner. The second important
101 property is that tracing should be as noninvasive as possible to the
102 normal operation of the system overall and of the traced thread in
103 particular. That is, attached tracing engines should not perturb a
104 thread's behavior, except to the extent that changing its user-visible
105 state is explicitly what you want to do. (Obviously some perturbation
106 is unavoidable, primarily timing changes, ranging from small delays due
107 to the overhead of tracing, to arbitrary pauses in user code execution
108 when a user stops a thread with a debugger for examination.) Even when
109 you explicitly want the perturbation of making the traced thread block,
110 just blocking directly in your callback has more unwanted effects. For
111 example, the <constant>CLONE</constant> event callbacks are called when
112 the new child thread has been created but not yet started running; the
113 child can never be scheduled until the <constant>CLONE</constant>
114 tracing callbacks return. (This allows engines tracing the parent to
115 attach to the child.) If a <constant>CLONE</constant> event callback
116 blocks the parent thread, it also prevents the child thread from
117 running (even to process a <constant>SIGKILL</constant>). If what you
118 want is to make both the parent and child block, then use
119 <function>utrace_attach_task</function> on the child and then use
120 <constant>UTRACE_STOP</constant> on both threads. A more crucial
121 problem with blocking in callbacks is that it can prevent
122 <constant>SIGKILL</constant> from working. A thread that is blocking
123 due to <constant>UTRACE_STOP</constant> will still wake up and die
124 immediately when sent a <constant>SIGKILL</constant>, as all threads
125 should. Relying on the <application>utrace</application>
126 infrastructure rather than on private synchronization calls in event
127 callbacks is an important way to help keep tracing robustly
128 noninvasive.
129 </para>
131 </sect2>
133 <sect2 id="UTRACE_STOP"><title>Using <constant>UTRACE_STOP</constant></title>
135 <para>
136 To control another thread and access its state, it must be stopped
137 with <constant>UTRACE_STOP</constant>. This means that it is
138 stopped and won't start running again while we access it. When a
139 thread is not already stopped, <function>utrace_control</function>
140 returns <constant>-EINPROGRESS</constant> and an engine must wait
141 for an event callback when the thread is ready to stop. The thread
142 may be running on another CPU or may be blocked. When it is ready
143 to be examined, it will make callbacks to engines that set the
144 <constant>UTRACE_EVENT(QUIESCE)</constant> event bit. To wake up an
145 interruptible wait, use <constant>UTRACE_INTERRUPT</constant>.
146 </para>
148 <para>
149 As long as some engine has used <constant>UTRACE_STOP</constant> and
150 not called <function>utrace_control</function> to resume the thread,
151 then the thread will remain stopped. <constant>SIGKILL</constant>
152 will wake it up, but it will not run user code. When the stop is
153 cleared with <function>utrace_control</function> or a callback
154 return value, the thread starts running again.
155 (See also <xref linkend="teardown"/>.)
156 </para>
158 </sect2>
160 </sect1>
162 <sect1 id="teardown"><title>Tear-down Races</title>
164 <sect2 id="SIGKILL"><title>Primacy of <constant>SIGKILL</constant></title>
165 <para>
166 Ordinarily synchronization issues for tracing engines are kept fairly
167 straightforward by using <constant>UTRACE_STOP</constant>. You ask a
168 thread to stop, and then once it makes the
169 <function>report_quiesce</function> callback it cannot do anything else
170 that would result in another callback, until you let it with a
171 <function>utrace_control</function> call. This simple arrangement
172 avoids complex and error-prone code in each one of a tracing engine's
173 event callbacks to keep them serialized with the engine's other
174 operations done on that thread from another thread of control.
175 However, giving tracing engines complete power to keep a traced thread
176 stuck in place runs afoul of a more important kind of simplicity that
177 the kernel overall guarantees: nothing can prevent or delay
178 <constant>SIGKILL</constant> from making a thread die and release its
179 resources. To preserve this important property of
180 <constant>SIGKILL</constant>, it as a special case can break
181 <constant>UTRACE_STOP</constant> like nothing else normally can. This
182 includes both explicit <constant>SIGKILL</constant> signals and the
183 implicit <constant>SIGKILL</constant> sent to each other thread in the
184 same thread group by a thread doing an exec, or processing a fatal
185 signal, or making an <function>exit_group</function> system call. A
186 tracing engine can prevent a thread from beginning the exit or exec or
187 dying by signal (other than <constant>SIGKILL</constant>) if it is
188 attached to that thread, but once the operation begins, no tracing
189 engine can prevent or delay all other threads in the same thread group
190 dying.
191 </para>
192 </sect2>
194 <sect2 id="reap"><title>Final callbacks</title>
195 <para>
196 The <function>report_reap</function> callback is always the final event
197 in the life cycle of a traced thread. Tracing engines can use this as
198 the trigger to clean up their own data structures. The
199 <function>report_death</function> callback is always the penultimate
200 event a tracing engine might see; it's seen unless the thread was
201 already in the midst of dying when the engine attached. Many tracing
202 engines will have no interest in when a parent reaps a dead process,
203 and nothing they want to do with a zombie thread once it dies; for
204 them, the <function>report_death</function> callback is the natural
205 place to clean up data structures and detach. To facilitate writing
206 such engines robustly, given the asynchrony of
207 <constant>SIGKILL</constant>, and without error-prone manual
208 implementation of synchronization schemes, the
209 <application>utrace</application> infrastructure provides some special
210 guarantees about the <function>report_death</function> and
211 <function>report_reap</function> callbacks. It still takes some care
212 to be sure your tracing engine is robust to tear-down races, but these
213 rules make it reasonably straightforward and concise to handle a lot of
214 corner cases correctly.
215 </para>
216 </sect2>
218 <sect2 id="refcount"><title>Engine and task pointers</title>
219 <para>
220 The first sort of guarantee concerns the core data structures
221 themselves. <structname>struct utrace_engine</structname> is
222 a reference-counted data structure. While you hold a reference, an
223 engine pointer will always stay valid so that you can safely pass it to
224 any <application>utrace</application> call. Each call to
225 <function>utrace_attach_task</function> or
226 <function>utrace_attach_pid</function> returns an engine pointer with a
227 reference belonging to the caller. You own that reference until you
228 drop it using <function>utrace_engine_put</function>. There is an
229 implicit reference on the engine while it is attached. So if you drop
230 your only reference, and then use
231 <function>utrace_attach_task</function> without
232 <constant>UTRACE_ATTACH_CREATE</constant> to look up that same engine,
233 you will get the same pointer with a new reference to replace the one
234 you dropped, just like calling <function>utrace_engine_get</function>.
235 When an engine has been detached, either explicitly with
236 <constant>UTRACE_DETACH</constant> or implicitly after
237 <function>report_reap</function>, then any references you hold are all
238 that keep the old engine pointer alive.
239 </para>
241 <para>
242 There is nothing a kernel module can do to keep a <structname>struct
243 task_struct</structname> alive outside of
244 <function>rcu_read_lock</function>. When the task dies and is reaped
245 by its parent (or itself), that structure can be freed so that any
246 dangling pointers you have stored become invalid.
247 <application>utrace</application> will not prevent this, but it can
248 help you detect it safely. By definition, a task that has been reaped
249 has had all its engines detached. All
250 <application>utrace</application> calls can be safely called on a
251 detached engine if the caller holds a reference on that engine pointer,
252 even if the task pointer passed in the call is invalid. All calls
253 return <constant>-ESRCH</constant> for a detached engine, which tells
254 you that the task pointer you passed could be invalid now. Since
255 <function>utrace_control</function> and
256 <function>utrace_set_events</function> do not block, you can call those
257 inside a <function>rcu_read_lock</function> section and be sure after
258 they don't return <constant>-ESRCH</constant> that the task pointer is
259 still valid until <function>rcu_read_unlock</function>. The
260 infrastructure never holds task references of its own. Though neither
261 <function>rcu_read_lock</function> nor any other lock is held while
262 making a callback, it's always guaranteed that the <structname>struct
263 task_struct</structname> and the <structname>struct
264 utrace_engine</structname> passed as arguments remain valid
265 until the callback function returns.
266 </para>
268 <para>
269 The common means for safely holding task pointers that is available to
270 kernel modules is to use <structname>struct pid</structname>, which
271 permits <function>put_pid</function> from kernel modules. When using
272 that, the calls <function>utrace_attach_pid</function>,
273 <function>utrace_control_pid</function>,
274 <function>utrace_set_events_pid</function>, and
275 <function>utrace_barrier_pid</function> are available.
276 </para>
277 </sect2>
279 <sect2 id="reap-after-death">
280 <title>
281 Serialization of <constant>DEATH</constant> and <constant>REAP</constant>
282 </title>
283 <para>
284 The second guarantee is the serialization of
285 <constant>DEATH</constant> and <constant>REAP</constant> event
286 callbacks for a given thread. The actual reaping by the parent
287 (<function>release_task</function> call) can occur simultaneously
288 while the thread is still doing the final steps of dying, including
289 the <function>report_death</function> callback. If a tracing engine
290 has requested both <constant>DEATH</constant> and
291 <constant>REAP</constant> event reports, it's guaranteed that the
292 <function>report_reap</function> callback will not be made until
293 after the <function>report_death</function> callback has returned.
294 If the <function>report_death</function> callback itself detaches
295 from the thread, then the <function>report_reap</function> callback
296 will never be made. Thus it is safe for a
297 <function>report_death</function> callback to clean up data
298 structures and detach.
299 </para>
300 </sect2>
302 <sect2 id="interlock"><title>Interlock with final callbacks</title>
303 <para>
304 The final sort of guarantee is that a tracing engine will know for sure
305 whether or not the <function>report_death</function> and/or
306 <function>report_reap</function> callbacks will be made for a certain
307 thread. These tear-down races are disambiguated by the error return
308 values of <function>utrace_set_events</function> and
309 <function>utrace_control</function>. Normally
310 <function>utrace_control</function> called with
311 <constant>UTRACE_DETACH</constant> returns zero, and this means that no
312 more callbacks will be made. If the thread is in the midst of dying,
313 it returns <constant>-EALREADY</constant> to indicate that the
314 <constant>report_death</constant> callback may already be in progress;
315 when you get this error, you know that any cleanup your
316 <function>report_death</function> callback does is about to happen or
317 has just happened--note that if the <function>report_death</function>
318 callback does not detach, the engine remains attached until the thread
319 gets reaped. If the thread is in the midst of being reaped,
320 <function>utrace_control</function> returns <constant>-ESRCH</constant>
321 to indicate that the <function>report_reap</function> callback may
322 already be in progress; this means the engine is implicitly detached
323 when the callback completes. This makes it possible for a tracing
324 engine that has decided asynchronously to detach from a thread to
325 safely clean up its data structures, knowing that no
326 <function>report_death</function> or <function>report_reap</function>
327 callback will try to do the same. <constant>utrace_detach</constant>
328 returns <constant>-ESRCH</constant> when the <structname>struct
329 utrace_engine</structname> has already been detached, but is
330 still a valid pointer because of its reference count. A tracing engine
331 can use this to safely synchronize its own independent multiple threads
332 of control with each other and with its event callbacks that detach.
333 </para>
335 <para>
336 In the same vein, <function>utrace_set_events</function> normally
337 returns zero; if the target thread was stopped before the call, then
338 after a successful call, no event callbacks not requested in the new
339 flags will be made. It fails with <constant>-EALREADY</constant> if
340 you try to clear <constant>UTRACE_EVENT(DEATH)</constant> when the
341 <function>report_death</function> callback may already have begun, if
342 you try to clear <constant>UTRACE_EVENT(REAP)</constant> when the
343 <function>report_reap</function> callback may already have begun, or if
344 you try to newly set <constant>UTRACE_EVENT(DEATH)</constant> or
345 <constant>UTRACE_EVENT(QUIESCE)</constant> when the target is already
346 dead or dying. Like <function>utrace_control</function>, it returns
347 <constant>-ESRCH</constant> when the thread has already been detached
348 (including forcible detach on reaping). This lets the tracing engine
349 know for sure which event callbacks it will or won't see after
350 <function>utrace_set_events</function> has returned. By checking for
351 errors, it can know whether to clean up its data structures immediately
352 or to let its callbacks do the work.
353 </para>
354 </sect2>
356 <sect2 id="barrier"><title>Using <function>utrace_barrier</function></title>
357 <para>
358 When a thread is safely stopped, calling
359 <function>utrace_control</function> with <constant>UTRACE_DETACH</constant>
360 or calling <function>utrace_set_events</function> to disable some events
361 ensures synchronously that your engine won't get any more of the callbacks
362 that have been disabled (none at all when detaching). But these can also
363 be used while the thread is not stopped, when it might be simultaneously
364 making a callback to your engine. For this situation, these calls return
365 <constant>-EINPROGRESS</constant> when it's possible a callback is in
366 progress. If you are not prepared to have your old callbacks still run,
367 then you can synchronize to be sure all the old callbacks are finished,
368 using <function>utrace_barrier</function>. This is necessary if the
369 kernel module containing your callback code is going to be unloaded.
370 </para>
371 <para>
372 After using <constant>UTRACE_DETACH</constant> once, further calls to
373 <function>utrace_control</function> with the same engine pointer will
374 return <constant>-ESRCH</constant>. In contrast, after getting
375 <constant>-EINPROGRESS</constant> from
376 <function>utrace_set_events</function>, you can call
377 <function>utrace_set_events</function> again later and if it returns zero
378 then know the old callbacks have finished.
379 </para>
380 <para>
381 Unlike all other calls, <function>utrace_barrier</function> (and
382 <function>utrace_barrier_pid</function>) will accept any engine pointer you
383 hold a reference on, even if <constant>UTRACE_DETACH</constant> has already
384 been used. After any <function>utrace_control</function> or
385 <function>utrace_set_events</function> call (these do not block), you can
386 call <function>utrace_barrier</function> to block until callbacks have
387 finished. This returns <constant>-ESRCH</constant> only if the engine is
388 completely detached (finished all callbacks). Otherwise it waits
389 until the thread is definitely not in the midst of a callback to this
390 engine and then returns zero, but can return
391 <constant>-ERESTARTSYS</constant> if its wait is interrupted.
392 </para>
393 </sect2>
395 </sect1>
397 </chapter>
399 <chapter id="core"><title>utrace core API</title>
401 <para>
402 The utrace API is declared in <filename>&lt;linux/utrace.h&gt;</filename>.
403 </para>
405 !Iinclude/linux/utrace.h
406 !Ekernel/utrace.c
408 </chapter>
410 <chapter id="machine"><title>Machine State</title>
412 <para>
413 The <function>task_current_syscall</function> function can be used on any
414 valid <structname>struct task_struct</structname> at any time, and does
415 not even require that <function>utrace_attach_task</function> was used at all.
416 </para>
418 <para>
419 The other ways to access the registers and other machine-dependent state of
420 a task can only be used on a task that is at a known safe point. The safe
421 points are all the places where <function>utrace_set_events</function> can
422 request callbacks (except for the <constant>DEATH</constant> and
423 <constant>REAP</constant> events). So at any event callback, it is safe to
424 examine <varname>current</varname>.
425 </para>
427 <para>
428 One task can examine another only after a callback in the target task that
429 returns <constant>UTRACE_STOP</constant> so that task will not return to user
430 mode after the safe point. This guarantees that the task will not resume
431 until the same engine uses <function>utrace_control</function>, unless the
432 task dies suddenly. To examine safely, one must use a pair of calls to
433 <function>utrace_prepare_examine</function> and
434 <function>utrace_finish_examine</function> surrounding the calls to
435 <structname>struct user_regset</structname> functions or direct examination
436 of task data structures. <function>utrace_prepare_examine</function> returns
437 an error if the task is not properly stopped and not dead. After a
438 successful examination, the paired <function>utrace_finish_examine</function>
439 call returns an error if the task ever woke up during the examination. If
440 so, any data gathered may be scrambled and should be discarded. This means
441 there was a spurious wake-up (which should not happen), or a sudden death.
442 </para>
444 <sect1 id="regset"><title><structname>struct user_regset</structname></title>
446 <para>
447 The <structname>struct user_regset</structname> API
448 is declared in <filename>&lt;linux/regset.h&gt;</filename>.
449 </para>
451 !Finclude/linux/regset.h
453 </sect1>
455 <sect1 id="task_current_syscall">
456 <title><filename>System Call Information</filename></title>
458 <para>
459 This function is declared in <filename>&lt;linux/ptrace.h&gt;</filename>.
460 </para>
462 !Elib/syscall.c
464 </sect1>
466 <sect1 id="syscall"><title><filename>System Call Tracing</filename></title>
468 <para>
469 The arch API for system call information is declared in
470 <filename>&lt;asm/syscall.h&gt;</filename>.
471 Each of these calls can be used only at system call entry tracing,
472 or can be used only at system call exit and the subsequent safe points
473 before returning to user mode.
474 At system call entry tracing means either during a
475 <structfield>report_syscall_entry</structfield> callback,
476 or any time after that callback has returned <constant>UTRACE_STOP</constant>.
477 </para>
479 !Finclude/asm-generic/syscall.h
481 </sect1>
483 </chapter>
485 <chapter id="internals"><title>Kernel Internals</title>
487 <para>
488 This chapter covers the interface to the tracing infrastructure
489 from the core of the kernel and the architecture-specific code.
490 This is for maintainers of the kernel and arch code, and not relevant
491 to using the tracing facilities described in preceding chapters.
492 </para>
494 <sect1 id="tracehook"><title>Core Calls In</title>
496 <para>
497 These calls are declared in <filename>&lt;linux/tracehook.h&gt;</filename>.
498 The core kernel calls these functions at various important places.
499 </para>
501 !Finclude/linux/tracehook.h
503 </sect1>
505 <sect1 id="arch"><title>Architecture Calls Out</title>
507 <para>
508 An arch that has done all these things sets
509 <constant>CONFIG_HAVE_ARCH_TRACEHOOK</constant>.
510 This is required to enable the <application>utrace</application> code.
511 </para>
513 <sect2 id="arch-ptrace"><title><filename>&lt;asm/ptrace.h&gt;</filename></title>
515 <para>
516 An arch defines these in <filename>&lt;asm/ptrace.h&gt;</filename>
517 if it supports hardware single-step or block-step features.
518 </para>
520 !Finclude/linux/ptrace.h arch_has_single_step arch_has_block_step
521 !Finclude/linux/ptrace.h user_enable_single_step user_enable_block_step
522 !Finclude/linux/ptrace.h user_disable_single_step
524 </sect2>
526 <sect2 id="arch-syscall">
527 <title><filename>&lt;asm/syscall.h&gt;</filename></title>
529 <para>
530 An arch provides <filename>&lt;asm/syscall.h&gt;</filename> that
531 defines these as inlines, or declares them as exported functions.
532 These interfaces are described in <xref linkend="syscall"/>.
533 </para>
535 </sect2>
537 <sect2 id="arch-tracehook">
538 <title><filename>&lt;linux/tracehook.h&gt;</filename></title>
540 <para>
541 An arch must define <constant>TIF_NOTIFY_RESUME</constant>
542 and <constant>TIF_SYSCALL_TRACE</constant>
543 in its <filename>&lt;asm/thread_info.h&gt;</filename>.
544 The arch code must call the following functions, all declared
545 in <filename>&lt;linux/tracehook.h&gt;</filename> and
546 described in <xref linkend="tracehook"/>:
548 <itemizedlist>
549 <listitem>
550 <para><function>tracehook_notify_resume</function></para>
551 </listitem>
552 <listitem>
553 <para><function>tracehook_report_syscall_entry</function></para>
554 </listitem>
555 <listitem>
556 <para><function>tracehook_report_syscall_exit</function></para>
557 </listitem>
558 <listitem>
559 <para><function>tracehook_signal_handler</function></para>
560 </listitem>
561 </itemizedlist>
563 </para>
565 </sect2>
567 </sect1>
569 </chapter>
571 </book>