1 <?xml version="1.0"?> <!-- -*- sgml -*- -->
2 <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
3 "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"
4 [ <!ENTITY % vg-entities SYSTEM "../../docs/xml/vg-entities.xml"> %vg-entities; ]>
6 <chapter id="cl-manual" xreflabel="Callgrind Manual">
7 <title>Callgrind: a call-graph generating cache and branch prediction profiler</title>
10 <para>To use this tool, you must specify
11 <option>--tool=callgrind</option> on the
12 Valgrind command line.</para>
14 <sect1 id="cl-manual.use" xreflabel="Overview">
15 <title>Overview</title>
17 <para>Callgrind is a profiling tool that records the call history among
18 functions in a program's run as a call-graph.
19 By default, the collected data consists of
20 the number of instructions executed, their relationship
21 to source lines, the caller/callee relationship between functions,
22 and the numbers of such calls.
23 Optionally, cache simulation and/or branch prediction (similar to Cachegrind)
24 can produce further information about the runtime behavior of an application.
27 <para>The profile data is written out to a file at program
28 termination. For presentation of the data, and interactive control
29 of the profiling, two command line tools are provided:</para>
32 <term><command>callgrind_annotate</command></term>
34 <para>This command reads in the profile data, and prints a
35 sorted lists of functions, optionally with source annotation.</para>
37 <para>For graphical visualization of the data, try
38 <ulink url="&cl-gui-url;">KCachegrind</ulink>, which is a KDE/Qt based
39 GUI that makes it easy to navigate the large amount of data that
40 Callgrind produces.</para>
46 <term><command>callgrind_control</command></term>
48 <para>This command enables you to interactively observe and control
49 the status of a program currently running under Callgrind's control,
50 without stopping the program. You can get statistics information as
51 well as the current stack trace, and you can request zeroing of counters
52 or dumping of profile data.</para>
57 <sect2 id="cl-manual.functionality" xreflabel="Functionality">
58 <title>Functionality</title>
60 <para>Cachegrind collects flat profile data: event counts (data reads,
61 cache misses, etc.) are attributed directly to the function they
62 occurred in. This cost attribution mechanism is
63 called <emphasis>self</emphasis> or <emphasis>exclusive</emphasis>
66 <para>Callgrind extends this functionality by propagating costs
67 across function call boundaries. If function <function>foo</function> calls
68 <function>bar</function>, the costs from <function>bar</function> are added into
69 <function>foo</function>'s costs. When applied to the program as a whole,
70 this builds up a picture of so called <emphasis>inclusive</emphasis>
71 costs, that is, where the cost of each function includes the costs of
72 all functions it called, directly or indirectly.</para>
74 <para>As an example, the inclusive cost of
75 <function>main</function> should be almost 100 percent
76 of the total program cost. Because of costs arising before
77 <function>main</function> is run, such as
78 initialization of the run time linker and construction of global C++
79 objects, the inclusive cost of <function>main</function>
80 is not exactly 100 percent of the total program cost.</para>
82 <para>Together with the call graph, this allows you to find the
83 specific call chains starting from
84 <function>main</function> in which the majority of the
85 program's costs occur. Caller/callee cost attribution is also useful
86 for profiling functions called from multiple call sites, and where
87 optimization opportunities depend on changing code in the callers, in
88 particular by reducing the call count.</para>
90 <para>Callgrind's cache simulation is based on that of Cachegrind.
91 Read the documentation for <xref linkend="&vg-cg-manual-id;"/> first. The material
92 below describes the features supported in addition to Cachegrind's
95 <para>Callgrind's ability to detect function calls and returns depends
96 on the instruction set of the platform it is run on. It works best on
97 x86 and amd64, and unfortunately currently does not work so well on
98 PowerPC, ARM, Thumb or MIPS code. This is because there are no explicit
99 call or return instructions in these instruction sets, so Callgrind
100 has to rely on heuristics to detect calls and returns.</para>
104 <sect2 id="cl-manual.basics" xreflabel="Basic Usage">
105 <title>Basic Usage</title>
107 <para>As with Cachegrind, you probably want to compile with debugging info
108 (the <option>-g</option> option) and with optimization turned on.</para>
110 <para>To start a profile run for a program, execute:
111 <screen>valgrind --tool=callgrind [callgrind options] your-program [program options]</screen>
114 <para>While the simulation is running, you can observe execution with:
115 <screen>callgrind_control -b</screen>
116 This will print out the current backtrace. To annotate the backtrace with
118 <screen>callgrind_control -e -b</screen>
121 <para>After program termination, a profile data file named
122 <computeroutput>callgrind.out.<pid></computeroutput>
123 is generated, where <emphasis>pid</emphasis> is the process ID
124 of the program being profiled.
125 The data file contains information about the calls made in the
126 program among the functions executed, together with
127 <command>Instruction Read</command> (Ir) event counts.</para>
129 <para>To generate a function-by-function summary from the profile
131 <screen>callgrind_annotate [options] callgrind.out.<pid></screen>
132 This summary is similar to the output you get from a Cachegrind
133 run with cg_annotate: the list
134 of functions is ordered by exclusive cost of functions, which also
135 are the ones that are shown.
136 Important for the additional features of Callgrind are
137 the following two options:</para>
141 <para><option>--inclusive=yes</option>: Instead of using
142 exclusive cost of functions as sorting order, use and show
143 inclusive cost.</para>
147 <para><option>--tree=both</option>: Interleave into the
148 top level list of functions, information on the callers and the callees
149 of each function. In these lines, which represents executed
150 calls, the cost gives the number of events spent in the call.
151 Indented, above each function, there is the list of callers,
152 and below, the list of callees. The sum of events in calls to
153 a given function (caller lines), as well as the sum of events in
154 calls from the function (callee lines) together with the self
155 cost, gives the total inclusive cost of the function.</para>
159 <para>By default, you will also get annotated source code
160 for all relevant functions for which the source can be found. In
161 addition to source annotation as produced by
162 <computeroutput>cg_annotate</computeroutput>, you will see the
163 annotated call sites with call counts. For all other options,
164 consult the (Cachegrind) documentation for
165 <computeroutput>cg_annotate</computeroutput>.
168 <para>For better call graph browsing experience, it is highly recommended
169 to use <ulink url="&cl-gui-url;">KCachegrind</ulink>.
171 has a significant fraction of its cost in <emphasis>cycles</emphasis> (sets
172 of functions calling each other in a recursive manner), you have to
173 use KCachegrind, as <computeroutput>callgrind_annotate</computeroutput>
174 currently does not do any cycle detection, which is important to get correct
175 results in this case.</para>
177 <para>If you are additionally interested in measuring the
178 cache behavior of your program, use Callgrind with the option
179 <option><link linkend="clopt.cache-sim">--cache-sim=yes</link></option>.
180 For branch prediction simulation, use
181 <option><link linkend="clopt.branch-sim">--branch-sim=yes</link></option>.
182 Expect a further slow down approximately by a factor of 2.</para>
184 <para>If the program section you want to profile is somewhere in the
185 middle of the run, it is beneficial to
186 <emphasis>fast forward</emphasis> to this section without any
187 profiling, and then enable profiling. This is achieved by using
188 the command line option
189 <option><link linkend="opt.instr-atstart">--instr-atstart=no</link></option>
190 and running, in a shell:
191 <computeroutput>callgrind_control -i on</computeroutput> just before the
192 interesting code section is executed. To exactly specify
193 the code position where profiling should start, use the client request
194 <computeroutput><link linkend="cr.start-instr">CALLGRIND_START_INSTRUMENTATION</link></computeroutput>.</para>
196 <para>If you want to be able to see assembly code level annotation, specify
197 <option><link linkend="opt.dump-instr">--dump-instr=yes</link></option>.
198 This will produce profile data at instruction granularity.
199 Note that the resulting profile data
200 can only be viewed with KCachegrind. For assembly annotation, it also is
201 interesting to see more details of the control flow inside of functions,
202 i.e. (conditional) jumps. This will be collected by further specifying
203 <option><link linkend="opt.collect-jumps">--collect-jumps=yes</link></option>.
210 <sect1 id="cl-manual.usage" xreflabel="Advanced Usage">
211 <title>Advanced Usage</title>
213 <sect2 id="cl-manual.dumps"
214 xreflabel="Multiple dumps from one program run">
215 <title>Multiple profiling dumps from one program run</title>
217 <para>Sometimes you are not interested in characteristics of a full
218 program run, but only of a small part of it, for example execution of one
219 algorithm. If there are multiple algorithms, or one algorithm
220 running with different input data, it may even be useful to get different
221 profile information for different parts of a single program run.</para>
223 <para>Profile data files have names of the form
225 callgrind.out.<emphasis>pid</emphasis>.<emphasis>part</emphasis>-<emphasis>threadID</emphasis>
228 <para>where <emphasis>pid</emphasis> is the PID of the running
229 program, <emphasis>part</emphasis> is a number incremented on each
230 dump (".part" is skipped for the dump at program termination), and
231 <emphasis>threadID</emphasis> is a thread identification
232 ("-threadID" is only used if you request dumps of individual
234 <option><link linkend="opt.separate-threads">--separate-threads=yes</link></option>).
237 <para>There are different ways to generate multiple profile dumps
238 while a program is running under Callgrind's supervision. Nevertheless,
239 all methods trigger the same action, which is "dump all profile
240 information since the last dump or program start, and zero cost
241 counters afterwards". To allow for zeroing cost counters without
242 dumping, there is a second action "zero all cost counters now".
243 The different methods are:</para>
247 <para><command>Dump on program termination.</command>
248 This method is the standard way and doesn't need any special
249 action on your part.</para>
253 <para><command>Spontaneous, interactive dumping.</command> Use
254 <screen>callgrind_control -d [hint [PID/Name]]</screen> to
255 request the dumping of profile information of the supervised
256 application with PID or Name. <emphasis>hint</emphasis> is an
257 arbitrary string you can optionally specify to later be able to
258 distinguish profile dumps. The control program will not terminate
259 before the dump is completely written. Note that the application
260 must be actively running for detection of the dump command. So,
261 for a GUI application, resize the window, or for a server, send a
263 <para>If you are using <ulink url="&cl-gui-url;">KCachegrind</ulink>
264 for browsing of profile information, you can use the toolbar
265 button <command>Force dump</command>. This will request a dump
266 and trigger a reload after the dump is written.</para>
270 <para><command>Periodic dumping after execution of a specified
271 number of basic blocks</command>. For this, use the command line
272 option <option><link linkend="opt.dump-every-bb">--dump-every-bb=count</link></option>.
277 <para><command>Dumping at enter/leave of specified functions.</command>
279 option <option><link linkend="opt.dump-before">--dump-before=function</link></option>
280 and <option><link linkend="opt.dump-after">--dump-after=function</link></option>.
281 To zero cost counters before entering a function, use
282 <option><link linkend="opt.zero-before">--zero-before=function</link></option>.</para>
283 <para>You can specify these options multiple times for different
284 functions. Function specifications support wildcards: e.g. use
285 <option><link linkend="opt.dump-before">--dump-before='foo*'</link></option> to
286 generate dumps before entering any function starting with
287 <emphasis>foo</emphasis>.</para>
291 <para><command>Program controlled dumping.</command>
293 <computeroutput><link linkend="cr.dump-stats">CALLGRIND_DUMP_STATS</link>;</computeroutput>
294 at the position in your code where you want a profile dump to
296 <computeroutput><link linkend="cr.zero-stats">CALLGRIND_ZERO_STATS</link>;</computeroutput> to only
297 zero profile counters.
298 See <xref linkend="cl-manual.clientrequests"/> for more information on
299 Callgrind specific client requests.</para>
303 <para>If you are running a multi-threaded application and specify the
305 <option><link linkend="opt.separate-threads">--separate-threads=yes</link></option>,
306 every thread will be profiled on its own and will create its own
307 profile dump. Thus, the last two methods will only generate one dump
308 of the currently running thread. With the other methods, you will get
309 multiple dumps (one for each thread) on a dump request.</para>
315 <sect2 id="cl-manual.limits"
316 xreflabel="Limiting range of event collection">
317 <title>Limiting the range of collected events</title>
319 <para>By default, whenever events are happening (such as an
320 instruction execution or cache hit/miss), Callgrind is aggregating
321 them into event counters. However, you may be interested only in
322 what is happening within a given function or starting from a given
323 program phase. To this end, you can disable event aggregation for
324 uninteresting program parts. While attribution of events to
325 functions as well as producing separate output per program phase
326 can be done by other means (see previous section), there are two
327 benefits by disabling aggregation. First, this is very
328 fine-granular (e.g. just for a loop within a function). Second,
329 disabling event aggregation for complete program phases allows to
330 switch off time-consuming cache simulation and allows Callgrind to
331 progress at much higher speed with an slowdown of around factor 2
332 (identical to <computeroutput>valgrind
333 --tool=none</computeroutput>).
336 <para>There are two aspects which influence whether Callgrind is
337 aggregating events at some point in time of program execution.
338 First, there is the <emphasis>collection state</emphasis>. If this
339 is off, no aggregation will be done. By changing the collection
340 state, you can control event aggregation at a very fine
341 granularity. However, there is not much difference in regard to
342 execution speed of Callgrind. By default, collection is switched
343 on, but can be disabled by different means (see below). Second,
344 there is the <emphasis>instrumentation mode</emphasis> in which
345 Callgrind is running. This mode either can be on or off. If
346 instrumentation is off, no observation of actions in the program
347 will be done and thus, no actions will be forwarded to the
348 simulator which could trigger events. In the end, no events will
349 be aggregated. The huge benefit is the much higher speed with
350 instrumentation switched off. However, this only should be used
351 with care and in a coarse fashion: every mode change resets the
352 simulator state (ie. whether a memory block is cached or not) and
353 flushes Valgrinds internal cache of instrumented code blocks,
354 resulting in latency penalty at switching time. Also, cache
355 simulator results directly after switching on instrumentation will
356 be skewed due to identified cache misses which would not happen in
357 reality (if you care about this warm-up effect, you should make
358 sure to temporarly have collection state switched off directly
359 after turning instrumentation mode on). However, switching
360 instrumentation state is very useful to skip larger program phases
361 such as an initialization phase. By default, instrumentation is
362 switched on, but as with the collection state, can be changed by
366 <para>Callgrind can start with instrumentation mode switched off by
368 <option><link linkend="opt.instr-atstart">--instr-atstart=no</link></option>.
369 Afterwards, instrumentation can be controlled in two ways: first,
370 interactively with: <screen>callgrind_control -i on</screen> (and
371 switching off again by specifying "off" instead of "on"). Second,
372 instrumentation state can be programmatically changed with the
373 macros <computeroutput><link linkend="cr.start-instr">CALLGRIND_START_INSTRUMENTATION</link>;</computeroutput>
374 and <computeroutput><link linkend="cr.stop-instr">CALLGRIND_STOP_INSTRUMENTATION</link>;</computeroutput>.
377 <para>Similarly, the collection state at program start can be
379 <option><link linkend="opt.instr-atstart">--instr-atstart=no</link></option>.
380 During execution, it can be controlled programmatically with the
381 macro <computeroutput>CALLGRIND_TOGGLE_COLLECT;</computeroutput>.
382 Further, you can limit event collection to a specific function by
383 using <option><link linkend="opt.toggle-collect">--toggle-collect=function</link></option>.
384 This will toggle the collection state on entering and leaving the
385 specified function. When this option is in effect, the default
386 collection state at program start is "off". Only events happening
387 while running inside of the given function will be
388 collected. Recursive calls of the given function do not trigger
389 any action. This option can be given multiple times to specify
390 different functions of interest.</para>
393 <sect2 id="cl-manual.busevents" xreflabel="Counting global bus events">
394 <title>Counting global bus events</title>
396 <para>For access to shared data among threads in a multithreaded
397 code, synchronization is required to avoid raced conditions.
398 Synchronization primitives are usually implemented via atomic instructions.
399 However, excessive use of such instructions can lead to performance
402 <para>To enable analysis of this problem, Callgrind optionally can count
403 the number of atomic instructions executed. More precisely, for x86/x86_64,
404 these are instructions using a lock prefix. For architectures supporting
405 LL/SC, these are the number of SC instructions executed. For both, the term
406 "global bus events" is used.</para>
408 <para>The short name of the event type used for global bus events is "Ge".
409 To count global bus events, use
410 <option><link linkend="clopt.collect-bus">--collect-bus=yes</link></option>.
414 <sect2 id="cl-manual.cycles" xreflabel="Avoiding cycles">
415 <title>Avoiding cycles</title>
417 <para>Informally speaking, a cycle is a group of functions which
418 call each other in a recursive way.</para>
420 <para>Formally speaking, a cycle is a nonempty set S of functions,
421 such that for every pair of functions F and G in S, it is possible
422 to call from F to G (possibly via intermediate functions) and also
423 from G to F. Furthermore, S must be maximal -- that is, be the
424 largest set of functions satisfying this property. For example, if
425 a third function H is called from inside S and calls back into S,
426 then H is also part of the cycle and should be included in S.</para>
428 <para>Recursion is quite usual in programs, and therefore, cycles
429 sometimes appear in the call graph output of Callgrind. However,
430 the title of this chapter should raise two questions: What is bad
431 about cycles which makes you want to avoid them? And: How can
432 cycles be avoided without changing program code?</para>
434 <para>Cycles are not bad in itself, but tend to make performance
435 analysis of your code harder. This is because inclusive costs
436 for calls inside of a cycle are meaningless. The definition of
437 inclusive cost, i.e. self cost of a function plus inclusive cost
438 of its callees, needs a topological order among functions. For
439 cycles, this does not hold true: callees of a function in a cycle include
440 the function itself. Therefore, KCachegrind does cycle detection
441 and skips visualization of any inclusive cost for calls inside
442 of cycles. Further, all functions in a cycle are collapsed into artificial
443 functions called like <computeroutput>Cycle 1</computeroutput>.</para>
445 <para>Now, when a program exposes really big cycles (as is
446 true for some GUI code, or in general code using event or callback based
447 programming style), you lose the nice property to let you pinpoint
448 the bottlenecks by following call chains from
449 <function>main</function>, guided via
450 inclusive cost. In addition, KCachegrind loses its ability to show
451 interesting parts of the call graph, as it uses inclusive costs to
452 cut off uninteresting areas.</para>
454 <para>Despite the meaningless of inclusive costs in cycles, the big
455 drawback for visualization motivates the possibility to temporarily
456 switch off cycle detection in KCachegrind, which can lead to
457 misguiding visualization. However, often cycles appear because of
458 unlucky superposition of independent call chains in a way that
459 the profile result will see a cycle. Neglecting uninteresting
460 calls with very small measured inclusive cost would break these
461 cycles. In such cases, incorrect handling of cycles by not detecting
462 them still gives meaningful profiling visualization.</para>
464 <para>It has to be noted that currently, <command>callgrind_annotate</command>
465 does not do any cycle detection at all. For program executions with function
466 recursion, it e.g. can print nonsense inclusive costs way above 100%.</para>
468 <para>After describing why cycles are bad for profiling, it is worth
469 talking about cycle avoidance. The key insight here is that symbols in
470 the profile data do not have to exactly match the symbols found in the
471 program. Instead, the symbol name could encode additional information
472 from the current execution context such as recursion level of the
473 current function, or even some part of the call chain leading to the
474 function. While encoding of additional information into symbols is
475 quite capable of avoiding cycles, it has to be used carefully to not cause
476 symbol explosion. The latter imposes large memory requirement for Callgrind
477 with possible out-of-memory conditions, and big profile data files.</para>
479 <para>A further possibility to avoid cycles in Callgrind's profile data
480 output is to simply leave out given functions in the call graph. Of course, this
481 also skips any call information from and to an ignored function, and thus can
482 break a cycle. Candidates for this typically are dispatcher functions in event
483 driven code. The option to ignore calls to a function is
484 <option><link linkend="opt.fn-skip">--fn-skip=function</link></option>.
485 Aside from possibly breaking cycles, this is used in Callgrind to skip
486 trampoline functions in the PLT sections
487 for calls to functions in shared libraries. You can see the difference
489 <option><link linkend="opt.skip-plt">--skip-plt=no</link></option>.
490 If a call is ignored, its cost events will be propagated to the
491 enclosing function.</para>
493 <para>If you have a recursive function, you can distinguish the first
494 10 recursion levels by specifying
495 <option><link linkend="opt.separate-recs-num">--separate-recs10=function</link></option>.
496 Or for all functions with
497 <option><link linkend="opt.separate-recs">--separate-recs=10</link></option>,
499 give you much bigger profile data files. In the profile data, you will see
500 the recursion levels of "func" as the different functions with names
501 "func", "func'2", "func'3" and so on.</para>
503 <para>If you have call chains "A > B > C" and "A > C > B"
504 in your program, you usually get a "false" cycle "B <> C". Use
505 <option><link linkend="opt.separate-callers-num">--separate-callers2=B</link></option>
506 <option><link linkend="opt.separate-callers-num">--separate-callers2=C</link></option>,
507 and functions "B" and "C" will be treated as different functions
508 depending on the direct caller. Using the apostrophe for appending
509 this "context" to the function name, you get "A > B'A > C'B"
510 and "A > C'A > B'C", and there will be no cycle. Use
511 <option><link linkend="opt.separate-callers">--separate-callers=2</link></option> to get a 2-caller
512 dependency for all functions. Note that doing this will increase
513 the size of profile data files.</para>
517 <sect2 id="cl-manual.forkingprograms" xreflabel="Forking Programs">
518 <title>Forking Programs</title>
520 <para>If your program forks, the child will inherit all the profiling
521 data that has been gathered for the parent. To start with empty profile
522 counter values in the child, the client request
523 <computeroutput><link linkend="cr.zero-stats">CALLGRIND_ZERO_STATS</link>;</computeroutput>
524 can be inserted into code to be executed by the child, directly
526 <computeroutput>fork</computeroutput>.</para>
528 <para>However, you will have to make sure that the output file format string
529 (controlled by <option>--callgrind-out-file</option>) does contain
530 <option>%p</option> (which is true by default). Otherwise, the
531 outputs from the parent and child will overwrite each other or will be
532 intermingled, which almost certainly is not what you want.</para>
534 <para>You will be able to control the new child independently from
535 the parent via callgrind_control.</para>
542 <sect1 id="cl-manual.options" xreflabel="Callgrind Command-line Options">
543 <title>Callgrind Command-line Options</title>
546 In the following, options are grouped into classes.
549 Some options allow the specification of a function/symbol name, such as
550 <option><link linkend="opt.dump-before">--dump-before=function</link></option>, or
551 <option><link linkend="opt.fn-skip">--fn-skip=function</link></option>.
552 All these options can be specified multiple times for different functions.
553 In addition, the function specifications actually are patterns by supporting
554 the use of wildcards '*' (zero or more arbitrary characters) and '?'
555 (exactly one arbitrary character), similar to file name globbing in the
556 shell. This feature is important especially for C++, as without wildcard
557 usage, the function would have to be specified in full extent, including
558 parameter signature. </para>
560 <sect2 id="cl-manual.options.creation"
561 xreflabel="Dump creation options">
562 <title>Dump creation options</title>
565 These options influence the name and format of the profile data files.
568 <!-- start of xi:include in the manpage -->
569 <variablelist id="cl.opts.list.creation">
571 <varlistentry id="opt.callgrind-out-file" xreflabel="--callgrind-out-file">
573 <option><![CDATA[--callgrind-out-file=<file> ]]></option>
576 <para>Write the profile data to
577 <computeroutput>file</computeroutput> rather than to the default
579 <computeroutput>callgrind.out.<pid></computeroutput>. The
580 <option>%p</option> and <option>%q</option> format specifiers
581 can be used to embed the process ID and/or the contents of an
582 environment variable in the name, as is the case for the core
584 <option><link linkend="opt.log-file">--log-file</link></option>.
585 When multiple dumps are made, the file name
586 is modified further; see below.</para>
590 <varlistentry id="opt.dump-line" xreflabel="--dump-line">
592 <option><![CDATA[--dump-line=<no|yes> [default: yes] ]]></option>
595 <para>This specifies that event counting should be performed at
596 source line granularity. This allows source annotation for sources
597 which are compiled with debug information
598 (<option>-g</option>).</para>
602 <varlistentry id="opt.dump-instr" xreflabel="--dump-instr">
604 <option><![CDATA[--dump-instr=<no|yes> [default: no] ]]></option>
607 <para>This specifies that event counting should be performed at
608 per-instruction granularity.
609 This allows for assembly code
610 annotation. Currently the results can only be
611 displayed by KCachegrind.</para>
615 <varlistentry id="opt.compress-strings" xreflabel="--compress-strings">
617 <option><![CDATA[--compress-strings=<no|yes> [default: yes] ]]></option>
620 <para>This option influences the output format of the profile data.
621 It specifies whether strings (file and function names) should be
622 identified by numbers. This shrinks the file,
623 but makes it more difficult
624 for humans to read (which is not recommended in any case).</para>
628 <varlistentry id="opt.compress-pos" xreflabel="--compress-pos">
630 <option><![CDATA[--compress-pos=<no|yes> [default: yes] ]]></option>
633 <para>This option influences the output format of the profile data.
634 It specifies whether numerical positions are always specified as absolute
635 values or are allowed to be relative to previous numbers.
636 This shrinks the file size.</para>
640 <varlistentry id="opt.combine-dumps" xreflabel="--combine-dumps">
642 <option><![CDATA[--combine-dumps=<no|yes> [default: no] ]]></option>
645 <para>When enabled, when multiple profile data parts are to be
646 generated these parts are appended to the same output file.
647 Not recommended.</para>
654 <sect2 id="cl-manual.options.activity"
655 xreflabel="Activity options">
656 <title>Activity options</title>
659 These options specify when actions relating to event counts are to
660 be executed. For interactive control use callgrind_control.
663 <!-- start of xi:include in the manpage -->
664 <variablelist id="cl.opts.list.activity">
666 <varlistentry id="opt.dump-every-bb" xreflabel="--dump-every-bb">
668 <option><![CDATA[--dump-every-bb=<count> [default: 0, never] ]]></option>
671 <para>Dump profile data every <option>count</option> basic blocks.
672 Whether a dump is needed is only checked when Valgrind's internal
673 scheduler is run. Therefore, the minimum setting useful is about 100000.
674 The count is a 64-bit value to make long dump periods possible.
679 <varlistentry id="opt.dump-before" xreflabel="--dump-before">
681 <option><![CDATA[--dump-before=<function> ]]></option>
684 <para>Dump when entering <option>function</option>.</para>
688 <varlistentry id="opt.zero-before" xreflabel="--zero-before">
690 <option><![CDATA[--zero-before=<function> ]]></option>
693 <para>Zero all costs when entering <option>function</option>.</para>
697 <varlistentry id="opt.dump-after" xreflabel="--dump-after">
699 <option><![CDATA[--dump-after=<function> ]]></option>
702 <para>Dump when leaving <option>function</option>.</para>
707 <!-- end of xi:include in the manpage -->
710 <sect2 id="cl-manual.options.collection"
711 xreflabel="Data collection options">
712 <title>Data collection options</title>
715 These options specify when events are to be aggregated into event counts.
716 Also see <xref linkend="cl-manual.limits"/>.</para>
718 <!-- start of xi:include in the manpage -->
719 <variablelist id="cl.opts.list.collection">
721 <varlistentry id="opt.instr-atstart" xreflabel="--instr-atstart">
723 <option><![CDATA[--instr-atstart=<yes|no> [default: yes] ]]></option>
726 <para>Specify if you want Callgrind to start simulation and
727 profiling from the beginning of the program.
728 When set to <computeroutput>no</computeroutput>,
729 Callgrind will not be able
730 to collect any information, including calls, but it will have at
731 most a slowdown of around 4, which is the minimum Valgrind
732 overhead. Instrumentation can be interactively enabled via
733 <computeroutput>callgrind_control -i on</computeroutput>.</para>
734 <para>Note that the resulting call graph will most probably not
735 contain <function>main</function>, but will contain all the
736 functions executed after instrumentation was enabled.
737 Instrumentation can also be programmatically enabled/disabled. See the
738 Callgrind include file
739 <computeroutput>callgrind.h</computeroutput> for the macro
740 you have to use in your source code.</para> <para>For cache
741 simulation, results will be less accurate when switching on
742 instrumentation later in the program run, as the simulator starts
743 with an empty cache at that moment. Switch on event collection
744 later to cope with this error.</para>
748 <varlistentry id="opt.collect-atstart" xreflabel="--collect-atstart">
750 <option><![CDATA[--collect-atstart=<yes|no> [default: yes] ]]></option>
753 <para>Specify whether event collection is enabled at beginning
754 of the profile run.</para>
755 <para>To only look at parts of your program, you have two
756 possibilities:</para>
759 <para>Zero event counters before entering the program part you
760 want to profile, and dump the event counters to a file after
761 leaving that program part.</para>
764 <para>Switch on/off collection state as needed to only see
765 event counters happening while inside of the program part you
766 want to profile.</para>
769 <para>The second option can be used if the program part you want to
770 profile is called many times. Option 1, i.e. creating a lot of
771 dumps is not practical here.</para>
772 <para>Collection state can be
773 toggled at entry and exit of a given function with the
774 option <option><link linkend="opt.toggle-collect">--toggle-collect</link></option>. If you
775 use this option, collection
776 state should be disabled at the beginning. Note that the
777 specification of <option>--toggle-collect</option>
779 <option>--collect-state=no</option>.</para>
780 <para>Collection state can be toggled also by inserting the client request
782 <!-- commented out because it causes broken links in the man page
783 <xref linkend="cr.toggle-collect"/>;
785 CALLGRIND_TOGGLE_COLLECT
787 at the needed code positions.</para>
791 <varlistentry id="opt.toggle-collect" xreflabel="--toggle-collect">
793 <option><![CDATA[--toggle-collect=<function> ]]></option>
796 <para>Toggle collection on entry/exit of <option>function</option>.</para>
800 <varlistentry id="opt.collect-jumps" xreflabel="--collect-jumps">
802 <option><![CDATA[--collect-jumps=<no|yes> [default: no] ]]></option>
805 <para>This specifies whether information for (conditional) jumps
806 should be collected. As above, callgrind_annotate currently is not
807 able to show you the data. You have to use KCachegrind to get jump
808 arrows in the annotated code.</para>
812 <varlistentry id="opt.collect-systime" xreflabel="--collect-systime">
814 <option><![CDATA[--collect-systime=<no|yes|msec|usec|nsec> [default: no] ]]></option>
817 <para>This specifies whether information for system call times
818 should be collected.</para>
819 <para>The value <computeroutput>no</computeroutput> indicates to record
820 no system call information.</para>
821 <para>The other values indicate to record the number of system calls
822 done (sysCount event) and the elapsed time (sysTime event) spent
824 The <computeroutput>--collect-systime</computeroutput> value gives
825 the unit used for sysTime : milli seconds, micro seconds or nano
826 seconds. With the value <computeroutput>nsec</computeroutput>,
827 callgrind also records the cpu time spent during system calls
829 <para>The value <computeroutput>yes</computeroutput> is a synonym
830 of <computeroutput>msec</computeroutput>.
831 The value <computeroutput>nsec</computeroutput> is not supported
836 <varlistentry id="clopt.collect-bus" xreflabel="--collect-bus">
838 <option><![CDATA[--collect-bus=<no|yes> [default: no] ]]></option>
841 <para>This specifies whether the number of global bus events executed
842 should be collected. The event type "Ge" is used for these events.</para>
847 <!-- end of xi:include in the manpage -->
850 <sect2 id="cl-manual.options.separation"
851 xreflabel="Cost entity separation options">
852 <title>Cost entity separation options</title>
855 These options specify how event counts should be attributed to execution
857 For example, they specify whether the recursion level or the
858 call chain leading to a function should be taken into account,
859 and whether the thread ID should be considered.
860 Also see <xref linkend="cl-manual.cycles"/>.</para>
862 <!-- start of xi:include in the manpage -->
863 <variablelist id="cmd-options.separation">
865 <varlistentry id="opt.separate-threads" xreflabel="--separate-threads">
867 <option><![CDATA[--separate-threads=<no|yes> [default: no] ]]></option>
870 <para>This option specifies whether profile data should be generated
871 separately for every thread. If yes, the file names get "-threadID"
876 <varlistentry id="opt.separate-callers" xreflabel="--separate-callers">
878 <option><![CDATA[--separate-callers=<callers> [default: 0] ]]></option>
881 <para>Separate contexts by at most <callers> functions in the
882 call chain. See <xref linkend="cl-manual.cycles"/>.</para>
886 <varlistentry id="opt.separate-callers-num" xreflabel="--separate-callers2">
888 <option><![CDATA[--separate-callers<number>=<function> ]]></option>
891 <para>Separate <option>number</option> callers for <option>function</option>.
892 See <xref linkend="cl-manual.cycles"/>.</para>
896 <varlistentry id="opt.separate-recs" xreflabel="--separate-recs">
898 <option><![CDATA[--separate-recs=<level> [default: 2] ]]></option>
901 <para>Separate function recursions by at most <option>level</option> levels.
902 See <xref linkend="cl-manual.cycles"/>.</para>
906 <varlistentry id="opt.separate-recs-num" xreflabel="--separate-recs10">
908 <option><![CDATA[--separate-recs<number>=<function> ]]></option>
911 <para>Separate <option>number</option> recursions for <option>function</option>.
912 See <xref linkend="cl-manual.cycles"/>.</para>
916 <varlistentry id="opt.skip-plt" xreflabel="--skip-plt">
918 <option><![CDATA[--skip-plt=<no|yes> [default: yes] ]]></option>
921 <para>Ignore calls to/from PLT sections.</para>
925 <varlistentry id="opt.skip-direct-rec" xreflabel="--skip-direct-rec">
927 <option><![CDATA[--skip-direct-rec=<no|yes> [default: yes] ]]></option>
930 <para>Ignore direct recursions.</para>
934 <varlistentry id="opt.fn-skip" xreflabel="--fn-skip">
936 <option><![CDATA[--fn-skip=<function> ]]></option>
939 <para>Ignore calls to/from a given function. E.g. if you have a
940 call chain A > B > C, and you specify function B to be
941 ignored, you will only see A > C.</para>
942 <para>This is very convenient to skip functions handling callback
943 behaviour. For example, with the signal/slot mechanism in the
944 Qt graphics library, you only want
945 to see the function emitting a signal to call the slots connected
946 to that signal. First, determine the real call chain to see the
947 functions needed to be skipped, then use this option.</para>
952 commenting out as it is only enabled with CLG_EXPERIMENTAL. (Nb: I had to
953 insert a space between the double dash to avoid XML comment problems.)
955 <varlistentry id="opt.fn-group">
957 <option><![CDATA[- -fn-group<number>=<function> ]]></option>
960 <para>Put a function into a separate group. This influences the
961 context name for cycle avoidance. All functions inside such a
962 group are treated as being the same for context name building, which
963 resembles the call chain leading to a context. By specifying function
964 groups with this option, you can shorten the context name, as functions
965 in the same group will not appear in sequence in the name. </para>
971 <!-- end of xi:include in the manpage -->
975 <sect2 id="cl-manual.options.simulation"
976 xreflabel="Simulation options">
977 <title>Simulation options</title>
979 <!-- start of xi:include in the manpage -->
980 <variablelist id="cl.opts.list.simulation">
982 <varlistentry id="clopt.cache-sim" xreflabel="--cache-sim">
984 <option><![CDATA[--cache-sim=<yes|no> [default: no] ]]></option>
987 <para>Specify if you want to do full cache simulation. By default,
988 only instruction read accesses will be counted ("Ir").
989 With cache simulation, further event counters are enabled:
990 Cache misses on instruction reads ("I1mr"/"ILmr"),
991 data read accesses ("Dr") and related cache misses ("D1mr"/"DLmr"),
992 data write accesses ("Dw") and related cache misses ("D1mw"/"DLmw").
993 For more information, see <xref linkend="&vg-cg-manual-id;"/>.
998 <varlistentry id="clopt.branch-sim" xreflabel="--branch-sim">
1000 <option><![CDATA[--branch-sim=<yes|no> [default: no] ]]></option>
1003 <para>Specify if you want to do branch prediction simulation.
1004 Further event counters are enabled: Number of executed conditional
1005 branches and related predictor misses ("Bc"/"Bcm"), executed indirect
1006 jumps and related misses of the jump address predictor ("Bi"/"Bim").
1012 <!-- end of xi:include in the manpage -->
1016 <sect2 id="cl-manual.options.cachesimulation"
1017 xreflabel="Cache simulation options">
1018 <title>Cache simulation options</title>
1020 <!-- start of xi:include in the manpage -->
1021 <variablelist id="cl.opts.list.cachesimulation">
1023 <varlistentry id="opt.simulate-wb" xreflabel="--simulate-wb">
1025 <option><![CDATA[--simulate-wb=<yes|no> [default: no] ]]></option>
1028 <para>Specify whether write-back behavior should be simulated, allowing
1029 to distinguish LL caches misses with and without write backs.
1030 The cache model of Cachegrind/Callgrind does not specify write-through
1031 vs. write-back behavior, and this also is not relevant for the number
1032 of generated miss counts. However, with explicit write-back simulation
1033 it can be decided whether a miss triggers not only the loading of a new
1034 cache line, but also if a write back of a dirty cache line had to take
1035 place before. The new dirty miss events are ILdmr, DLdmr, and DLdmw,
1036 for misses because of instruction read, data read, and data write,
1037 respectively. As they produce two memory transactions, they should
1038 account for a doubled time estimation in relation to a normal miss.
1043 <varlistentry id="opt.simulate-hwpref" xreflabel="--simulate-hwpref">
1045 <option><![CDATA[--simulate-hwpref=<yes|no> [default: no] ]]></option>
1048 <para>Specify whether simulation of a hardware prefetcher should be
1049 added which is able to detect stream access in the second level cache
1050 by comparing accesses to separate to each page.
1051 As the simulation can not decide about any timing issues of prefetching,
1052 it is assumed that any hardware prefetch triggered succeeds before a
1053 real access is done. Thus, this gives a best-case scenario by covering
1054 all possible stream accesses.</para>
1058 <varlistentry id="opt.cacheuse" xreflabel="--cacheuse">
1060 <option><![CDATA[--cacheuse=<yes|no> [default: no] ]]></option>
1063 <para>Specify whether cache line use should be collected. For every
1064 cache line, from loading to it being evicted, the number of accesses
1065 as well as the number of actually used bytes is determined. This
1066 behavior is related to the code which triggered loading of the cache
1067 line. In contrast to miss counters, which shows the position where
1068 the symptoms of bad cache behavior (i.e. latencies) happens, the
1069 use counters try to pinpoint at the reason (i.e. the code with the
1070 bad access behavior). The new counters are defined in a way such
1071 that worse behavior results in higher cost.
1072 AcCost1 and AcCost2 are counters showing bad temporal locality
1073 for L1 and LL caches, respectively. This is done by summing up
1074 reciprocal values of the numbers of accesses of each cache line,
1075 multiplied by 1000 (as only integer costs are allowed). E.g. for
1076 a given source line with 5 read accesses, a value of 5000 AcCost
1077 means that for every access, a new cache line was loaded and directly
1078 evicted afterwards without further accesses. Similarly, SpLoss1/2
1079 shows bad spatial locality for L1 and LL caches, respectively. It
1080 gives the <emphasis>spatial loss</emphasis> count of bytes which
1081 were loaded into cache but never accessed. It pinpoints at code
1082 accessing data in a way such that cache space is wasted. This hints
1083 at bad layout of data structures in memory. Assuming a cache line
1084 size of 64 bytes and 100 L1 misses for a given source line, the
1085 loading of 6400 bytes into L1 was triggered. If SpLoss1 shows a
1086 value of 3200 for this line, this means that half of the loaded data was
1087 never used, or using a better data layout, only half of the cache
1088 space would have been needed.
1089 Please note that for cache line use counters, it currently is
1090 not possible to provide meaningful inclusive costs. Therefore,
1091 inclusive cost of these counters should be ignored.
1096 <varlistentry id="cl.opt.I1" xreflabel="--I1">
1098 <option><![CDATA[--I1=<size>,<associativity>,<line size> ]]></option>
1101 <para>Specify the size, associativity and line size of the level 1
1102 instruction cache. </para>
1106 <varlistentry id="cl.opt.D1" xreflabel="--D1">
1108 <option><![CDATA[--D1=<size>,<associativity>,<line size> ]]></option>
1111 <para>Specify the size, associativity and line size of the level 1
1116 <varlistentry id="cl.opt.LL" xreflabel="--LL">
1118 <option><![CDATA[--LL=<size>,<associativity>,<line size> ]]></option>
1121 <para>Specify the size, associativity and line size of the last-level
1126 <!-- end of xi:include in the manpage -->
1132 <sect1 id="cl-manual.monitor-commands" xreflabel="Callgrind Monitor Commands">
1133 <title>Callgrind Monitor Commands</title>
1134 <para>The Callgrind tool provides monitor commands handled by the Valgrind
1135 gdbserver (see <xref linkend="manual-core-adv.gdbserver-commandhandling"/>).
1136 Valgrind python code provides GDB front end commands giving an easier usage of
1137 the callgrind monitor commands (see
1138 <xref linkend="manual-core-adv.gdbserver-gdbmonitorfrontend"/>). To launch a
1139 callgrind monitor command via its GDB front end command, instead of prefixing
1140 the command with "monitor", you must use the GDB <varname>callgrind</varname>
1141 command (or the shorter aliases <varname>cg</varname>). Using the callgrind GDB
1142 front end command provide a more flexible usage, such as auto-completion of the
1143 command by GDB. In GDB, you can use <varname>help callgrind</varname> to get
1144 help about the callgrind front end monitor commands and you can
1145 use <varname>apropos callgrind</varname> to get all the commands mentionning the
1146 word "callgrind" in their name or on-line help.
1151 <para><varname>dump [<dump_hint>]</varname> requests to dump the
1152 profile data. </para>
1156 <para><varname>zero</varname> requests to zero the profile data
1161 <para><varname>instrumentation [on|off]</varname> requests to set
1162 (if parameter on/off is given) or get the current instrumentation state.
1167 <para><varname>status</varname> requests to print out some status
1174 <sect1 id="cl-manual.clientrequests" xreflabel="Client request reference">
1175 <title>Callgrind specific client requests</title>
1177 <para>Callgrind provides the following specific client requests in
1178 <filename>callgrind.h</filename>. See that file for the exact details of
1179 their arguments.</para>
1181 <variablelist id="cl.clientrequests.list">
1183 <varlistentry id="cr.dump-stats" xreflabel="CALLGRIND_DUMP_STATS">
1185 <computeroutput>CALLGRIND_DUMP_STATS</computeroutput>
1188 <para>Force generation of a profile dump at specified position
1189 in code, for the current thread only. Written counters will be reset
1194 <varlistentry id="cr.dump-stats-at" xreflabel="CALLGRIND_DUMP_STATS_AT">
1196 <computeroutput>CALLGRIND_DUMP_STATS_AT(string)</computeroutput>
1199 <para>Same as <computeroutput>CALLGRIND_DUMP_STATS</computeroutput>,
1200 but allows to specify a string to be able to distinguish profile
1205 <varlistentry id="cr.zero-stats" xreflabel="CALLGRIND_ZERO_STATS">
1207 <computeroutput>CALLGRIND_ZERO_STATS</computeroutput>
1210 <para>Reset the profile counters for the current thread to zero.</para>
1214 <varlistentry id="cr.toggle-collect" xreflabel="CALLGRIND_TOGGLE_COLLECT">
1216 <computeroutput>CALLGRIND_TOGGLE_COLLECT</computeroutput>
1219 <para>Toggle the collection state. This allows to ignore events
1220 with regard to profile counters. See also options
1221 <option><link linkend="opt.collect-atstart">--collect-atstart</link></option>
1223 <option><link linkend="opt.toggle-collect">--toggle-collect</link></option>.</para>
1227 <varlistentry id="cr.start-instr" xreflabel="CALLGRIND_START_INSTRUMENTATION">
1229 <computeroutput>CALLGRIND_START_INSTRUMENTATION</computeroutput>
1232 <para>Start full Callgrind instrumentation if not already enabled.
1233 When cache simulation is done, this will flush the simulated cache
1234 and lead to an artificial cache warmup phase afterwards with
1235 cache misses which would not have happened in reality. See also
1237 <option><link linkend="opt.instr-atstart">--instr-atstart</link></option>.
1242 <varlistentry id="cr.stop-instr" xreflabel="CALLGRIND_STOP_INSTRUMENTATION">
1244 <computeroutput>CALLGRIND_STOP_INSTRUMENTATION</computeroutput>
1247 <para>Stop full Callgrind instrumentation if not already disabled.
1248 This flushes Valgrinds translation cache, and does no additional
1249 instrumentation afterwards: it effectivly will run at the same
1250 speed as Nulgrind, i.e. at minimal slowdown. Use this to
1251 speed up the Callgrind run for uninteresting code parts. Use
1252 <computeroutput><link linkend="cr.start-instr">CALLGRIND_START_INSTRUMENTATION</link></computeroutput>
1253 to enable instrumentation again. See also option
1254 <option><link linkend="opt.instr-atstart">--instr-atstart</link></option>.
1265 <sect1 id="cl-manual.callgrind_annotate-options" xreflabel="callgrind_annotate Command-line Options">
1266 <title>callgrind_annotate Command-line Options</title>
1268 <!-- start of xi:include in the manpage -->
1269 <variablelist id="callgrind_annotate.opts.list">
1272 <term><option>-h --help</option></term>
1274 <para>Show summary of options.</para>
1279 <term><option>--version</option></term>
1281 <para>Show version of callgrind_annotate.</para>
1287 <option>--show=A,B,C [default: all]</option>
1290 <para>Only show figures for events A,B,C.</para>
1296 <option><![CDATA[--threshold=<0--100> [default: 99%] ]]></option>
1299 <para>Percentage of counts (of primary sort event) we are
1300 interested in.</para>
1301 <para>callgrind_annotate stops printing functions when the sum
1302 of the cost percentage of the printed functions is bigger or equal
1303 to the given threshold percentage.</para>
1309 <option>--sort=A,B,C</option>
1312 <para>Sort columns by events A,B,C [event column order].</para>
1313 <para>Optionally, each event is followed by a : and a threshold,
1314 to specify different thresholds depending on the event.</para>
1315 <para>callgrind_annotate stops printing functions when the sum
1316 of the cost percentage of the printed functions for all the events
1317 is bigger or equal to the given event threshold percentages.</para>
1318 <para>When one or more thresholds are given via this option,
1319 the value of <option>--threshold</option> is ignored.</para>
1325 <option><![CDATA[--show-percs=<no|yes> [default: no] ]]></option>
1328 <para>When enabled, a percentage is printed next to all event counts.
1329 This helps gauge the relative importance of each function and line.
1336 <option><![CDATA[--auto=<yes|no> [default: yes] ]]></option>
1339 <para>Annotate all source files containing functions that helped
1340 reach the event count threshold.</para>
1346 <option>--context=N [default: 8] </option>
1349 <para>Print N lines of context before and after annotated
1356 <option><![CDATA[--inclusive=<yes|no> [default: no] ]]></option>
1359 <para>Add subroutine costs to functions calls.</para>
1365 <option><![CDATA[--tree=<none|caller|calling|both> [default: none] ]]></option>
1368 <para>Print for each function their callers, the called functions
1375 <option><![CDATA[-I, --include=<dir> ]]></option>
1378 <para>Add <option>dir</option> to the list of directories to search
1379 for source files.</para>
1384 <!-- end of xi:include in the manpage -->
1392 <sect1 id="cl-manual.callgrind_control-options" xreflabel="callgrind_control Command-line Options">
1393 <title>callgrind_control Command-line Options</title>
1395 <para>By default, callgrind_control acts on all programs run by the
1396 current user under Callgrind. It is possible to limit the actions to
1397 specified Callgrind runs by providing a list of pids or program names as
1398 argument. The default action is to give some brief information about the
1399 applications being run under Callgrind.</para>
1401 <!-- start of xi:include in the manpage -->
1402 <variablelist id="callgrind_control.opts.list">
1405 <term><option>-h --help</option></term>
1407 <para>Show a short description, usage, and summary of options.</para>
1412 <term><option>--version</option></term>
1414 <para>Show version of callgrind_control.</para>
1419 <term><option>-l --long</option></term>
1421 <para>Show also the working directory, in addition to the brief
1422 information given by default.
1428 <term><option>-s --stat</option></term>
1430 <para>Show statistics information about active Callgrind runs.</para>
1435 <term><option>-b --back</option></term>
1437 <para>Show stack/back traces of each thread in active Callgrind runs. For
1438 each active function in the stack trace, also the number of invocations
1439 since program start (or last dump) is shown. This option can be
1440 combined with -e to show inclusive cost of active functions.</para>
1445 <term><option><![CDATA[-e [A,B,...] ]]></option> (default: all)</term>
1447 <para>Show the current per-thread, exclusive cost values of event
1448 counters. If no explicit event names are given, figures for all event
1449 types which are collected in the given Callgrind run are
1450 shown. Otherwise, only figures for event types A, B, ... are shown. If
1451 this option is combined with -b, inclusive cost for the functions of
1452 each active stack frame is provided, too.
1458 <term><option><![CDATA[--dump[=<desc>] ]]></option> (default: no description)</term>
1460 <para>Request the dumping of profile information. Optionally, a
1461 description can be specified which is written into the dump as part of
1462 the information giving the reason which triggered the dump action. This
1463 can be used to distinguish multiple dumps.</para>
1468 <term><option>-z --zero</option></term>
1470 <para>Zero all event counters.</para>
1475 <term><option>-k --kill</option></term>
1477 <para>Force a Callgrind run to be terminated.</para>
1482 <term><option><![CDATA[--instr=<on|off>]]></option></term>
1484 <para>Switch instrumentation mode on or off. If a Callgrind run has
1485 instrumentation disabled, no simulation is done and no events are
1486 counted. This is useful to skip uninteresting program parts, as there
1487 is much less slowdown (same as with the Valgrind tool "none"). See also
1488 the Callgrind option <option>--instr-atstart</option>.</para>
1493 <term><option><![CDATA[--vgdb-prefix=<prefix>]]></option></term>
1495 <para>Specify the vgdb prefix to use by callgrind_control.
1496 callgrind_control internally uses vgdb to find and control the active
1497 Callgrind runs. If the <option>--vgdb-prefix</option> option was used
1498 for launching valgrind, then the same option must be given to
1499 callgrind_control.</para>
1503 <!-- end of xi:include in the manpage -->