callgrind/docs/cl-manual.xml

   1 <?xml version="1.0"?> <!-- -*- sgml -*- -->
   2 <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
   3   "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"
   4 [ <!ENTITY % vg-entities SYSTEM "../../docs/xml/vg-entities.xml"> %vg-entities; ]>
   5
   6 <chapter id="cl-manual" xreflabel="Callgrind Manual">
   7 <title>Callgrind: a call-graph generating cache and branch prediction profiler</title>
   8
   9
  10 <para>To use this tool, you must specify
  11 <option>--tool=callgrind</option> on the
  12 Valgrind command line.</para>
  13
  14 <sect1 id="cl-manual.use" xreflabel="Overview">
  15 <title>Overview</title>
  16
  17 <para>Callgrind is a profiling tool that records the call history among
  18 functions in a program's run as a call-graph.
  19 By default, the collected data consists of
  20 the number of instructions executed, their relationship
  21 to source lines, the caller/callee relationship between functions,
  22 and the numbers of such calls.
  23 Optionally, cache simulation and/or branch prediction (similar to Cachegrind)
  24 can produce further information about the runtime behavior of an application.
  25 </para>
  26
  27 <para>The profile data is written out to a file at program
  28 termination. For presentation of the data, and interactive control
  29 of the profiling, two command line tools are provided:</para>
  30 <variablelist>
  31   <varlistentry>
  32   <term><command>callgrind_annotate</command></term>
  33   <listitem>
  34     <para>This command reads in the profile data, and prints a
  35     sorted lists of functions, optionally with source annotation.</para>
  36
  37     <para>For graphical visualization of the data, try
  38     <ulink url="&cl-gui-url;">KCachegrind</ulink>, which is a KDE/Qt based
  39     GUI that makes it easy to navigate the large amount of data that
  40     Callgrind produces.</para>
  41
  42   </listitem>
  43   </varlistentry>
  44
  45   <varlistentry>
  46   <term><command>callgrind_control</command></term>
  47   <listitem>
  48     <para>This command enables you to interactively observe and control
  49     the status of a program currently running under Callgrind's control,
  50     without stopping the program.  You can get statistics information as
  51     well as the current stack trace, and you can request zeroing of counters
  52     or dumping of profile data.</para>
  53   </listitem>
  54   </varlistentry>
  55 </variablelist>
  56
  57   <sect2 id="cl-manual.functionality" xreflabel="Functionality">
  58   <title>Functionality</title>
  59
  60 <para>Cachegrind collects flat profile data: event counts (data reads,
  61 cache misses, etc.) are attributed directly to the function they
  62 occurred in.  This cost attribution mechanism is
  63 called <emphasis>self</emphasis> or <emphasis>exclusive</emphasis>
  64 attribution.</para>
  65
  66 <para>Callgrind extends this functionality by propagating costs
  67 across function call boundaries.  If function <function>foo</function> calls
  68 <function>bar</function>, the costs from <function>bar</function> are added into
  69 <function>foo</function>'s costs.  When applied to the program as a whole,
  70 this builds up a picture of so called <emphasis>inclusive</emphasis>
  71 costs, that is, where the cost of each function includes the costs of
  72 all functions it called, directly or indirectly.</para>
  73
  74 <para>As an example, the inclusive cost of
  75 <function>main</function> should be almost 100 percent
  76 of the total program cost.  Because of costs arising before
  77 <function>main</function> is run, such as
  78 initialization of the run time linker and construction of global C++
  79 objects, the inclusive cost of <function>main</function>
  80 is not exactly 100 percent of the total program cost.</para>
  81
  82 <para>Together with the call graph, this allows you to find the
  83 specific call chains starting from
  84 <function>main</function> in which the majority of the
  85 program's costs occur.  Caller/callee cost attribution is also useful
  86 for profiling functions called from multiple call sites, and where
  87 optimization opportunities depend on changing code in the callers, in
  88 particular by reducing the call count.</para>
  89
  90 <para>Callgrind's cache simulation is based on that of Cachegrind.
  91 Read the documentation for <xref linkend="&vg-cg-manual-id;"/> first.  The material
  92 below describes the features supported in addition to Cachegrind's
  93 features.</para>
  94
  95 <para>Callgrind's ability to detect function calls and returns depends
  96 on the instruction set of the platform it is run on.  It works best on
  97 x86 and amd64, and unfortunately currently does not work so well on
  98 PowerPC, ARM, Thumb or MIPS code.  This is because there are no explicit
  99 call or return instructions in these instruction sets, so Callgrind
 100 has to rely on heuristics to detect calls and returns.</para>
 101
 102   </sect2>
 103
 104   <sect2 id="cl-manual.basics" xreflabel="Basic Usage">
 105   <title>Basic Usage</title>
 106
 107   <para>As with Cachegrind, you probably want to compile with debugging info
 108   (the <option>-g</option> option) and with optimization turned on.</para>
 109
 110   <para>To start a profile run for a program, execute:
 111   <screen>valgrind --tool=callgrind [callgrind options] your-program [program options]</screen>
 112   </para>
 113
 114   <para>While the simulation is running, you can observe execution with:
 115   <screen>callgrind_control -b</screen>
 116   This will print out the current backtrace. To annotate the backtrace with
 117   event counts, run
 118   <screen>callgrind_control -e -b</screen>
 119   </para>
 120
 121   <para>After program termination, a profile data file named
 122   <computeroutput>callgrind.out.&lt;pid&gt;</computeroutput>
 123   is generated, where <emphasis>pid</emphasis> is the process ID
 124   of the program being profiled.
 125   The data file contains information about the calls made in the
 126   program among the functions executed, together with
 127   <command>Instruction Read</command> (Ir) event counts.</para>
 128
 129   <para>To generate a function-by-function summary from the profile
 130   data file, use
 131   <screen>callgrind_annotate [options] callgrind.out.&lt;pid&gt;</screen>
 132   This summary is similar to the output you get from a Cachegrind
 133   run with cg_annotate: the list
 134   of functions is ordered by exclusive cost of functions, which also
 135   are the ones that are shown.
 136   Important for the additional features of Callgrind are
 137   the following two options:</para>
 138
 139   <itemizedlist>
 140     <listitem>
 141       <para><option>--inclusive=yes</option>: Instead of using
 142       exclusive cost of functions as sorting order, use and show
 143       inclusive cost.</para>
 144     </listitem>
 145
 146     <listitem>
 147       <para><option>--tree=both</option>: Interleave into the
 148       top level list of functions, information on the callers and the callees
 149       of each function. In these lines, which represents executed
 150       calls, the cost gives the number of events spent in the call.
 151       Indented, above each function, there is the list of callers,
 152       and below, the list of callees. The sum of events in calls to
 153       a given function (caller lines), as well as the sum of events in
 154       calls from the function (callee lines) together with the self
 155       cost, gives the total inclusive cost of the function.</para>
 156      </listitem>
 157   </itemizedlist>
 158
 159   <para>By default, you will also get annotated source code
 160   for all relevant functions for which the source can be found. In
 161   addition to source annotation as produced by
 162   <computeroutput>cg_annotate</computeroutput>, you will see the
 163   annotated call sites with call counts. For all other options,
 164   consult the (Cachegrind) documentation for
 165   <computeroutput>cg_annotate</computeroutput>.
 166   </para>
 167
 168   <para>For better call graph browsing experience, it is highly recommended
 169   to use <ulink url="&cl-gui-url;">KCachegrind</ulink>.
 170   If your code
 171   has a significant fraction of its cost in <emphasis>cycles</emphasis> (sets
 172   of functions calling each other in a recursive manner), you have to
 173   use KCachegrind, as <computeroutput>callgrind_annotate</computeroutput>
 174   currently does not do any cycle detection, which is important to get correct
 175   results in this case.</para>
 176
 177   <para>If you are additionally interested in measuring the
 178   cache behavior of your program, use Callgrind with the option
 179   <option><link linkend="clopt.cache-sim">--cache-sim=yes</link></option>.
 180   For branch prediction simulation, use
 181   <option><link linkend="clopt.branch-sim">--branch-sim=yes</link></option>.
 182   Expect a further slow down approximately by a factor of 2.</para>
 183
 184   <para>If the program section you want to profile is somewhere in the
 185   middle of the run, it is beneficial to
 186   <emphasis>fast forward</emphasis> to this section without any
 187   profiling, and then enable profiling.  This is achieved by using
 188   the command line option
 189   <option><link linkend="opt.instr-atstart">--instr-atstart=no</link></option>
 190   and running, in a shell:
 191   <computeroutput>callgrind_control -i on</computeroutput> just before the
 192   interesting code section is executed. To exactly specify
 193   the code position where profiling should start, use the client request
 194   <computeroutput><link linkend="cr.start-instr">CALLGRIND_START_INSTRUMENTATION</link></computeroutput>.</para>
 195
 196   <para>If you want to be able to see assembly code level annotation, specify
 197   <option><link linkend="opt.dump-instr">--dump-instr=yes</link></option>.
 198   This will produce profile data at instruction granularity.
 199   Note that the resulting profile data
 200   can only be viewed with KCachegrind. For assembly annotation, it also is
 201   interesting to see more details of the control flow inside of functions,
 202   i.e. (conditional) jumps. This will be collected by further specifying
 203   <option><link linkend="opt.collect-jumps">--collect-jumps=yes</link></option>.
 204   </para>
 205
 206   </sect2>
 207
 208 </sect1>
 209
 210 <sect1 id="cl-manual.usage" xreflabel="Advanced Usage">
 211 <title>Advanced Usage</title>
 212
 213   <sect2 id="cl-manual.dumps"
 214          xreflabel="Multiple dumps from one program run">
 215   <title>Multiple profiling dumps from one program run</title>
 216
 217   <para>Sometimes you are not interested in characteristics of a full
 218   program run, but only of a small part of it, for example execution of one
 219   algorithm.  If there are multiple algorithms, or one algorithm
 220   running with different input data, it may even be useful to get different
 221   profile information for different parts of a single program run.</para>
 222
 223   <para>Profile data files have names of the form
 224 <screen>
 225 callgrind.out.<emphasis>pid</emphasis>.<emphasis>part</emphasis>-<emphasis>threadID</emphasis>
 226 </screen>
 227   </para>
 228   <para>where <emphasis>pid</emphasis> is the PID of the running
 229   program, <emphasis>part</emphasis> is a number incremented on each
 230   dump (".part" is skipped for the dump at program termination), and
 231   <emphasis>threadID</emphasis> is a thread identification
 232   ("-threadID" is only used if you request dumps of individual
 233   threads with
 234   <option><link linkend="opt.separate-threads">--separate-threads=yes</link></option>).
 235   </para>
 236
 237   <para>There are different ways to generate multiple profile dumps
 238   while a program is running under Callgrind's supervision.  Nevertheless,
 239   all methods trigger the same action, which is "dump all profile
 240   information since the last dump or program start, and zero cost
 241   counters afterwards".  To allow for zeroing cost counters without
 242   dumping, there is a second action "zero all cost counters now".
 243   The different methods are:</para>
 244   <itemizedlist>
 245
 246     <listitem>
 247       <para><command>Dump on program termination.</command>
 248       This method is the standard way and doesn't need any special
 249       action on your part.</para>
 250     </listitem>
 251
 252     <listitem>
 253       <para><command>Spontaneous, interactive dumping.</command> Use
 254       <screen>callgrind_control -d [hint [PID/Name]]</screen> to
 255       request the dumping of profile information of the supervised
 256       application with PID or Name.  <emphasis>hint</emphasis> is an
 257       arbitrary string you can optionally specify to later be able to
 258       distinguish profile dumps.  The control program will not terminate
 259       before the dump is completely written.  Note that the application
 260       must be actively running for detection of the dump command. So,
 261       for a GUI application, resize the window, or for a server, send a
 262       request.</para>
 263       <para>If you are using <ulink url="&cl-gui-url;">KCachegrind</ulink>
 264       for browsing of profile information, you can use the toolbar
 265       button <command>Force dump</command>. This will request a dump
 266       and trigger a reload after the dump is written.</para>
 267     </listitem>
 268
 269     <listitem>
 270       <para><command>Periodic dumping after execution of a specified
 271       number of basic blocks</command>. For this, use the command line
 272       option <option><link linkend="opt.dump-every-bb">--dump-every-bb=count</link></option>.
 273       </para>
 274     </listitem>
 275
 276     <listitem>
 277       <para><command>Dumping at enter/leave of specified functions.</command>
 278       Use the
 279       option <option><link linkend="opt.dump-before">--dump-before=function</link></option>
 280       and <option><link linkend="opt.dump-after">--dump-after=function</link></option>.
 281       To zero cost counters before entering a function, use
 282       <option><link linkend="opt.zero-before">--zero-before=function</link></option>.</para>
 283       <para>You can specify these options multiple times for different
 284       functions. Function specifications support wildcards: e.g. use
 285       <option><link linkend="opt.dump-before">--dump-before='foo*'</link></option> to
 286       generate dumps before entering any function starting with
 287       <emphasis>foo</emphasis>.</para>
 288     </listitem>
 289
 290     <listitem>
 291       <para><command>Program controlled dumping.</command>
 292       Insert
 293       <computeroutput><link linkend="cr.dump-stats">CALLGRIND_DUMP_STATS</link>;</computeroutput>
 294       at the position in your code where you want a profile dump to
 295       happen. Use
 296       <computeroutput><link linkend="cr.zero-stats">CALLGRIND_ZERO_STATS</link>;</computeroutput> to only
 297       zero profile counters.
 298       See <xref linkend="cl-manual.clientrequests"/> for more information on
 299       Callgrind specific client requests.</para>
 300     </listitem>
 301   </itemizedlist>
 302
 303   <para>If you are running a multi-threaded application and specify the
 304   command line option
 305   <option><link linkend="opt.separate-threads">--separate-threads=yes</link></option>,
 306   every thread will be profiled on its own and will create its own
 307   profile dump. Thus, the last two methods will only generate one dump
 308   of the currently running thread. With the other methods, you will get
 309   multiple dumps (one for each thread) on a dump request.</para>
 310
 311   </sect2>
 312
 313
 314
 315   <sect2 id="cl-manual.limits"
 316          xreflabel="Limiting range of event collection">
 317   <title>Limiting the range of collected events</title>
 318
 319   <para>By default, whenever events are happening (such as an
 320     instruction execution or cache hit/miss), Callgrind is aggregating
 321     them into event counters. However, you may be interested only in
 322     what is happening within a given function or starting from a given
 323     program phase. To this end, you can disable event aggregation for
 324     uninteresting program parts. While attribution of events to
 325     functions as well as producing separate output per program phase
 326     can be done by other means (see previous section), there are two
 327     benefits by disabling aggregation. First, this is very
 328     fine-granular (e.g. just for a loop within a function).  Second,
 329     disabling event aggregation for complete program phases allows to
 330     switch off time-consuming cache simulation and allows Callgrind to
 331     progress at much higher speed with an slowdown of around factor 2
 332     (identical to <computeroutput>valgrind
 333     --tool=none</computeroutput>).
 334   </para>
 335
 336   <para>There are two aspects which influence whether Callgrind is
 337     aggregating events at some point in time of program execution.
 338     First, there is the <emphasis>collection state</emphasis>. If this
 339     is off, no aggregation will be done.  By changing the collection
 340     state, you can control event aggregation at a very fine
 341     granularity.  However, there is not much difference in regard to
 342     execution speed of Callgrind.  By default, collection is switched
 343     on, but can be disabled by different means (see below).  Second,
 344     there is the <emphasis>instrumentation mode</emphasis> in which
 345     Callgrind is running. This mode either can be on or off. If
 346     instrumentation is off, no observation of actions in the program
 347     will be done and thus, no actions will be forwarded to the
 348     simulator which could trigger events. In the end, no events will
 349     be aggregated.  The huge benefit is the much higher speed with
 350     instrumentation switched off.  However, this only should be used
 351     with care and in a coarse fashion: every mode change resets the
 352     simulator state (ie. whether a memory block is cached or not) and
 353     flushes Valgrinds internal cache of instrumented code blocks,
 354     resulting in latency penalty at switching time. Also, cache
 355     simulator results directly after switching on instrumentation will
 356     be skewed due to identified cache misses which would not happen in
 357     reality (if you care about this warm-up effect, you should make
 358     sure to temporarly have collection state switched off directly
 359     after turning instrumentation mode on). However, switching
 360     instrumentation state is very useful to skip larger program phases
 361     such as an initialization phase. By default, instrumentation is
 362     switched on, but as with the collection state, can be changed by
 363     various means.
 364   </para>
 365
 366   <para>Callgrind can start with instrumentation mode switched off by
 367     specifying option
 368     <option><link linkend="opt.instr-atstart">--instr-atstart=no</link></option>.
 369     Afterwards, instrumentation can be controlled in two ways: first,
 370     interactively with: <screen>callgrind_control -i on</screen> (and
 371     switching off again by specifying "off" instead of "on").  Second,
 372     instrumentation state can be programmatically changed with the
 373     macros <computeroutput><link linkend="cr.start-instr">CALLGRIND_START_INSTRUMENTATION</link>;</computeroutput>
 374     and <computeroutput><link linkend="cr.stop-instr">CALLGRIND_STOP_INSTRUMENTATION</link>;</computeroutput>.
 375   </para>
 376
 377   <para>Similarly, the collection state at program start can be
 378     switched off by
 379     <option><link linkend="opt.instr-atstart">--instr-atstart=no</link></option>.
 380     During execution, it can be controlled programmatically with the
 381     macro <computeroutput>CALLGRIND_TOGGLE_COLLECT;</computeroutput>.
 382     Further, you can limit event collection to a specific function by
 383     using <option><link linkend="opt.toggle-collect">--toggle-collect=function</link></option>.
 384     This will toggle the collection state on entering and leaving the
 385     specified function.  When this option is in effect, the default
 386     collection state at program start is "off".  Only events happening
 387     while running inside of the given function will be
 388     collected. Recursive calls of the given function do not trigger
 389     any action. This option can be given multiple times to specify
 390     different functions of interest.</para>
 391   </sect2>
 392
 393   <sect2 id="cl-manual.busevents" xreflabel="Counting global bus events">
 394   <title>Counting global bus events</title>
 395
 396   <para>For access to shared data among threads in a multithreaded
 397   code, synchronization is required to avoid raced conditions.
 398   Synchronization primitives are usually implemented via atomic instructions.
 399   However, excessive use of such instructions can lead to performance
 400   issues.</para>
 401
 402   <para>To enable analysis of this problem, Callgrind optionally can count
 403   the number of atomic instructions executed. More precisely, for x86/x86_64,
 404   these are instructions using a lock prefix. For architectures supporting
 405   LL/SC, these are the number of SC instructions executed. For both, the term
 406   "global bus events" is used.</para>
 407
 408   <para>The short name of the event type used for global bus events is "Ge".
 409   To count global bus events, use
 410   <option><link linkend="clopt.collect-bus">--collect-bus=yes</link></option>.
 411   </para>
 412   </sect2>
 413
 414   <sect2 id="cl-manual.cycles" xreflabel="Avoiding cycles">
 415   <title>Avoiding cycles</title>
 416
 417   <para>Informally speaking, a cycle is a group of functions which
 418   call each other in a recursive way.</para>
 419
 420   <para>Formally speaking, a cycle is a nonempty set S of functions,
 421   such that for every pair of functions F and G in S, it is possible
 422   to call from F to G (possibly via intermediate functions) and also
 423   from G to F.  Furthermore, S must be maximal -- that is, be the
 424   largest set of functions satisfying this property.  For example, if
 425   a third function H is called from inside S and calls back into S,
 426   then H is also part of the cycle and should be included in S.</para>
 427
 428   <para>Recursion is quite usual in programs, and therefore, cycles
 429   sometimes appear in the call graph output of Callgrind. However,
 430   the title of this chapter should raise two questions: What is bad
 431   about cycles which makes you want to avoid them? And: How can
 432   cycles be avoided without changing program code?</para>
 433
 434   <para>Cycles are not bad in itself, but tend to make performance
 435   analysis of your code harder. This is because inclusive costs
 436   for calls inside of a cycle are meaningless. The definition of
 437   inclusive cost, i.e. self cost of a function plus inclusive cost
 438   of its callees, needs a topological order among functions. For
 439   cycles, this does not hold true: callees of a function in a cycle include
 440   the function itself. Therefore, KCachegrind does cycle detection
 441   and skips visualization of any inclusive cost for calls inside
 442   of cycles. Further, all functions in a cycle are collapsed into artificial
 443   functions called like <computeroutput>Cycle 1</computeroutput>.</para>
 444
 445   <para>Now, when a program exposes really big cycles (as is
 446   true for some GUI code, or in general code using event or callback based
 447   programming style), you lose the nice property to let you pinpoint
 448   the bottlenecks by following call chains from
 449   <function>main</function>, guided via
 450   inclusive cost. In addition, KCachegrind loses its ability to show
 451   interesting parts of the call graph, as it uses inclusive costs to
 452   cut off uninteresting areas.</para>
 453
 454   <para>Despite the meaningless of inclusive costs in cycles, the big
 455   drawback for visualization motivates the possibility to temporarily
 456   switch off cycle detection in KCachegrind, which can lead to
 457   misguiding visualization. However, often cycles appear because of
 458   unlucky superposition of independent call chains in a way that
 459   the profile result will see a cycle. Neglecting uninteresting
 460   calls with very small measured inclusive cost would break these
 461   cycles. In such cases, incorrect handling of cycles by not detecting
 462   them still gives meaningful profiling visualization.</para>
 463
 464   <para>It has to be noted that currently, <command>callgrind_annotate</command>
 465   does not do any cycle detection at all. For program executions with function
 466   recursion, it e.g. can print nonsense inclusive costs way above 100%.</para>
 467
 468   <para>After describing why cycles are bad for profiling, it is worth
 469   talking about cycle avoidance. The key insight here is that symbols in
 470   the profile data do not have to exactly match the symbols found in the
 471   program. Instead, the symbol name could encode additional information
 472   from the current execution context such as recursion level of the
 473   current function, or even some part of the call chain leading to the
 474   function. While encoding of additional information into symbols is
 475   quite capable of avoiding cycles, it has to be used carefully to not cause
 476   symbol explosion. The latter imposes large memory requirement for Callgrind
 477   with possible out-of-memory conditions, and big profile data files.</para>
 478
 479   <para>A further possibility to avoid cycles in Callgrind's profile data
 480   output is to simply leave out given functions in the call graph. Of course, this
 481   also skips any call information from and to an ignored function, and thus can
 482   break a cycle. Candidates for this typically are dispatcher functions in event
 483   driven code. The option to ignore calls to a function is
 484   <option><link linkend="opt.fn-skip">--fn-skip=function</link></option>.
 485   Aside from possibly breaking cycles, this is used in Callgrind to skip
 486   trampoline functions in the PLT sections
 487   for calls to functions in shared libraries. You can see the difference
 488   if you profile with
 489   <option><link linkend="opt.skip-plt">--skip-plt=no</link></option>.
 490   If a call is ignored, its cost events will be propagated to the
 491   enclosing function.</para>
 492
 493   <para>If you have a recursive function, you can distinguish the first
 494   10 recursion levels by specifying
 495   <option><link linkend="opt.separate-recs-num">--separate-recs10=function</link></option>.
 496   Or for all functions with
 497   <option><link linkend="opt.separate-recs">--separate-recs=10</link></option>,
 498   but this will
 499   give you much bigger profile data files.  In the profile data, you will see
 500   the recursion levels of "func" as the different functions with names
 501   "func", "func'2", "func'3" and so on.</para>
 502
 503   <para>If you have call chains "A &gt; B &gt; C" and "A &gt; C &gt; B"
 504   in your program, you usually get a "false" cycle "B &lt;&gt; C". Use
 505   <option><link linkend="opt.separate-callers-num">--separate-callers2=B</link></option>
 506   <option><link linkend="opt.separate-callers-num">--separate-callers2=C</link></option>,
 507   and functions "B" and "C" will be treated as different functions
 508   depending on the direct caller. Using the apostrophe for appending
 509   this "context" to the function name, you get "A &gt; B'A &gt; C'B"
 510   and "A &gt; C'A &gt; B'C", and there will be no cycle. Use
 511   <option><link linkend="opt.separate-callers">--separate-callers=2</link></option> to get a 2-caller
 512   dependency for all functions.  Note that doing this will increase
 513   the size of profile data files.</para>
 514
 515   </sect2>
 516
 517   <sect2 id="cl-manual.forkingprograms" xreflabel="Forking Programs">
 518   <title>Forking Programs</title>
 519
 520   <para>If your program forks, the child will inherit all the profiling
 521   data that has been gathered for the parent. To start with empty profile
 522   counter values in the child, the client request
 523   <computeroutput><link linkend="cr.zero-stats">CALLGRIND_ZERO_STATS</link>;</computeroutput>
 524   can be inserted into code to be executed by the child, directly
 525   after
 526   <computeroutput>fork</computeroutput>.</para>
 527
 528   <para>However, you will have to make sure that the output file format string
 529   (controlled by <option>--callgrind-out-file</option>) does contain
 530   <option>%p</option> (which is true by default). Otherwise, the
 531   outputs from the parent and child will overwrite each other or will be
 532   intermingled, which almost certainly is not what you want.</para>
 533
 534   <para>You will be able to control the new child independently from
 535   the parent via callgrind_control.</para>
 536
 537   </sect2>
 538
 539 </sect1>
 540
 541
 542 <sect1 id="cl-manual.options" xreflabel="Callgrind Command-line Options">
 543 <title>Callgrind Command-line Options</title>
 544
 545 <para>
 546 In the following, options are grouped into classes.
 547 </para>
 548 <para>
 549 Some options allow the specification of a function/symbol name, such as
 550 <option><link linkend="opt.dump-before">--dump-before=function</link></option>, or
 551 <option><link linkend="opt.fn-skip">--fn-skip=function</link></option>.
 552 All these options can be specified multiple times for different functions.
 553 In addition, the function specifications actually are patterns by supporting
 554 the use of wildcards '*' (zero or more arbitrary characters) and '?'
 555 (exactly one arbitrary character), similar to file name globbing in the
 556 shell. This feature is important especially for C++, as without wildcard
 557 usage, the function would have to be specified in full extent, including
 558 parameter signature. </para>
 559
 560 <sect2 id="cl-manual.options.creation"
 561        xreflabel="Dump creation options">
 562 <title>Dump creation options</title>
 563
 564 <para>
 565 These options influence the name and format of the profile data files.
 566 </para>
 567
 568 <!-- start of xi:include in the manpage -->
 569 <variablelist id="cl.opts.list.creation">
 570
 571   <varlistentry id="opt.callgrind-out-file" xreflabel="--callgrind-out-file">
 572     <term>
 573       <option><![CDATA[--callgrind-out-file=<file> ]]></option>
 574     </term>
 575     <listitem>
 576       <para>Write the profile data to
 577             <computeroutput>file</computeroutput> rather than to the default
 578             output file,
 579             <computeroutput>callgrind.out.&lt;pid&gt;</computeroutput>.  The
 580             <option>%p</option> and <option>%q</option> format specifiers
 581             can be used to embed the process ID and/or the contents of an
 582             environment variable in the name, as is the case for the core
 583             option
 584             <option><link linkend="opt.log-file">--log-file</link></option>.
 585             When multiple dumps are made, the file name
 586             is modified further; see below.</para>
 587     </listitem>
 588   </varlistentry>
 589
 590   <varlistentry id="opt.dump-line" xreflabel="--dump-line">
 591     <term>
 592       <option><![CDATA[--dump-line=<no|yes> [default: yes] ]]></option>
 593     </term>
 594     <listitem>
 595       <para>This specifies that event counting should be performed at
 596       source line granularity. This allows source annotation for sources
 597       which are compiled with debug information
 598       (<option>-g</option>).</para>
 599   </listitem>
 600   </varlistentry>
 601
 602   <varlistentry id="opt.dump-instr" xreflabel="--dump-instr">
 603     <term>
 604       <option><![CDATA[--dump-instr=<no|yes> [default: no] ]]></option>
 605     </term>
 606     <listitem>
 607       <para>This specifies that event counting should be performed at
 608       per-instruction granularity.
 609       This allows for assembly code
 610       annotation.  Currently the results can only be
 611       displayed by KCachegrind.</para>
 612   </listitem>
 613   </varlistentry>
 614
 615   <varlistentry id="opt.compress-strings" xreflabel="--compress-strings">
 616     <term>
 617       <option><![CDATA[--compress-strings=<no|yes> [default: yes] ]]></option>
 618     </term>
 619     <listitem>
 620       <para>This option influences the output format of the profile data.
 621       It specifies whether strings (file and function names) should be
 622       identified by numbers. This shrinks the file,
 623       but makes it more difficult
 624       for humans to read (which is not recommended in any case).</para>
 625     </listitem>
 626   </varlistentry>
 627
 628   <varlistentry id="opt.compress-pos" xreflabel="--compress-pos">
 629     <term>
 630       <option><![CDATA[--compress-pos=<no|yes> [default: yes] ]]></option>
 631     </term>
 632     <listitem>
 633       <para>This option influences the output format of the profile data.
 634       It specifies whether numerical positions are always specified as absolute
 635       values or are allowed to be relative to previous numbers.
 636       This shrinks the file size.</para>
 637     </listitem>
 638   </varlistentry>
 639
 640   <varlistentry id="opt.combine-dumps" xreflabel="--combine-dumps">
 641     <term>
 642       <option><![CDATA[--combine-dumps=<no|yes> [default: no] ]]></option>
 643     </term>
 644     <listitem>
 645       <para>When enabled, when multiple profile data parts are to be
 646       generated these parts are appended to the same output file.
 647       Not recommended.</para>
 648   </listitem>
 649   </varlistentry>
 650
 651 </variablelist>
 652 </sect2>
 653
 654 <sect2 id="cl-manual.options.activity"
 655        xreflabel="Activity options">
 656 <title>Activity options</title>
 657
 658 <para>
 659 These options specify when actions relating to event counts are to
 660 be executed. For interactive control use callgrind_control.
 661 </para>
 662
 663 <!-- start of xi:include in the manpage -->
 664 <variablelist id="cl.opts.list.activity">
 665
 666   <varlistentry id="opt.dump-every-bb" xreflabel="--dump-every-bb">
 667     <term>
 668       <option><![CDATA[--dump-every-bb=<count> [default: 0, never] ]]></option>
 669     </term>
 670     <listitem>
 671       <para>Dump profile data every <option>count</option> basic blocks.
 672       Whether a dump is needed is only checked when Valgrind's internal
 673       scheduler is run. Therefore, the minimum setting useful is about 100000.
 674       The count is a 64-bit value to make long dump periods possible.
 675       </para>
 676     </listitem>
 677   </varlistentry>
 678
 679   <varlistentry id="opt.dump-before" xreflabel="--dump-before">
 680     <term>
 681       <option><![CDATA[--dump-before=<function> ]]></option>
 682     </term>
 683     <listitem>
 684       <para>Dump when entering <option>function</option>.</para>
 685     </listitem>
 686   </varlistentry>
 687
 688   <varlistentry id="opt.zero-before" xreflabel="--zero-before">
 689     <term>
 690       <option><![CDATA[--zero-before=<function> ]]></option>
 691     </term>
 692     <listitem>
 693       <para>Zero all costs when entering <option>function</option>.</para>
 694     </listitem>
 695   </varlistentry>
 696
 697   <varlistentry id="opt.dump-after" xreflabel="--dump-after">
 698     <term>
 699       <option><![CDATA[--dump-after=<function> ]]></option>
 700     </term>
 701     <listitem>
 702       <para>Dump when leaving <option>function</option>.</para>
 703     </listitem>
 704   </varlistentry>
 705
 706 </variablelist>
 707 <!-- end of xi:include in the manpage -->
 708 </sect2>
 709
 710 <sect2 id="cl-manual.options.collection"
 711        xreflabel="Data collection options">
 712 <title>Data collection options</title>
 713
 714 <para>
 715 These options specify when events are to be aggregated into event counts.
 716 Also see <xref linkend="cl-manual.limits"/>.</para>
 717
 718 <!-- start of xi:include in the manpage -->
 719 <variablelist id="cl.opts.list.collection">
 720
 721   <varlistentry id="opt.instr-atstart" xreflabel="--instr-atstart">
 722     <term>
 723       <option><![CDATA[--instr-atstart=<yes|no> [default: yes] ]]></option>
 724     </term>
 725     <listitem>
 726       <para>Specify if you want Callgrind to start simulation and
 727       profiling from the beginning of the program.
 728       When set to <computeroutput>no</computeroutput>,
 729       Callgrind will not be able
 730       to collect any information, including calls, but it will have at
 731       most a slowdown of around 4, which is the minimum Valgrind
 732       overhead.  Instrumentation can be interactively enabled via
 733       <computeroutput>callgrind_control -i on</computeroutput>.</para>
 734       <para>Note that the resulting call graph will most probably not
 735       contain <function>main</function>, but will contain all the
 736       functions executed after instrumentation was enabled.
 737       Instrumentation can also be programmatically enabled/disabled. See the
 738       Callgrind include file
 739       <computeroutput>callgrind.h</computeroutput> for the macro
 740       you have to use in your source code.</para> <para>For cache
 741       simulation, results will be less accurate when switching on
 742       instrumentation later in the program run, as the simulator starts
 743       with an empty cache at that moment.  Switch on event collection
 744       later to cope with this error.</para>
 745     </listitem>
 746   </varlistentry>
 747
 748   <varlistentry id="opt.collect-atstart" xreflabel="--collect-atstart">
 749     <term>
 750       <option><![CDATA[--collect-atstart=<yes|no> [default: yes] ]]></option>
 751     </term>
 752     <listitem>
 753       <para>Specify whether event collection is enabled at beginning
 754       of the profile run.</para>
 755       <para>To only look at parts of your program, you have two
 756       possibilities:</para>
 757       <orderedlist>
 758       <listitem>
 759         <para>Zero event counters before entering the program part you
 760         want to profile, and dump the event counters to a file after
 761         leaving that program part.</para>
 762         </listitem>
 763         <listitem>
 764           <para>Switch on/off collection state as needed to only see
 765           event counters happening while inside of the program part you
 766           want to profile.</para>
 767         </listitem>
 768       </orderedlist>
 769       <para>The second option can be used if the program part you want to
 770       profile is called many times. Option 1, i.e. creating a lot of
 771       dumps is not practical here.</para>
 772       <para>Collection state can be
 773       toggled at entry and exit of a given function with the
 774       option <option><link linkend="opt.toggle-collect">--toggle-collect</link></option>.  If you
 775       use this option, collection
 776       state should be disabled at the beginning.  Note that the
 777       specification of <option>--toggle-collect</option>
 778       implicitly sets
 779       <option>--collect-state=no</option>.</para>
 780       <para>Collection state can be toggled also by inserting the client request
 781       <computeroutput>
 782       <!-- commented out because it causes broken links in the man page
 783       <xref linkend="cr.toggle-collect"/>;
 784       -->
 785       CALLGRIND_TOGGLE_COLLECT
 786       ;</computeroutput>
 787       at the needed code positions.</para>
 788     </listitem>
 789   </varlistentry>
 790
 791   <varlistentry id="opt.toggle-collect" xreflabel="--toggle-collect">
 792     <term>
 793       <option><![CDATA[--toggle-collect=<function> ]]></option>
 794     </term>
 795     <listitem>
 796       <para>Toggle collection on entry/exit of <option>function</option>.</para>
 797     </listitem>
 798   </varlistentry>
 799
 800   <varlistentry id="opt.collect-jumps" xreflabel="--collect-jumps">
 801     <term>
 802       <option><![CDATA[--collect-jumps=<no|yes> [default: no] ]]></option>
 803     </term>
 804     <listitem>
 805       <para>This specifies whether information for (conditional) jumps
 806       should be collected.  As above, callgrind_annotate currently is not
 807       able to show you the data.  You have to use KCachegrind to get jump
 808       arrows in the annotated code.</para>
 809     </listitem>
 810   </varlistentry>
 811
 812   <varlistentry id="opt.collect-systime" xreflabel="--collect-systime">
 813     <term>
 814       <option><![CDATA[--collect-systime=<no|yes|msec|usec|nsec> [default: no] ]]></option>
 815     </term>
 816     <listitem>
 817       <para>This specifies whether information for system call times
 818         should be collected.</para>
 819       <para>The value <computeroutput>no</computeroutput> indicates to record
 820         no system call information.</para>
 821       <para>The other values indicate to record the number of system calls
 822         done (sysCount event) and the elapsed time (sysTime event) spent
 823         in system calls.
 824         The <computeroutput>--collect-systime</computeroutput> value gives
 825         the unit used for sysTime : milli seconds, micro seconds or nano
 826         seconds.  With the value <computeroutput>nsec</computeroutput>,
 827         callgrind also records the cpu time spent during system calls
 828         (sysCpuTime).</para>
 829       <para>The value <computeroutput>yes</computeroutput> is a synonym
 830         of <computeroutput>msec</computeroutput>.
 831         The value <computeroutput>nsec</computeroutput> is not supported
 832         on Darwin.</para>
 833     </listitem>
 834   </varlistentry>
 835
 836   <varlistentry id="clopt.collect-bus" xreflabel="--collect-bus">
 837     <term>
 838       <option><![CDATA[--collect-bus=<no|yes> [default: no] ]]></option>
 839     </term>
 840     <listitem>
 841       <para>This specifies whether the number of global bus events executed
 842       should be collected. The event type "Ge" is used for these events.</para>
 843     </listitem>
 844   </varlistentry>
 845
 846 </variablelist>
 847 <!-- end of xi:include in the manpage -->
 848 </sect2>
 849
 850 <sect2 id="cl-manual.options.separation"
 851        xreflabel="Cost entity separation options">
 852 <title>Cost entity separation options</title>
 853
 854 <para>
 855 These options specify how event counts should be attributed to execution
 856 contexts.
 857 For example, they specify whether the recursion level or the
 858 call chain leading to a function should be taken into account,
 859 and whether the thread ID should be considered.
 860 Also see <xref linkend="cl-manual.cycles"/>.</para>
 861
 862 <!-- start of xi:include in the manpage -->
 863 <variablelist id="cmd-options.separation">
 864
 865   <varlistentry id="opt.separate-threads" xreflabel="--separate-threads">
 866     <term>
 867       <option><![CDATA[--separate-threads=<no|yes> [default: no] ]]></option>
 868     </term>
 869     <listitem>
 870       <para>This option specifies whether profile data should be generated
 871       separately for every thread. If yes, the file names get "-threadID"
 872       appended.</para>
 873     </listitem>
 874   </varlistentry>
 875
 876   <varlistentry id="opt.separate-callers" xreflabel="--separate-callers">
 877     <term>
 878       <option><![CDATA[--separate-callers=<callers> [default: 0] ]]></option>
 879     </term>
 880     <listitem>
 881       <para>Separate contexts by at most &lt;callers&gt; functions in the
 882       call chain. See <xref linkend="cl-manual.cycles"/>.</para>
 883     </listitem>
 884   </varlistentry>
 885
 886   <varlistentry id="opt.separate-callers-num" xreflabel="--separate-callers2">
 887     <term>
 888       <option><![CDATA[--separate-callers<number>=<function> ]]></option>
 889     </term>
 890     <listitem>
 891       <para>Separate <option>number</option> callers for <option>function</option>.
 892       See <xref linkend="cl-manual.cycles"/>.</para>
 893     </listitem>
 894   </varlistentry>
 895
 896   <varlistentry id="opt.separate-recs" xreflabel="--separate-recs">
 897     <term>
 898       <option><![CDATA[--separate-recs=<level> [default: 2] ]]></option>
 899     </term>
 900     <listitem>
 901       <para>Separate function recursions by at most <option>level</option> levels.
 902       See <xref linkend="cl-manual.cycles"/>.</para>
 903     </listitem>
 904   </varlistentry>
 905
 906   <varlistentry id="opt.separate-recs-num" xreflabel="--separate-recs10">
 907     <term>
 908       <option><![CDATA[--separate-recs<number>=<function> ]]></option>
 909     </term>
 910     <listitem>
 911       <para>Separate <option>number</option> recursions for <option>function</option>.
 912       See <xref linkend="cl-manual.cycles"/>.</para>
 913     </listitem>
 914   </varlistentry>
 915
 916   <varlistentry id="opt.skip-plt" xreflabel="--skip-plt">
 917     <term>
 918       <option><![CDATA[--skip-plt=<no|yes> [default: yes] ]]></option>
 919     </term>
 920     <listitem>
 921       <para>Ignore calls to/from PLT sections.</para>
 922     </listitem>
 923   </varlistentry>
 924
 925   <varlistentry id="opt.skip-direct-rec" xreflabel="--skip-direct-rec">
 926     <term>
 927       <option><![CDATA[--skip-direct-rec=<no|yes> [default: yes] ]]></option>
 928     </term>
 929     <listitem>
 930       <para>Ignore direct recursions.</para>
 931     </listitem>
 932   </varlistentry>
 933
 934   <varlistentry id="opt.fn-skip" xreflabel="--fn-skip">
 935     <term>
 936       <option><![CDATA[--fn-skip=<function> ]]></option>
 937     </term>
 938     <listitem>
 939       <para>Ignore calls to/from a given function.  E.g. if you have a
 940       call chain A &gt; B &gt; C, and you specify function B to be
 941       ignored, you will only see A &gt; C.</para>
 942       <para>This is very convenient to skip functions handling callback
 943       behaviour.  For example, with the signal/slot mechanism in the
 944       Qt graphics library, you only want
 945       to see the function emitting a signal to call the slots connected
 946       to that signal. First, determine the real call chain to see the
 947       functions needed to be skipped, then use this option.</para>
 948     </listitem>
 949   </varlistentry>
 950
 951 <!--
 952     commenting out as it is only enabled with CLG_EXPERIMENTAL.  (Nb: I had to
 953     insert a space between the double dash to avoid XML comment problems.)
 954
 955   <varlistentry id="opt.fn-group">
 956     <term>
 957       <option><![CDATA[- -fn-group<number>=<function> ]]></option>
 958     </term>
 959     <listitem>
 960       <para>Put a function into a separate group. This influences the
 961       context name for cycle avoidance. All functions inside such a
 962       group are treated as being the same for context name building, which
 963       resembles the call chain leading to a context. By specifying function
 964       groups with this option, you can shorten the context name, as functions
 965       in the same group will not appear in sequence in the name. </para>
 966     </listitem>
 967   </varlistentry>
 968 -->
 969
 970 </variablelist>
 971 <!-- end of xi:include in the manpage -->
 972 </sect2>
 973
 974
 975 <sect2 id="cl-manual.options.simulation"
 976        xreflabel="Simulation options">
 977 <title>Simulation options</title>
 978
 979 <!-- start of xi:include in the manpage -->
 980 <variablelist id="cl.opts.list.simulation">
 981
 982   <varlistentry id="clopt.cache-sim" xreflabel="--cache-sim">
 983     <term>
 984       <option><![CDATA[--cache-sim=<yes|no> [default: no] ]]></option>
 985     </term>
 986     <listitem>
 987       <para>Specify if you want to do full cache simulation.  By default,
 988       only instruction read accesses will be counted ("Ir").
 989       With cache simulation, further event counters are enabled:
 990       Cache misses on instruction reads ("I1mr"/"ILmr"),
 991       data read accesses ("Dr") and related cache misses ("D1mr"/"DLmr"),
 992       data write accesses ("Dw") and related cache misses ("D1mw"/"DLmw").
 993       For more information, see <xref linkend="&vg-cg-manual-id;"/>.
 994       </para>
 995     </listitem>
 996   </varlistentry>
 997
 998   <varlistentry id="clopt.branch-sim" xreflabel="--branch-sim">
 999     <term>
1000       <option><![CDATA[--branch-sim=<yes|no> [default: no] ]]></option>
1001     </term>
1002     <listitem>
1003       <para>Specify if you want to do branch prediction simulation.
1004       Further event counters are enabled: Number of executed conditional
1005       branches and related predictor misses ("Bc"/"Bcm"), executed indirect
1006       jumps and related misses of the jump address predictor ("Bi"/"Bim").
1007       </para>
1008     </listitem>
1009   </varlistentry>
1010
1011 </variablelist>
1012 <!-- end of xi:include in the manpage -->
1013 </sect2>
1014
1015
1016 <sect2 id="cl-manual.options.cachesimulation"
1017        xreflabel="Cache simulation options">
1018 <title>Cache simulation options</title>
1019
1020 <!-- start of xi:include in the manpage -->
1021 <variablelist id="cl.opts.list.cachesimulation">
1022
1023   <varlistentry id="opt.simulate-wb" xreflabel="--simulate-wb">
1024     <term>
1025       <option><![CDATA[--simulate-wb=<yes|no> [default: no] ]]></option>
1026     </term>
1027     <listitem>
1028       <para>Specify whether write-back behavior should be simulated, allowing
1029       to distinguish LL caches misses with and without write backs.
1030       The cache model of Cachegrind/Callgrind does not specify write-through
1031       vs. write-back behavior, and this also is not relevant for the number
1032       of generated miss counts. However, with explicit write-back simulation
1033       it can be decided whether a miss triggers not only the loading of a new
1034       cache line, but also if a write back of a dirty cache line had to take
1035       place before. The new dirty miss events are ILdmr, DLdmr, and DLdmw,
1036       for misses because of instruction read, data read, and data write,
1037       respectively. As they produce two memory transactions, they should
1038       account for a doubled time estimation in relation to a normal miss.
1039       </para>
1040     </listitem>
1041   </varlistentry>
1042
1043   <varlistentry id="opt.simulate-hwpref" xreflabel="--simulate-hwpref">
1044     <term>
1045       <option><![CDATA[--simulate-hwpref=<yes|no> [default: no] ]]></option>
1046     </term>
1047     <listitem>
1048       <para>Specify whether simulation of a hardware prefetcher should be
1049       added which is able to detect stream access in the second level cache
1050       by comparing accesses to separate to each page.
1051       As the simulation can not decide about any timing issues of prefetching,
1052       it is assumed that any hardware prefetch triggered succeeds before a
1053       real access is done. Thus, this gives a best-case scenario by covering
1054       all possible stream accesses.</para>
1055     </listitem>
1056   </varlistentry>
1057
1058   <varlistentry id="opt.cacheuse" xreflabel="--cacheuse">
1059     <term>
1060       <option><![CDATA[--cacheuse=<yes|no> [default: no] ]]></option>
1061     </term>
1062     <listitem>
1063       <para>Specify whether cache line use should be collected. For every
1064       cache line, from loading to it being evicted, the number of accesses
1065       as well as the number of actually used bytes is determined. This
1066       behavior is related to the code which triggered loading of the cache
1067       line. In contrast to miss counters, which shows the position where
1068       the symptoms of bad cache behavior (i.e. latencies) happens, the
1069       use counters try to pinpoint at the reason (i.e. the code with the
1070       bad access behavior). The new counters are defined in a way such
1071       that worse behavior results in higher cost.
1072       AcCost1 and AcCost2 are counters showing bad temporal locality
1073       for L1 and LL caches, respectively. This is done by summing up
1074       reciprocal values of the numbers of accesses of each cache line,
1075       multiplied by 1000 (as only integer costs are allowed). E.g. for
1076       a given source line with 5 read accesses, a value of 5000 AcCost
1077       means that for every access, a new cache line was loaded and directly
1078       evicted afterwards without further accesses. Similarly, SpLoss1/2
1079       shows bad spatial locality for L1 and LL caches, respectively. It
1080       gives the <emphasis>spatial loss</emphasis> count of bytes which
1081       were loaded into cache but never accessed. It pinpoints at code
1082       accessing data in a way such that cache space is wasted. This hints
1083       at bad layout of data structures in memory. Assuming a cache line
1084       size of 64 bytes and 100 L1 misses for a given source line, the
1085       loading of 6400 bytes into L1 was triggered. If SpLoss1 shows a
1086       value of 3200 for this line, this means that half of the loaded data was
1087       never used, or using a better data layout, only half of the cache
1088       space would have been needed.
1089       Please note that for cache line use counters, it currently is
1090       not possible to provide meaningful inclusive costs. Therefore,
1091       inclusive cost of these counters should be ignored.
1092       </para>
1093     </listitem>
1094   </varlistentry>
1095
1096   <varlistentry id="cl.opt.I1" xreflabel="--I1">
1097     <term>
1098       <option><![CDATA[--I1=<size>,<associativity>,<line size> ]]></option>
1099     </term>
1100     <listitem>
1101       <para>Specify the size, associativity and line size of the level 1
1102       instruction cache.  </para>
1103     </listitem>
1104   </varlistentry>
1105
1106   <varlistentry id="cl.opt.D1" xreflabel="--D1">
1107     <term>
1108       <option><![CDATA[--D1=<size>,<associativity>,<line size> ]]></option>
1109     </term>
1110     <listitem>
1111       <para>Specify the size, associativity and line size of the level 1
1112       data cache.</para>
1113     </listitem>
1114   </varlistentry>
1115
1116   <varlistentry id="cl.opt.LL" xreflabel="--LL">
1117     <term>
1118       <option><![CDATA[--LL=<size>,<associativity>,<line size> ]]></option>
1119     </term>
1120     <listitem>
1121       <para>Specify the size, associativity and line size of the last-level
1122       cache.</para>
1123     </listitem>
1124   </varlistentry>
1125 </variablelist>
1126 <!-- end of xi:include in the manpage -->
1127
1128 </sect2>
1129
1130 </sect1>
1131
1132 <sect1 id="cl-manual.monitor-commands" xreflabel="Callgrind Monitor Commands">
1133 <title>Callgrind Monitor Commands</title>
1134 <para>The Callgrind tool provides monitor commands handled by the Valgrind
1135 gdbserver (see <xref linkend="manual-core-adv.gdbserver-commandhandling"/>).
1136 Valgrind python code provides GDB front end commands giving an easier usage of
1137 the callgrind monitor commands (see
1138 <xref linkend="manual-core-adv.gdbserver-gdbmonitorfrontend"/>).  To launch a
1139 callgrind monitor command via its GDB front end command, instead of prefixing
1140 the command with "monitor", you must use the GDB <varname>callgrind</varname>
1141 command (or the shorter aliases <varname>cg</varname>).  Using the callgrind GDB
1142 front end command provide a more flexible usage, such as auto-completion of the
1143 command by GDB. In GDB, you can use <varname>help callgrind</varname> to get
1144 help about the callgrind front end monitor commands and you can
1145 use <varname>apropos callgrind</varname> to get all the commands mentionning the
1146 word "callgrind" in their name or on-line help.
1147 </para>
1148
1149 <itemizedlist>
1150   <listitem>
1151     <para><varname>dump [&lt;dump_hint&gt;]</varname> requests to dump the
1152     profile data. </para>
1153   </listitem>
1154
1155   <listitem>
1156     <para><varname>zero</varname> requests to zero the profile data
1157     counters. </para>
1158   </listitem>
1159
1160   <listitem>
1161     <para><varname>instrumentation [on|off]</varname> requests to set
1162     (if parameter on/off is given) or get the current instrumentation state.
1163     </para>
1164   </listitem>
1165
1166   <listitem>
1167     <para><varname>status</varname> requests to print out some status
1168     information.</para>
1169   </listitem>
1170
1171 </itemizedlist>
1172 </sect1>
1173
1174 <sect1 id="cl-manual.clientrequests" xreflabel="Client request reference">
1175 <title>Callgrind specific client requests</title>
1176
1177 <para>Callgrind provides the following specific client requests in
1178 <filename>callgrind.h</filename>.  See that file for the exact details of
1179 their arguments.</para>
1180
1181 <variablelist id="cl.clientrequests.list">
1182
1183   <varlistentry id="cr.dump-stats" xreflabel="CALLGRIND_DUMP_STATS">
1184     <term>
1185       <computeroutput>CALLGRIND_DUMP_STATS</computeroutput>
1186     </term>
1187     <listitem>
1188       <para>Force generation of a profile dump at specified position
1189       in code, for the current thread only. Written counters will be reset
1190       to zero.</para>
1191     </listitem>
1192   </varlistentry>
1193
1194   <varlistentry id="cr.dump-stats-at" xreflabel="CALLGRIND_DUMP_STATS_AT">
1195     <term>
1196       <computeroutput>CALLGRIND_DUMP_STATS_AT(string)</computeroutput>
1197     </term>
1198     <listitem>
1199       <para>Same as <computeroutput>CALLGRIND_DUMP_STATS</computeroutput>,
1200       but allows to specify a string to be able to distinguish profile
1201       dumps.</para>
1202     </listitem>
1203   </varlistentry>
1204
1205   <varlistentry id="cr.zero-stats" xreflabel="CALLGRIND_ZERO_STATS">
1206     <term>
1207       <computeroutput>CALLGRIND_ZERO_STATS</computeroutput>
1208     </term>
1209     <listitem>
1210       <para>Reset the profile counters for the current thread to zero.</para>
1211     </listitem>
1212   </varlistentry>
1213
1214   <varlistentry id="cr.toggle-collect" xreflabel="CALLGRIND_TOGGLE_COLLECT">
1215     <term>
1216       <computeroutput>CALLGRIND_TOGGLE_COLLECT</computeroutput>
1217     </term>
1218     <listitem>
1219       <para>Toggle the collection state. This allows to ignore events
1220       with regard to profile counters. See also options
1221       <option><link linkend="opt.collect-atstart">--collect-atstart</link></option>
1222       and
1223       <option><link linkend="opt.toggle-collect">--toggle-collect</link></option>.</para>
1224     </listitem>
1225   </varlistentry>
1226
1227   <varlistentry id="cr.start-instr" xreflabel="CALLGRIND_START_INSTRUMENTATION">
1228     <term>
1229       <computeroutput>CALLGRIND_START_INSTRUMENTATION</computeroutput>
1230     </term>
1231     <listitem>
1232       <para>Start full Callgrind instrumentation if not already enabled.
1233       When cache simulation is done, this will flush the simulated cache
1234       and lead to an artificial cache warmup phase afterwards with
1235       cache misses which would not have happened in reality.  See also
1236       option
1237       <option><link linkend="opt.instr-atstart">--instr-atstart</link></option>.
1238       </para>
1239     </listitem>
1240   </varlistentry>
1241
1242   <varlistentry id="cr.stop-instr" xreflabel="CALLGRIND_STOP_INSTRUMENTATION">
1243     <term>
1244       <computeroutput>CALLGRIND_STOP_INSTRUMENTATION</computeroutput>
1245     </term>
1246     <listitem>
1247       <para>Stop full Callgrind instrumentation if not already disabled.
1248       This flushes Valgrinds translation cache, and does no additional
1249       instrumentation afterwards: it effectivly will run at the same
1250       speed as Nulgrind, i.e. at minimal slowdown. Use this to
1251       speed up the Callgrind run for uninteresting code parts. Use
1252       <computeroutput><link linkend="cr.start-instr">CALLGRIND_START_INSTRUMENTATION</link></computeroutput>
1253       to enable instrumentation again.  See also option
1254       <option><link linkend="opt.instr-atstart">--instr-atstart</link></option>.
1255       </para>
1256     </listitem>
1257   </varlistentry>
1258
1259 </variablelist>
1260
1261 </sect1>
1262
1263
1264
1265 <sect1 id="cl-manual.callgrind_annotate-options" xreflabel="callgrind_annotate Command-line Options">
1266 <title>callgrind_annotate Command-line Options</title>
1267
1268 <!-- start of xi:include in the manpage -->
1269 <variablelist id="callgrind_annotate.opts.list">
1270
1271   <varlistentry>
1272     <term><option>-h --help</option></term>
1273     <listitem>
1274       <para>Show summary of options.</para>
1275     </listitem>
1276   </varlistentry>
1277
1278   <varlistentry>
1279     <term><option>--version</option></term>
1280     <listitem>
1281       <para>Show version of callgrind_annotate.</para>
1282     </listitem>
1283   </varlistentry>
1284
1285   <varlistentry>
1286     <term>
1287       <option>--show=A,B,C [default: all]</option>
1288     </term>
1289     <listitem>
1290       <para>Only show figures for events A,B,C.</para>
1291     </listitem>
1292   </varlistentry>
1293
1294   <varlistentry>
1295     <term>
1296       <option><![CDATA[--threshold=<0--100> [default: 99%] ]]></option>
1297     </term>
1298     <listitem>
1299       <para>Percentage of counts (of primary sort event) we are
1300         interested in.</para>
1301       <para>callgrind_annotate stops printing functions when the sum
1302         of the cost percentage of the printed functions is bigger or equal
1303         to the given threshold percentage.</para>
1304     </listitem>
1305   </varlistentry>
1306
1307   <varlistentry>
1308     <term>
1309       <option>--sort=A,B,C</option>
1310     </term>
1311     <listitem>
1312       <para>Sort columns by events A,B,C [event column order].</para>
1313       <para>Optionally, each event is followed by a : and a threshold,
1314         to specify different thresholds depending on the event.</para>
1315       <para>callgrind_annotate stops printing functions when the sum
1316         of the cost percentage of the printed functions for all the events
1317         is bigger or equal to the given event threshold percentages.</para>
1318       <para>When one or more thresholds are given via this option,
1319         the value of <option>--threshold</option> is ignored.</para>
1320     </listitem>
1321   </varlistentry>
1322
1323   <varlistentry>
1324     <term>
1325       <option><![CDATA[--show-percs=<no|yes> [default: no] ]]></option>
1326     </term>
1327     <listitem>
1328       <para>When enabled, a percentage is printed next to all event counts.
1329       This helps gauge the relative importance of each function and line.
1330       </para>
1331     </listitem>
1332   </varlistentry>
1333
1334   <varlistentry>
1335     <term>
1336       <option><![CDATA[--auto=<yes|no> [default: yes] ]]></option>
1337     </term>
1338     <listitem>
1339       <para>Annotate all source files containing functions that helped
1340       reach the event count threshold.</para>
1341     </listitem>
1342   </varlistentry>
1343
1344   <varlistentry>
1345     <term>
1346       <option>--context=N [default: 8] </option>
1347     </term>
1348     <listitem>
1349       <para>Print N lines of context before and after annotated
1350       lines.</para>
1351     </listitem>
1352   </varlistentry>
1353
1354   <varlistentry>
1355     <term>
1356       <option><![CDATA[--inclusive=<yes|no> [default: no] ]]></option>
1357     </term>
1358     <listitem>
1359       <para>Add subroutine costs to functions calls.</para>
1360     </listitem>
1361   </varlistentry>
1362
1363   <varlistentry>
1364     <term>
1365       <option><![CDATA[--tree=<none|caller|calling|both> [default: none] ]]></option>
1366     </term>
1367     <listitem>
1368       <para>Print for each function their callers, the called functions
1369       or both.</para>
1370     </listitem>
1371   </varlistentry>
1372
1373   <varlistentry>
1374     <term>
1375       <option><![CDATA[-I, --include=<dir> ]]></option>
1376     </term>
1377     <listitem>
1378       <para>Add <option>dir</option> to the list of directories to search
1379       for source files.</para>
1380   </listitem>
1381   </varlistentry>
1382
1383 </variablelist>
1384 <!-- end of xi:include in the manpage -->
1385
1386
1387 </sect1>
1388
1389
1390
1391
1392 <sect1 id="cl-manual.callgrind_control-options" xreflabel="callgrind_control Command-line Options">
1393 <title>callgrind_control Command-line Options</title>
1394
1395 <para>By default, callgrind_control acts on all programs run by the
1396   current user under Callgrind.  It is possible to limit the actions to
1397   specified Callgrind runs by providing a list of pids or program names as
1398   argument.  The default action is to give some brief information about the
1399   applications being run under Callgrind.</para>
1400
1401 <!-- start of xi:include in the manpage -->
1402 <variablelist id="callgrind_control.opts.list">
1403
1404   <varlistentry>
1405     <term><option>-h --help</option></term>
1406     <listitem>
1407       <para>Show a short description, usage, and summary of options.</para>
1408     </listitem>
1409   </varlistentry>
1410
1411   <varlistentry>
1412     <term><option>--version</option></term>
1413     <listitem>
1414       <para>Show version of callgrind_control.</para>
1415     </listitem>
1416   </varlistentry>
1417
1418   <varlistentry>
1419     <term><option>-l --long</option></term>
1420     <listitem>
1421       <para>Show also the working directory, in addition to the brief
1422       information given by default.
1423       </para>
1424     </listitem>
1425   </varlistentry>
1426
1427   <varlistentry>
1428     <term><option>-s --stat</option></term>
1429     <listitem>
1430       <para>Show statistics information about active Callgrind runs.</para>
1431     </listitem>
1432   </varlistentry>
1433
1434   <varlistentry>
1435     <term><option>-b --back</option></term>
1436     <listitem>
1437       <para>Show stack/back traces of each thread in active Callgrind runs. For
1438       each active function in the stack trace, also the number of invocations
1439       since program start (or last dump) is shown. This option can be
1440       combined with -e to show inclusive cost of active functions.</para>
1441     </listitem>
1442   </varlistentry>
1443
1444   <varlistentry>
1445     <term><option><![CDATA[-e [A,B,...] ]]></option> (default: all)</term>
1446     <listitem>
1447       <para>Show the current per-thread, exclusive cost values of event
1448       counters. If no explicit event names are given, figures for all event
1449       types which are collected in the given Callgrind run are
1450       shown. Otherwise, only figures for event types A, B, ... are shown. If
1451       this option is combined with -b, inclusive cost for the functions of
1452       each active stack frame is provided, too.
1453       </para>
1454     </listitem>
1455   </varlistentry>
1456
1457   <varlistentry>
1458     <term><option><![CDATA[--dump[=<desc>] ]]></option> (default: no description)</term>
1459     <listitem>
1460       <para>Request the dumping of profile information. Optionally, a
1461       description can be specified which is written into the dump as part of
1462       the information giving the reason which triggered the dump action. This
1463       can be used to distinguish multiple dumps.</para>
1464     </listitem>
1465   </varlistentry>
1466
1467   <varlistentry>
1468     <term><option>-z --zero</option></term>
1469     <listitem>
1470       <para>Zero all event counters.</para>
1471     </listitem>
1472   </varlistentry>
1473
1474   <varlistentry>
1475     <term><option>-k --kill</option></term>
1476     <listitem>
1477       <para>Force a Callgrind run to be terminated.</para>
1478     </listitem>
1479   </varlistentry>
1480
1481   <varlistentry>
1482     <term><option><![CDATA[--instr=<on|off>]]></option></term>
1483     <listitem>
1484       <para>Switch instrumentation mode on or off. If a Callgrind run has
1485       instrumentation disabled, no simulation is done and no events are
1486       counted. This is useful to skip uninteresting program parts, as there
1487       is much less slowdown (same as with the Valgrind tool "none"). See also
1488       the Callgrind option <option>--instr-atstart</option>.</para>
1489     </listitem>
1490   </varlistentry>
1491
1492   <varlistentry>
1493     <term><option><![CDATA[--vgdb-prefix=<prefix>]]></option></term>
1494     <listitem>
1495       <para>Specify the vgdb prefix to use by callgrind_control.
1496       callgrind_control internally uses vgdb to find and control the active
1497       Callgrind runs. If the <option>--vgdb-prefix</option> option was used
1498       for launching valgrind, then the same option must be given to
1499       callgrind_control.</para>
1500     </listitem>
1501   </varlistentry>
1502 </variablelist>
1503 <!-- end of xi:include in the manpage -->
1504
1505 </sect1>
1506
1507 </chapter>