2 Note, 11 May 2009. The XML format evolved over several versions,
3 as expected. This file describes 3 different versions of the
4 format (called Protocols 1, 2 and 3 respectively). As of 11 May 09
5 a fourth version, Protocol 4, was defined, and that is described
6 in xml-output-protocol4.txt.
8 The original May 2005 introduction follows. These comments are
9 correct up to and including Protocol 3, which was used in the Valgrind
10 3.4.x series. However, there were some more significant changes in
11 the format and the required flags for Valgrind, in Protocol 4.
13 ----------------------
15 As of May 2005, Valgrind can produce its output in XML form. The
16 intention is to provide an easily parsed, stable format which is
17 suitable for GUIs to read.
23 * Produce XML output which is easily parsed
25 * Have a stable output format which does not change much over time, so
26 that investments in parser-writing by GUI developers is not lost as
27 new versions of Valgrind appear.
29 * Have an extensible output format, so that future changes to the
30 format do not break backwards compatibility with existing parsers of
33 * Produce output in a form which suitable for both offline GUIs (run
34 all the way to the end, then examine output) and interactive GUIs
35 (parse XML incrementally, update display as we go).
37 * Put as much information as possible into the XML and let the GUIs
38 decide what to show the user (a.k.a provide mechanism, not policy).
40 * Make XML which is actually parseable by standard XML tools.
46 Run with flag --xml=yes. That's all. Note however several
49 * At the present time only Memcheck is supported. The scheme extends
50 easily enough to cover Helgrind if needed.
52 * When XML output is selected, various other settings are made.
53 This is in order that the output format is more controlled.
54 The settings which are changed are:
56 - Suppression generation is disabled, as that would require user
59 - Attaching to GDB is disabled for the same reason.
61 - The verbosity level is set to 1 (-v).
63 - Error limits are disabled. Usually if the program generates a lot
64 of errors, Valgrind slows down and eventually stops collecting
65 them. When outputting XML this is not the case.
67 - VEX emulation warnings are not shown.
69 - File descriptor leak checking is disabled. This could be
70 re-enabled at some future point.
72 - Maximum-detail leak checking is selected (--leak-check=full).
77 For the most part this should be self descriptive. It is printed in a
78 sort-of human-readable way for easy understanding. You may want to
79 read the rest of this together with the results of "valgrind --xml=yes
80 memcheck/tests/xml1" as an example.
82 All tags are balanced: a <foo> tag is always closed by </foo>. Hence
83 in the description that follows, mention of a tag <foo> implicitly
84 means there is a matching closing tag </foo>.
86 Symbols in CAPITALS are nonterminals in the grammar and are defined
87 somewhere below. The root nonterminal is TOPLEVEL.
89 The following nonterminals are not described further:
90 INT is a 64-bit signed decimal integer.
91 TEXT is arbitrary text.
92 HEX64 is a 64-bit hexadecimal number, with leading "0x".
94 Text strings are escaped so as to remove the <, > and & characters
95 which would otherwise mess up parsing. They are replaced respectively
96 with the standard encodings "<", ">" and "&" respectively.
97 Note this is not (yet) done throughout, only for function names in
98 <frame>..</frame> tags-pairs.
104 The first line output is always this:
106 <?xml version="1.0"?>
108 All remaining output is contained within the tag-pair
111 Inside that, the first entity is an indication of the protocol
112 version. This is provided so that existing parsers can identify XML
113 created by future versions of Valgrind merely by observing that the
114 protocol version is one they don't understand. Hence TOPLEVEL is:
116 <?xml version="1.0"?>
118 <protocolversion>INT<protocolversion>
122 Valgrind versions 3.0.0 and 3.0.1 emit protocol version 1. Versions
123 3.1.X and 3.2.X emit protocol version 2. 3.4.X emits protocol version
127 PROTOCOL for version 3
128 ----------------------
129 Changes in 3.4.X (tentative): (jrs, 1 March 2008)
131 * There may be more than one <logfilequalifier> clause.
133 * Some errors may have two <auxwhat> blocks, rather than just one
134 (resulting from merge of the DATASYMS branch)
136 * Some errors may have an ORIGIN component, indicating the origins of
137 uninitialised values. This results from the merge of the
138 OTRACK_BY_INSTRUMENTATION branch.
141 PROTOCOL for version 2
142 ----------------------
143 Version 2 is identical in every way to version 1, except that the time
146 <time>human-readable-time-string</time>
148 has changed format, and is also elapsed wallclock time since process
149 start, and not local time or any such. In fact version 1 does not
150 define the format of the string so in some ways this revision is
154 PROTOCOL for version 1
155 ----------------------
156 This is the main top-level construction. Roughly speaking, it
157 contains a load of preamble, the errors from the run of the
158 program, and the result of the final leak check. Hence the
159 following in sequence:
161 * Various preamble lines which give version info for the various
162 components. The text in them can be anything; it is not intended
163 for interpretation by the GUI:
166 <line>Misc version/copyright text</line> (zero or more of)
169 * The PID of this process and of its parent:
174 * The name of the tool being used:
178 * OPTIONALLY, if --log-file-qualifier=VAR flag was given:
180 <logfilequalifier> <var>VAR</var> <value>$VAR</value>
183 That is, both the name of the environment variable and its value
185 [update: as of v3.3.0, this is not present, as the --log-file-qualifier
186 option has been removed, replaced by the %q format specifier in --log-file.]
188 * OPTIONALLY, if --xml-user-comment=STRING was given:
190 <usercomment>STRING</usercomment>
192 STRING is not escaped in any way, so that it itself may be a piece
193 of XML with arbitrary tags etc.
195 * The program and args: first those pertaining to Valgrind itself, and
196 then those pertaining to the program to be run under Valgrind (the
202 <arg>TEXT</arg> (zero or more of)
206 <arg>TEXT</arg> (zero or more of)
210 * The following, indicating that the program has now started:
212 <status> <state>RUNNING</state>
213 <time>human-readable-time-string</time>
216 * Zero or more of (either ERROR or ERRORCOUNTS).
218 * The following, indicating that the program has now finished, and
219 that the wrapup (leak checking) is happening.
221 <status> <state>FINISHED</state>
222 <time>human-readable-time-string</time>
225 * SUPPCOUNTS, indicating how many times each suppression was used.
227 * Zero or more ERRORs, each of which is a complaint from the
235 This shows an error, and is the most complex nonterminal. The format
239 <unique>HEX64</unique>
244 optionally: <leakedbytes>INT</leakedbytes>
245 optionally: <leakedblocks>INT</leakedblocks>
249 optionally: <auxwhat>TEXT</auxwhat>
255 * Each error contains a unique, arbitrary 64-bit hex number. This is
256 used to refer to the error in ERRORCOUNTS nonterminals (see below).
258 * The <tid> tag indicates the Valgrind thread number. This value
259 is arbitrary but may be used to determine which threads produced
260 which errors (at least, the first instance of each error).
262 * The <kind> tag specifies one of a small number of fixed error
263 types (enumerated below), so that GUIs may roughly categorise
264 errors by type if they want.
266 * The <what> tag gives a human-understandable description of the
269 * For <kind> tags specifying a KIND of the form "Leak_*", the
270 optional <leakedbytes> and <leakedblocks> indicate the number of
271 bytes and blocks leaked by this error.
273 * The primary STACK for this error, indicating where it occurred.
275 * Some error types may have auxiliary information attached:
277 <auxwhat>TEXT</auxwhat> gives an auxiliary human-readable
278 description (usually of invalid addresses)
280 STACK gives an auxiliary stack (usually the allocation/free
281 point of a block). If this STACK is present then
282 <auxwhat>TEXT</auxwhat> will precede it.
287 This is a small enumeration indicating roughly the nature of an error.
288 The possible values are:
292 free/delete/delete[] on an invalid pointer
296 free/delete/delete[] does not match allocation function
297 (eg doing new[] then free on the result)
301 read of an invalid address
305 write of an invalid address
309 jump to an invalid address
313 args overlap other otherwise bogus in eg memcpy
317 invalid mem pool specified in client request
321 conditional jump/move depends on undefined value
325 other use of undefined value (primarily memory addresses)
329 system call params are undefined or point to
330 undefined/unaddressible memory
334 "error" resulting from a client check request
338 memory leak; the referenced blocks are definitely lost
342 memory leak; the referenced blocks are lost because all pointers
343 to them are also in leaked blocks
347 memory leak; only interior pointers to referenced blocks were
352 memory leak; pointers to un-freed blocks are still available
357 STACK indicates locations in the program being debugged. A STACK
358 is one or more FRAMEs. The first is the innermost frame, the
359 next its caller, etc.
368 FRAME records a single program location:
372 optionally <obj>TEXT</obj>
373 optionally <fn>TEXT</fn>
374 optionally <dir>TEXT</dir>
375 optionally <file>TEXT</file>
376 optionally <line>INT</line>
379 Only the <ip> field is guaranteed to be present. It indicates a
380 code ("instruction pointer") address.
382 The optional fields, if present, appear in the order stated:
384 * obj: gives the name of the ELF object containing the code address
386 * fn: gives the name of the function containing the code address
388 * dir: gives the source directory associated with the name specified
389 by <file>. Note the current implementation often does not
390 put anything useful in this field.
392 * file: gives the name of the source file containing the code address
394 * line: gives the line number in the source file
399 ORIGIN shows the origin of uninitialised data in errors that involve
400 uninitialised data. STACK shows the origin of the uninitialised
401 value. TEXT gives a human-understandable hint as to the meaning of
402 the information in STACK.
412 This specifies, for each error that has been so far presented,
413 the number of occurrences of that error.
417 <pair> <count>INT</count> <unique>HEX64</unique> </pair>
420 Each <pair> gives the current error count <count> for the error with
421 unique tag </unique>. The counts do not have to give a count for each
422 error so far presented - partial information is allowable.
424 As at Valgrind rev 3793, error counts are only emitted at program
425 termination. However, it is perfectly acceptable to periodically emit
426 error counts as the program is running. Doing so would facilitate a
427 GUI to dynamically update its error-count display as the program runs.
432 A SUPPCOUNTS block appears exactly once, after the program terminates.
433 It specifies the number of times each error-suppression was used.
434 Suppressions not mentioned were used zero times.
438 <pair> <count>INT</count> <name>TEXT</name> </pair>
441 The <name> is as specified in the suppression name fields in .supp