Run DCE after a LoopFlatten test to reduce spurious output [nfc]
[llvm-project.git] / clang / www / analyzer / checker_dev_manual.html
blob20b4f41765a846a2b66d694698aa8ef4c0368ae4
1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2 "http://www.w3.org/TR/html4/strict.dtd">
3 <html>
4 <head>
5 <title>Checker Developer Manual</title>
6 <link type="text/css" rel="stylesheet" href="menu.css">
7 <link type="text/css" rel="stylesheet" href="content.css">
8 <script type="text/javascript" src="scripts/menu.js"></script>
9 </head>
10 <body>
12 <div id="page">
13 <!--#include virtual="menu.html.incl"-->
15 <div id="content">
17 <h3 style="color:red">This Page Is Under Construction</h3>
19 <h1>Checker Developer Manual</h1>
21 <p>The static analyzer engine performs path-sensitive exploration of the program and
22 relies on a set of checkers to implement the logic for detecting and
23 constructing specific bug reports. Anyone who is interested in implementing their own
24 checker, should check out the Building a Checker in 24 Hours talk
25 (<a href="https://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf">slides</a>
26 <a href="https://youtu.be/kdxlsP5QVPw">video</a>)
27 and refer to this page for additional information on writing a checker. The static analyzer is a
28 part of the Clang project, so consult <a href="https://clang.llvm.org/hacking.html">Hacking on Clang</a>
29 and <a href="https://llvm.org/docs/ProgrammersManual.html">LLVM Programmer's Manual</a>
30 for developer guidelines and post your questions and proposals to the
31 <a href="https://discourse.llvm.org/c/clang/static-analyzer/"> Static Analyzer</a> subcategory at
32 the official <a href="https://discourse.llvm.org/"> LLVM Discourse server</a>.
33 </p>
35 <ul>
36 <li><a href="#start">Getting Started</a></li>
37 <li><a href="#analyzer">Static Analyzer Overview</a>
38 <ul>
39 <li><a href="#interaction">Interaction with Checkers</a></li>
40 <li><a href="#values">Representing Values</a></li>
41 </ul></li>
42 <li><a href="#idea">Idea for a Checker</a></li>
43 <li><a href="#registration">Checker Registration</a></li>
44 <li><a href="#events_callbacks">Events, Callbacks, and Checker Class Structure</a></li>
45 <li><a href="#extendingstates">Custom Program States</a></li>
46 <li><a href="#bugs">Bug Reports</a></li>
47 <li><a href="#ast">AST Visitors</a></li>
48 <li><a href="#testing">Testing</a></li>
49 <li><a href="#commands">Useful Commands/Debugging Hints</a>
50 <ul>
51 <li><a href="#attaching">Attaching the Debugger</a></li>
52 <li><a href="#narrowing">Narrowing Down the Problem</a></li>
53 <li><a href="#visualizing">Visualizing the Analysis</a></li>
54 <li><a href="#debugprints">Debug Prints and Tricks</a></li>
55 </ul></li>
56 <li><a href="#additioninformation">Additional Sources of Information</a></li>
57 <li><a href="#links">Useful Links</a></li>
58 </ul>
60 <h2 id=start>Getting Started</h2>
61 <ul>
62 <li>To check out the source code and build the project, follow steps 1-4 of
63 the <a href="https://clang.llvm.org/get_started.html">Clang Getting Started</a>
64 page.</li>
66 <li>The analyzer source code is located under the Clang source tree:
67 <br><tt>
68 $ <b>cd llvm/tools/clang</b>
69 </tt>
70 <br>See: <tt>include/clang/StaticAnalyzer</tt>, <tt>lib/StaticAnalyzer</tt>,
71 <tt>test/Analysis</tt>.</li>
73 <li>The analyzer regression tests can be executed from the Clang's build
74 directory:
75 <br><tt>
76 $ <b>cd ../../../; cd build/tools/clang; TESTDIRS=Analysis make test</b>
77 </tt></li>
79 <li>Analyze a file with the specified checker:
80 <br><tt>
81 $ <b>clang -cc1 -analyze -analyzer-checker=core.DivideZero test.c</b>
82 </tt></li>
84 <li>List the available checkers:
85 <br><tt>
86 $ <b>clang -cc1 -analyzer-checker-help</b>
87 </tt></li>
89 <li>See the analyzer help for different output formats, fine tuning, and
90 debug options:
91 <br><tt>
92 $ <b>clang -cc1 -help | grep "analyzer"</b>
93 </tt></li>
95 </ul>
97 <h2 id=analyzer>Static Analyzer Overview</h2>
98 The analyzer core performs symbolic execution of the given program. All the
99 input values are represented with symbolic values; further, the engine deduces
100 the values of all the expressions in the program based on the input symbols
101 and the path. The execution is path sensitive and every possible path through
102 the program is explored. The explored execution traces are represented with
103 <a href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedGraph.html">ExplodedGraph</a> object.
104 Each node of the graph is
105 <a href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedNode.html">ExplodedNode</a>,
106 which consists of a <tt>ProgramPoint</tt> and a <tt>ProgramState</tt>.
108 <a href="https://clang.llvm.org/doxygen/classclang_1_1ProgramPoint.html">ProgramPoint</a>
109 represents the corresponding location in the program (or the CFG).
110 <tt>ProgramPoint</tt> is also used to record additional information on
111 when/how the state was added. For example, <tt>PostPurgeDeadSymbolsKind</tt>
112 kind means that the state is the result of purging dead symbols - the
113 analyzer's equivalent of garbage collection.
115 <a href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1ProgramState.html">ProgramState</a>
116 represents abstract state of the program. It consists of:
117 <ul>
118 <li><tt>Environment</tt> - a mapping from source code expressions to symbolic
119 values
120 <li><tt>Store</tt> - a mapping from memory locations to symbolic values
121 <li><tt>GenericDataMap</tt> - constraints on symbolic values
122 </ul>
124 <h3 id=interaction>Interaction with Checkers</h3>
127 Checkers are not merely passive receivers of the analyzer core changes - they
128 actively participate in the <tt>ProgramState</tt> construction through the
129 <tt>GenericDataMap</tt> which can be used to store the checker-defined part
130 of the state. Each time the analyzer engine explores a new statement, it
131 notifies each checker registered to listen for that statement, giving it an
132 opportunity to either report a bug or modify the state. (As a rule of thumb,
133 the checker itself should be stateless.) The checkers are called one after another
134 in the predefined order; thus, calling all the checkers adds a chain to the
135 <tt>ExplodedGraph</tt>.
136 </p>
138 <h3 id=values>Representing Values</h3>
141 During symbolic execution, <a href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1SVal.html">SVal</a>
142 objects are used to represent the semantic evaluation of expressions.
143 They can represent things like concrete
144 integers, symbolic values, or memory locations (which are memory regions).
145 They are a discriminated union of "values", symbolic and otherwise.
146 If a value isn't symbolic, usually that means there is no symbolic
147 information to track. For example, if the value was an integer, such as
148 <tt>42</tt>, it would be a <a href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1nonloc_1_1ConcreteInt.html">ConcreteInt</a>,
149 and the checker doesn't usually need to track any state with the concrete
150 number. In some cases, <tt>SVal</tt> is not a symbol, but it really should be
151 a symbolic value. This happens when the analyzer cannot reason about something
152 (yet). An example is floating point numbers. In such cases, the
153 <tt>SVal</tt> will evaluate to <a href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1UnknownVal.html">UnknownVal</a>.
154 This represents a case that is outside the realm of the analyzer's reasoning
155 capabilities. <tt>SVals</tt> are value objects and their values can be viewed
156 using the <tt>.dump()</tt> method. Often they wrap persistent objects such as
157 symbols or regions.
158 </p>
161 <a href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymExpr.html">SymExpr</a> (symbol)
162 is meant to represent abstract, but named, symbolic value. Symbols represent
163 an actual (immutable) value. We might not know what its specific value is, but
164 we can associate constraints with that value as we analyze a path. For
165 example, we might record that the value of a symbol is greater than
166 <tt>0</tt>, etc.
167 </p>
170 <a href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1MemRegion.html">MemRegion</a> is similar to a symbol.
171 It is used to provide a lexicon of how to describe abstract memory. Regions can
172 layer on top of other regions, providing a layered approach to representing memory.
173 For example, a struct object on the stack might be represented by a <tt>VarRegion</tt>,
174 but a <tt>FieldRegion</tt> which is a subregion of the <tt>VarRegion</tt> could
175 be used to represent the memory associated with a specific field of that object.
176 So how do we represent symbolic memory regions? That's what
177 <a href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymbolicRegion.html">SymbolicRegion</a>
178 is for. It is a <tt>MemRegion</tt> that has an associated symbol. Since the
179 symbol is unique and has a unique name; that symbol names the region.
180 </p>
183 Let's see how the analyzer processes the expressions in the following example:
184 </p>
187 <pre class="code_example">
188 int foo(int x) {
189 int y = x * 2;
190 int z = x;
193 </pre>
194 </p>
197 Let's look at how <tt>x*2</tt> gets evaluated. When <tt>x</tt> is evaluated,
198 we first construct an <tt>SVal</tt> that represents the lvalue of <tt>x</tt>, in
199 this case it is an <tt>SVal</tt> that references the <tt>MemRegion</tt> for <tt>x</tt>.
200 Afterwards, when we do the lvalue-to-rvalue conversion, we get a new <tt>SVal</tt>,
201 which references the value <b>currently bound</b> to <tt>x</tt>. That value is
202 symbolic; it's whatever <tt>x</tt> was bound to at the start of the function.
203 Let's call that symbol <tt>$0</tt>. Similarly, we evaluate the expression for <tt>2</tt>,
204 and get an <tt>SVal</tt> that references the concrete number <tt>2</tt>. When
205 we evaluate <tt>x*2</tt>, we take the two <tt>SVals</tt> of the subexpressions,
206 and create a new <tt>SVal</tt> that represents their multiplication (which in
207 this case is a new symbolic expression, which we might call <tt>$1</tt>). When we
208 evaluate the assignment to <tt>y</tt>, we again compute its lvalue (a <tt>MemRegion</tt>),
209 and then bind the <tt>SVal</tt> for the RHS (which references the symbolic value <tt>$1</tt>)
210 to the <tt>MemRegion</tt> in the symbolic store.
211 <br>
212 The second line is similar. When we evaluate <tt>x</tt> again, we do the same
213 dance, and create an <tt>SVal</tt> that references the symbol <tt>$0</tt>. Note, two <tt>SVals</tt>
214 might reference the same underlying values.
215 </p>
218 To summarize, MemRegions are unique names for blocks of memory. Symbols are
219 unique names for abstract symbolic values. Some MemRegions represents abstract
220 symbolic chunks of memory, and thus are also based on symbols. SVals are just
221 references to values, and can reference either MemRegions, Symbols, or concrete
222 values (e.g., the number 1).
223 </p>
225 <!--
226 TODO: Add a picture.
227 <br>
228 Symbols<br>
229 FunctionalObjects are used throughout.
232 <h2 id=idea>Idea for a Checker</h2>
233 Here are several questions which you should consider when evaluating your
234 checker idea:
235 <ul>
236 <li>Can the check be effectively implemented without path-sensitive
237 analysis? See <a href="#ast">AST Visitors</a>.</li>
239 <li>How high the false positive rate is going to be? Looking at the occurrences
240 of the issue you want to write a checker for in the existing code bases might
241 give you some ideas. </li>
243 <li>How the current limitations of the analysis will effect the false alarm
244 rate? Currently, the analyzer only reasons about one procedure at a time (no
245 inter-procedural analysis). Also, it uses a simple range tracking based
246 solver to model symbolic execution.</li>
248 <li>Consult the <a
249 href="https://github.com/llvm/llvm-project/labels/clang%3Astatic%20analyzer">GitHub Issues</a>
250 to get some ideas for new checkers and consider starting with improving/fixing
251 bugs in the existing checkers.</li>
252 </ul>
254 <p>Once an idea for a checker has been chosen, there are two key decisions that
255 need to be made:
256 <ul>
257 <li> Which events the checker should be tracking. This is discussed in more
258 detail in the section <a href="#events_callbacks">Events, Callbacks, and
259 Checker Class Structure</a>.
260 <li> What checker-specific data needs to be stored as part of the program
261 state (if any). This should be minimized as much as possible. More detail about
262 implementing custom program state is given in section <a
263 href="#extendingstates">Custom Program States</a>.
264 </ul>
267 <h2 id=registration>Checker Registration</h2>
268 All checker implementation files are located in
269 <tt>clang/lib/StaticAnalyzer/Checkers</tt> folder. The steps below describe
270 how the checker <tt>SimpleStreamChecker</tt>, which checks for misuses of
271 stream APIs, was registered with the analyzer.
272 Similar steps should be followed for a new checker.
273 <ol>
274 <li>A new checker implementation file, <tt>SimpleStreamChecker.cpp</tt>, was
275 created in the directory <tt>lib/StaticAnalyzer/Checkers</tt>.
276 <li>The following registration code was added to the implementation file:
277 <pre class="code_example">
278 void ento::registerSimpleStreamChecker(CheckerManager &amp;mgr) {
279 mgr.registerChecker&lt;SimpleStreamChecker&gt();
281 </pre>
282 <li>A package was selected for the checker and the checker was defined in the
283 table of checkers at <tt>include/clang/StaticAnalyzer/Checkers/Checkers.td</tt>.
284 Since all checkers should first be developed as "alpha", and the SimpleStreamChecker
285 performs UNIX API checks, the correct package is "alpha.unix", and the following
286 was added to the corresponding <tt>UnixAlpha</tt> section of <tt>Checkers.td</tt>:
287 <pre class="code_example">
288 let ParentPackage = UnixAlpha in {
290 def SimpleStreamChecker : Checker<"SimpleStream">,
291 HelpText<"Check for misuses of stream APIs">,
292 DescFile<"SimpleStreamChecker.cpp">;
294 } // end "alpha.unix"
295 </pre>
297 <li>The source code file was made visible to CMake by adding it to
298 <tt>lib/StaticAnalyzer/Checkers/CMakeLists.txt</tt>.
300 </ol>
302 After adding a new checker to the analyzer, one can verify that the new checker
303 was successfully added by seeing if it appears in the list of available checkers:
304 <br> <tt><b>$clang -cc1 -analyzer-checker-help</b></tt>
306 <h2 id=events_callbacks>Events, Callbacks, and Checker Class Structure</h2>
308 <p> All checkers inherit from the <tt><a
309 href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1Checker.html">
310 Checker</a></tt> template class; the template parameter(s) describe the type of
311 events that the checker is interested in processing. The various types of events
312 that are available are described in the file <a
313 href="https://clang.llvm.org/doxygen/CheckerDocumentation_8cpp_source.html">
314 CheckerDocumentation.cpp</a>
316 <p> For each event type requested, a corresponding callback function must be
317 defined in the checker class (<a
318 href="https://clang.llvm.org/doxygen/CheckerDocumentation_8cpp_source.html">
319 CheckerDocumentation.cpp</a> shows the
320 correct function name and signature for each event type).
322 <p> As an example, consider <tt>SimpleStreamChecker</tt>. This checker needs to
323 take action at the following times:
325 <ul>
326 <li>Before making a call to a function, check if the function is <tt>fclose</tt>.
327 If so, check the parameter being passed.
328 <li>After making a function call, check if the function is <tt>fopen</tt>. If
329 so, process the return value.
330 <li>When values go out of scope, check whether they are still-open file
331 descriptors, and report a bug if so. In addition, remove any information about
332 them from the program state in order to keep the state as small as possible.
333 <li>When file pointers "escape" (are used in a way that the analyzer can no longer
334 track them), mark them as such. This prevents false positives in the cases where
335 the analyzer cannot be sure whether the file was closed or not.
336 </ul>
338 <p>These events that will be used for each of these actions are, respectively, <a
339 href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PreCall.html">PreCall</a>,
341 href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PostCall.html">PostCall</a>,
343 href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1DeadSymbols.html">DeadSymbols</a>,
344 and <a
345 href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PointerEscape.html">PointerEscape</a>.
346 The high-level structure of the checker's class is thus:
348 <pre class="code_example">
349 class SimpleStreamChecker : public Checker&lt;check::PreCall,
350 check::PostCall,
351 check::DeadSymbols,
352 check::PointerEscape&gt; {
353 public:
355 void checkPreCall(const CallEvent &amp;Call, CheckerContext &amp;C) const;
357 void checkPostCall(const CallEvent &amp;Call, CheckerContext &amp;C) const;
359 void checkDeadSymbols(SymbolReaper &amp;SR, CheckerContext &amp;C) const;
361 ProgramStateRef checkPointerEscape(ProgramStateRef State,
362 const InvalidatedSymbols &amp;Escaped,
363 const CallEvent *Call,
364 PointerEscapeKind Kind) const;
366 </pre>
368 <h2 id=extendingstates>Custom Program States</h2>
370 <p> Checkers often need to keep track of information specific to the checks they
371 perform. However, since checkers have no guarantee about the order in which the
372 program will be explored, or even that all possible paths will be explored, this
373 state information cannot be kept within individual checkers. Therefore, if
374 checkers need to store custom information, they need to add new categories of
375 data to the <tt>ProgramState</tt>. The preferred way to do so is to use one of
376 several macros designed for this purpose. They are:
378 <ul>
379 <li><a
380 href="https://clang.llvm.org/doxygen/ProgramStateTrait_8h.html#ae4cddb54383cd702a045d7c61b009147">REGISTER_TRAIT_WITH_PROGRAMSTATE</a>:
381 Used when the state information is a single value. The methods available for
382 state types declared with this macro are <tt>get</tt>, <tt>set</tt>, and
383 <tt>remove</tt>.
384 <li><a
385 href="https://clang.llvm.org/doxygen/ProgramStateTrait_8h.html#aa27656fa0ce65b0d9ba12eb3c02e8be9">REGISTER_LIST_WITH_PROGRAMSTATE</a>:
386 Used when the state information is a list of values. The methods available for
387 state types declared with this macro are <tt>add</tt>, <tt>get</tt>,
388 <tt>remove</tt>, and <tt>contains</tt>.
389 <li><a
390 href="https://clang.llvm.org/doxygen/ProgramStateTrait_8h.html#ad90f9387b94b344eaaf499afec05f4d1">REGISTER_SET_WITH_PROGRAMSTATE</a>:
391 Used when the state information is a set of values. The methods available for
392 state types declared with this macro are <tt>add</tt>, <tt>get</tt>,
393 <tt>remove</tt>, and <tt>contains</tt>.
394 <li><a
395 href="https://clang.llvm.org/doxygen/ProgramStateTrait_8h.html#a6d1893bb8c18543337b6c363c1319fcf">REGISTER_MAP_WITH_PROGRAMSTATE</a>:
396 Used when the state information is a map from a key to a value. The methods
397 available for state types declared with this macro are <tt>add</tt>,
398 <tt>set</tt>, <tt>get</tt>, <tt>remove</tt>, and <tt>contains</tt>.
399 </ul>
401 <p>All of these macros take as parameters the name to be used for the custom
402 category of state information and the data type(s) to be used for storage. The
403 data type(s) specified will become the parameter type and/or return type of the
404 methods that manipulate the new category of state information. Each of these
405 methods are templated with the name of the custom data type.
407 <p>For example, a common case is the need to track data associated with a
408 symbolic expression; a map type is the most logical way to implement this. The
409 key for this map will be a pointer to a symbolic expression
410 (<tt>SymbolRef</tt>). If the data type to be associated with the symbolic
411 expression is an integer, then the custom category of state information would be
412 declared as
414 <pre class="code_example">
415 REGISTER_MAP_WITH_PROGRAMSTATE(ExampleDataType, SymbolRef, int)
416 </pre>
418 The data would be accessed with the function
420 <pre class="code_example">
421 ProgramStateRef state;
422 SymbolRef Sym;
424 int currentlValue = state-&gt;get&lt;ExampleDataType&gt;(Sym);
425 </pre>
427 and set with the function
429 <pre class="code_example">
430 ProgramStateRef state;
431 SymbolRef Sym;
432 int newValue;
434 ProgramStateRef newState = state-&gt;set&lt;ExampleDataType&gt;(Sym, newValue);
435 </pre>
437 <p>In addition, the macros define a data type used for storing the data of the
438 new data category; the name of this type is the name of the data category with
439 "Ty" appended. For <tt>REGISTER_TRAIT_WITH_PROGRAMSTATE</tt>, this will simply
440 be passed data type; for the other three macros, this will be a specialized
441 version of the <a
442 href="https://llvm.org/doxygen/classllvm_1_1ImmutableList.html">llvm::ImmutableList</a>,
444 href="https://llvm.org/doxygen/classllvm_1_1ImmutableSet.html">llvm::ImmutableSet</a>,
445 or <a
446 href="https://llvm.org/doxygen/classllvm_1_1ImmutableMap.html">llvm::ImmutableMap</a>
447 templated class. For the <tt>ExampleDataType</tt> example above, the type
448 created would be equivalent to writing the declaration:
450 <pre class="code_example">
451 using ExampleDataTypeTy = llvm::ImmutableMap&lt;SymbolRef, int&gt;;
452 </pre>
454 <p>These macros will cover a majority of use cases; however, they still have a
455 few limitations. They cannot be used inside namespaces (since they expand to
456 contain top-level namespace references), and the data types that they define
457 cannot be referenced from more than one file.
459 <p>Note that <tt>ProgramStates</tt> are immutable; instead of modifying an existing
460 one, functions that modify the state will return a copy of the previous state
461 with the change applied. This updated state must be then provided to the
462 analyzer core by calling the <tt>CheckerContext::addTransition</tt> function.
463 <h2 id=bugs>Bug Reports</h2>
466 <p> When a checker detects a mistake in the analyzed code, it needs a way to
467 report it to the analyzer core so that it can be displayed. The two classes used
468 to construct this report are <tt><a
469 href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1BugType.html">BugType</a></tt>
470 and <tt><a
471 href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1BugReport.html">
472 BugReport</a></tt>.
475 <tt>BugType</tt>, as the name would suggest, represents a type of bug. The
476 constructor for <tt>BugType</tt> takes two parameters: The name of the bug
477 type, and the name of the category of the bug. These are used (e.g.) in the
478 summary page generated by the scan-build tool.
481 The <tt>BugReport</tt> class represents a specific occurrence of a bug. In
482 the most common case, three parameters are used to form a <tt>BugReport</tt>:
483 <ol>
484 <li>The type of bug, specified as an instance of the <tt>BugType</tt> class.
485 <li>A short descriptive string. This is placed at the location of the bug in
486 the detailed line-by-line output generated by scan-build.
487 <li>The context in which the bug occurred. This includes both the location of
488 the bug in the program and the program's state when the location is reached. These are
489 both encapsulated in an <tt>ExplodedNode</tt>.
490 </ol>
492 <p>In order to obtain the correct <tt>ExplodedNode</tt>, a decision must be made
493 as to whether or not analysis can continue along the current path. This decision
494 is based on whether the detected bug is one that would prevent the program under
495 analysis from continuing. For example, leaking of a resource should not stop
496 analysis, as the program can continue to run after the leak. Dereferencing a
497 null pointer, on the other hand, should stop analysis, as there is no way for
498 the program to meaningfully continue after such an error.
500 <p>If analysis can continue, then the most recent <tt>ExplodedNode</tt>
501 generated by the checker can be passed to the <tt>BugReport</tt> constructor
502 without additional modification. This <tt>ExplodedNode</tt> will be the one
503 returned by the most recent call to <a
504 href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#a9717efea3fbc71523984160ae7ae9d41">CheckerContext::addTransition</a>.
505 If no transition has been performed during the current callback, the checker should call <a
506 href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#a9717efea3fbc71523984160ae7ae9d41">CheckerContext::addTransition()</a>
507 and use the returned node for bug reporting.
509 <p>If analysis can not continue, then the current state should be transitioned
510 into a so-called <i>sink node</i>, a node from which no further analysis will be
511 performed. This is done by calling the <a
512 href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#a249245cdf2384738921f134c8d7d909a">
513 CheckerContext::generateSink</a> function; this function is the same as the
514 <tt>addTransition</tt> function, but marks the state as a sink node. Like
515 <tt>addTransition</tt>, this returns an <tt>ExplodedNode</tt> with the updated
516 state, which can then be passed to the <tt>BugReport</tt> constructor.
519 After a <tt>BugReport</tt> is created, it should be passed to the analyzer core
520 by calling <a href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#af50a9f46f6ea787a2a8e4ad7f86576e7">CheckerContext::emitReport</a>.
522 <h2 id=ast>AST Visitors</h2>
523 Some checks might not require path-sensitivity to be effective. Simple AST walk
524 might be sufficient. If that is the case, consider implementing a Clang
525 compiler warning. On the other hand, a check might not be acceptable as a compiler
526 warning; for example, because of a relatively high false positive rate. In this
527 situation, AST callbacks <tt><b>checkASTDecl</b></tt> and
528 <tt><b>checkASTCodeBody</b></tt> are your best friends.
530 <h2 id=testing>Testing</h2>
531 Every patch should be well tested with Clang regression tests. The checker tests
532 live in <tt>clang/test/Analysis</tt> folder. To run all of the analyzer tests,
533 execute the following from the <tt>clang</tt> build directory:
534 <pre class="code">
535 $ <b>bin/llvm-lit -sv ../llvm/tools/clang/test/Analysis</b>
536 </pre>
538 <h2 id=commands>Useful Commands/Debugging Hints</h2>
540 <h3 id=attaching>Attaching the Debugger</h3>
542 <p>When your command contains the <tt><b>-cc1</b></tt> flag, you can attach the
543 debugger to it directly:</p>
545 <pre class="code">
546 $ <b>gdb --args clang -cc1 -analyze -analyzer-checker=core test.c</b>
547 $ <b>lldb -- clang -cc1 -analyze -analyzer-checker=core test.c</b>
548 </pre>
551 Otherwise, if your command line contains <tt><b>--analyze</b></tt>,
552 the actual clang instance would be run in a separate process. In
553 order to debug it, use the <tt><b>-###</b></tt> flag for obtaining
554 the command line of the child process:
555 </p>
557 <pre class="code">
558 $ <b>clang --analyze test.c -\#\#\#</b>
559 </pre>
562 Below we describe a few useful command line arguments, all of which assume that
563 you are running <tt><b>clang -cc1</b></tt>.
564 </p>
566 <h3 id=narrowing>Narrowing Down the Problem</h3>
568 <p>While investigating a checker-related issue, instruct the analyzer to only
569 execute a single checker:
570 </p>
571 <pre class="code">
572 $ <b>clang -cc1 -analyze -analyzer-checker=osx.KeychainAPI test.c</b>
573 </pre>
575 <p>If you are experiencing a crash, to see which function is failing while
576 processing a large file use the <tt><b>-analyzer-display-progress</b></tt>
577 option.</p>
579 <p>To selectively analyze only the given function, use the
580 <tt><b>-analyze-function</b></tt> option:</p>
581 <pre class="code">
582 $ <b>clang -cc1 -analyze -analyzer-checker=core test.c -analyzer-display-progress</b>
583 ANALYZE (Syntax): test.c foo
584 ANALYZE (Syntax): test.c bar
585 ANALYZE (Path, Inline_Regular): test.c bar
586 ANALYZE (Path, Inline_Regular): test.c foo
587 $ <b>clang -cc1 -analyze -analyzer-checker=core test.c -analyzer-display-progress -analyze-function=foo</b>
588 ANALYZE (Syntax): test.c foo
589 ANALYZE (Path, Inline_Regular): test.c foo
590 </pre>
592 <b>Note: </b> a fully qualified function name has to be used when selecting
593 C++ functions and methods, Objective-C methods and blocks, e.g.:
595 <pre class="code">
596 $ <b>clang -cc1 -analyze -analyzer-checker=core test.cc -analyze-function='foo(int)'</b>
597 </pre>
599 The fully qualified name can be found from the
600 <tt><b>-analyzer-display-progress</b></tt> output.
602 <p>The bug reporter mechanism removes path diagnostics inside intermediate
603 function calls that have returned by the time the bug was found and contain
604 no interesting pieces. Usually it is up to the checkers to produce more
605 interesting pieces by adding custom <tt>BugReporterVisitor</tt> objects.
606 However, you can disable path pruning while debugging with the
607 <tt><b>-analyzer-config prune-paths=false</b></tt> option.
609 <h3 id=visualizing>Visualizing the Analysis</h3>
611 <p>To dump the AST, which often helps understanding how the program should
612 behave:</p>
613 <pre class="code">
614 $ <b>clang -cc1 -ast-dump test.c</b>
615 </pre>
617 <p>To view/dump CFG use <tt>debug.ViewCFG</tt> or <tt>debug.DumpCFG</tt>
618 checkers:</p>
619 <pre class="code">
620 $ <b>clang -cc1 -analyze -analyzer-checker=debug.ViewCFG test.c</b>
621 </pre>
623 <p><tt>ExplodedGraph</tt> (the state graph explored by the analyzer) can be
624 visualized with another debug checker:</p>
625 <pre class="code">
626 $ <b>clang -cc1 -analyze -analyzer-checker=debug.ViewExplodedGraph test.c</b>
627 </pre>
628 <p>Or, equivalently, with <tt><b>-analyzer-viz-egraph-graphviz</b></tt>
629 option, which does the same thing - dumps the exploded graph in graphviz
630 <tt><b>.dot</b></tt> format.</p>
632 <p>You can convert <tt><b>.dot</b></tt> files into other formats - in
633 particular, converting to <tt><b>.svg</b></tt> and viewing in your web
634 browser might be more comfortable than using a <tt><b>.dot</b></tt> viewer:</p>
635 <pre class="code">
636 $ <b>dot -Tsvg ExprEngine-501e2e.dot -o ExprEngine-501e2e.svg</b>
637 </pre>
639 <p>The <tt><b>-trim-egraph</b></tt> option removes all paths except those
640 leading to bug reports from the exploded graph dump. This is useful
641 because exploded graphs are often huge and hard to navigate.</p>
643 <p>Viewing <tt>ExplodedGraph</tt> is your most powerful tool for understanding
644 the analyzer's false positives, because it gives comprehensive information
645 on every decision made by the analyzer across all analysis paths.</p>
647 <p>There are more debug checkers available. To see all available debug checkers:
648 </p>
649 <pre class="code">
650 $ <b>clang -cc1 -analyzer-checker-help | grep "debug"</b>
651 </pre>
653 <h3 id=debugprints>Debug Prints and Tricks</h3>
655 <p>To view "half-baked" <tt>ExplodedGraph</tt> while debugging, jump to a frame
656 that has <tt>clang::ento::ExprEngine</tt> object and execute:</p>
657 <pre class="code">
658 (gdb) <b>p ViewGraph(0)</b>
659 </pre>
661 <p>To see the <tt>ProgramState</tt> while debugging use the following command.
662 <pre class="code">
663 (gdb) <b>p State->dump()</b>
664 </pre>
666 <p>To see <tt>clang::Expr</tt> while debugging use the following command. If you
667 pass in a <tt>SourceManager</tt> object, it will also dump the corresponding line in the
668 source code.</p>
669 <pre class="code">
670 (gdb) <b>p E->dump()</b>
671 </pre>
673 <p>To dump AST of a method that the current <tt>ExplodedNode</tt> belongs
674 to:</p>
675 <pre class="code">
676 (gdb) <b>p C.getPredecessor()->getCodeDecl().getBody()->dump()</b>
677 </pre>
679 <h2 id=links>Making Your Checker Better</h2>
680 <ul>
681 <li>User facing documentation is important for adoption! Make sure the <a href="/available_checks.html">checker list </a>is updated
682 at the homepage of the analyzer. Also ensure the description is clear to
683 non-analyzer-developers in <tt>Checkers.td</tt>.</li>
684 <li>Warning and note messages should be clear and easy to understand, even if a bit long.</li>
685 <ul>
686 <li>Messages should start with a capital letter (unlike Clang warnings!) and should not
687 end with <tt>.</tt>.</li>
688 <li>Articles are usually omitted, eg. <tt>Dereference of a null pointer</tt> ->
689 <tt>Dereference of null pointer</tt>.</li>
690 <li>Introduce <tt>BugReporterVisitor</tt>s to emit additional notes that explain the warning
691 to the user better. There are some existing visitors that might be useful for your check,
692 e.g. <tt>trackNullOrUndefValue</tt>. For example, SimpleStreamChecker should highlight
693 the event of opening the file when reporting a file descriptor leak.</li>
694 </ul>
695 <li>If the check tracks anything in the program state, it needs to implement the
696 <tt>checkDeadSymbols</tt>callback to clean the state up.</li>
697 <li>The check should conservatively assume that the program is correct when a tracked symbol
698 is passed to a function that is unknown to the analyzer.
699 <tt>checkPointerEscape</tt> callback could help you handle that case.</li>
700 <li>Use safe and convenient APIs!</li>
701 <ul>
702 <li>Always use <tt>CheckerContext::generateErrorNode</tt> and
703 <tt>CheckerContext::generateNonFatalErrorNode</tt> for emitting bug reports.
704 Most importantly, never emit report against <tt>CheckerContext::getPredecessor</tt>.</li>
705 <li>Prefer <tt>checkPreCall</tt> and <tt>checkPostCall</tt> to
706 <tt>checkPreStmt&lt;CallExpr&gt;</tt> and <tt>checkPostStmt&lt;CallExpr&gt;</tt>.</li>
707 <li>Use <tt>CallDescription</tt> to detect hardcoded API calls in the program.</li>
708 <li>Simplify <tt>C.getState()->getSVal(E, C.getLocationContext())</tt> to <tt>C.getSVal(E)</tt>.</li>
709 </ul>
710 <li>Common sources of crashes:</li>
711 <ul>
712 <li><tt>CallEvent::getOriginExpr</tt> is nullable - for example, it returns null for an
713 automatic destructor of a variable. The same applies to some values generated while the
714 call was modeled, eg. <tt>SymbolConjured::getStmt</tt> is nullable.</li>
715 <li><tt>CallEvent::getDecl</tt> is nullable - for example, it returns null for a
716 call of symbolic function pointer.</li>
717 <li><tt>addTransition</tt>, <tt>generateSink</tt>, <tt>generateNonFatalErrorNode</tt>,
718 <tt>generateErrorNode</tt> are nullable because you can transition to a node that you have already visited.</li>
719 <li>Methods of <tt>CallExpr</tt>/<tt>FunctionDecl</tt>/<tt>CallEvent</tt> that
720 return arguments crash when the argument is out-of-bounds. If you checked the function name,
721 it doesn't mean that the function has the expected number of arguments!
722 Which is why you should use <tt>CallDescription</tt>.</li>
723 <li>Nullability of different entities within different kinds of symbols and regions is usually
724 documented via assertions in their constructors.</li>
725 <li><tt>NamedDecl::getName</tt> will fail if the name of the declaration is not a single token,
726 e.g. for destructors. You could use <tt>NamedDecl::getNameAsString</tt> for those cases.
727 Note that this method is much slower and should be used sparringly, e.g. only when generating reports
728 but not during analysis.</li>
729 <li>Is <tt>-analyzer-checker=core</tt> included in all test <tt>RUN:</tt> lines? It was never supported
730 to run the analyzer with the core checks disabled. It might cause unexpected behavior and
731 crashes. You should do all your testing with the core checks enabled.</li>
732 </ul>
733 </ul>
734 <li>Patterns that you should most likely avoid even if they're not technically wrong:</li>
735 <ul>
736 <li><tt>BugReporterVisitor</tt> should most likely not match the AST of the current program point
737 to decide when to emit a note. It is much easier to determine that by observing changes in
738 the program state.</li>
739 <li>In <tt>State->getSVal(Region)</tt>, if <tt>Region</tt> is not known to be a <tt>TypedValueRegion</tt>
740 and the optional type argument is not specified, the checker may accidentally try to dereference a
741 void pointer.</li>
742 <li>Checker logic should not depend on whether a certain value is a <tt>Loc</tt> or <tt>NonLoc</tt>.
743 It should be immediately obvious whether the <tt>SVal</tt> is a <tt>Loc</tt> or a
744 <tt>NonLoc</tt> depending on the AST that is being checked. Checking whether a value
745 is <tt>Loc</tt> or <tt>Unknown</tt>/<tt>Undefined</tt> or whether the value is
746 <tt>NonLoc</tt> or <tt>Unknown</tt>/<tt>Undefined</tt> is totally fine.</li>
747 <li>New symbols should not be constructed in the checker via direct calls to <tt>SymbolManager</tt>,
748 unless they are of <tt>SymbolMetadata</tt> class tagged by the checker,
749 or they represent newly created values such as the return value in <tt>evalCall</tt>.
750 For modeling arithmetic/bitwise/comparison operations, <tt>SValBuilder</tt> should be used.</li>
751 <li>Custom <tt>ProgramPointTag</tt>s should not be created within the checker. There is usually
752 no good reason for a checker to chain multiple nodes together, because checkers aren't worklists.</li>
753 </ul>
754 <li>Checkers are encouraged to actively participate in the analysis by sharing
755 their knowledge about the program state with the rest of the analyzer,
756 but they should not be disrupting the analysis unnecessarily:</li>
757 <ul>
758 <li>If a checker splits program state, this must be based on knowledge that
759 the newly appearing branches are definitely possible and worth exploring
760 from the user's perspective. Otherwise the state split should be delayed
761 until there's an indication that one of the paths is taken, or one of the
762 paths needs to be dropped entirely. For example, it is fine to eagerly split
763 paths while modeling <tt>isalpha(x)</tt> as long as <tt>x</tt> is constrained accordingly on
764 each path. At the same time, it is not a good idea to split paths over the
765 return value of <tt>printf()</tt> while modeling the call because nobody ever checks
766 for errors in <tt>printf</tt>; at best, it'd just double the remaining analysis time.
767 </li>
768 <li>Caution is advised when using <tt>CheckerContext::generateNonFatalErrorNode</tt>
769 because it generates an independent transition, much like <tt>addTransition</tt>.
770 It is easy to accidentally split paths while using it. Ideally, try to
771 structure the code so that it was obvious that every <tt>addTransition</tt> or
772 <tt>generateNonFatalErrorNode</tt> (or sequence of such if the split is intended) is
773 immediately followed by return from the checker callback.</li>
774 <li>Multiple implementations of <tt>evalCall</tt> in different checkers should not conflict.</li>
775 <li>When implementing <tt>evalAssume</tt>, the checker should always return a non-null state
776 for either the true assumption or the false assumption (or both).</li>
777 <li>Checkers shall not mutate values of expressions, i.e. use the <tt>ProgramState::BindExpr</tt> API,
778 unless they are fully responsible for computing the value.
779 Under no circumstances should they change non-<tt>Unknown</tt> values of expressions.
780 Currently the only valid use case for this API in checkers is to model the return value in the <tt>evalCall</tt> callback.
781 If expression values are incorrect, <tt>ExprEngine</tt> needs to be fixed instead.</li>
782 </ul>
784 <h2 id=additioninformation>Additional Sources of Information</h2>
786 Here are some additional resources that are useful when working on the Clang
787 Static Analyzer:
789 <ul>
790 <li><a href="https://lcs.ios.ac.cn/~xzx/memmodel.pdf">Xu, Zhongxing &
791 Kremenek, Ted & Zhang, Jian. (2010). A Memory Model for Static Analysis of C
792 Programs.</a></li>
793 <li><a href="https://github.com/llvm/llvm-project/blob/main/clang/lib/StaticAnalyzer/README.txt">
794 The Clang Static Analyzer README</a></li>
795 <li><a href="https://github.com/llvm/llvm-project/blob/main/clang/docs/analyzer/developer-docs/RegionStore.rst">
796 Documentation for how the Store works</a></li>
797 <li><a href="https://github.com/llvm/llvm-project/blob/main/clang/docs/analyzer/developer-docs/IPA.rst">
798 Documentation about inlining</a></li>
799 <li> The "Building a Checker in 24 hours" presentation given at the <a
800 href="https://llvm.org/devmtg/2012-11">November 2012 LLVM Developer's
801 meeting</a>. Describes the construction of SimpleStreamChecker. <a
802 href="https://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf">Slides</a>
803 and <a
804 href="https://youtu.be/kdxlsP5QVPw">video</a>
805 are available.</li>
806 <li>
807 <a href="https://github.com/haoNoQ/clang-analyzer-guide/releases/download/v0.1/clang-analyzer-guide-v0.1.pdf">
808 Artem Degrachev: Clang Static Analyzer: A Checker Developer's Guide
809 </a> (reading the previous items first might be a good idea)</li>
810 <li>The list of <a href="implicit_checks.html">Implicit Checkers</a></li>
811 <li> <a href="https://clang.llvm.org/doxygen">Clang doxygen</a>. Contains
812 up-to-date documentation about the APIs available in Clang. Relevant entries
813 have been linked throughout this page. Also of use is the
814 <a href="https://llvm.org/doxygen">LLVM doxygen</a>, when dealing with classes
815 from LLVM.</li>
816 <li>
817 The <a href="https://discourse.llvm.org/c/clang/"> Clang Frontend Discourse site</a>.
818 This is the primary forum discussing ideas and posting questions about Clang development.
819 For posting Clang Static Analyzer specific questions, please visit the
820 <a href="https://discourse.llvm.org/c/clang/static-analyzer/"> Static Analyzer subcategory</a>
821 of the same site. In the past, Static Analyzer discussions took place at the
822 <a href="https://lists.llvm.org/pipermail/cfe-dev/"> cfe-dev</a> mailing list, which is now
823 archived and superseeded by the mentioned Discourse site.
824 </li>
825 </ul>
827 </div>
828 </div>
829 </body>
830 </html>