clang/www/analyzer/checker_dev_manual.html

   1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
   2           "http://www.w3.org/TR/html4/strict.dtd">
   3 <html>
   4 <head>
   5   <title>Checker Developer Manual</title>
   6   <link type="text/css" rel="stylesheet" href="menu.css">
   7   <link type="text/css" rel="stylesheet" href="content.css">
   8   <script type="text/javascript" src="scripts/menu.js"></script>
   9 </head>
  10 <body>
  11
  12 <div id="page">
  13 <!--#include virtual="menu.html.incl"-->
  14
  15 <div id="content">
  16
  17 <h3 style="color:red">This Page Is Under Construction</h3>
  18
  19 <h1>Checker Developer Manual</h1>
  20
  21 <p>The static analyzer engine performs path-sensitive exploration of the program and
  22 relies on a set of checkers to implement the logic for detecting and
  23 constructing specific bug reports. Anyone who is interested in implementing their own
  24 checker, should check out the Building a Checker in 24 Hours talk
  25 (<a href="https://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf">slides</a>
  26  <a href="https://youtu.be/kdxlsP5QVPw">video</a>)
  27 and refer to this page for additional information on writing a checker. The static analyzer is a
  28 part of the Clang project, so consult <a href="https://clang.llvm.org/hacking.html">Hacking on Clang</a>
  29 and <a href="https://llvm.org/docs/ProgrammersManual.html">LLVM Programmer's Manual</a>
  30 for developer guidelines and post your questions and proposals to the
  31 <a href="https://discourse.llvm.org/c/clang/static-analyzer/"> Static Analyzer</a> subcategory at
  32 the official <a href="https://discourse.llvm.org/"> LLVM Discourse server</a>.
  33 </p>
  34
  35     <ul>
  36       <li><a href="#start">Getting Started</a></li>
  37       <li><a href="#analyzer">Static Analyzer Overview</a>
  38       <ul>
  39         <li><a href="#interaction">Interaction with Checkers</a></li>
  40         <li><a href="#values">Representing Values</a></li>
  41       </ul></li>
  42       <li><a href="#idea">Idea for a Checker</a></li>
  43       <li><a href="#registration">Checker Registration</a></li>
  44       <li><a href="#events_callbacks">Events, Callbacks, and Checker Class Structure</a></li>
  45       <li><a href="#extendingstates">Custom Program States</a></li>
  46       <li><a href="#bugs">Bug Reports</a></li>
  47       <li><a href="#ast">AST Visitors</a></li>
  48       <li><a href="#testing">Testing</a></li>
  49       <li><a href="#commands">Useful Commands/Debugging Hints</a>
  50       <ul>
  51         <li><a href="#attaching">Attaching the Debugger</a></li>
  52         <li><a href="#narrowing">Narrowing Down the Problem</a></li>
  53         <li><a href="#visualizing">Visualizing the Analysis</a></li>
  54         <li><a href="#debugprints">Debug Prints and Tricks</a></li>
  55       </ul></li>
  56       <li><a href="#additioninformation">Additional Sources of Information</a></li>
  57       <li><a href="#links">Useful Links</a></li>
  58     </ul>
  59
  60 <h2 id=start>Getting Started</h2>
  61   <ul>
  62     <li>To check out the source code and build the project, follow steps 1-4 of
  63     the <a href="https://clang.llvm.org/get_started.html">Clang Getting Started</a>
  64   page.</li>
  65
  66     <li>The analyzer source code is located under the Clang source tree:
  67     <br><tt>
  68     $ <b>cd llvm/tools/clang</b>
  69     </tt>
  70     <br>See: <tt>include/clang/StaticAnalyzer</tt>, <tt>lib/StaticAnalyzer</tt>,
  71      <tt>test/Analysis</tt>.</li>
  72
  73     <li>The analyzer regression tests can be executed from the Clang's build
  74     directory:
  75     <br><tt>
  76     $ <b>cd ../../../; cd build/tools/clang; TESTDIRS=Analysis make test</b>
  77     </tt></li>
  78
  79     <li>Analyze a file with the specified checker:
  80     <br><tt>
  81     $ <b>clang -cc1 -analyze -analyzer-checker=core.DivideZero test.c</b>
  82     </tt></li>
  83
  84     <li>List the available checkers:
  85     <br><tt>
  86     $ <b>clang -cc1 -analyzer-checker-help</b>
  87     </tt></li>
  88
  89     <li>See the analyzer help for different output formats, fine tuning, and
  90     debug options:
  91     <br><tt>
  92     $ <b>clang -cc1 -help | grep "analyzer"</b>
  93     </tt></li>
  94
  95   </ul>
  96
  97 <h2 id=analyzer>Static Analyzer Overview</h2>
  98   The analyzer core performs symbolic execution of the given program. All the
  99   input values are represented with symbolic values; further, the engine deduces
 100   the values of all the expressions in the program based on the input symbols
 101   and the path. The execution is path sensitive and every possible path through
 102   the program is explored. The explored execution traces are represented with
 103   <a href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedGraph.html">ExplodedGraph</a> object.
 104   Each node of the graph is
 105   <a href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedNode.html">ExplodedNode</a>,
 106   which consists of a <tt>ProgramPoint</tt> and a <tt>ProgramState</tt>.
 107   <p>
 108   <a href="https://clang.llvm.org/doxygen/classclang_1_1ProgramPoint.html">ProgramPoint</a>
 109   represents the corresponding location in the program (or the CFG).
 110   <tt>ProgramPoint</tt> is also used to record additional information on
 111   when/how the state was added. For example, <tt>PostPurgeDeadSymbolsKind</tt>
 112   kind means that the state is the result of purging dead symbols - the
 113   analyzer's equivalent of garbage collection.
 114   <p>
 115   <a href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1ProgramState.html">ProgramState</a>
 116   represents abstract state of the program. It consists of:
 117   <ul>
 118     <li><tt>Environment</tt> - a mapping from source code expressions to symbolic
 119     values
 120     <li><tt>Store</tt> - a mapping from memory locations to symbolic values
 121     <li><tt>GenericDataMap</tt> - constraints on symbolic values
 122   </ul>
 123
 124   <h3 id=interaction>Interaction with Checkers</h3>
 125
 126   <p>
 127   Checkers are not merely passive receivers of the analyzer core changes - they
 128   actively participate in the <tt>ProgramState</tt> construction through the
 129   <tt>GenericDataMap</tt> which can be used to store the checker-defined part
 130   of the state. Each time the analyzer engine explores a new statement, it
 131   notifies each checker registered to listen for that statement, giving it an
 132   opportunity to either report a bug or modify the state. (As a rule of thumb,
 133   the checker itself should be stateless.) The checkers are called one after another
 134   in the predefined order; thus, calling all the checkers adds a chain to the
 135   <tt>ExplodedGraph</tt>.
 136   </p>
 137
 138   <h3 id=values>Representing Values</h3>
 139
 140   <p>
 141   During symbolic execution, <a href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1SVal.html">SVal</a>
 142   objects are used to represent the semantic evaluation of expressions.
 143   They can represent things like concrete
 144   integers, symbolic values, or memory locations (which are memory regions).
 145   They are a discriminated union of "values", symbolic and otherwise.
 146   If a value isn't symbolic, usually that means there is no symbolic
 147   information to track. For example, if the value was an integer, such as
 148   <tt>42</tt>, it would be a <a href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1nonloc_1_1ConcreteInt.html">ConcreteInt</a>,
 149   and the checker doesn't usually need to track any state with the concrete
 150   number. In some cases, <tt>SVal</tt> is not a symbol, but it really should be
 151   a symbolic value. This happens when the analyzer cannot reason about something
 152   (yet). An example is floating point numbers. In such cases, the
 153   <tt>SVal</tt> will evaluate to <a href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1UnknownVal.html">UnknownVal</a>.
 154   This represents a case that is outside the realm of the analyzer's reasoning
 155   capabilities. <tt>SVals</tt> are value objects and their values can be viewed
 156   using the <tt>.dump()</tt> method. Often they wrap persistent objects such as
 157   symbols or regions.
 158   </p>
 159
 160   <p>
 161   <a href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymExpr.html">SymExpr</a> (symbol)
 162   is meant to represent abstract, but named, symbolic value. Symbols represent
 163   an actual (immutable) value. We might not know what its specific value is, but
 164   we can associate constraints with that value as we analyze a path. For
 165   example, we might record that the value of a symbol is greater than
 166   <tt>0</tt>, etc.
 167   </p>
 168
 169   <p>
 170   <a href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1MemRegion.html">MemRegion</a> is similar to a symbol.
 171   It is used to provide a lexicon of how to describe abstract memory. Regions can
 172   layer on top of other regions, providing a layered approach to representing memory.
 173   For example, a struct object on the stack might be represented by a <tt>VarRegion</tt>,
 174   but a <tt>FieldRegion</tt> which is a subregion of the <tt>VarRegion</tt> could
 175   be used to represent the memory associated with a specific field of that object.
 176   So how do we represent symbolic memory regions? That's what
 177   <a href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymbolicRegion.html">SymbolicRegion</a>
 178   is for. It is a <tt>MemRegion</tt> that has an associated symbol. Since the
 179   symbol is unique and has a unique name; that symbol names the region.
 180   </p>
 181
 182   <p>
 183   Let's see how the analyzer processes the expressions in the following example:
 184   </p>
 185
 186   <p>
 187   <pre class="code_example">
 188   int foo(int x) {
 189      int y = x * 2;
 190      int z = x;
 191      ...
 192   }
 193   </pre>
 194   </p>
 195
 196   <p>
 197 Let's look at how <tt>x*2</tt> gets evaluated. When <tt>x</tt> is evaluated,
 198 we first construct an <tt>SVal</tt> that represents the lvalue of <tt>x</tt>, in
 199 this case it is an <tt>SVal</tt> that references the <tt>MemRegion</tt> for <tt>x</tt>.
 200 Afterwards, when we do the lvalue-to-rvalue conversion, we get a new <tt>SVal</tt>,
 201 which references the value <b>currently bound</b> to <tt>x</tt>. That value is
 202 symbolic; it's whatever <tt>x</tt> was bound to at the start of the function.
 203 Let's call that symbol <tt>$0</tt>. Similarly, we evaluate the expression for <tt>2</tt>,
 204 and get an <tt>SVal</tt> that references the concrete number <tt>2</tt>. When
 205 we evaluate <tt>x*2</tt>, we take the two <tt>SVals</tt> of the subexpressions,
 206 and create a new <tt>SVal</tt> that represents their multiplication (which in
 207 this case is a new symbolic expression, which we might call <tt>$1</tt>). When we
 208 evaluate the assignment to <tt>y</tt>, we again compute its lvalue (a <tt>MemRegion</tt>),
 209 and then bind the <tt>SVal</tt> for the RHS (which references the symbolic value <tt>$1</tt>)
 210 to the <tt>MemRegion</tt> in the symbolic store.
 211 <br>
 212 The second line is similar. When we evaluate <tt>x</tt> again, we do the same
 213 dance, and create an <tt>SVal</tt> that references the symbol <tt>$0</tt>. Note, two <tt>SVals</tt>
 214 might reference the same underlying values.
 215   </p>
 216
 217 <p>
 218 To summarize, MemRegions are unique names for blocks of memory. Symbols are
 219 unique names for abstract symbolic values. Some MemRegions represents abstract
 220 symbolic chunks of memory, and thus are also based on symbols. SVals are just
 221 references to values, and can reference either MemRegions, Symbols, or concrete
 222 values (e.g., the number 1).
 223 </p>
 224
 225   <!--
 226   TODO: Add a picture.
 227   <br>
 228   Symbols<br>
 229   FunctionalObjects are used throughout.
 230   -->
 231
 232 <h2 id=idea>Idea for a Checker</h2>
 233   Here are several questions which you should consider when evaluating your
 234   checker idea:
 235   <ul>
 236     <li>Can the check be effectively implemented without path-sensitive
 237     analysis? See <a href="#ast">AST Visitors</a>.</li>
 238
 239     <li>How high the false positive rate is going to be? Looking at the occurrences
 240     of the issue you want to write a checker for in the existing code bases might
 241     give you some ideas. </li>
 242
 243     <li>How the current limitations of the analysis will effect the false alarm
 244     rate? Currently, the analyzer only reasons about one procedure at a time (no
 245     inter-procedural analysis). Also, it uses a simple range tracking based
 246     solver to model symbolic execution.</li>
 247
 248     <li>Consult the <a
 249     href="https://github.com/llvm/llvm-project/labels/clang%3Astatic%20analyzer">GitHub Issues</a>
 250     to get some ideas for new checkers and consider starting with improving/fixing
 251     bugs in the existing checkers.</li>
 252   </ul>
 253
 254 <p>Once an idea for a checker has been chosen, there are two key decisions that
 255 need to be made:
 256   <ul>
 257     <li> Which events the checker should be tracking. This is discussed in more
 258     detail in the section <a href="#events_callbacks">Events, Callbacks, and
 259     Checker Class Structure</a>.
 260     <li> What checker-specific data needs to be stored as part of the program
 261     state (if any). This should be minimized as much as possible. More detail about
 262     implementing custom program state is given in section <a
 263     href="#extendingstates">Custom Program States</a>.
 264   </ul>
 265
 266
 267 <h2 id=registration>Checker Registration</h2>
 268   All checker implementation files are located in
 269   <tt>clang/lib/StaticAnalyzer/Checkers</tt> folder. The steps below describe
 270   how the checker <tt>SimpleStreamChecker</tt>, which checks for misuses of
 271   stream APIs, was registered with the analyzer.
 272   Similar steps should be followed for a new checker.
 273 <ol>
 274   <li>A new checker implementation file, <tt>SimpleStreamChecker.cpp</tt>, was
 275   created in the directory <tt>lib/StaticAnalyzer/Checkers</tt>.
 276   <li>The following registration code was added to the implementation file:
 277 <pre class="code_example">
 278 void ento::registerSimpleStreamChecker(CheckerManager &amp;mgr) {
 279   mgr.registerChecker&lt;SimpleStreamChecker&gt();
 280 }
 281 </pre>
 282 <li>A package was selected for the checker and the checker was defined in the
 283 table of checkers at <tt>include/clang/StaticAnalyzer/Checkers/Checkers.td</tt>.
 284 Since all checkers should first be developed as "alpha", and the SimpleStreamChecker
 285 performs UNIX API checks, the correct package is "alpha.unix", and the following
 286 was added to the corresponding <tt>UnixAlpha</tt> section of <tt>Checkers.td</tt>:
 287 <pre class="code_example">
 288 let ParentPackage = UnixAlpha in {
 289 ...
 290 def SimpleStreamChecker : Checker<"SimpleStream">,
 291   HelpText<"Check for misuses of stream APIs">,
 292   DescFile<"SimpleStreamChecker.cpp">;
 293 ...
 294 } // end "alpha.unix"
 295 </pre>
 296
 297 <li>The source code file was made visible to CMake by adding it to
 298 <tt>lib/StaticAnalyzer/Checkers/CMakeLists.txt</tt>.
 299
 300 </ol>
 301
 302 After adding a new checker to the analyzer, one can verify that the new checker
 303 was successfully added by seeing if it appears in the list of available checkers:
 304 <br> <tt><b>$clang -cc1 -analyzer-checker-help</b></tt>
 305
 306 <h2 id=events_callbacks>Events, Callbacks, and Checker Class Structure</h2>
 307
 308 <p> All checkers inherit from the <tt><a
 309 href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1Checker.html">
 310 Checker</a></tt> template class; the template parameter(s) describe the type of
 311 events that the checker is interested in processing. The various types of events
 312 that are available are described in the file <a
 313 href="https://clang.llvm.org/doxygen/CheckerDocumentation_8cpp_source.html">
 314 CheckerDocumentation.cpp</a>
 315
 316 <p> For each event type requested, a corresponding callback function must be
 317 defined in the checker class (<a
 318 href="https://clang.llvm.org/doxygen/CheckerDocumentation_8cpp_source.html">
 319 CheckerDocumentation.cpp</a> shows the
 320 correct function name and signature for each event type).
 321
 322 <p> As an example, consider <tt>SimpleStreamChecker</tt>. This checker needs to
 323 take action at the following times:
 324
 325 <ul>
 326 <li>Before making a call to a function, check if the function is <tt>fclose</tt>.
 327 If so, check the parameter being passed.
 328 <li>After making a function call, check if the function is <tt>fopen</tt>. If
 329 so, process the return value.
 330 <li>When values go out of scope, check whether they are still-open file
 331 descriptors, and report a bug if so. In addition, remove any information about
 332 them from the program state in order to keep the state as small as possible.
 333 <li>When file pointers "escape" (are used in a way that the analyzer can no longer
 334 track them), mark them as such. This prevents false positives in the cases where
 335 the analyzer cannot be sure whether the file was closed or not.
 336 </ul>
 337
 338 <p>These events that will be used for each of these actions are, respectively, <a
 339 href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PreCall.html">PreCall</a>,
 340 <a
 341 href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PostCall.html">PostCall</a>,
 342 <a
 343 href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1DeadSymbols.html">DeadSymbols</a>,
 344 and <a
 345 href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PointerEscape.html">PointerEscape</a>.
 346 The high-level structure of the checker's class is thus:
 347
 348 <pre class="code_example">
 349 class SimpleStreamChecker : public Checker&lt;check::PreCall,
 350                                            check::PostCall,
 351                                            check::DeadSymbols,
 352                                            check::PointerEscape&gt; {
 353 public:
 354
 355   void checkPreCall(const CallEvent &amp;Call, CheckerContext &amp;C) const;
 356
 357   void checkPostCall(const CallEvent &amp;Call, CheckerContext &amp;C) const;
 358
 359   void checkDeadSymbols(SymbolReaper &amp;SR, CheckerContext &amp;C) const;
 360
 361   ProgramStateRef checkPointerEscape(ProgramStateRef State,
 362                                      const InvalidatedSymbols &amp;Escaped,
 363                                      const CallEvent *Call,
 364                                      PointerEscapeKind Kind) const;
 365 };
 366 </pre>
 367
 368 <h2 id=extendingstates>Custom Program States</h2>
 369
 370 <p> Checkers often need to keep track of information specific to the checks they
 371 perform. However, since checkers have no guarantee about the order in which the
 372 program will be explored, or even that all possible paths will be explored, this
 373 state information cannot be kept within individual checkers. Therefore, if
 374 checkers need to store custom information, they need to add new categories of
 375 data to the <tt>ProgramState</tt>. The preferred way to do so is to use one of
 376 several macros designed for this purpose. They are:
 377
 378 <ul>
 379 <li><a
 380 href="https://clang.llvm.org/doxygen/ProgramStateTrait_8h.html#ae4cddb54383cd702a045d7c61b009147">REGISTER_TRAIT_WITH_PROGRAMSTATE</a>:
 381 Used when the state information is a single value. The methods available for
 382 state types declared with this macro are <tt>get</tt>, <tt>set</tt>, and
 383 <tt>remove</tt>.
 384 <li><a
 385 href="https://clang.llvm.org/doxygen/ProgramStateTrait_8h.html#aa27656fa0ce65b0d9ba12eb3c02e8be9">REGISTER_LIST_WITH_PROGRAMSTATE</a>:
 386 Used when the state information is a list of values. The methods available for
 387 state types declared with this macro are <tt>add</tt>, <tt>get</tt>,
 388 <tt>remove</tt>, and <tt>contains</tt>.
 389 <li><a
 390 href="https://clang.llvm.org/doxygen/ProgramStateTrait_8h.html#ad90f9387b94b344eaaf499afec05f4d1">REGISTER_SET_WITH_PROGRAMSTATE</a>:
 391 Used when the state information is a set of values. The methods available for
 392 state types declared with this macro are <tt>add</tt>, <tt>get</tt>,
 393 <tt>remove</tt>, and <tt>contains</tt>.
 394 <li><a
 395 href="https://clang.llvm.org/doxygen/ProgramStateTrait_8h.html#a6d1893bb8c18543337b6c363c1319fcf">REGISTER_MAP_WITH_PROGRAMSTATE</a>:
 396 Used when the state information is a map from a key to a value. The methods
 397 available for state types declared with this macro are <tt>add</tt>,
 398 <tt>set</tt>, <tt>get</tt>, <tt>remove</tt>, and <tt>contains</tt>.
 399 </ul>
 400
 401 <p>All of these macros take as parameters the name to be used for the custom
 402 category of state information and the data type(s) to be used for storage. The
 403 data type(s) specified will become the parameter type and/or return type of the
 404 methods that manipulate the new category of state information. Each of these
 405 methods are templated with the name of the custom data type.
 406
 407 <p>For example, a common case is the need to track data associated with a
 408 symbolic expression; a map type is the most logical way to implement this. The
 409 key for this map will be a pointer to a symbolic expression
 410 (<tt>SymbolRef</tt>). If the data type to be associated with the symbolic
 411 expression is an integer, then the custom category of state information would be
 412 declared as
 413
 414 <pre class="code_example">
 415 REGISTER_MAP_WITH_PROGRAMSTATE(ExampleDataType, SymbolRef, int)
 416 </pre>
 417
 418 The data would be accessed with the function
 419
 420 <pre class="code_example">
 421 ProgramStateRef state;
 422 SymbolRef Sym;
 423 ...
 424 int currentlValue = state-&gt;get&lt;ExampleDataType&gt;(Sym);
 425 </pre>
 426
 427 and set with the function
 428
 429 <pre class="code_example">
 430 ProgramStateRef state;
 431 SymbolRef Sym;
 432 int newValue;
 433 ...
 434 ProgramStateRef newState = state-&gt;set&lt;ExampleDataType&gt;(Sym, newValue);
 435 </pre>
 436
 437 <p>In addition, the macros define a data type used for storing the data of the
 438 new data category; the name of this type is the name of the data category with
 439 "Ty" appended. For <tt>REGISTER_TRAIT_WITH_PROGRAMSTATE</tt>, this will simply
 440 be passed data type; for the other three macros, this will be a specialized
 441 version of the <a
 442 href="https://llvm.org/doxygen/classllvm_1_1ImmutableList.html">llvm::ImmutableList</a>,
 443 <a
 444 href="https://llvm.org/doxygen/classllvm_1_1ImmutableSet.html">llvm::ImmutableSet</a>,
 445 or <a
 446 href="https://llvm.org/doxygen/classllvm_1_1ImmutableMap.html">llvm::ImmutableMap</a>
 447 templated class. For the <tt>ExampleDataType</tt> example above, the type
 448 created would be equivalent to writing the declaration:
 449
 450 <pre class="code_example">
 451 using ExampleDataTypeTy = llvm::ImmutableMap&lt;SymbolRef, int&gt;;
 452 </pre>
 453
 454 <p>These macros will cover a majority of use cases; however, they still have a
 455 few limitations. They cannot be used inside namespaces (since they expand to
 456 contain top-level namespace references), and the data types that they define
 457 cannot be referenced from more than one file.
 458
 459 <p>Note that <tt>ProgramStates</tt> are immutable; instead of modifying an existing
 460 one, functions that modify the state will return a copy of the previous state
 461 with the change applied. This updated state must be then provided to the
 462 analyzer core by calling the <tt>CheckerContext::addTransition</tt> function.
 463 <h2 id=bugs>Bug Reports</h2>
 464
 465
 466 <p> When a checker detects a mistake in the analyzed code, it needs a way to
 467 report it to the analyzer core so that it can be displayed. The two classes used
 468 to construct this report are <tt><a
 469 href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1BugType.html">BugType</a></tt>
 470 and <tt><a
 471 href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1BugReport.html">
 472 BugReport</a></tt>.
 473
 474 <p>
 475 <tt>BugType</tt>, as the name would suggest, represents a type of bug. The
 476 constructor for <tt>BugType</tt> takes two parameters: The name of the bug
 477 type, and the name of the category of the bug. These are used (e.g.) in the
 478 summary page generated by the scan-build tool.
 479
 480 <P>
 481   The <tt>BugReport</tt> class represents a specific occurrence of a bug. In
 482   the most common case, three parameters are used to form a <tt>BugReport</tt>:
 483 <ol>
 484 <li>The type of bug, specified as an instance of the <tt>BugType</tt> class.
 485 <li>A short descriptive string. This is placed at the location of the bug in
 486 the detailed line-by-line output generated by scan-build.
 487 <li>The context in which the bug occurred. This includes both the location of
 488 the bug in the program and the program's state when the location is reached. These are
 489 both encapsulated in an <tt>ExplodedNode</tt>.
 490 </ol>
 491
 492 <p>In order to obtain the correct <tt>ExplodedNode</tt>, a decision must be made
 493 as to whether or not analysis can continue along the current path. This decision
 494 is based on whether the detected bug is one that would prevent the program under
 495 analysis from continuing. For example, leaking of a resource should not stop
 496 analysis, as the program can continue to run after the leak. Dereferencing a
 497 null pointer, on the other hand, should stop analysis, as there is no way for
 498 the program to meaningfully continue after such an error.
 499
 500 <p>If analysis can continue, then the most recent <tt>ExplodedNode</tt>
 501 generated by the checker can be passed to the <tt>BugReport</tt> constructor
 502 without additional modification. This <tt>ExplodedNode</tt> will be the one
 503 returned by the most recent call to <a
 504 href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#a9717efea3fbc71523984160ae7ae9d41">CheckerContext::addTransition</a>.
 505 If no transition has been performed during the current callback, the checker should call <a
 506 href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#a9717efea3fbc71523984160ae7ae9d41">CheckerContext::addTransition()</a>
 507 and use the returned node for bug reporting.
 508
 509 <p>If analysis can not continue, then the current state should be transitioned
 510 into a so-called <i>sink node</i>, a node from which no further analysis will be
 511 performed. This is done by calling the <a
 512 href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#a249245cdf2384738921f134c8d7d909a">
 513 CheckerContext::generateSink</a> function; this function is the same as the
 514 <tt>addTransition</tt> function, but marks the state as a sink node. Like
 515 <tt>addTransition</tt>, this returns an <tt>ExplodedNode</tt> with the updated
 516 state, which can then be passed to the <tt>BugReport</tt> constructor.
 517
 518 <p>
 519 After a <tt>BugReport</tt> is created, it should be passed to the analyzer core
 520 by calling <a href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#af50a9f46f6ea787a2a8e4ad7f86576e7">CheckerContext::emitReport</a>.
 521
 522 <h2 id=ast>AST Visitors</h2>
 523   Some checks might not require path-sensitivity to be effective. Simple AST walk
 524   might be sufficient. If that is the case, consider implementing a Clang
 525   compiler warning. On the other hand, a check might not be acceptable as a compiler
 526   warning; for example, because of a relatively high false positive rate. In this
 527   situation, AST callbacks <tt><b>checkASTDecl</b></tt> and
 528   <tt><b>checkASTCodeBody</b></tt> are your best friends.
 529
 530 <h2 id=testing>Testing</h2>
 531   Every patch should be well tested with Clang regression tests. The checker tests
 532   live in <tt>clang/test/Analysis</tt> folder. To run all of the analyzer tests,
 533   execute the following from the <tt>clang</tt> build directory:
 534     <pre class="code">
 535     $ <b>bin/llvm-lit -sv ../llvm/tools/clang/test/Analysis</b>
 536     </pre>
 537
 538 <h2 id=commands>Useful Commands/Debugging Hints</h2>
 539
 540 <h3 id=attaching>Attaching the Debugger</h3>
 541
 542 <p>When your command contains the <tt><b>-cc1</b></tt> flag, you can attach the
 543 debugger to it directly:</p>
 544
 545 <pre class="code">
 546     $ <b>gdb --args clang -cc1 -analyze -analyzer-checker=core test.c</b>
 547     $ <b>lldb -- clang -cc1 -analyze -analyzer-checker=core test.c</b>
 548 </pre>
 549
 550 <p>
 551 Otherwise, if your command line contains <tt><b>--analyze</b></tt>,
 552 the actual clang instance would be run in a separate process. In
 553 order to debug it, use the <tt><b>-###</b></tt> flag for obtaining
 554 the command line of the child process:
 555 </p>
 556
 557 <pre class="code">
 558     $ <b>clang --analyze test.c -\#\#\#</b>
 559 </pre>
 560
 561 <p>
 562 Below we describe a few useful command line arguments, all of which assume that
 563 you are running <tt><b>clang -cc1</b></tt>.
 564 </p>
 565
 566 <h3 id=narrowing>Narrowing Down the Problem</h3>
 567
 568 <p>While investigating a checker-related issue, instruct the analyzer to only
 569 execute a single checker:
 570 </p>
 571 <pre class="code">
 572     $ <b>clang -cc1 -analyze -analyzer-checker=osx.KeychainAPI test.c</b>
 573 </pre>
 574
 575 <p>If you are experiencing a crash, to see which function is failing while
 576 processing a large file use the  <tt><b>-analyzer-display-progress</b></tt>
 577 option.</p>
 578
 579 <p>To selectively analyze only the given function, use the
 580 <tt><b>-analyze-function</b></tt> option:</p>
 581 <pre class="code">
 582     $ <b>clang -cc1 -analyze -analyzer-checker=core test.c -analyzer-display-progress</b>
 583     ANALYZE (Syntax): test.c foo
 584     ANALYZE (Syntax): test.c bar
 585     ANALYZE (Path,  Inline_Regular): test.c bar
 586     ANALYZE (Path,  Inline_Regular): test.c foo
 587     $ <b>clang -cc1 -analyze -analyzer-checker=core test.c -analyzer-display-progress -analyze-function=foo</b>
 588     ANALYZE (Syntax): test.c foo
 589     ANALYZE (Path,  Inline_Regular): test.c foo
 590 </pre>
 591
 592 <b>Note: </b> a fully qualified function name has to be used when selecting
 593 C++ functions and methods, Objective-C methods and blocks, e.g.:
 594
 595 <pre class="code">
 596     $ <b>clang -cc1 -analyze -analyzer-checker=core test.cc -analyze-function='foo(int)'</b>
 597 </pre>
 598
 599 The fully qualified name can be found from the
 600 <tt><b>-analyzer-display-progress</b></tt> output.
 601
 602 <p>The bug reporter mechanism removes path diagnostics inside intermediate
 603 function calls that have returned by the time the bug was found and contain
 604 no interesting pieces. Usually it is up to the checkers to produce more
 605 interesting pieces by adding custom <tt>BugReporterVisitor</tt> objects.
 606 However, you can disable path pruning while debugging with the
 607 <tt><b>-analyzer-config prune-paths=false</b></tt> option.
 608
 609 <h3 id=visualizing>Visualizing the Analysis</h3>
 610
 611 <p>To dump the AST, which often helps understanding how the program should
 612 behave:</p>
 613 <pre class="code">
 614     $ <b>clang -cc1 -ast-dump test.c</b>
 615 </pre>
 616
 617 <p>To view/dump CFG use <tt>debug.ViewCFG</tt> or <tt>debug.DumpCFG</tt>
 618 checkers:</p>
 619 <pre class="code">
 620     $ <b>clang -cc1 -analyze -analyzer-checker=debug.ViewCFG test.c</b>
 621 </pre>
 622
 623 <p><tt>ExplodedGraph</tt> (the state graph explored by the analyzer) can be
 624 visualized with another debug checker:</p>
 625 <pre class="code">
 626     $ <b>clang -cc1 -analyze -analyzer-checker=debug.ViewExplodedGraph test.c</b>
 627 </pre>
 628 <p>Or, equivalently, with <tt><b>-analyzer-viz-egraph-graphviz</b></tt>
 629 option, which does the same thing - dumps the exploded graph in graphviz
 630 <tt><b>.dot</b></tt> format.</p>
 631
 632 <p>You can convert <tt><b>.dot</b></tt> files into other formats - in
 633 particular, converting to <tt><b>.svg</b></tt> and viewing in your web
 634 browser might be more comfortable than using a <tt><b>.dot</b></tt> viewer:</p>
 635 <pre class="code">
 636     $ <b>dot -Tsvg ExprEngine-501e2e.dot -o ExprEngine-501e2e.svg</b>
 637 </pre>
 638
 639 <p>The <tt><b>-trim-egraph</b></tt> option removes all paths except those
 640 leading to bug reports from the exploded graph dump. This is useful
 641 because exploded graphs are often huge and hard to navigate.</p>
 642
 643 <p>Viewing <tt>ExplodedGraph</tt> is your most powerful tool for understanding
 644 the analyzer's false positives, because it gives comprehensive information
 645 on every decision made by the analyzer across all analysis paths.</p>
 646
 647 <p>There are more debug checkers available. To see all available debug checkers:
 648 </p>
 649 <pre class="code">
 650     $ <b>clang -cc1 -analyzer-checker-help | grep "debug"</b>
 651 </pre>
 652
 653 <h3 id=debugprints>Debug Prints and Tricks</h3>
 654
 655 <p>To view "half-baked" <tt>ExplodedGraph</tt> while debugging, jump to a frame
 656 that has <tt>clang::ento::ExprEngine</tt> object and execute:</p>
 657 <pre class="code">
 658     (gdb) <b>p ViewGraph(0)</b>
 659 </pre>
 660
 661 <p>To see the <tt>ProgramState</tt> while debugging use the following command.
 662 <pre class="code">
 663     (gdb) <b>p State->dump()</b>
 664 </pre>
 665
 666 <p>To see <tt>clang::Expr</tt> while debugging use the following command. If you
 667 pass in a <tt>SourceManager</tt> object, it will also dump the corresponding line in the
 668 source code.</p>
 669 <pre class="code">
 670     (gdb) <b>p E->dump()</b>
 671 </pre>
 672
 673 <p>To dump AST of a method that the current <tt>ExplodedNode</tt> belongs
 674 to:</p>
 675 <pre class="code">
 676     (gdb) <b>p C.getPredecessor()->getCodeDecl().getBody()->dump()</b>
 677 </pre>
 678
 679 <h2 id=links>Making Your Checker Better</h2>
 680 <ul>
 681 <li>User facing documentation is important for adoption! Make sure the <a href="/available_checks.html">checker list </a>is updated
 682     at the homepage of the analyzer. Also ensure the description is clear to
 683     non-analyzer-developers in <tt>Checkers.td</tt>.</li>
 684 <li>Warning and note messages should be clear and easy to understand, even if a bit long.</li>
 685 <ul>
 686   <li>Messages should start with a capital letter (unlike Clang warnings!) and should not
 687       end with <tt>.</tt>.</li>
 688   <li>Articles are usually omitted, eg. <tt>Dereference of a null pointer</tt> ->
 689       <tt>Dereference of null pointer</tt>.</li>
 690   <li>Introduce <tt>BugReporterVisitor</tt>s to emit additional notes that explain the warning
 691       to the user better. There are some existing visitors that might be useful for your check,
 692       e.g. <tt>trackNullOrUndefValue</tt>. For example, SimpleStreamChecker should highlight
 693       the event of opening the file when reporting a file descriptor leak.</li>
 694 </ul>
 695 <li>If the check tracks anything in the program state, it needs to implement the
 696     <tt>checkDeadSymbols</tt>callback to clean the state up.</li>
 697 <li>The check should conservatively assume that the program is correct when a tracked symbol
 698     is passed to a function that is unknown to the analyzer.
 699     <tt>checkPointerEscape</tt> callback could help you handle that case.</li>
 700 <li>Use safe and convenient APIs!</li>
 701 <ul>
 702   <li>Always use <tt>CheckerContext::generateErrorNode</tt> and
 703     <tt>CheckerContext::generateNonFatalErrorNode</tt> for emitting bug reports.
 704     Most importantly, never emit report against <tt>CheckerContext::getPredecessor</tt>.</li>
 705   <li>Prefer <tt>checkPreCall</tt> and <tt>checkPostCall</tt> to
 706     <tt>checkPreStmt&lt;CallExpr&gt;</tt> and <tt>checkPostStmt&lt;CallExpr&gt;</tt>.</li>
 707   <li>Use <tt>CallDescription</tt> to detect hardcoded API calls in the program.</li>
 708   <li>Simplify <tt>C.getState()->getSVal(E, C.getLocationContext())</tt> to <tt>C.getSVal(E)</tt>.</li>
 709 </ul>
 710 <li>Common sources of crashes:</li>
 711 <ul>
 712   <li><tt>CallEvent::getOriginExpr</tt> is nullable - for example, it returns null for an
 713     automatic destructor of a variable. The same applies to some values generated while the
 714     call was modeled, eg. <tt>SymbolConjured::getStmt</tt> is nullable.</li>
 715   <li><tt>CallEvent::getDecl</tt> is nullable - for example, it returns null for a
 716       call of symbolic function pointer.</li>
 717   <li><tt>addTransition</tt>, <tt>generateSink</tt>, <tt>generateNonFatalErrorNode</tt>,
 718     <tt>generateErrorNode</tt> are nullable because you can transition to a node that you have already visited.</li>
 719   <li>Methods of <tt>CallExpr</tt>/<tt>FunctionDecl</tt>/<tt>CallEvent</tt> that
 720     return arguments crash when the argument is out-of-bounds. If you checked the function name,
 721     it doesn't mean that the function has the expected number of arguments!
 722     Which is why you should use <tt>CallDescription</tt>.</li>
 723   <li>Nullability of different entities within different kinds of symbols and regions is usually
 724       documented via assertions in their constructors.</li>
 725   <li><tt>NamedDecl::getName</tt> will fail if the name of the declaration is not a single token,
 726     e.g. for destructors. You could use <tt>NamedDecl::getNameAsString</tt> for those cases.
 727     Note that this method is much slower and should be used sparringly, e.g. only when generating reports
 728     but not during analysis.</li>
 729   <li>Is <tt>-analyzer-checker=core</tt> included in all test <tt>RUN:</tt> lines? It was never supported
 730     to run the analyzer with the core checks disabled. It might cause unexpected behavior and
 731     crashes. You should do all your testing with the core checks enabled.</li>
 732 </ul>
 733 </ul>
 734 <li>Patterns that you should most likely avoid even if they're not technically wrong:</li>
 735 <ul>
 736   <li><tt>BugReporterVisitor</tt> should most likely not match the AST of the current program point
 737       to decide when to emit a note. It is much easier to determine that by observing changes in
 738       the program state.</li>
 739   <li>In <tt>State->getSVal(Region)</tt>, if <tt>Region</tt> is not known to be a <tt>TypedValueRegion</tt>
 740       and the optional type argument is not specified, the checker may accidentally try to dereference a
 741       void pointer.</li>
 742   <li>Checker logic should not depend on whether a certain value is a <tt>Loc</tt> or <tt>NonLoc</tt>.
 743     It should be immediately obvious whether the <tt>SVal</tt> is a <tt>Loc</tt> or a
 744     <tt>NonLoc</tt> depending on the AST that is being checked. Checking whether a value
 745     is <tt>Loc</tt> or <tt>Unknown</tt>/<tt>Undefined</tt> or whether the value is
 746     <tt>NonLoc</tt> or <tt>Unknown</tt>/<tt>Undefined</tt> is totally fine.</li>
 747   <li>New symbols should not be constructed in the checker via direct calls to <tt>SymbolManager</tt>,
 748     unless they are of <tt>SymbolMetadata</tt> class tagged by the checker,
 749     or they represent newly created values such as the return value in <tt>evalCall</tt>.
 750     For modeling arithmetic/bitwise/comparison operations, <tt>SValBuilder</tt> should be used.</li>
 751   <li>Custom <tt>ProgramPointTag</tt>s should not be created within the checker. There is usually
 752     no good reason for a checker to chain multiple nodes together, because checkers aren't worklists.</li>
 753 </ul>
 754 <li>Checkers are encouraged to actively participate in the analysis by sharing
 755   their knowledge about the program state with the rest of the analyzer,
 756   but they should not be disrupting the analysis unnecessarily:</li>
 757 <ul>
 758   <li>If a checker splits program state, this must be based on knowledge that
 759     the newly appearing branches are definitely possible and worth exploring
 760     from the user's perspective. Otherwise the state split should be delayed
 761     until there's an indication that one of the paths is taken, or one of the
 762     paths needs to be dropped entirely. For example, it is fine to eagerly split
 763     paths while modeling <tt>isalpha(x)</tt> as long as <tt>x</tt> is constrained accordingly on
 764     each path. At the same time, it is not a good idea to split paths over the
 765     return value of <tt>printf()</tt> while modeling the call because nobody ever checks
 766     for errors in <tt>printf</tt>; at best, it'd just double the remaining analysis time.
 767   </li>
 768   <li>Caution is advised when using <tt>CheckerContext::generateNonFatalErrorNode</tt>
 769     because it generates an independent transition, much like <tt>addTransition</tt>.
 770     It is easy to accidentally split paths while using it. Ideally, try to
 771     structure the code so that it was obvious that every <tt>addTransition</tt> or
 772     <tt>generateNonFatalErrorNode</tt> (or sequence of such if the split is intended) is
 773     immediately followed by return from the checker callback.</li>
 774   <li>Multiple implementations of <tt>evalCall</tt> in different checkers should not conflict.</li>
 775   <li>When implementing <tt>evalAssume</tt>, the checker should always return a non-null state
 776       for either the true assumption or the false assumption (or both).</li>
 777   <li>Checkers shall not mutate values of expressions, i.e. use the <tt>ProgramState::BindExpr</tt> API,
 778     unless they are fully responsible for computing the value.
 779     Under no circumstances should they change non-<tt>Unknown</tt> values of expressions.
 780     Currently the only valid use case for this API in checkers is to model the return value in the <tt>evalCall</tt> callback.
 781     If expression values are incorrect, <tt>ExprEngine</tt> needs to be fixed instead.</li>
 782 </ul>
 783
 784 <h2 id=additioninformation>Additional Sources of Information</h2>
 785
 786 Here are some additional resources that are useful when working on the Clang
 787 Static Analyzer:
 788
 789 <ul>
 790 <li><a href="https://lcs.ios.ac.cn/~xzx/memmodel.pdf">Xu, Zhongxing &
 791 Kremenek, Ted & Zhang, Jian. (2010). A Memory Model for Static Analysis of C
 792 Programs.</a></li>
 793 <li><a href="https://github.com/llvm/llvm-project/blob/main/clang/lib/StaticAnalyzer/README.txt">
 794 The Clang Static Analyzer README</a></li>
 795 <li><a href="https://github.com/llvm/llvm-project/blob/main/clang/docs/analyzer/developer-docs/RegionStore.rst">
 796 Documentation for how the Store works</a></li>
 797 <li><a href="https://github.com/llvm/llvm-project/blob/main/clang/docs/analyzer/developer-docs/IPA.rst">
 798 Documentation about inlining</a></li>
 799 <li> The "Building a Checker in 24 hours" presentation given at the <a
 800 href="https://llvm.org/devmtg/2012-11">November 2012 LLVM Developer's
 801 meeting</a>. Describes the construction of SimpleStreamChecker. <a
 802 href="https://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf">Slides</a>
 803 and <a
 804 href="https://youtu.be/kdxlsP5QVPw">video</a>
 805 are available.</li>
 806 <li>
 807 <a href="https://github.com/haoNoQ/clang-analyzer-guide/releases/download/v0.1/clang-analyzer-guide-v0.1.pdf">
 808 Artem Degrachev: Clang Static Analyzer: A Checker Developer's Guide
 809 </a> (reading the previous items first might be a good idea)</li>
 810 <li>The list of <a href="implicit_checks.html">Implicit Checkers</a></li>
 811 <li> <a href="https://clang.llvm.org/doxygen">Clang doxygen</a>. Contains
 812 up-to-date documentation about the APIs available in Clang. Relevant entries
 813 have been linked throughout this page. Also of use is the
 814 <a href="https://llvm.org/doxygen">LLVM doxygen</a>, when dealing with classes
 815 from LLVM.</li>
 816 <li>
 817   The <a href="https://discourse.llvm.org/c/clang/"> Clang Frontend Discourse site</a>.
 818   This is the primary forum discussing ideas and posting questions about Clang development.
 819   For posting Clang Static Analyzer specific questions, please visit the
 820   <a href="https://discourse.llvm.org/c/clang/static-analyzer/"> Static Analyzer subcategory</a>
 821   of the same site. In the past, Static Analyzer discussions took place at the
 822   <a href="https://lists.llvm.org/pipermail/cfe-dev/"> cfe-dev</a> mailing list, which is now
 823   archived and superseeded by the mentioned Discourse site.
 824 </li>
 825 </ul>
 826
 827 </div>
 828 </div>
 829 </body>
 830 </html>