1 <!DOCTYPE HTML PUBLIC
"-//W3C//DTD HTML 4.01//EN"
2 "http://www.w3.org/TR/html4/strict.dtd">
5 <title>Open Projects
</title>
6 <link type=
"text/css" rel=
"stylesheet" href=
"menu.css">
7 <link type=
"text/css" rel=
"stylesheet" href=
"content.css">
8 <script type=
"text/javascript" src=
"scripts/menu.js"></script>
13 <!--#include virtual="menu.html.incl"-->
16 <h1>Open Projects
</h1>
18 <p>This page lists several projects that would boost analyzer's usability and
19 power. Most of the projects listed here are infrastructure-related so this list
20 is an addition to the
<a href=
"potential_checkers.html">potential checkers
21 list
</a>. If you are interested in tackling one of these, please send an email
22 to the
<a href=https://lists.llvm.org/mailman/listinfo/cfe-dev
>cfe-dev
23 mailing list
</a> to notify other members of the community.
</p>
26 <li>Release checkers from
"alpha"
27 <p>New checkers which were contributed to the analyzer,
28 but have not passed a rigorous evaluation process,
29 are committed as
"alpha checkers" (from
"alpha version"),
30 and are not enabled by default.
</p>
32 <p>Ideally, only the checkers which are actively being worked on should be in
34 but over the years the development of many of those has stalled.
35 Such checkers should either be improved
36 up to a point where they can be enabled by default,
37 or removed from the analyzer entirely.
40 <li><code>alpha.security.ArrayBound
</code> and
41 <code>alpha.security.ArrayBoundV2
</code>
42 <p>Array bounds checking is a desired feature,
43 but having an acceptable rate of false positives might not be possible
45 <a href=
"https://en.wikipedia.org/wiki/Widening_(computer_science)">loop widening
</a> support.
46 Additionally, it might be more promising to perform index checking based on
47 <a href=
"https://en.wikipedia.org/wiki/Taint_checking">tainted
</a> index values.
48 <p><i>(Difficulty: Medium)
</i></p></p>
53 <li>Improve C++ support
55 <li>Handle construction as part of aggregate initialization.
56 <p><a href=
"https://en.cppreference.com/w/cpp/language/aggregate_initialization">Aggregates
</a>
57 are objects that can be brace-initialized without calling a
58 constructor (that is,
<code><a href=
"https://clang.llvm.org/doxygen/classclang_1_1CXXConstructExpr.html">
59 CXXConstructExpr
</a></code> does not occur in the AST),
60 but potentially calling
61 constructors for their fields and base classes
63 constructors of sub-objects need to know what object they are constructing.
64 Moreover, if the aggregate contains
65 references, lifetime extension needs to be properly modeled.
67 One can start untangling this problem by trying to replace the
68 current ad-hoc
<code><a href=
"https://clang.llvm.org/doxygen/classclang_1_1ParentMap.html">
69 ParentMap
</a></code> lookup in
<a href=
"https://clang.llvm.org/doxygen/ExprEngineCXX_8cpp_source.html#l00430">
70 <code>CXXConstructionKind::NonVirtualBase
</code></a> branch of
71 <code>ExprEngine::VisitCXXConstructExpr()
</code>
72 with proper support for the feature.
73 <p><i>(Difficulty: Medium)
</i></p></p>
76 <li>Handle array constructors.
77 <p>When an array of objects is allocated (say, using the
78 <code>operator new[]
</code> or defining a stack array),
79 constructors for all elements of the array are called.
80 We should model (potentially some of) such evaluations,
81 and the same applies for destructors called from
82 <code>operator delete[]
</code>.
83 See tests cases in
<a href=
"https://github.com/llvm/llvm-project/tree/main/clang/test/Analysis/handle_constructors_with_new_array.cpp">handle_constructors_with_new_array.cpp
</a>.
86 Constructing an array requires invoking multiple (potentially unknown)
87 amount of constructors with the same construct-expression.
88 Apart from the technical difficulties of juggling program points around
89 correctly to avoid accidentally merging paths together, we'll have to
90 be a judge on when to exit the loop and how to widen it.
91 Given that the constructor is going to be a default constructor,
92 a nice
95% solution might be to execute exactly one constructor and
93 then default-bind the resulting LazyCompoundVal to the whole array;
94 it'll work whenever the default constructor doesn't touch global state
95 but only initializes the object to various default values.
96 But if, say, we're making an array of strings,
97 depending on the implementation you might have to allocate a new buffer
98 for each string, and in this case default-binding won't cut it.
99 We might want to come up with an auxiliary analysis in order to perform
100 widening of these simple loops more precisely.
104 <li>Handle constructors that can be elided due to Named Return Value Optimization (NRVO)
105 <p>Local variables which are returned by values on all return statements
106 may be stored directly at the address for the return value,
107 eliding the copy or move constructor call.
108 Such variables can be identified using the AST call
<code>VarDecl::isNRVOVariable
</code>.
112 <li>Handle constructors of lambda captures
113 <p>Variables which are captured by value into a lambda require a call to
115 This call is not currently modeled.
119 <li>Handle constructors for default arguments
120 <p>Default arguments in C++ are recomputed at every call,
121 and are therefore local, and not static, variables.
122 See tests cases in
<a href=
"https://github.com/llvm/llvm-project/tree/main/clang/test/Analysis/handle_constructors_for_default_arguments.cpp">handle_constructors_for_default_arguments.cpp
</a>.
125 Default arguments are annoying because the initializer expression is
126 evaluated at the call site but doesn't syntactically belong to the
127 caller's AST; instead it belongs to the ParmVarDecl for the default
128 parameter. This can lead to situations when the same expression has to
129 carry different values simultaneously -
130 when multiple instances of the same function are evaluated as part of the
131 same full-expression without specifying the default arguments.
132 Even simply calling the function twice (not necessarily within the
133 same full-expression) may lead to program points agglutinating because
134 it's the same expression. There are some nasty test cases already
135 in temporaries.cpp (struct DefaultParam and so on). I recommend adding a
136 new LocationContext kind specifically to deal with this problem. It'll
137 also help you figure out the construction context when you evaluate the
138 construct-expression (though you might still need to do some additional
139 CFG work to get construction contexts right).
143 <li>Enhance the modeling of the standard library.
144 <p>The analyzer needs a better understanding of STL in order to be more
145 useful on C++ codebases.
146 While full library modeling is not an easy task,
147 large gains can be achieved by supporting only a few cases:
148 e.g. calling
<code>.length()
</code> on an empty
149 <code>std::string
</code> always yields zero.
150 <p><i>(Difficulty: Medium)
</i></p><p>
153 <li>Enhance CFG to model exception-handling.
154 <p>Currently exceptions are treated as
"black holes", and exception-handling
155 control structures are poorly modeled in order to be conservative.
156 This could be improved for both C++ and Objective-C exceptions.
157 <p><i>(Difficulty: Hard)
</i></p></p>
162 <li>Core Analyzer Infrastructure
165 <p>Currently in the analyzer the value of a union is always regarded as
168 previously
<a href=
"https://lists.llvm.org/pipermail/cfe-dev/2017-March/052864.html">discussed
</a>
169 on the mailing list, but no solution was implemented.
170 <p><i> (Difficulty: Medium)
</i></p></p>
173 <li>Floating-point support.
174 <p>Currently, the analyzer treats all floating-point values as unknown.
175 This project would involve adding a new
<code>SVal
</code> kind
176 for constant floats, generalizing the constraint manager to handle floats,
177 and auditing existing code to make sure it doesn't
178 make incorrect assumptions (most notably, that
<code>X == X
</code>
179 is always true, since it does not hold for
<code>NaN
</code>).
180 <p><i> (Difficulty: Medium)
</i></p></p>
183 <li>Improved loop execution modeling.
184 <p>The analyzer simply unrolls each loop
<tt>N
</tt> times before
185 dropping the path, for a fixed constant
<tt>N
</tt>.
186 However, that results in lost coverage in cases where the loop always
187 executes more than
<tt>N
</tt> times.
188 A Google Summer Of Code
189 <a href=
"https://summerofcode.withgoogle.com/archive/2017/projects/6071606019358720/">project
</a>
190 was completed to make the loop bound parameterizable,
191 but the
<a href=
"https://en.wikipedia.org/wiki/Widening_(computer_science)">widening
</a>
192 problem still remains open.
194 <p><i> (Difficulty: Hard)
</i></p></p>
197 <li>Basic function summarization support
198 <p>The analyzer performs inter-procedural analysis using
199 either inlining or
"conservative evaluation" (invalidating all data
200 passed to the function).
201 Often, a very simple summary
202 (e.g.
"this function is <a href="https://en.wikipedia.org/wiki/Pure_function
">pure</a>") would be
203 enough to be a large improvement over conservative evaluation.
204 Such summaries could be obtained either syntactically,
205 or using a dataflow framework.
206 <p><i>(Difficulty: Hard)
</i></p><p>
209 <li>Implement a dataflow flamework.
211 implements a
<a href=
"https://en.wikipedia.org/wiki/Symbolic_execution">symbolic execution
</a>
212 engine, which performs checks
213 (use-after-free, uninitialized value read, etc.)
214 over a
<em>single
</em> program path.
215 However, many useful properties
216 (dead code, check-after-use, etc.) require
217 reasoning over
<em>all
</em> possible in a program.
218 Such reasoning requires a
219 <a href=
"https://en.wikipedia.org/wiki/Data-flow_analysis">dataflow analysis
</a> framework.
220 Clang already implements
221 a few dataflow analyses (most notably, liveness),
222 but they implemented in an ad-hoc fashion.
223 A proper framework would enable us writing many more useful checkers.
224 <p><i> (Difficulty: Hard)
</i></p></p>
227 <li>Track type information through casts more precisely.
228 <p>The
<code>DynamicTypePropagation
</code>
229 checker is in charge of inferring a region's
230 dynamic type based on what operations the code is performing.
231 Casts are a rich source of type information that the analyzer currently ignores.
232 <p><i>(Difficulty: Medium)
</i></p></p>
238 <li>Fixing miscellaneous bugs
239 <p>Apart from the open projects listed above,
240 contributors are welcome to fix any of the outstanding
241 <a href=
"https://bugs.llvm.org/buglist.cgi?component=Static%20Analyzer&list_id=147756&product=clang&resolution=---">bugs
</a>
243 <p><i>(Difficulty: Anything)
</i></p></p>