1 # Data flow analysis: an informal introduction
5 This document introduces data flow analysis in an informal way. The goal is to
6 give the reader an intuitive understanding of how it works, and show how it
7 applies to a range of refactoring and bug finding problems.
9 Data flow analysis is a well-established technique; it is described in many
10 papers, books, and videos. If you would like a more formal, or a more thorough
11 explanation of the concepts mentioned in this document, please refer to the
14 * [The Lattice article in Wikipedia](https://en.wikipedia.org/wiki/Lattice_\(order\)).
15 * Videos on the PacketPrep YouTube channel that introduce lattices and the
16 necessary background information:
17 [#20](https://www.youtube.com/watch?v=73j_FXBXGm8),
18 [#21](https://www.youtube.com/watch?v=b5sDjo9tfE8),
19 [#22](https://www.youtube.com/watch?v=saOG7Uooeho),
20 [#23](https://www.youtube.com/watch?v=3EAYX-wZH0g),
21 [#24](https://www.youtube.com/watch?v=KRkHwQtW6Cc),
22 [#25](https://www.youtube.com/watch?v=7Gwzsc4rAgw).
23 * [Introduction to Dataflow Analysis](https://www.youtube.com/watch?v=OROXJ9-wUQE)
24 * [Introduction to abstract interpretation](http://www.cs.tau.ac.il/~msagiv/courses/asv/absint-1.pdf).
25 * [Introduction to symbolic execution](https://www.cs.umd.edu/~mwh/se-tutorial/symbolic-exec.pdf).
26 * [Static Program Analysis by Anders Møller and Michael I. Schwartzbach](https://cs.au.dk/~amoeller/spa/).
27 * [EXE: automatically generating inputs of death](https://css.csail.mit.edu/6.858/2020/readings/exe.pdf)
28 (a paper that successfully applies symbolic execution to real-world
33 ### The purpose of data flow analysis
35 Data flow analysis is a static analysis technique that proves facts about a
36 program or its fragment. It can make conclusions about all paths through the
37 program, while taking control flow into account and scaling to large programs.
38 The basic idea is propagating facts about the program through the edges of the
39 control flow graph (CFG) until a fixpoint is reached.
41 ### Sample problem and an ad-hoc solution
43 We would like to explain data flow analysis while discussing an example. Let's
44 imagine that we want to track possible values of an integer variable in our
45 program. Here is how a human could annotate the code:
63 We use sets of integers to represent possible values of `x`. Local variables
64 have unambiguous values between statements, so we annotate program points
65 between statements with sets of possible values.
67 Here is how we arrived at these annotations. Assigning a constant to `x` allows
68 us to make a conclusion that `x` can only have one value. When control flow from
69 the "then" and "else" branches joins, `x` can have either value.
71 Abstract algebra provides a nice formalism that models this kind of structure,
72 namely, a lattice. A join-semilattice is a partially ordered set, in which every
73 two elements have a least upper bound (called a *join*).
76 join(a, b) ⩾ a and join(a, b) ⩾ b and join(x, x) = x
79 For this problem we will use the lattice of subsets of integers, with set
80 inclusion relation as ordering and set union as a join.
82 Lattices are often represented visually as Hasse diagrams. Here is a Hasse
83 diagram for our lattice that tracks subsets of integers:
85 ![Hasse diagram for a lattice of integer sets](DataFlowAnalysisIntroImages/IntegerSetsInfiniteLattice.svg)
87 Computing the join in the lattice corresponds to finding the lowest common
88 ancestor (LCA) between two nodes in its Hasse diagram. There is a vast amount of
89 literature on efficiently implementing LCA queries for a DAG, however Efficient
90 Implementation of Lattice Operations (1989)
91 ([CiteSeerX](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.106.4911),
92 [doi](https://doi.org/10.1145%2F59287.59293)) describes a scheme that
93 particularly well-suited for programmatic implementation.
95 ### Too much information and "top" values
97 Let's try to find the possible sets of values of `x` in a function that modifies
101 void ExampleOfInfiniteSets() {
102 int x = 0; // x is {0}
103 while (condition()) {
104 x += 1; // x is {0; 1; 2; …}
106 print(x); // x is {0; 1; 2; …}
110 We have an issue: `x` can have any value greater than zero; that's an infinite
111 set of values, if the program operated on mathematical integers. In C++ `int` is
112 limited by `INT_MAX` so technically we have a set `{0; 1; …; INT_MAX}` which is
115 To make our analysis practical to compute, we have to limit the amount of
116 information that we track. In this case, we can, for example, arbitrarily limit
117 the size of sets to 3 elements. If at a certain program point `x` has more than
118 3 possible values, we stop tracking specific values at that program point.
119 Instead, we denote possible values of `x` with the symbol `⊤` (pronounced "top"
120 according to a convention in abstract algebra).
123 void ExampleOfTopWithALoop() {
124 int x = 0; // x is {0}
125 while (condition()) {
132 The statement "at this program point, `x`'s possible values are `⊤`" is
133 understood as "at this program point `x` can have any value because we have too
134 much information, or the information is conflicting".
136 Note that we can get more than 3 possible values even without a loop:
139 void ExampleOfTopWithoutLoops(int n) {
140 int x = 0; // x is {0}
142 case 0: x = 1; break; // x is {1}
143 case 1: x = 9; break; // x is {9}
144 case 2: x = 7; break; // x is {7}
145 default: x = 3; break; // x is {3}
151 ### Uninitialized variables and "bottom" values
153 When `x` is declared but not initialized, it has no possible values. We
154 represent this fact symbolically as `⊥` (pronounced "bottom").
157 void ExampleOfBottom() {
164 Note that using values read from uninitialized variables is undefined behaviour
165 in C++. Generally, compilers and static analysis tools can assume undefined
166 behavior does not happen. We must model uninitialized variables only when we are
167 implementing a checker that specifically is trying to find uninitialized reads.
168 In this example we show how to model uninitialized variables only to demonstrate
169 the concept of "bottom", and how it applies to possible value analysis. We
170 describe an analysis that finds uninitialized reads in a section below.
172 ### A practical lattice that tracks sets of concrete values
174 Taking into account all corner cases covered above, we can put together a
175 lattice that we can use in practice to track possible values of integer
176 variables. This lattice represents sets of integers with 1, 2, or 3 elements, as
177 well as top and bottom. Here is a Hasse diagram for it:
179 ![Hasse diagram for a lattice of integer sets](DataFlowAnalysisIntroImages/IntegerSetsFiniteLattice.svg)
183 Let's consider a slightly more complex example, and think about how we can
184 compute the sets of possible values algorithmically.
187 void Example(int n) {
195 print(x); // x is {44; 5}
203 As humans, we understand the control flow from the program text. We used our
204 understanding of control flow to find program points where two flows join.
205 Formally, control flow is represented by a CFG (control flow graph):
207 ![CFG for the code above](DataFlowAnalysisIntroImages/CFGExample.svg)
209 We can compute sets of possible values by propagating them through the CFG of
212 * When `x` is declared but not initialized, its possible values are `{}`. The
213 empty set plays the role of `⊥` in this lattice.
215 * When `x` is assigned a concrete value, its possible set of values contains
216 just that specific value.
218 * When `x` is assigned some unknown value, it can have any value. We represent
221 * When two control flow paths join, we compute the set union of incoming
222 values (limiting the number of elements to 3, representig larger sets as
225 The sets of possible values are influenced by:
227 * Statements, for example, assignments.
229 * Joins in control flow, for example, ones that appear at the end of "if"
232 **Effects of statements** are modeled by what is formally known as a transfer
233 function. A transfer function takes two arguments: the statement, and the state
234 of `x` at the previous program point. It produces the state of `x` at the next
235 program point. For example, the transfer function for assignment ignores the
236 state at the previous program point:
239 // GIVEN: x is {42; 44}
241 // CONCLUSION: x is {0}
244 The transfer function for `+` performs arithmetic on every set member:
247 // GIVEN: x is {42, 44}
249 // CONCLUSION: x is {142, 144}
252 **Effects of control flow** are modeled by joining the knowledge from all
253 possible previous program points.
263 // CONCLUSION: x is {42; 44}
272 // CONCLUSION: {42; 44}
275 The predicate that we marked "given" is usually called a precondition, and the
276 conclusion is called a postcondition.
278 In terms of the CFG, we join the information from all predecessor basic blocks.
280 ![Modeling the effects of a CFG basic block](DataFlowAnalysisIntroImages/CFGJoinRule.svg)
282 Putting it all together, to model the effects of a basic block we compute:
285 out = transfer(basic_block, join(in_1, in_2, ..., in_n))
288 (Note that there are other ways to write this equation that produce higher
289 precision analysis results. The trick is to keep exploring the execution paths
290 separately and delay joining until later. However, we won't discuss those
293 To make a conclusion about all paths through the program, we repeat this
294 computation on all basic blocks until we reach a fixpoint. In other words, we
295 keep propagating information through the CFG until the computed sets of values
298 If the lattice has a finite height and transfer functions are monotonic the
299 algorithm is guaranteed to terminate. Each iteration of the algorithm can
300 change computed values only to larger values from the lattice. In the worst
301 case, all computed values become `⊤`, which is not very useful, but at least the
302 analysis terminates at that point, because it can't change any of the values.
304 Fixpoint iteration can be optimised by only reprocessing basic blocks which had
305 one of their inputs changed on the previous iteration. This is typically
306 implemented using a worklist queue. With this optimisation the time complexity
307 becomes `O(m * |L|)`, where `m` is the number of basic blocks in the CFG and
308 `|L|` is the size of lattice used by the analysis.
310 ## Symbolic execution: a very short informal introduction
314 In the previous example where we tried to figure out what values a variable can
315 have, the analysis had to be seeded with a concrete value. What if there are no
316 assignments of concrete values in the program? We can still deduce some
317 interesting information by representing unknown input values symbolically, and
318 computing results as symbolic expressions:
321 void PrintAbs(int x) {
324 result = x; // result is {x}
326 result = -x; // result is {-x}
328 print(result); // result is {x; -x}
332 We can't say what specific value gets printed, but we know that it is either `x`
335 Dataflow analysis is an istance of abstract interpretation, and does not dictate
336 how exactly the lattice and transfer functions should be designed, beyond the
337 necessary conditions for the analysis to converge. Nevertheless, we can use
338 symbolic execution ideas to guide our design of the lattice and transfer
339 functions: lattice values can be symbolic expressions, and transfer functions
340 can construct more complex symbolic expressions from symbolic expressions that
341 represent arguments. See [this StackOverflow
342 discussion](https://cstheory.stackexchange.com/questions/19708/symbolic-execution-is-a-case-of-abstract-interpretation)
343 for a further comparison of abstract interpretation and symbolic execution.
347 A human can say about the previous example that the function returns `x` when
348 `x >= 0`, and `-x` when `x < 0`. We can make this conclusion programmatically by
349 tracking a flow condition. A flow condition is a predicate written in terms of
350 the program state that is true at a specific program point regardless of the
351 execution path that led to this statement. For example, the flow condition for
352 the program point right before evaluating `result = x` is `x >= 0`.
354 If we enhance the lattice to be a set of pairs of values and predicates, the
355 dataflow analysis computes the following values:
358 void PrintAbs(int x) {
361 // Flow condition: x >= 0.
362 result = x; // result is {x if x >= 0}
364 // Flow condition: x < 0.
365 result = -x; // result is {-x if x < 0}
367 print(result); // result is {x if x >= 0; -x if x < 0}
371 Of course, in a program with loops, symbolic expressions for flow conditions can
372 grow unbounded. A practical static analysis system must control this growth to
373 keep the symbolic representations manageable and ensure that the data flow
374 analysis terminates. For example, it can use a constraint solver to prune
375 impossible flow conditions, and/or it can abstract them, losing precision, after
376 their symbolic representations grow beyond some threshold. This is similar to
377 how we had to limit the sizes of computed sets of possible values to 3 elements.
379 ### Symbolic pointers
381 This approach proves to be particularly useful for modeling pointer values,
382 since we don't care about specific addresses but just want to give a unique
383 identifier to a memory location.
386 void ExampleOfSymbolicPointers(bool b) {
387 int x = 0; // x is {0}
388 int* ptr = &x; // x is {0} ptr is {&x}
390 *ptr = 42; // x is {42} ptr is {&x}
392 print(x); // x is {0; 42} ptr is {&x}
396 ## Example: finding output parameters
398 Let's explore how data flow analysis can help with a problem that is hard to
399 solve with other tools in Clang.
401 ### Problem description
403 Output parameters are function parameters of pointer or reference type whose
404 pointee is completely overwritten by the function, and not read before it is
405 overwritten. They are common in pre-C++11 code due to the absence of move
406 semantics. In modern C++ output parameters are non-idiomatic, and return values
409 Imagine that we would like to refactor output parameters to return values to
410 modernize old code. The first step is to identify refactoring candidates through
413 For example, in the following code snippet the pointer `c` is an output
422 void GetCustomer(Customer *c) {
432 We would like to refactor this code into:
435 Customer GetCustomer() {
447 However, in the function below the parameter `c` is not an output parameter
448 because its field `name` is not overwritten on every path through the function.
451 void GetCustomer(Customer *c) {
459 The code also cannot read the value of the parameter before overwriting it:
462 void GetCustomer(Customer *c) {
469 Functions that escape the pointer also block the refactoring:
472 Customer* kGlobalCustomer;
474 void GetCustomer(Customer *c) {
481 To identify a candidate function for refactoring, we need to do the following:
483 * Find a function with a non-const pointer or reference parameter.
485 * Find the definition of that function.
487 * Prove that the function completely overwrites the pointee on all paths
490 * Prove that the function reads the pointee only after overwriting it.
492 * Prove that the function does not persist the pointer in a data structure
493 that is live after the function returns.
495 There are also requirements that all usage sites of the candidate function must
496 satisfy, for example, that function arguments do not alias, that users are not
497 taking the address of the function, and so on. Let's consider verifying usage
498 site conditions to be a separate static analysis problem.
502 To analyze the function body we can use a lattice which consists of normal
503 states and failure states. A normal state describes program points where we are
504 sure that no behaviors that block the refactoring have occurred. Normal states
505 keep track of all parameter's member fields that are known to be overwritten on
506 every path from function entry to the corresponding program point. Failure
507 states accumulate observed violations (unsafe reads and pointer escapes) that
508 block the refactoring.
510 In the partial order of the lattice failure states compare greater than normal
511 states, which guarantees that they "win" when joined with normal states. Order
512 between failure states is determined by inclusion relation on the set of
513 accumulated violations (lattice's `⩽` is `⊆` on the set of violations). Order
514 between normal states is determined by reversed inclusion relation on the set of
515 overwritten parameter's member fields (lattice's `⩽` is `⊇` on the set of
518 ![Lattice for data flow analysis that identifies output parameters](DataFlowAnalysisIntroImages/OutputParameterIdentificationLattice.svg)
520 To determine whether a statement reads or writes a field we can implement
521 symbolic evaluation of `DeclRefExpr`s, `LValueToRValue` casts, pointer
522 dereference operator and `MemberExpr`s.
524 ### Using data flow results to identify output parameters
526 Let's take a look at how we use data flow analysis to identify an output
527 parameter. The refactoring can be safely done when the data flow algorithm
528 computes a normal state with all of the fields proven to be overwritten in the
529 exit basic block of the function.
537 void GetCustomer(Customer* c) {
539 c->account_id = ...; // Overwritten: {c->account_id}
541 c->name = ...; // Overwritten: {c->account_id, c->name}
543 c->name = ...; // Overwritten: {c->account_id, c->name}
545 // Overwritten: {c->account_id, c->name}
549 When the data flow algorithm computes a normal state, but not all fields are
550 proven to be overwritten we can't perform the refactoring.
553 void target(bool b, Customer* c) {
556 c->account_id = 42; // Overwritten: {c->account_id}
558 c->name = "Konrad"; // Overwritten: {c->name}
564 Similarly, when the data flow algorithm computes a failure state, we also can't
565 perform the refactoring.
568 Customer* kGlobalCustomer;
570 void GetCustomer(Customer* c) {
572 c->account_id = ...; // Overwritten: {c->account_id}
574 print(c->name); // Unsafe read
576 kGlobalCustomer = c; // Pointer escape
578 // Unsafe read, Pointer escape
582 ## Example: finding dead stores
584 Let's say we want to find redundant stores, because they indicate potential
592 The first store to `x` is never read, probably there is a bug.
594 The implementation of dead store analysis is very similar to output parameter
595 analysis: we need to track stores and loads, and find stores that were never
598 [Liveness analysis](https://en.wikipedia.org/wiki/Live_variable_analysis) is a
599 generalization of this idea, which is often used to answer many related
600 questions, for example:
602 * finding dead stores,
603 * finding uninitialized variables,
604 * finding a good point to deallocate memory,
605 * finding out if it would be safe to move an object.
607 ## Example: definitive initialization
609 Definitive initialization proves that variables are known to be initialized when
610 read. If we find a variable which is read when not initialized then we generate
615 int x; // x is uninitialized
617 x = 10; // x is initialized
619 x = 20; // x is initialized
621 print(x); // x is initialized
627 int x; // x is uninitialized
629 x = 10; // x is initialized
631 print(x); // x is maybe uninitialized, x is being read, report a bug.
635 For this purpose we can use lattice in a form of a mapping from variable
636 declarations to initialization states; each initialization state is represented
637 by the followingn lattice:
639 ![Lattice for definitive initialization analysis](DataFlowAnalysisIntroImages/DefinitiveInitializationLattice.svg)
641 A lattice element could also capture the source locations of the branches that
642 lead us to the corresponding program point. Diagnostics would use this
643 information to show a sample buggy code path to the user.
645 ## Example: refactoring raw pointers to `unique_ptr`
647 Modern idiomatic C++ uses smart pointers to express memory ownership, however in
648 pre-C++11 code one can often find raw pointers that own heap memory blocks.
650 Imagine that we would like to refactor raw pointers that own memory to
651 `unique_ptr`. There are multiple ways to design a data flow analysis for this
652 problem; let's look at one way to do it.
654 For example, we would like to refactor the following code that uses raw
658 void UniqueOwnership1() {
669 into code that uses `unique_ptr`:
672 void UniqueOwnership1() {
673 auto pi = std::make_unique<int>();
677 TakeOwnership(pi.release());
682 This problem can be solved with a lattice in form of map from value declarations
685 ![Lattice that identifies candidates for unique_ptr refactoring](DataFlowAnalysisIntroImages/UniquePtrLattice.svg)
687 We can perform the refactoring if at the exit of a function `pi` is
691 void UniqueOwnership1() {
692 int *pi; // pi is Compatible
693 pi = new int; // pi is Defined
695 Borrow(pi); // pi is Defined
696 delete pi; // pi is Compatible
698 TakeOwnership(pi); // pi is Compatible
704 Let's look at an example where the raw pointer owns two different memory blocks:
707 void UniqueOwnership2() {
708 int *pi = new int; // pi is Defined
710 delete pi; // pi is Compatible
712 pi = new int; // pi is Defined
714 delete pi; // pi is Compatible
720 It can be refactored to use `unique_ptr` like this:
723 void UniqueOwnership2() {
724 auto pi = make_unique<int>();
727 pi = make_unique<int>();
733 In the following example, the raw pointer is used to access the heap object
734 after the ownership has been transferred.
737 void UniqueOwnership3() {
738 int *pi = new int; // pi is Defined
741 delete pi; // pi is Compatible
743 vector<unique_ptr<int>> v = {std::unique_ptr(pi)}; // pi is Compatible
751 We can refactor this code to use `unique_ptr`, however we would have to
752 introduce a non-owning pointer variable, since we can't use the moved-from
753 `unique_ptr` to access the object:
756 void UniqueOwnership3() {
757 std::unique_ptr<int> pi = std::make_unique<int>();
761 int *pi_non_owning = pi.get();
762 vector<unique_ptr<int>> v = {std::move(pi)};
763 print(*pi_non_owning);
769 If the original code didn't call `delete` at the very end of the function, then
770 our refactoring may change the point at which we run the destructor and release
771 memory. Specifically, if there is some user code after `delete`, then extending
772 the lifetime of the object until the end of the function may hold locks for
773 longer than necessary, introduce memory overhead etc.
775 One solution is to always replace `delete` with a call to `reset()`, and then
776 perform another analysis that removes unnecessary `reset()` calls.
779 void AddedMemoryOverhead() {
780 HugeObject *ho = new HugeObject();
782 delete ho; // Release the large amount of memory quickly.
783 LongRunningFunction();
787 This analysis will refuse to refactor code that mixes borrowed pointer values
788 and unique ownership. In the following code, `GetPtr()` returns a borrowed
789 pointer, which is assigned to `pi`. Then, `pi` is used to hold a uniquely-owned
790 pointer. We don't distinguish between these two assignments, and we want each
791 assignment to be paired with a corresponding sink; otherwise, we transition the
792 pointer to a `Conflicting` state, like in this example.
795 void ConflictingOwnership() {
796 int *pi; // pi is Compatible
797 pi = GetPtr(); // pi is Defined
798 Borrow(pi); // pi is Defined
800 pi = new int; // pi is Conflicting
807 We could still handle this case by finding a maximal range in the code where
808 `pi` could be in the Compatible state, and only refactoring that part.
811 void ConflictingOwnership() {
816 std::unique_ptr<int> pi_unique = std::make_unique<int>();
817 Borrow(pi_unique.get());
821 ## Example: finding redundant branch conditions
823 In the code below `b1` should not be checked in both the outer and inner "if"
824 statements. It is likely there is a bug in this code.
827 int F(bool b1, bool b2) {
830 if (b1 && b2) { // Check `b1` again -- unnecessary!
837 A checker that finds this pattern syntactically is already implemented in
838 ClangTidy using AST matchers (`bugprone-redundant-branch-condition`).
840 To implement it using the data flow analysis framework, we can produce a warning
841 if any part of the branch condition is implied by the flow condition.
844 int F(bool b1, bool b2) {
845 // Flow condition: true.
847 // Flow condition: b1.
849 if (b1 && b2) { // `b1` is implied by the flow condition.
856 One way to check this implication is to use a SAT solver. Without a SAT solver,
857 we could keep the flow condition in the CNF form and then it would be easy to
858 check the implication.
860 ## Example: finding unchecked `std::optional` unwraps
862 Calling `optional::value()` is only valid if `optional::has_value()` is true. We
863 want to show that when `x.value()` is executed, the flow condition implies
866 In the example below `x.value()` is accessed safely because it is guarded by the
867 `x.has_value()` check.
870 void Example(std::optional<int> &x) {
877 While entering the if branch we deduce that `x.has_value()` is implied by the
881 void Example(std::optional<int> x) {
882 // Flow condition: true.
884 // Flow condition: x.has_value() == true.
887 // Flow condition: true.
891 We also need to prove that `x` is not modified between check and value access.
892 The modification of `x` may be very subtle:
895 void F(std::optional<int> &x);
897 void Example(std::optional<int> &x) {
899 // Flow condition: x.has_value() == true.
900 unknown_function(x); // may change x.
901 // Flow condition: true.
907 ## Example: finding dead code behind A/B experiment flags
909 Finding dead code is a classic application of data flow analysis.
911 Unused flags for A/B experiment hide dead code. However, this flavor of dead
912 code is invisible to the compiler because the flag can be turned on at any
915 We could make a tool that deletes experiment flags. The user tells us which flag
916 they want to delete, and we assume that the it's value is a given constant.
918 For example, the user could use the tool to remove `example_flag` from this
922 DEFINE_FLAG(std::string, example_flag, "", "A sample flag.");
925 bool x = GetFlag(FLAGS_example_flag).empty();
935 The tool would simplify the code to:
944 We can solve this problem with a classic constant propagation lattice combined
945 with symbolic evaluation.
947 ## Example: finding inefficient usages of associative containers
949 Real-world code often accidentally performs repeated lookups in associative
953 map<int, Employee> xs;
954 xs[42]->name = "...";
955 xs[42]->title = "...";
958 To find the above inefficiency we can use the available expressions analysis to
959 understand that `m[42]` is evaluated twice.
962 map<int, Employee> xs;
963 Employee &e = xs[42];
968 We can also track the `m.contains()` check in the flow condition to find
969 redundant checks, like in the example below.
972 std::map<int, Employee> xs;
973 if (!xs.contains(42)) {
974 xs.insert({42, someEmployee});
978 ## Example: refactoring types that implicitly convert to each other
980 Refactoring one strong type to another is difficult, but the compiler can help:
981 once you refactor one reference to the type, the compiler will flag other places
982 where this information flows with type mismatch errors. Unfortunately this
983 strategy does not work when you are refactoring types that implicitly convert to
984 each other, for example, replacing `int32_t` with `int64_t`.
986 Imagine that we want to change user IDs from 32 to 64-bit integers. In other
987 words, we need to find all integers tainted with user IDs. We can use data flow
988 analysis to implement taint analysis.
991 void UseUser(int32_t user_id) {
992 int32_t id = user_id;
993 // Variable `id` is tainted with a user ID.
998 Taint analysis is very well suited to this problem because the program rarely
999 branches on user IDs, and almost certainly does not perform any computation