tools/fuzzing/docs/index.rst

   1 Fuzzing
   2 =======
   3
   4 .. toctree::
   5   :maxdepth: 1
   6   :hidden:
   7   :glob:
   8   :reversed:
   9
  10   *
  11
  12 This section focuses on explaining the software testing technique called
  13 “Fuzzing” or “Fuzz Testing” and its application to the Mozilla codebase.
  14 The overall goal is to educate developers about the capabilities and
  15 usefulness of fuzzing and also allow them to write their own fuzzing
  16 targets. Note that not all fuzzing tools used at Mozilla are open
  17 source. Some tools are for internal use only because they can easily
  18 find critical security vulnerabilities.
  19
  20 What is Fuzzing?
  21 ----------------
  22
  23 Fuzzing (or Fuzz Testing) is a technique to randomly use a program or
  24 parts of it with the goal to uncover bugs. Random usage can have a wide
  25 variety of forms, a few common ones are
  26
  27 -  random input data (e.g. file formats, network data, source code, etc.)
  28
  29 -  random API usage
  30
  31 -  random UI interaction
  32
  33 with the first two being the most practical methods used in the field.
  34 Of course, these methods are not entirely separate, combinations are
  35 possible. Fuzzing is a great way to find quality issues, some of them
  36 being also security issues.
  37
  38 Random input data
  39 ~~~~~~~~~~~~~~~~~
  40
  41 This is probably the most obvious fuzzing method: You have code that
  42 processes data and you provide it with random or mutated data, hoping
  43 that it will uncover bugs in your implementation. Examples are media
  44 formats like JPEG or H.264, but basically anything that involves
  45 processing a “blob” of data can be a valuable target. Countless security
  46 vulnerabilities in a variety of libraries and programs have been found
  47 using this method (the AFLFuzz
  48 `bug-o-rama <http://lcamtuf.coredump.cx/afl/#bugs>`__ gives a good
  49 impression).
  50
  51 Common tools for this task are e.g.
  52 `libFuzzer <https://llvm.org/docs/LibFuzzer.html>`__ and
  53 `AFLFuzz <http://lcamtuf.coredump.cx/afl/>`__, but also specialized
  54 tools with custom logic like
  55 `LangFuzz <https://www.usenix.org/system/files/conference/usenixsecurity12/sec12-final73.pdf>`__
  56 and `Avalanche <https://github.com/MozillaSecurity/avalanche>`__.
  57
  58 Random API Usage
  59 ~~~~~~~~~~~~~~~~
  60
  61 Randomly testing APIs is especially helpful with parts of software that
  62 expose a well-defined interface (see also :ref:`Well-defined
  63 behavior and Safety <Well defined behaviour and safety>`). If this interface is additionally exposed to
  64 untrusted parties/content, then this is a strong sign that random API
  65 testing would be worthwhile here, also for security reasons. APIs can be
  66 anything from C++ layer code to APIs offered in the browser.
  67
  68 A good example for a fuzzing target here is the DOM (Document Object
  69 Model) and various other browser APIs. The browser exposes a variety of
  70 different APIs for working with documents, media, communication,
  71 storage, etc. with a growing complexity. Each of these APIs has
  72 potential bugs that can be uncovered with fuzzing. At Mozilla, we
  73 currently use domino (internal tool) for this purpose.
  74
  75 Random UI Interaction
  76 ~~~~~~~~~~~~~~~~~~~~~
  77
  78 A third way to test programs and in particular user interfaces is by
  79 directly interacting with the UI in a random way, typically in
  80 combination with other actions the program has to perform. Imagine for
  81 example an automated browser that surfs through the web and randomly
  82 performs actions such as scrolling, zooming and clicking links. The nice
  83 thing about this approach is that you likely find many issues that the
  84 end-user also experiences. However, this approach typically suffers from
  85 bad reproducibility (see also :ref:`Reproducibility <Reproducibility>`) and is therefore
  86 often of limited use.
  87
  88 An example for a fuzzing tool using this technique is `Android
  89 Monkey <https://developer.android.com/studio/test/monkey>`__. At
  90 Mozilla however, we currently don’t make much use of this approach.
  91
  92 Why Fuzzing Helps You
  93 ---------------------
  94
  95 Understanding the value of fuzzing for you as a developer and software
  96 quality in general is important to justify the support this testing
  97 method might need from you. When your component is fuzzed for the first
  98 time there are two common things you will be confronted with:
  99
 100 **Bug reports that don’t seem real bugs or not important:** Fuzzers
 101 find all sorts of bugs in various corners of your component, even
 102 obscure ones. This automatically leads to a larger number of bugs that
 103 either don’t seem to be bugs (see also the :ref:`Well-defined behavior and
 104 safety <Well defined behaviour and safety>` section below) or that don’t seem to be important bugs.
 105
 106 Fixing these bugs is still important for the fuzzers because ignoring them
 107 in fuzzing costs resources (performance, human resources) and might even
 108 prevent the fuzzer from hitting other bugs. For example certain fuzzing tools
 109 like libFuzzer run in-process and have to restart on every crash, involving a
 110 costly re-read of the fuzzing samples.
 111
 112 Also, as some of our code evolves quickly, a corner case might become a
 113 hot code path in a few months.
 114
 115 **New steps to reproduce:** Fuzzing tools are very likely to exercise
 116 your component using different methods than an average end-user. A
 117 common technique is modify existing parts of a program or write entirely
 118 new code to yield a fuzzing "target". This target is specifically
 119 designed to work with the fuzzing tools in use. Reproducing the reported
 120 bugs might require you to learn these new steps to reproduce, including
 121 building/acquiring that target and having the right environment.
 122
 123 Both of these issues might seem like a waste of time in some cases,
 124 however, realizing that both steps are a one-time investment for a
 125 constant stream of valuable bug reports is paramount here. Helping your
 126 security engineers to overcome these issues will ensure that future
 127 regressions in your code can be detected at an earlier stage and in a
 128 form that is more easily actionable. Especially if you are dealing with
 129 regressions in your code already, fuzzing has the potential to make your
 130 job as a developer easier.
 131
 132 One of the best examples at Mozilla is the JavaScript engine. The JS
 133 team has put great quite some effort into getting fuzzing started and
 134 supporting our work. Here’s what Jan de Mooij, a senior platform
 135 engineer for the JavaScript engine, has to say about it:
 136
 137 *“Bugs in the engine can cause mysterious browser crashes and bugs that
 138 are incredibly hard to track down. Fortunately, we don't have to deal
 139 with these time consuming browser issues very often: usually the fuzzers
 140 find a reliable shell test long before the bug makes it into a release.
 141 Fuzzing is invaluable to us and I cannot imagine working on this project
 142 without it.”*
 143
 144 Levels of Fuzzing in Firefox/Gecko
 145 ----------------------------------
 146
 147 Applying fuzzing to e.g. Firefox happens at different "levels", similar
 148 to the different types of automated tests we have:
 149
 150 Full Browser Fuzzing
 151 ~~~~~~~~~~~~~~~~~~~~
 152
 153 The most obvious method of testing would be to test the full browser and
 154 doing so is required for certain features like the DOM and other APIs.
 155 The advantage here is that we have all the features of the browser
 156 available and testing happens closely to what we actually ship. The
 157 downside here though is that browser testing is by far the slowest of
 158 all testing methods. In addition, it has the most amount of
 159 non-determinism involved (resulting e.g. in intermittent testcases).
 160 Browser fuzzing at Mozilla is largely done with the `Grizzly
 161 framework <https://blog.mozilla.org/security/2019/07/10/grizzly/>`__
 162 (`meta bug <https://bugzilla.mozilla.org/show_bug.cgi?id=grizzly>`__)
 163 and one of the most successful fuzzers is the Domino tool (`meta
 164 bug <https://bugzilla.mozilla.org/show_bug.cgi?id=domino>`__).
 165
 166 Summarizing, full browser fuzzing is the right technique to investigate
 167 if your feature really requires it. Consider using other methods (see
 168 below) if your code can be exercised in this way.
 169
 170 The Fuzzing Interface
 171 ~~~~~~~~~~~~~~~~~~~~~
 172
 173 **Fuzzing Interface**
 174
 175 The fuzzing interface is glue code living in mozilla-central in order to make it
 176 easier for developers and security researchers to test C/C++ code with either libFuzzer or afl-fuzz.
 177
 178 This interface offers a gtest (C++ unit test) level component based
 179 fuzzing approach and is suitable for anything that could also be
 180 tested/exercised using a gtest. This method is by far the fastest, but
 181 usually limited to testing isolated components that can be instantiated
 182 on this level. Utilizing this method requires you to write a fuzzing
 183 target similar to writing a gtest. This target will automatically be
 184 usable with libFuzzer and AFLFuzz. We offer a :ref:`comprehensive manual <Fuzzing Interface>`
 185 that describes how to write and utilize your own target.
 186
 187 A simple example here is the `SDP parser
 188 target <https://searchfox.org/mozilla-central/rev/efdf9bb55789ea782ae3a431bda6be74a87b041e/media/webrtc/signaling/fuzztest/sdp_parser_libfuzz.cpp#30>`__,
 189 which tests the SipccSdpParser in our codebase.
 190
 191 Shell-based Fuzzing
 192 ~~~~~~~~~~~~~~~~~~~
 193
 194 Some of our fuzzing, e.g. JS Engine testing, happens in a separate shell
 195 program. For JS, this is the JS shell also used for most of the JS tests
 196 and development. In theory, xpcshell could also be used for testing but
 197 so far, there has not been a use case for this (most things that can be
 198 reached through xpcshell can also be tested on the gtest level).
 199
 200 Identifying the right level of fuzzing is the first step towards
 201 continuous fuzz testing of your code.
 202
 203 Code/Process Requirements for Fuzzing
 204 -------------------------------------
 205
 206 In this section, we are going to discuss how code should be written in
 207 order to yield optimal results with fuzzing.
 208
 209 Defect Oracles
 210 ~~~~~~~~~~~~~~
 211
 212 Fuzzing is only effective if you are able to know when a problem has
 213 been found. Crashes are typically problems if the unit being tested is
 214 safe for fuzzing (see Well-defined behavior and Safety). But there are
 215 many more problems that you would want to find, correctness issues,
 216 corruptions that don’t necessarily crash etc. For this, you need an
 217 *oracle* that tells you something is wrong.
 218
 219 The simplest defect oracle is the assertion (ex: ``MOZ_ASSERT``).
 220 Assertions are a very powerful instrument because they can be used to
 221 determine if your program is performing correctly, even if the bug would
 222 not lead to any sort of crash. They can encode arbitrarily complex
 223 information about what is considered correct, information that might
 224 otherwise only exist in the developers’ minds.
 225
 226 External tools like the sanitizers (AddressSanitizer aka ASan,
 227 ThreadSanitizer aka TSan, MemorySanitizer aka MSan and
 228 UndefinedBehaviorSanitizer - UBSan) can also serve as oracles for
 229 sometimes severe issues that would not necessarily crash. Making sure
 230 that these tools can be used on your code is highly useful.
 231
 232 Examples for bugs found with sanitizers are `bug
 233 1419608 <https://bugzilla.mozilla.org/show_bug.cgi?id=1419608>`__,
 234 `bug 1580288 <https://bugzilla.mozilla.org/show_bug.cgi?id=1580288>`__
 235 and `bug 922603 <https://bugzilla.mozilla.org/show_bug.cgi?id=922603>`__,
 236 but since we started using sanitizers, we have found over 1000 bugs with
 237 these tools.
 238
 239 Another defect oracle can be a reference implementation. Comparing
 240 program behavior (typically output) between two programs or two modes of
 241 the same program that should produce the same outputs can find complex
 242 correctness issues. This method is often called differential testing.
 243
 244 One example where this is regularly used to find issues is the Mozilla
 245 JavaScript engine: Running random programs with and without JIT
 246 compilation enabled finds lots of problems with the JIT implementation.
 247 One example for such a bug is `Bug
 248 1404636 <https://bugzilla.mozilla.org/show_bug.cgi?id=1404636>`__.
 249
 250 Component Decoupling
 251 ~~~~~~~~~~~~~~~~~~~~
 252
 253 Being able to test components in isolation can be an advantage for
 254 fuzzing (both for performance and reproducibility). Clear boundaries
 255 between different components and documentation that explains the
 256 contracts usually help with this goal. Sometimes it might be useful to
 257 mock a certain component that the target component is interacting with
 258 and that is much harder if the components are tightly coupled and their
 259 contracts unclear. Of course, this does not mean that one should only
 260 test components in isolation. Sometimes, testing the interaction between
 261 them is even desirable and does not hurt performance at all.
 262
 263 Avoiding external I/O
 264 ~~~~~~~~~~~~~~~~~~~~~
 265
 266 External I/O like network or file interactions are bad for performance
 267 and can introduce additional non-determinism. Providing interfaces to
 268 process data directly from memory instead is usually much more helpful.
 269
 270 .. _Well defined behaviour and safety:
 271
 272 Well-defined Behavior and Safety
 273 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 274
 275 This requirement mostly ties in where defect oracles ended and is one of
 276 the most important problems seen in the wild nowadays with fuzzing. If a
 277 part of your program’s behavior is unspecified, then this potentially
 278 leads to bad times if the behavior is considered a defect by fuzzing.
 279 For example, if your code has crashes that are not considered bugs, then
 280 your code might be unsuitable for fuzzing. Your component should be
 281 fuzzing safe, meaning that any defect oracle (e.g. assertion or crash)
 282 triggered by the fuzzer is considered a bug. This important aspect is
 283 often neglected. Be aware that any false positives cause both
 284 performance degradation and additional manual work for your fuzzing
 285 team. The Mozilla JS developers for example have implemented this
 286 concept in a “--fuzzing-safe” switch which disables harmful functions.
 287 Sometimes, crashes cannot be avoided for handling certain error
 288 conditions. In such situations, it is important to mark these crashes in
 289 a way the fuzzer can recognize and distinguish them from undesired
 290 crashes. However, keep in mind that crashes in general can be disruptive
 291 to the fuzzing process. Performance is an important aspect of fuzzing
 292 and frequent crashes can severely degrade performance.
 293
 294 .. _Reproducibility:
 295
 296 Reproducibility
 297 ~~~~~~~~~~~~~~~
 298
 299 Being able to reproduce issues found with fuzzing is necessary for
 300 several reasons: First, you as the developer probably want a test that
 301 reproduces the issue so you can debug it better. Our feedback from most
 302 developers is that traces without a reproducible test can help to find a
 303 problem, but it makes the whole process very complicated. Some of these
 304 non-reproducible bugs never get fixed. Second, having a reproducible
 305 test also helps the triage process by allowing an automated bisection to
 306 find the responsible developer. Last but not least, the test can be
 307 added to a test suite, used for automated verification of fixes and even
 308 serve as a basis for more fuzzing.
 309
 310 Adding functionality to the program that improve reproducibility is
 311 therefore a good idea in case non-reproducible issues are found. Some
 312 examples are shown in the next section.
 313
 314 While many problems with reproducibility are specific for the project
 315 you are working on, there is one source of these problems that many
 316 programs have in common: Threading. While some bugs only occur in the
 317 first place due to concurrency, some other bugs would be perfectly
 318 reproducible without threads, but are intermittent and hard to with
 319 threading enabled. If the bug is indeed caused by a data race, then
 320 tools like ThreadSanitizer will help and we are currently working on
 321 making ThreadSanitizer usable on Firefox. For bugs that are not caused
 322 by threading, it sometimes makes sense to be able to disable threading
 323 or limit the amount of worker threads involved.
 324
 325 Supporting Code
 326 ~~~~~~~~~~~~~~~
 327
 328 Some possibilities of what support implementations for fuzzing can do
 329 have already been named in the previous sections: Additional defect
 330 oracles and functionality to improve reproducibility and safety. In
 331 fact, many features added specifically for fuzzing fit into one of these
 332 categories. However, there’s room for more: Often, there are ways to
 333 make it easier for fuzzers to exercise complex and hard to reach parts
 334 of your code. For example, if a certain optimization feature is only
 335 turned on under very specific conditions (that are not a requirement for
 336 the optimization), then it makes sense to add a functionality to force
 337 it on. Then, a fuzzer can hit the optimization code much more
 338 frequently, increasing the chance to find issues. Some examples from
 339 Firefox and SpiderMonkey:
 340
 341 - The `FuzzingFunctions <https://searchfox.org/mozilla-central/rev/efdf9bb55789ea782ae3a431bda6be74a87b041e/dom/webidl/FuzzingFunctions.webidl#15>`__
 342   interface in the browser allows fuzzing tools to perform GC/CC, tune various
 343   settings related to garbage collection or enable features like accessibility
 344   mode. Being able to force a garbage collection at a specific time helped
 345   identifying lots of problems in the past.
 346
 347 - The --ion-eager and --baseline-eager flags for the JS shell force JIT
 348   compilation at various stages, rather than using the builtin
 349   heuristic to enable it only for hot functions.
 350
 351 - The --no-threads flag disables all threading (if possible) in the JS shell.
 352   This makes some bugs reproduce deterministically that would otherwise be
 353   intermittent and harder to find. However, some bugs that only occur with
 354   threading can’t be found with this option enabled.
 355
 356 Another important feature that must be turned off for fuzzing is
 357 checksums. Many file formats use checksums to validate a file before
 358 processing it. If a checksum feature is still enabled, fuzzers are
 359 likely never going to produce valid files. The same often holds for
 360 cryptographic signatures. Being able to turn off the validation of these
 361 features as part of a fuzzing switch is extremely helpful.
 362
 363 An example for such a checksum can be found in the
 364 `FlacDemuxer <https://searchfox.org/mozilla-central/rev/efdf9bb55789ea782ae3a431bda6be74a87b041e/dom/media/flac/FlacDemuxer.cpp#494>`__.
 365
 366 Test Samples
 367 ~~~~~~~~~~~~
 368
 369 Some fuzzing strategies make use of existing data that is mutated to
 370 produce the new random data. In fact, mutation-based strategies are
 371 typically superior to others if the original samples are of good quality
 372 because the originals carry a lot of semantics that the fuzzer does not
 373 have to know about or implement. However, success here really stands and
 374 falls with the quality of the samples. If the originals don’t cover
 375 certain parts of the implementation, then the fuzzer will also have to
 376 do more work to get there.
 377
 378
 379 Fuzz Blockers
 380 ~~~~~~~~~~~~~
 381
 382 Fuzz blockers are issues that prevent fuzzers from being as
 383 effective as possible. Depending on the fuzzer and its scope a fuzz blocker
 384 in one area (or component) can impede performance in other areas and in
 385 some cases block the fuzzer all together. Some examples are:
 386
 387 - Frequent crashes - These can block code paths and waste compute
 388   resources due to the need to relaunch the fuzzing target and handle
 389   the results (regardless of whether it is ignored or reported). This can also
 390   include assertions that are mostly benign in many cases are but easily
 391   triggered by fuzzers.
 392
 393 - Frequent hangs / timeouts - This includes any issue that slows down
 394   or blocks execution of the fuzzer or the target.
 395
 396 - Hard to bucket - This includes crashes such as stack overflows or any issue
 397   that crashes in an inconsistent location. This also includes issues that
 398   corrupt logs/debugger output or provide a broken/invalid crash report.
 399
 400 - Broken builds - This is fairly straightforward, without up-to-date builds
 401   fuzzers are unable to run or verify fixes.
 402
 403 - Missing instrumentation - In some cases tools such as ASan are used as
 404   defect oracles and are required by the fuzzing tools to allow for proper
 405   automation. In other cases incomplete instrumentation can give a false sense
 406   of stability or make investigating issues much more time consuming. Although
 407   this is not necessarily blocking the fuzzers it should be prioritized
 408   appropriately.
 409
 410 Since these types of crashes harm the overall fuzzing progress, it is important
 411 for them to be addressed in a timely manner. Even if the bug itself might seem
 412 trivial and low priority for the product, it can still have devastating effects
 413 on fuzzing and hence prevent finding other critical issues.
 414
 415 Issues in Bugzilla are marked as fuzz blockers by adding “[fuzzblocker]”
 416 to the “Whiteboard” field. A list of open issues marked as fuzz blockers
 417 can be found on `Bugzilla <https://bugzilla.mozilla.org/buglist.cgi?cmdtype=dorem&remaction=run&namedcmd=fuzzblockers&sharer_id=486634>`__.
 418
 419
 420 Documentation
 421 ~~~~~~~~~~~~~
 422
 423 It is important for the fuzzing team to know how your software, tests
 424 and designs work. Even obvious tasks, like how a test program is
 425 supposed to be invoked, which options are safe, etc. might be hard to
 426 figure out for the person doing the testing, just as you are reading
 427 this manual right now to find out what is important in fuzzing.
 428
 429 Contact Us
 430 ~~~~~~~~~~
 431
 432 The fuzzing team can be reached at
 433 `fuzzing@mozilla.com <mailto:fuzzing@mozilla.com>`__ or
 434 `on Matrix <https://chat.mozilla.org/#/room/#fuzzing:mozilla.org>`__
 435 and will be happy to help you with any questions about fuzzing
 436 you might have. We can help you find the right method of fuzzing for
 437 your feature, collaborate on the implementation and provide the
 438 infrastructure to run it and process the results accordingly.