libc/benchmarks/README.md

   1 # Libc mem* benchmarks
   2
   3 This framework has been designed to evaluate and compare relative performance of memory function implementations on a particular machine.
   4
   5 It relies on:
   6  - `libc.src.string.<mem_function>_benchmark` to run the benchmarks for the particular `<mem_function>`.
   7  - `libc-benchmark-analysis.py3` a tool to process the measurements into reports.
   8
   9 ## Benchmarking tool
  10
  11 ### Setup
  12
  13 ```shell
  14 cd llvm-project
  15 cmake -B/tmp/build -Sllvm -DLLVM_ENABLE_PROJECTS='clang;clang-tools-extra;libc' -DCMAKE_BUILD_TYPE=Release -DLIBC_INCLUDE_BENCHMARKS=Yes -G Ninja
  16 ninja -C /tmp/build libc.src.string.<mem_function>_benchmark
  17 ```
  18
  19 > Note: The machine should run in `performance` mode. This is achieved by running:
  20 ```shell
  21 cpupower frequency-set --governor performance
  22 ```
  23
  24 ### Usage
  25
  26 The benchmark can run in two modes:
  27  - **stochastic mode** returns the average time per call for a particular size distribution, this is the default,
  28  - **sweep mode** returns the average time per size over a range of sizes.
  29
  30 Each benchmark requires the `--study-name` to be set, this is a name to identify a run and provide label during analysis.  If **stochastic mode** is being used, you must also provide `--size-distribution-name` to pick one of the available MemorySizeDistribution's.
  31
  32 It also provides optional flags:
  33  - `--num-trials`: repeats the benchmark more times, the analysis tool can take this into account and give confidence intervals.
  34  - `--output`: specifies a file to write the report - or standard output if not set.
  35
  36 ### Stochastic mode
  37
  38 This is the preferred mode to use. The function parameters are randomized and the branch predictor is less likely to kick in.
  39
  40 ```shell
  41 /tmp/build/bin/libc.src.string.memcpy_benchmark \
  42     --study-name="new memcpy" \
  43     --size-distribution-name="memcpy Google A" \
  44     --num-trials=30 \
  45     --output=/tmp/benchmark_result.json
  46 ```
  47
  48 The `--size-distribution-name` flag is mandatory and points to one of the [predefined distribution](MemorySizeDistributions.h).
  49
  50 > Note: These distributions are gathered from several important binaries at Google (servers, databases, realtime and batch jobs) and reflect the importance of focusing on small sizes.
  51
  52 Using a profiler to observe size distributions for calls into libc functions, it
  53 was found most operations act on a small number of bytes.
  54
  55 Function           | % of calls with size ≤ 128 | % of calls with size ≤ 1024
  56 ------------------ | --------------------------: | ---------------------------:
  57 memcpy             | 96%                         | 99%
  58 memset             | 91%                         | 99.9%
  59 memcmp<sup>1</sup> | 99.5%                       | ~100%
  60
  61 _<sup>1</sup> - The size refers to the size of the buffers to compare and not
  62 the number of bytes until the first difference._
  63
  64 ### Sweep mode
  65
  66 This mode is used to measure call latency per size for a certain range of sizes. Because it exercises the same size over and over again the branch predictor can kick in. It can still be useful to compare strength and weaknesses of particular implementations.
  67
  68 ```shell
  69 /tmp/build/bin/libc.src.string.memcpy_benchmark \
  70     --study-name="new memcpy" \
  71     --sweep-mode \
  72     --sweep-max-size=128 \
  73     --output=/tmp/benchmark_result.json
  74 ```
  75
  76 ## Analysis tool
  77
  78 ### Setup
  79
  80 Make sure to have `matplotlib`, `pandas` and `seaborn` setup correctly:
  81
  82 ```shell
  83 apt-get install python3-pip
  84 pip3 install matplotlib pandas seaborn
  85 ```
  86 You may need `python3-gtk` or similar package to display the graphs.
  87
  88 ### Usage
  89
  90 ```shell
  91 python3 libc/benchmarks/libc-benchmark-analysis.py3 /tmp/benchmark_result.json ...
  92 ```
  93
  94 When used with __multiple trials Sweep Mode data__ the tool displays the 95% confidence interval.
  95
  96 When providing with multiple reports at the same time, all the graphs from the same machine are displayed side by side to allow for comparison.
  97
  98 The Y-axis unit can be changed via the `--mode` flag:
  99  - `time` displays the measured time (this is the default),
 100  - `cycles` displays the number of cycles computed from the cpu frequency,
 101  - `bytespercycle` displays the number of bytes per cycle (for `Sweep Mode` reports only).
 102
 103 ## Under the hood
 104
 105  To learn more about the design decisions behind the benchmarking framework,
 106  have a look at the [RATIONALE.md](RATIONALE.md) file.