1 llvm-exegesis - LLVM Machine Instruction Benchmark
2 ==================================================
7 :program:`llvm-exegesis` [*options*]
12 :program:`llvm-exegesis` is a benchmarking tool that uses information available
13 in LLVM to measure host machine instruction characteristics like latency,
14 throughput, or port decomposition.
16 Given an LLVM opcode name and a benchmarking mode, :program:`llvm-exegesis`
17 generates a code snippet that makes execution as serial (resp. as parallel) as
18 possible so that we can measure the latency (resp. inverse throughput/uop decomposition)
20 The code snippet is jitted and executed on the host subtarget. The time taken
21 (resp. resource usage) is measured using hardware performance counters. The
22 result is printed out as YAML to the standard output.
24 The main goal of this tool is to automatically (in)validate the LLVM's TableDef
25 scheduling models. To that end, we also provide analysis of the results.
27 :program:`llvm-exegesis` can also benchmark arbitrary user-provided code
30 EXAMPLE 1: benchmarking instructions
31 ------------------------------------
33 Assume you have an X86-64 machine. To measure the latency of a single
38 $ llvm-exegesis -mode=latency -opcode-name=ADD64rr
40 Measuring the uop decomposition or inverse throughput of an instruction works similarly:
44 $ llvm-exegesis -mode=uops -opcode-name=ADD64rr
45 $ llvm-exegesis -mode=inverse_throughput -opcode-name=ADD64rr
48 The output is a YAML document (the default is to write to stdout, but you can
49 redirect the output to a file using `-benchmarks-file`):
59 llvm_triple: x86_64-unknown-linux-gnu
60 num_repetitions: 10000
62 - { key: latency, value: 1.0058, debug_string: '' }
64 info: 'explicit self cycles, selecting one aliasing configuration.
70 To measure the latency of all instructions for the host architecture, run:
75 readonly INSTRUCTIONS=$(($(grep INSTRUCTION_LIST_END build/lib/Target/X86/X86GenInstrInfo.inc | cut -f2 -d=) - 1))
76 for INSTRUCTION in $(seq 1 ${INSTRUCTIONS});
78 ./build/bin/llvm-exegesis -mode=latency -opcode-index=${INSTRUCTION} | sed -n '/---/,$p'
81 FIXME: Provide an :program:`llvm-exegesis` option to test all instructions.
84 EXAMPLE 2: benchmarking a custom code snippet
85 ---------------------------------------------
87 To measure the latency/uops of a custom piece of code, you can specify the
88 `snippets-file` option (`-` reads from standard input).
92 $ echo "vzeroupper" | llvm-exegesis -mode=uops -snippets-file=-
94 Real-life code snippets typically depend on registers or memory.
95 :program:`llvm-exegesis` checks the liveliness of registers (i.e. any register
96 use has a corresponding def or is a "live in"). If your code depends on the
97 value of some registers, you have two options:
99 - Mark the register as requiring a definition. :program:`llvm-exegesis` will
100 automatically assign a value to the register. This can be done using the
101 directive `LLVM-EXEGESIS-DEFREG <reg name> <hex_value>`, where `<hex_value>`
102 is a bit pattern used to fill `<reg_name>`. If `<hex_value>` is smaller than
103 the register width, it will be sign-extended.
104 - Mark the register as a "live in". :program:`llvm-exegesis` will benchmark
105 using whatever value was in this registers on entry. This can be done using
106 the directive `LLVM-EXEGESIS-LIVEIN <reg name>`.
108 For example, the following code snippet depends on the values of XMM1 (which
109 will be set by the tool) and the memory buffer passed in RDI (live in).
113 # LLVM-EXEGESIS-LIVEIN RDI
114 # LLVM-EXEGESIS-DEFREG XMM1 42
115 vmulps (%rdi), %xmm1, %xmm2
116 vhaddps %xmm2, %xmm2, %xmm3
123 Assuming you have a set of benchmarked instructions (either latency or uops) as
124 YAML in file `/tmp/benchmarks.yaml`, you can analyze the results using the
129 $ llvm-exegesis -mode=analysis \
130 -benchmarks-file=/tmp/benchmarks.yaml \
131 -analysis-clusters-output-file=/tmp/clusters.csv \
132 -analysis-inconsistencies-output-file=/tmp/inconsistencies.html
134 This will group the instructions into clusters with the same performance
135 characteristics. The clusters will be written out to `/tmp/clusters.csv` in the
140 cluster_id,opcode_name,config,sched_class
142 2,ADD32ri8_DB,,WriteALU,1.00
143 2,ADD32ri_DB,,WriteALU,1.01
144 2,ADD32rr,,WriteALU,1.01
145 2,ADD32rr_DB,,WriteALU,1.00
146 2,ADD32rr_REV,,WriteALU,1.00
147 2,ADD64i32,,WriteALU,1.01
148 2,ADD64ri32,,WriteALU,1.01
149 2,MOVSX64rr32,,BSWAP32r_BSWAP64r_MOVSX64rr32,1.00
150 2,VPADDQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.02
151 2,VPSUBQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.01
152 2,ADD64ri8,,WriteALU,1.00
153 2,SETBr,,WriteSETCC,1.01
156 :program:`llvm-exegesis` will also analyze the clusters to point out
157 inconsistencies in the scheduling information. The output is an html file. For
158 example, `/tmp/inconsistencies.html` will contain messages like the following :
160 .. image:: llvm-exegesis-analysis.png
163 Note that the scheduling class names will be resolved only when
164 :program:`llvm-exegesis` is compiled in debug mode, else only the class id will
165 be shown. This does not invalidate any of the analysis results though.
173 Print a summary of command line options.
175 .. option:: -opcode-index=<LLVM opcode index>
177 Specify the opcode to measure, by index. See example 1 for details.
178 Either `opcode-index`, `opcode-name` or `snippets-file` must be set.
180 .. option:: -opcode-name=<opcode name 1>,<opcode name 2>,...
182 Specify the opcode to measure, by name. Several opcodes can be specified as
183 a comma-separated list. See example 1 for details.
184 Either `opcode-index`, `opcode-name` or `snippets-file` must be set.
186 .. option:: -snippets-file=<filename>
188 Specify the custom code snippet to measure. See example 2 for details.
189 Either `opcode-index`, `opcode-name` or `snippets-file` must be set.
191 .. option:: -mode=[latency|uops|inverse_throughput|analysis]
193 Specify the run mode. Note that if you pick `analysis` mode, you also need
194 to specify at least one of the `-analysis-clusters-output-file=` and
195 `-analysis-inconsistencies-output-file=`.
197 .. option:: -num-repetitions=<Number of repetition>
199 Specify the number of repetitions of the asm snippet.
200 Higher values lead to more accurate measurements but lengthen the benchmark.
202 .. option:: -benchmarks-file=</path/to/file>
204 File to read (`analysis` mode) or write (`latency`/`uops`/`inverse_throughput`
205 modes) benchmark results. "-" uses stdin/stdout.
207 .. option:: -analysis-clusters-output-file=</path/to/file>
209 If provided, write the analysis clusters as CSV to this file. "-" prints to
210 stdout. By default, this analysis is not run.
212 .. option:: -analysis-inconsistencies-output-file=</path/to/file>
214 If non-empty, write inconsistencies found during analysis to this file. `-`
215 prints to stdout. By default, this analysis is not run.
217 .. option:: -analysis-numpoints=<dbscan numPoints parameter>
219 Specify the numPoints parameters to be used for DBSCAN clustering
222 .. option:: -analysis-epsilon=<dbscan epsilon parameter>
224 Specify the numPoints parameters to be used for DBSCAN clustering
227 .. option:: -ignore-invalid-sched-class=false
229 If set, ignore instructions that do not have a sched class (class idx = 0).
231 .. option:: -mcpu=<cpu name>
233 If set, measure the cpu characteristics using the counters for this CPU. This
234 is useful when creating new sched models (the host CPU is unknown to LLVM).
239 :program:`llvm-exegesis` returns 0 on success. Otherwise, an error message is
240 printed to standard error, and the tool returns a non 0 value.