docs/CommandGuide/llvm-exegesis.rst

   1 llvm-exegesis - LLVM Machine Instruction Benchmark
   2 ==================================================
   3
   4 SYNOPSIS
   5 --------
   6
   7 :program:`llvm-exegesis` [*options*]
   8
   9 DESCRIPTION
  10 -----------
  11
  12 :program:`llvm-exegesis` is a benchmarking tool that uses information available
  13 in LLVM to measure host machine instruction characteristics like latency or port
  14 decomposition.
  15
  16 Given an LLVM opcode name and a benchmarking mode, :program:`llvm-exegesis`
  17 generates a code snippet that makes execution as serial (resp. as parallel) as
  18 possible so that we can measure the latency (resp. uop decomposition) of the
  19 instruction.
  20 The code snippet is jitted and executed on the host subtarget. The time taken
  21 (resp. resource usage) is measured using hardware performance counters. The
  22 result is printed out as YAML to the standard output.
  23
  24 The main goal of this tool is to automatically (in)validate the LLVM's TableDef
  25 scheduling models. To that end, we also provide analysis of the results.
  26
  27 EXAMPLES: benchmarking
  28 ----------------------
  29
  30 Assume you have an X86-64 machine. To measure the latency of a single
  31 instruction, run:
  32
  33 .. code-block:: bash
  34
  35     $ llvm-exegesis -mode=latency -opcode-name=ADD64rr
  36
  37 Measuring the uop decomposition of an instruction works similarly:
  38
  39 .. code-block:: bash
  40
  41     $ llvm-exegesis -mode=uops -opcode-name=ADD64rr
  42
  43 The output is a YAML document (the default is to write to stdout, but you can
  44 redirect the output to a file using `-benchmarks-file`):
  45
  46 .. code-block:: none
  47
  48   ---
  49   key:
  50     opcode_name:     ADD64rr
  51     mode:            latency
  52     config:          ''
  53   cpu_name:        haswell
  54   llvm_triple:     x86_64-unknown-linux-gnu
  55   num_repetitions: 10000
  56   measurements:
  57     - { key: latency, value: 1.0058, debug_string: '' }
  58   error:           ''
  59   info:            'explicit self cycles, selecting one aliasing configuration.
  60   Snippet:
  61   ADD64rr R8, R8, R10
  62   '
  63   ...
  64
  65 To measure the latency of all instructions for the host architecture, run:
  66
  67 .. code-block:: bash
  68
  69   #!/bin/bash
  70   readonly INSTRUCTIONS=$(($(grep INSTRUCTION_LIST_END build/lib/Target/X86/X86GenInstrInfo.inc | cut -f2 -d=) - 1))
  71   for INSTRUCTION in $(seq 1 ${INSTRUCTIONS});
  72   do
  73     ./build/bin/llvm-exegesis -mode=latency -opcode-index=${INSTRUCTION} | sed -n '/---/,$p'
  74   done
  75
  76 FIXME: Provide an :program:`llvm-exegesis` option to test all instructions.
  77
  78 EXAMPLES: analysis
  79 ----------------------
  80
  81 Assuming you have a set of benchmarked instructions (either latency or uops) as
  82 YAML in file `/tmp/benchmarks.yaml`, you can analyze the results using the
  83 following command:
  84
  85 .. code-block:: bash
  86
  87     $ llvm-exegesis -mode=analysis \
  88   -benchmarks-file=/tmp/benchmarks.yaml \
  89   -analysis-clusters-output-file=/tmp/clusters.csv \
  90   -analysis-inconsistencies-output-file=/tmp/inconsistencies.txt
  91
  92 This will group the instructions into clusters with the same performance
  93 characteristics. The clusters will be written out to `/tmp/clusters.csv` in the
  94 following format:
  95
  96 .. code-block:: none
  97
  98   cluster_id,opcode_name,config,sched_class
  99   ...
 100   2,ADD32ri8_DB,,WriteALU,1.00
 101   2,ADD32ri_DB,,WriteALU,1.01
 102   2,ADD32rr,,WriteALU,1.01
 103   2,ADD32rr_DB,,WriteALU,1.00
 104   2,ADD32rr_REV,,WriteALU,1.00
 105   2,ADD64i32,,WriteALU,1.01
 106   2,ADD64ri32,,WriteALU,1.01
 107   2,MOVSX64rr32,,BSWAP32r_BSWAP64r_MOVSX64rr32,1.00
 108   2,VPADDQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.02
 109   2,VPSUBQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.01
 110   2,ADD64ri8,,WriteALU,1.00
 111   2,SETBr,,WriteSETCC,1.01
 112   ...
 113
 114 :program:`llvm-exegesis` will also analyze the clusters to point out
 115 inconsistencies in the scheduling information. The output is an html file. For
 116 example, `/tmp/inconsistencies.html` will contain messages like the following :
 117
 118 .. image:: llvm-exegesis-analysis.png
 119   :align: center
 120
 121 Note that the scheduling class names will be resolved only when
 122 :program:`llvm-exegesis` is compiled in debug mode, else only the class id will
 123 be shown. This does not invalidate any of the analysis results though.
 124
 125
 126 OPTIONS
 127 -------
 128
 129 .. option:: -help
 130
 131  Print a summary of command line options.
 132
 133 .. option:: -opcode-index=<LLVM opcode index>
 134
 135  Specify the opcode to measure, by index.
 136  Either `opcode-index` or `opcode-name` must be set.
 137
 138 .. option:: -opcode-name=<LLVM opcode name>
 139
 140  Specify the opcode to measure, by name.
 141  Either `opcode-index` or `opcode-name` must be set.
 142
 143 .. option:: -mode=[latency|uops|analysis]
 144
 145  Specify the run mode.
 146
 147 .. option:: -num-repetitions=<Number of repetition>
 148
 149  Specify the number of repetitions of the asm snippet.
 150  Higher values lead to more accurate measurements but lengthen the benchmark.
 151
 152  .. option:: -benchmarks-file=</path/to/file>
 153
 154  File to read (`analysis` mode) or write (`latency`/`uops` modes) benchmark
 155  results. "-" uses stdin/stdout.
 156
 157 .. option:: -analysis-clusters-output-file=</path/to/file>
 158
 159  If provided, write the analysis clusters as CSV to this file. "-" prints to
 160  stdout.
 161
 162 .. option:: -analysis-inconsistencies-output-file=</path/to/file>
 163
 164  If non-empty, write inconsistencies found during analysis to this file. `-`
 165  prints to stdout.
 166
 167 .. option:: -analysis-numpoints=<dbscan numPoints parameter>
 168
 169  Specify the numPoints parameters to be used for DBSCAN clustering
 170  (`analysis` mode).
 171
 172 .. option:: -analysis-espilon=<dbscan epsilon parameter>
 173
 174  Specify the numPoints parameters to be used for DBSCAN clustering
 175  (`analysis` mode).
 176
 177
 178 EXIT STATUS
 179 -----------
 180
 181 :program:`llvm-exegesis` returns 0 on success. Otherwise, an error message is
 182 printed to standard error, and the tool returns a non 0 value.