llvm/docs/BranchWeightMetadata.rst

   1 ===========================
   2 LLVM Branch Weight Metadata
   3 ===========================
   4
   5 .. contents::
   6    :local:
   7
   8 Introduction
   9 ============
  10
  11 Branch Weight Metadata represents branch weights as its likeliness to be taken
  12 (see :doc:`BlockFrequencyTerminology`). Metadata is assigned to an
  13 ``Instruction`` that is a terminator as a ``MDNode`` of the ``MD_prof`` kind.
  14 The first operator is always a ``MDString`` node with the string
  15 "branch_weights".  Number of operators depends on the terminator type.
  16
  17 Branch weights might be fetch from the profiling file, or generated based on
  18 `__builtin_expect`_ and `__builtin_expect_with_probability`_ instruction.
  19
  20 All weights are represented as an unsigned 32-bit values, where higher value
  21 indicates greater chance to be taken.
  22
  23 Supported Instructions
  24 ======================
  25
  26 ``BranchInst``
  27 ^^^^^^^^^^^^^^
  28
  29 Metadata is only assigned to the conditional branches. There are two extra
  30 operands for the true and the false branch.
  31
  32 .. code-block:: none
  33
  34   !0 = !{
  35     !"branch_weights",
  36     i32 <TRUE_BRANCH_WEIGHT>,
  37     i32 <FALSE_BRANCH_WEIGHT>
  38   }
  39
  40 ``SwitchInst``
  41 ^^^^^^^^^^^^^^
  42
  43 Branch weights are assigned to every case (including the ``default`` case which
  44 is always case #0).
  45
  46 .. code-block:: none
  47
  48   !0 = !{
  49     !"branch_weights",
  50     i32 <DEFAULT_BRANCH_WEIGHT>
  51     [ , i32 <CASE_BRANCH_WEIGHT> ... ]
  52   }
  53
  54 ``IndirectBrInst``
  55 ^^^^^^^^^^^^^^^^^^
  56
  57 Branch weights are assigned to every destination.
  58
  59 .. code-block:: none
  60
  61   !0 = !{
  62     !"branch_weights",
  63     i32 <LABEL_BRANCH_WEIGHT>
  64     [ , i32 <LABEL_BRANCH_WEIGHT> ... ]
  65   }
  66
  67 ``CallInst``
  68 ^^^^^^^^^^^^^^^^^^
  69
  70 Calls may have branch weight metadata, containing the execution count of
  71 the call. It is currently used in SamplePGO mode only, to augment the
  72 block and entry counts which may not be accurate with sampling.
  73
  74 .. code-block:: none
  75
  76   !0 = !{
  77     !"branch_weights",
  78     i32 <CALL_BRANCH_WEIGHT>
  79   }
  80
  81 ``InvokeInst``
  82 ^^^^^^^^^^^^^^^^^^
  83
  84 Invoke instruction may have branch weight metadata with one or two weights.
  85 The second weight is optional and corresponds to the unwind branch.
  86 If only one weight is set then it contains the execution count of the call
  87 and used in SamplePGO mode only as described for the call instruction. If both
  88 weights are specified then the second weight contains count of unwind branch
  89 taken and the first weights contains the execution count of the call minus
  90 the count of unwind branch taken. Both weights specified are used to calculate
  91 BranchProbability as for BranchInst and for SamplePGO the sum of both weights
  92 is used.
  93
  94 .. code-block:: none
  95
  96   !0 = !{
  97     !"branch_weights",
  98     i32 <INVOKE_NORMAL_WEIGHT>
  99     [ , i32 <INVOKE_UNWIND_WEIGHT> ]
 100   }
 101
 102 Other
 103 ^^^^^
 104
 105 Other terminator instructions are not allowed to contain Branch Weight Metadata.
 106
 107 .. _\__builtin_expect:
 108
 109 Built-in ``expect`` Instructions
 110 ================================
 111
 112 ``__builtin_expect(long exp, long c)`` instruction provides branch prediction
 113 information. The return value is the value of ``exp``.
 114
 115 It is especially useful in conditional statements. Currently Clang supports two
 116 conditional statements:
 117
 118 ``if`` statement
 119 ^^^^^^^^^^^^^^^^
 120
 121 The ``exp`` parameter is the condition. The ``c`` parameter is the expected
 122 comparison value. If it is equal to 1 (true), the condition is likely to be
 123 true, in other case condition is likely to be false. For example:
 124
 125 .. code-block:: c++
 126
 127   if (__builtin_expect(x > 0, 1)) {
 128     // This block is likely to be taken.
 129   }
 130
 131 ``switch`` statement
 132 ^^^^^^^^^^^^^^^^^^^^
 133
 134 The ``exp`` parameter is the value. The ``c`` parameter is the expected
 135 value. If the expected value doesn't show on the cases list, the ``default``
 136 case is assumed to be likely taken.
 137
 138 .. code-block:: c++
 139
 140   switch (__builtin_expect(x, 5)) {
 141   default: break;
 142   case 0:  // ...
 143   case 3:  // ...
 144   case 5:  // This case is likely to be taken.
 145   }
 146
 147 .. _\__builtin_expect_with_probability:
 148
 149 Built-in ``expect.with.probability`` Instruction
 150 ================================================
 151
 152 ``__builtin_expect_with_probability(long exp, long c, double probability)`` has
 153 the same semantics as ``__builtin_expect``, but the caller provides the
 154 probability that ``exp == c``. The last argument ``probability`` must be
 155 constant floating-point expression and be in the range [0.0, 1.0] inclusive.
 156 The usage is also similar as ``__builtin_expect``, for example:
 157
 158 ``if`` statement
 159 ^^^^^^^^^^^^^^^^
 160
 161 If the expect comparison value ``c`` is equal to 1(true), and probability
 162 value ``probability`` is set to 0.8, that means the probability of condition
 163 to be true is 80% while that of false is 20%.
 164
 165 .. code-block:: c++
 166
 167   if (__builtin_expect_with_probability(x > 0, 1, 0.8)) {
 168     // This block is likely to be taken with probability 80%.
 169   }
 170
 171 ``switch`` statement
 172 ^^^^^^^^^^^^^^^^^^^^
 173
 174 This is basically the same as ``switch`` statement in ``__builtin_expect``.
 175 The probability that ``exp`` is equal to the expect value is given in
 176 the third argument ``probability``, while the probability of other value is
 177 the average of remaining probability(``1.0 - probability``). For example:
 178
 179 .. code-block:: c++
 180
 181   switch (__builtin_expect_with_probability(x, 5, 0.7)) {
 182   default: break;  // Take this case with probability 10%
 183   case 0:  break;  // Take this case with probability 10%
 184   case 3:  break;  // Take this case with probability 10%
 185   case 5:  break;  // This case is likely to be taken with probability 70%
 186   }
 187
 188 CFG Modifications
 189 =================
 190
 191 Branch Weight Metatada is not proof against CFG changes. If terminator operands'
 192 are changed some action should be taken. In other case some misoptimizations may
 193 occur due to incorrect branch prediction information.
 194
 195 Function Entry Counts
 196 =====================
 197
 198 To allow comparing different functions during inter-procedural analysis and
 199 optimization, ``MD_prof`` nodes can also be assigned to a function definition.
 200 The first operand is a string indicating the name of the associated counter.
 201
 202 Currently, one counter is supported: "function_entry_count". The second operand
 203 is a 64-bit counter that indicates the number of times that this function was
 204 invoked (in the case of instrumentation-based profiles). In the case of
 205 sampling-based profiles, this operand is an approximation of how many times
 206 the function was invoked.
 207
 208 For example, in the code below, the instrumentation for function foo()
 209 indicates that it was called 2,590 times at runtime.
 210
 211 .. code-block:: llvm
 212
 213   define i32 @foo() !prof !1 {
 214     ret i32 0
 215   }
 216   !1 = !{!"function_entry_count", i64 2590}
 217
 218 If "function_entry_count" has more than 2 operands, the later operands are
 219 the GUID of the functions that needs to be imported by ThinLTO. This is only
 220 set by sampling based profile. It is needed because the sampling based profile
 221 was collected on a binary that had already imported and inlined these functions,
 222 and we need to ensure the IR matches in the ThinLTO backends for profile
 223 annotation. The reason why we cannot annotate this on the callsite is that it
 224 can only goes down 1 level in the call chain. For the cases where
 225 foo_in_a_cc()->bar_in_b_cc()->baz_in_c_cc(), we will need to go down 2 levels
 226 in the call chain to import both bar_in_b_cc and baz_in_c_cc.