1 ============================
2 Global Instruction Selection
3 ============================
10 This document is a work in progress. It reflects the current state of the
11 implementation, as well as open design and implementation issues.
16 GlobalISel is a framework that provides a set of reusable passes and utilities
17 for instruction selection --- translation from LLVM IR to target-specific
20 GlobalISel is intended to be a replacement for SelectionDAG and FastISel, to
21 solve three major problems:
23 * **Performance** --- SelectionDAG introduces a dedicated intermediate
24 representation, which has a compile-time cost.
26 GlobalISel directly operates on the post-isel representation used by the
27 rest of the code generator, MIR.
28 It does require extensions to that representation to support arbitrary
29 incoming IR: :ref:`gmir`.
31 * **Granularity** --- SelectionDAG and FastISel operate on individual basic
32 blocks, losing some global optimization opportunities.
34 GlobalISel operates on the whole function.
36 * **Modularity** --- SelectionDAG and FastISel are radically different and share
39 GlobalISel is built in a way that enables code reuse. For instance, both the
40 optimized and fast selectors share the :ref:`pipeline`, and targets can
41 configure that pipeline to better suit their needs.
49 Machine IR operates on physical registers, register classes, and (mostly)
50 target-specific instructions.
52 To bridge the gap with LLVM IR, GlobalISel introduces "generic" extensions to
59 The generic MIR (GMIR) representation still contains references to IR
60 constructs (such as ``GlobalValue``). Removing those should let us write more
61 accurate tests, or delete IR after building the initial MIR. However, it is
62 not part of the GlobalISel effort.
64 .. _gmir-instructions:
69 The main addition is support for pre-isel generic machine instructions (e.g.,
70 ``G_ADD``). Like other target-independent instructions (e.g., ``COPY`` or
71 ``PHI``), these are available on all targets.
74 While we're progressively adding instructions, one kind in particular exposes
75 interesting problems: compares and how to represent condition codes.
76 Some targets (x86, ARM) have generic comparisons setting multiple flags,
77 which are then used by predicated variants.
78 Others (IR) specify the predicate in the comparison and users just get a single
79 bit. SelectionDAG uses SETCC/CONDBR vs BR_CC (and similar for select) to
82 The ``MachineIRBuilder`` class wraps the ``MachineInstrBuilder`` and provides
83 a convenient way to create these generic instructions.
87 Generic Virtual Registers
88 -------------------------
90 Generic instructions operate on a new kind of register: "generic" virtual
91 registers. As opposed to non-generic vregs, they are not assigned a Register
92 Class. Instead, generic vregs have a :ref:`gmir-llt`, and can be assigned
93 a :ref:`gmir-regbank`.
95 ``MachineRegisterInfo`` tracks the same information that it does for
96 non-generic vregs (e.g., use-def chains). Additionally, it also tracks the
97 :ref:`gmir-llt` of the register, and, instead of the ``TargetRegisterClass``,
98 its :ref:`gmir-regbank`, if any.
100 For simplicity, most generic instructions only accept generic vregs:
102 * instead of immediates, they use a gvreg defined by an instruction
103 materializing the immediate value (see :ref:`irtranslator-constants`).
104 * instead of physical register, they use a gvreg defined by a ``COPY``.
107 We started with an alternative representation, where MRI tracks a size for
108 each gvreg, and instructions have lists of types.
109 That had two flaws: the type and size are redundant, and there was no generic
110 way of getting a given operand's type (as there was no 1:1 mapping between
111 instruction types and operands).
112 We considered putting the type in some variant of MCInstrDesc instead:
113 See `PR26576 <http://llvm.org/PR26576>`_: [GlobalISel] Generic MachineInstrs
114 need a type but this increases the memory footprint of the related objects
121 A Register Bank is a set of register classes defined by the target.
122 A bank has a size, which is the maximum store size of all covered classes.
124 In general, cross-class copies inside a bank are expected to be cheaper than
125 copies across banks. They are also coalesceable by the register coalescer,
126 whereas cross-bank copies are not.
128 Also, equivalent operations can be performed on different banks using different
131 For example, X86 can be seen as having 3 main banks: general-purpose, x87, and
132 vector (which could be further split into a bank per domain for single vs
133 double precision instructions).
135 Register banks are described by a target-provided API,
136 :ref:`RegisterBankInfo <api-registerbankinfo>`.
143 Additionally, every generic virtual register has a type, represented by an
144 instance of the ``LLT`` class.
146 Like ``EVT``/``MVT``/``Type``, it has no distinction between unsigned and signed
147 integer types. Furthermore, it also has no distinction between integer and
148 floating-point types: it mainly conveys absolutely necessary information, such
149 as size and number of vector lanes:
152 * ``pN`` for pointers
153 * ``<N x sM>`` for vectors
154 * ``unsized`` for labels, etc..
156 ``LLT`` is intended to replace the usage of ``EVT`` in SelectionDAG.
158 Here are some LLT examples and their ``EVT`` and ``Type`` equivalents:
160 ============= ========= ======================================
162 ============= ========= ======================================
165 ``s32`` ``i32`` ``i32``
166 ``s32`` ``f32`` ``float``
167 ``s17`` ``i17`` ``i17``
168 ``s16`` N/A ``{i8, i8}``
169 ``s32`` N/A ``[4 x i8]``
170 ``p0`` ``iPTR`` ``i8*``, ``i32*``, ``%opaque*``
171 ``p2`` ``iPTR`` ``i8 addrspace(2)*``
172 ``<4 x s32>`` ``v4f32`` ``<4 x float>``
173 ``s64`` ``v1f64`` ``<1 x double>``
174 ``<3 x s32>`` ``v3i32`` ``<3 x i32>``
175 ``unsized`` ``Other`` ``label``
176 ============= ========= ======================================
179 Rationale: instructions already encode a specific interpretation of types
180 (e.g., ``add`` vs. ``fadd``, or ``sdiv`` vs. ``udiv``). Also encoding that
181 information in the type system requires introducing bitcast with no real
182 advantage for the selector.
184 Pointer types are distinguished by address space. This matches IR, as opposed
185 to SelectionDAG where address space is an attribute on operations.
186 This representation better supports pointers having different sizes depending
187 on their addressspace.
190 Currently, LLT requires at least 2 elements in vectors, but some targets have
191 the concept of a '1-element vector'. Representing them as their underlying
192 scalar type is a nice simplification.
195 Currently, non-generic virtual registers, defined by non-pre-isel-generic
196 instructions, cannot have a type, and thus cannot be used by a pre-isel generic
197 instruction. Instead, they are given a type using a COPY. We could relax that
198 and allow types on all vregs: this would reduce the number of MI required when
199 emitting target-specific MIR early in the pipeline. This should purely be
200 a compile-time optimization.
207 There are four required passes, regardless of the optimization mode:
212 Additional passes can then be inserted at higher optimization levels or for
213 specific targets. For example, to match the current SelectionDAG set of
214 transformations: MachineCSE and a better MachineCombiner between every pass.
217 In theory, not all passes are always necessary.
218 As an additional compile-time optimization, we could skip some of the passes by
219 setting the relevant MachineFunction properties. For instance, if the
220 IRTranslator did not encounter any illegal instruction, it would set the
221 ``legalized`` property to avoid running the :ref:`milegalizer`.
222 Similarly, we considered specializing the IRTranslator per-target to directly
223 emit target-specific MI.
224 However, we instead decided to keep the core pipeline simple, and focus on
225 minimizing the overhead of the passes in the no-op cases.
233 This pass translates the input LLVM IR ``Function`` to a GMIR
237 This currently doesn't support the more complex instructions, in particular
238 those involving control flow (``switch``, ``invoke``, ...).
239 For ``switch`` in particular, we can initially use the ``LowerSwitch`` pass.
241 .. _api-calllowering:
246 The ``IRTranslator`` (using the ``CallLowering`` target-provided utility) also
247 implements the ABI's calling convention by lowering calls, returns, and
248 arguments to the appropriate physical register usage and instruction sequences.
250 .. _irtranslator-aggregates:
255 Aggregates are lowered to a single scalar vreg.
256 This differs from SelectionDAG's multiple vregs via ``GetValueVTs``.
259 As some of the bits are undef (padding), we should consider augmenting the
260 representation with additional metadata (in effect, caching computeKnownBits
261 information on vregs).
262 See `PR26161 <http://llvm.org/PR26161>`_: [GlobalISel] Value to vreg during
263 IR to MachineInstr translation for aggregate type
265 .. _irtranslator-constants:
270 The ``IRTranslator`` lowers ``Constant`` operands into uses of gvregs defined
271 by ``G_CONSTANT`` or ``G_FCONSTANT`` instructions.
272 Currently, these instructions are always emitted in the entry basic block.
273 In a ``MachineFunction``, each ``Constant`` is materialized by a single gvreg.
275 This is beneficial as it allows us to fold constants into immediate operands
276 during :ref:`instructionselect`, while still avoiding redundant materializations
277 for expensive non-foldable constants.
278 However, this can lead to unnecessary spills and reloads in an -O0 pipeline, as
279 these vregs can have long live ranges.
282 We're investigating better placement of these instructions, in fast and
291 This pass transforms the generic machine instructions such that they are legal.
293 A legal instruction is defined as:
295 * **selectable** --- the target will later be able to select it to a
296 target-specific (non-generic) instruction.
298 * operating on **vregs that can be loaded and stored** -- if necessary, the
299 target can select a ``G_LOAD``/``G_STORE`` of each gvreg operand.
301 As opposed to SelectionDAG, there are no legalization phases. In particular,
302 'type' and 'operation' legalization are not separate.
304 Legalization is iterative, and all state is contained in GMIR. To maintain the
305 validity of the intermediate code, instructions are introduced:
307 * ``G_SEQUENCE`` --- concatenate multiple registers into a single wider
310 * ``G_EXTRACT`` --- extract multiple registers (as contiguous sequences of bits)
311 from a single wider register.
313 As they are expected to be temporary byproducts of the legalization process,
314 they are combined at the end of the :ref:`milegalizer` pass.
315 If any remain, they are expected to always be selectable, using loads and stores
318 .. _api-legalizerinfo:
323 Currently the API is broadly similar to SelectionDAG/TargetLowering, but
324 extended in two ways:
326 * The set of available actions is wider, avoiding the currently very
327 overloaded ``Expand`` (which can cover everything from libcalls to
328 scalarization depending on the node's opcode).
330 * Since there's no separate type legalization, independently varying
331 types on an instruction can have independent actions. For example a
332 ``G_ICMP`` has 2 independent types: the result and the inputs; we need
333 to be able to say that comparing 2 s32s is OK, but the s1 result
334 must be dealt with in another way.
336 As such, the primary key when deciding what to do is the ``InstrAspect``,
337 essentially a tuple consisting of ``(Opcode, TypeIdx, Type)`` and mapping to a
338 suggested course of action.
340 An example use might be:
344 // The CPU can't deal with an s1 result, do something about it.
345 setAction({G_ICMP, 0, s1}, WidenScalar);
346 // An s32 input (the second type) is fine though.
347 setAction({G_ICMP, 1, s32}, Legal);
351 An alternative worth investigating is to generalize the API to represent
352 actions using ``std::function`` that implements the action, instead of explicit
353 enum tokens (``Legal``, ``WidenScalar``, ...).
356 Moreover, we could use TableGen to initially infer legality of operation from
357 existing patterns (as any pattern we can select is by definition legal).
358 Expanding that to describe legalization actions is a much larger but
359 potentially useful project.
361 .. _milegalizer-non-power-of-2:
367 Types which have a size that isn't a power of 2 aren't currently supported.
368 The setAction API will probably require changes to support them.
369 Even notionally explicitly specified operations only make suggestions
370 like "Widen" or "Narrow". The eventual type is still unspecified and a
371 search is performed by repeated doubling/halving of the type's
373 This is incorrect for types that aren't a power of 2. It's reasonable to
374 expect we could construct an efficient set of side-tables for more general
375 lookups though, encoding a map from the integers (i.e. the size of the current
376 type) to types (the legal size).
378 .. _milegalizer-vector:
383 Vectors first get their element type legalized: ``<A x sB>`` becomes
384 ``<A x sC>`` such that at least one operation is legal with ``sC``.
386 This is currently specified by the function ``setScalarInVectorAction``, called
389 setScalarInVectorAction(G_ICMP, s1, WidenScalar);
391 Next the number of elements is chosen so that the entire operation is
392 legal. This aspect is not controllable at the moment, but probably
393 should be (you could imagine disagreements on whether a ``<2 x s8>``
394 operation should be scalarized or extended to ``<8 x s8>``).
402 This pass constrains the :ref:`gmir-gvregs` operands of generic
403 instructions to some :ref:`gmir-regbank`.
405 It iteratively maps instructions to a set of per-operand bank assignment.
406 The possible mappings are determined by the target-provided
407 :ref:`RegisterBankInfo <api-registerbankinfo>`.
408 The mapping is then applied, possibly introducing ``COPY`` instructions if
411 It traverses the ``MachineFunction`` top down so that all operands are already
412 mapped when analyzing an instruction.
414 This pass could also remap target-specific instructions when beneficial.
415 In the future, this could replace the ExeDepsFix pass, as we can directly
416 select the best variant for an instruction that's available on multiple banks.
418 .. _api-registerbankinfo:
420 API: RegisterBankInfo
421 ^^^^^^^^^^^^^^^^^^^^^
423 The ``RegisterBankInfo`` class describes multiple aspects of register banks.
425 * **Banks**: ``addRegBankCoverage`` --- which register bank covers each
428 * **Cross-Bank Copies**: ``copyCost`` --- the cost of a ``COPY`` from one bank
431 * **Default Mapping**: ``getInstrMapping`` --- the default bank assignments for
434 * **Alternative Mapping**: ``getInstrAlternativeMapping`` --- the other
435 possible bank assignments for a given instruction.
438 All this information should eventually be static and generated by TableGen,
439 mostly using existing information augmented by bank descriptions.
442 ``getInstrMapping`` is currently separate from ``getInstrAlternativeMapping``
443 because the latter is more expensive: as we move to static mapping info,
444 both methods should be free, and we should merge them.
446 .. _regbankselect-modes:
451 ``RegBankSelect`` currently has two modes:
453 * **Fast** --- For each instruction, pick a target-provided "default" bank
454 assignment. This is the default at -O0.
456 * **Greedy** --- For each instruction, pick the cheapest of several
457 target-provided bank assignment alternatives.
459 We intend to eventually introduce an additional optimizing mode:
461 * **Global** --- Across multiple instructions, pick the cheapest combination of
465 On AArch64, we are considering using the Greedy mode even at -O0 (or perhaps at
466 backend -O1): because :ref:`gmir-llt` doesn't distinguish floating point from
467 integer scalars, the default assignment for loads and stores is the integer
468 bank, introducing cross-bank copies on most floating point operations.
471 .. _instructionselect:
476 This pass transforms generic machine instructions into equivalent
477 target-specific instructions. It traverses the ``MachineFunction`` bottom-up,
478 selecting uses before definitions, enabling trivial dead code elimination.
480 .. _api-instructionselector:
482 API: InstructionSelector
483 ^^^^^^^^^^^^^^^^^^^^^^^^
485 The target implements the ``InstructionSelector`` class, containing the
486 target-specific selection logic proper.
488 The instance is provided by the subtarget, so that it can specialize the
489 selector by subtarget feature (with, e.g., a vector selector overriding parts
490 of a general-purpose common selector).
491 We might also want to parameterize it by MachineFunction, to enable selector
492 variants based on function attributes like optsize.
494 The simple API consists of:
498 virtual bool select(MachineInstr &MI)
500 This target-provided method is responsible for mutating (or replacing) a
501 possibly-generic MI into a fully target-specific equivalent.
502 It is also responsible for doing the necessary constraining of gvregs into the
503 appropriate register classes.
505 The ``InstructionSelector`` can fold other instructions into the selected MI,
506 by walking the use-def chain of the vreg operands.
507 As GlobalISel is Global, this folding can occur across basic blocks.
510 Currently, the Select pass is implemented with hand-written c++, similar to
511 FastISel, rather than backed by tblgen'erated pattern-matching.
512 We intend to eventually reuse SelectionDAG patterns.
520 .. _maintainability-iterative:
522 Iterative Transformations
523 -------------------------
525 Passes are split into small, iterative transformations, with all state
526 represented in the MIR.
528 This differs from SelectionDAG (in particular, the legalizer) using various
529 in-memory side-tables.
532 .. _maintainability-mir:
537 .. FIXME: Update the MIRLangRef to include GMI additions.
539 :ref:`gmir` is serializable (see :doc:`MIRLangRef`).
540 Combined with :ref:`maintainability-iterative`, this enables much finer-grained
541 testing, rather than requiring large and fragile IR-to-assembly tests.
543 The current "stage" in the :ref:`pipeline` is represented by a set of
544 ``MachineFunctionProperties``:
547 * ``regBankSelected``
551 .. _maintainability-verifier:
556 The pass approach lets us use the ``MachineVerifier`` to enforce invariants.
557 For instance, a ``regBankSelected`` function may not have gvregs without
561 The ``MachineVerifier`` being monolithic, some of the checks we want to do
562 can't be integrated to it: GlobalISel is a separate library, so we can't
563 directly reference it from CodeGen. For instance, legality checks are
564 currently done in RegBankSelect/InstructionSelect proper. We could #ifdef out
565 the checks, or we could add some sort of verifier API.
570 Progress and Future Work
571 ========================
573 The initial goal is to replace FastISel on AArch64. The next step will be to
574 replace SelectionDAG as the optimized ISel.
577 While we iterate on GlobalISel, we strive to avoid affecting the performance of
578 SelectionDAG, FastISel, or the other MIR passes. For instance, the types of
579 :ref:`gmir-gvregs` are stored in a separate table in ``MachineRegisterInfo``,
580 that is destroyed after :ref:`instructionselect`.
582 .. _progress-fastisel:
587 For the initial FastISel replacement, we intend to fallback to SelectionDAG on
590 Currently, compile-time of the fast pipeline is within 1.5x of FastISel.
591 We're optimistic we can get to within 1.1/1.2x, but beating FastISel will be
592 challenging given the multi-pass approach.
593 Still, supporting all IR (via a complete legalizer) and avoiding the fallback
594 to SelectionDAG in the worst case should enable better amortized performance
595 than SelectionDAG+FastISel.
598 We considered never having a fallback to SelectionDAG, instead deciding early
599 whether a given function is supported by GlobalISel or not. The decision would
600 be based on :ref:`milegalizer` queries.
601 We abandoned that for two reasons:
602 a) on IR inputs, we'd need to basically simulate the :ref:`irtranslator`;
603 b) to be robust against unforeseen failures and to enable iterative
606 .. _progress-targets:
608 Support For Other Targets
609 -------------------------
611 In parallel, we're investigating adding support for other - ideally quite
612 different - targets. For instance, there is some initial AMDGPU support.
617 Porting GlobalISel to A New Target
618 ==================================
620 There are four major classes to implement by the target:
622 * :ref:`CallLowering <api-calllowering>` --- lower calls, returns, and arguments
623 according to the ABI.
624 * :ref:`RegisterBankInfo <api-registerbankinfo>` --- describe
625 :ref:`gmir-regbank` coverage, cross-bank copy cost, and the mapping of
626 operands onto banks for each instruction.
627 * :ref:`LegalizerInfo <api-legalizerinfo>` --- describe what is legal, and how
628 to legalize what isn't.
629 * :ref:`InstructionSelector <api-instructionselector>` --- select generic MIR
630 to target-specific MIR.
634 * ``TargetPassConfig`` --- create the passes constituting the pipeline,
635 including additional passes not included in the :ref:`pipeline`.
636 * ``GISelAccessor`` --- setup the various subtarget-provided classes, with a
637 graceful fallback to no-op when GlobalISel isn't enabled.