1 ============================
2 Global Instruction Selection
3 ============================
10 This document is a work in progress. It reflects the current state of the
11 implementation, as well as open design and implementation issues.
16 GlobalISel is a framework that provides a set of reusable passes and utilities
17 for instruction selection --- translation from LLVM IR to target-specific
20 GlobalISel is intended to be a replacement for SelectionDAG and FastISel, to
21 solve three major problems:
23 * **Performance** --- SelectionDAG introduces a dedicated intermediate
24 representation, which has a compile-time cost.
26 GlobalISel directly operates on the post-isel representation used by the
27 rest of the code generator, MIR.
28 It does require extensions to that representation to support arbitrary
29 incoming IR: :ref:`gmir`.
31 * **Granularity** --- SelectionDAG and FastISel operate on individual basic
32 blocks, losing some global optimization opportunities.
34 GlobalISel operates on the whole function.
36 * **Modularity** --- SelectionDAG and FastISel are radically different and share
39 GlobalISel is built in a way that enables code reuse. For instance, both the
40 optimized and fast selectors share the :ref:`pipeline`, and targets can
41 configure that pipeline to better suit their needs.
49 Machine IR operates on physical registers, register classes, and (mostly)
50 target-specific instructions.
52 To bridge the gap with LLVM IR, GlobalISel introduces "generic" extensions to
59 The generic MIR (GMIR) representation still contains references to IR
60 constructs (such as ``GlobalValue``). Removing those should let us write more
61 accurate tests, or delete IR after building the initial MIR. However, it is
62 not part of the GlobalISel effort.
64 .. _gmir-instructions:
69 The main addition is support for pre-isel generic machine instructions (e.g.,
70 ``G_ADD``). Like other target-independent instructions (e.g., ``COPY`` or
71 ``PHI``), these are available on all targets.
74 While we're progressively adding instructions, one kind in particular exposes
75 interesting problems: compares and how to represent condition codes.
76 Some targets (x86, ARM) have generic comparisons setting multiple flags,
77 which are then used by predicated variants.
78 Others (IR) specify the predicate in the comparison and users just get a single
79 bit. SelectionDAG uses SETCC/CONDBR vs BR_CC (and similar for select) to
82 The ``MachineIRBuilder`` class wraps the ``MachineInstrBuilder`` and provides
83 a convenient way to create these generic instructions.
87 Generic Virtual Registers
88 -------------------------
90 Generic instructions operate on a new kind of register: "generic" virtual
91 registers. As opposed to non-generic vregs, they are not assigned a Register
92 Class. Instead, generic vregs have a :ref:`gmir-llt`, and can be assigned
93 a :ref:`gmir-regbank`.
95 ``MachineRegisterInfo`` tracks the same information that it does for
96 non-generic vregs (e.g., use-def chains). Additionally, it also tracks the
97 :ref:`gmir-llt` of the register, and, instead of the ``TargetRegisterClass``,
98 its :ref:`gmir-regbank`, if any.
100 For simplicity, most generic instructions only accept generic vregs:
102 * instead of immediates, they use a gvreg defined by an instruction
103 materializing the immediate value (see :ref:`irtranslator-constants`).
104 * instead of physical register, they use a gvreg defined by a ``COPY``.
107 We started with an alternative representation, where MRI tracks a size for
108 each gvreg, and instructions have lists of types.
109 That had two flaws: the type and size are redundant, and there was no generic
110 way of getting a given operand's type (as there was no 1:1 mapping between
111 instruction types and operands).
112 We considered putting the type in some variant of MCInstrDesc instead:
113 See `PR26576 <http://llvm.org/PR26576>`_: [GlobalISel] Generic MachineInstrs
114 need a type but this increases the memory footprint of the related objects
121 A Register Bank is a set of register classes defined by the target.
122 A bank has a size, which is the maximum store size of all covered classes.
124 In general, cross-class copies inside a bank are expected to be cheaper than
125 copies across banks. They are also coalesceable by the register coalescer,
126 whereas cross-bank copies are not.
128 Also, equivalent operations can be performed on different banks using different
131 For example, X86 can be seen as having 3 main banks: general-purpose, x87, and
132 vector (which could be further split into a bank per domain for single vs
133 double precision instructions).
135 Register banks are described by a target-provided API,
136 :ref:`RegisterBankInfo <api-registerbankinfo>`.
143 Additionally, every generic virtual register has a type, represented by an
144 instance of the ``LLT`` class.
146 Like ``EVT``/``MVT``/``Type``, it has no distinction between unsigned and signed
147 integer types. Furthermore, it also has no distinction between integer and
148 floating-point types: it mainly conveys absolutely necessary information, such
149 as size and number of vector lanes:
152 * ``pN`` for pointers
153 * ``<N x sM>`` for vectors
154 * ``unsized`` for labels, etc..
156 ``LLT`` is intended to replace the usage of ``EVT`` in SelectionDAG.
158 Here are some LLT examples and their ``EVT`` and ``Type`` equivalents:
160 ============= ========= ======================================
162 ============= ========= ======================================
165 ``s32`` ``i32`` ``i32``
166 ``s32`` ``f32`` ``float``
167 ``s17`` ``i17`` ``i17``
168 ``s16`` N/A ``{i8, i8}``
169 ``s32`` N/A ``[4 x i8]``
170 ``p0`` ``iPTR`` ``i8*``, ``i32*``, ``%opaque*``
171 ``p2`` ``iPTR`` ``i8 addrspace(2)*``
172 ``<4 x s32>`` ``v4f32`` ``<4 x float>``
173 ``s64`` ``v1f64`` ``<1 x double>``
174 ``<3 x s32>`` ``v3i32`` ``<3 x i32>``
175 ``unsized`` ``Other`` ``label``
176 ============= ========= ======================================
179 Rationale: instructions already encode a specific interpretation of types
180 (e.g., ``add`` vs. ``fadd``, or ``sdiv`` vs. ``udiv``). Also encoding that
181 information in the type system requires introducing bitcast with no real
182 advantage for the selector.
184 Pointer types are distinguished by address space. This matches IR, as opposed
185 to SelectionDAG where address space is an attribute on operations.
186 This representation better supports pointers having different sizes depending
187 on their addressspace.
190 Currently, LLT requires at least 2 elements in vectors, but some targets have
191 the concept of a '1-element vector'. Representing them as their underlying
192 scalar type is a nice simplification.
195 Currently, non-generic virtual registers, defined by non-pre-isel-generic
196 instructions, cannot have a type, and thus cannot be used by a pre-isel generic
197 instruction. Instead, they are given a type using a COPY. We could relax that
198 and allow types on all vregs: this would reduce the number of MI required when
199 emitting target-specific MIR early in the pipeline. This should purely be
200 a compile-time optimization.
207 There are four required passes, regardless of the optimization mode:
212 Additional passes can then be inserted at higher optimization levels or for
213 specific targets. For example, to match the current SelectionDAG set of
214 transformations: MachineCSE and a better MachineCombiner between every pass.
217 In theory, not all passes are always necessary.
218 As an additional compile-time optimization, we could skip some of the passes by
219 setting the relevant MachineFunction properties. For instance, if the
220 IRTranslator did not encounter any illegal instruction, it would set the
221 ``legalized`` property to avoid running the :ref:`milegalizer`.
222 Similarly, we considered specializing the IRTranslator per-target to directly
223 emit target-specific MI.
224 However, we instead decided to keep the core pipeline simple, and focus on
225 minimizing the overhead of the passes in the no-op cases.
233 This pass translates the input LLVM IR ``Function`` to a GMIR
237 This currently doesn't support the more complex instructions, in particular
238 those involving control flow (``switch``, ``invoke``, ...).
239 For ``switch`` in particular, we can initially use the ``LowerSwitch`` pass.
241 .. _api-calllowering:
246 The ``IRTranslator`` (using the ``CallLowering`` target-provided utility) also
247 implements the ABI's calling convention by lowering calls, returns, and
248 arguments to the appropriate physical register usage and instruction sequences.
250 .. _irtranslator-aggregates:
255 Aggregates are lowered to a single scalar vreg.
256 This differs from SelectionDAG's multiple vregs via ``GetValueVTs``.
259 As some of the bits are undef (padding), we should consider augmenting the
260 representation with additional metadata (in effect, caching computeKnownBits
261 information on vregs).
262 See `PR26161 <http://llvm.org/PR26161>`_: [GlobalISel] Value to vreg during
263 IR to MachineInstr translation for aggregate type
265 .. _irtranslator-constants:
270 The ``IRTranslator`` lowers ``Constant`` operands into uses of gvregs defined
271 by ``G_CONSTANT`` or ``G_FCONSTANT`` instructions.
272 Currently, these instructions are always emitted in the entry basic block.
273 In a ``MachineFunction``, each ``Constant`` is materialized by a single gvreg.
275 This is beneficial as it allows us to fold constants into immediate operands
276 during :ref:`instructionselect`, while still avoiding redundant materializations
277 for expensive non-foldable constants.
278 However, this can lead to unnecessary spills and reloads in an -O0 pipeline, as
279 these vregs can have long live ranges.
282 We're investigating better placement of these instructions, in fast and
291 This pass transforms the generic machine instructions such that they are legal.
293 A legal instruction is defined as:
295 * **selectable** --- the target will later be able to select it to a
296 target-specific (non-generic) instruction.
298 * operating on **vregs that can be loaded and stored** -- if necessary, the
299 target can select a ``G_LOAD``/``G_STORE`` of each gvreg operand.
301 As opposed to SelectionDAG, there are no legalization phases. In particular,
302 'type' and 'operation' legalization are not separate.
304 Legalization is iterative, and all state is contained in GMIR. To maintain the
305 validity of the intermediate code, instructions are introduced:
307 * ``G_MERGE_VALUES`` --- concatenate multiple registers of the same
308 size into a single wider register.
310 * ``G_UNMERGE_VALUES`` --- extract multiple registers of the same size
311 from a single wider register.
313 * ``G_EXTRACT`` --- extract a simple register (as contiguous sequences of bits)
314 from a single wider register.
316 As they are expected to be temporary byproducts of the legalization process,
317 they are combined at the end of the :ref:`milegalizer` pass.
318 If any remain, they are expected to always be selectable, using loads and stores
321 The legality of an instruction may only depend on the instruction itself and
322 must not depend on any context in which the instruction is used. However, after
323 deciding that an instruction is not legal, using the context of the instruction
324 to decide how to legalize the instruction is permitted. As an example, if we
325 have a ``G_FOO`` instruction of the form::
327 %1:_(s32) = G_CONSTANT i32 1
328 %2:_(s32) = G_FOO %0:_(s32), %1:_(s32)
330 it's impossible to say that G_FOO is legal iff %1 is a ``G_CONSTANT`` with
331 value ``1``. However, the following::
333 %2:_(s32) = G_FOO %0:_(s32), i32 1
335 can say that it's legal iff operand 2 is an immediate with value ``1`` because
336 that information is entirely contained within the single instruction.
338 .. _api-legalizerinfo:
343 The recommended [#legalizer-legacy-footnote]_ API looks like this::
345 getActionDefinitionsBuilder({G_ADD, G_SUB, G_MUL, G_AND, G_OR, G_XOR, G_SHL})
346 .legalFor({s32, s64, v2s32, v4s32, v2s64})
347 .clampScalar(0, s32, s64)
348 .widenScalarToNextPow2(0)
349 .clampNumElements(0, v2s32, v4s32)
350 .clampNumElements(0, v2s64, v2s64)
351 .moreElementsToNextPow2(0);
353 and describes a set of rules by which we can either declare an instruction legal
354 or decide which action to take to make it more legal.
356 At the core of this ruleset is the ``LegalityQuery`` which describes the
357 instruction. We use a description rather than the instruction to both allow other
358 passes to determine legality without having to create an instruction and also to
359 limit the information available to the predicates to that which is safe to rely
360 on. Currently, the information available to the predicates that determine
363 * The opcode for the instruction
365 * The type of each type index (see ``type0``, ``type1``, etc.)
367 * The size in bytes and atomic ordering for each MachineMemOperand
369 Rule Processing and Declaring Rules
370 """""""""""""""""""""""""""""""""""
372 The ``getActionDefinitionsBuilder`` function generates a ruleset for the given
373 opcode(s) that rules can be added to. If multiple opcodes are given, they are
374 all permanently bound to the same ruleset. The rules in a ruleset are executed
375 from top to bottom and will start again from the top if an instruction is
376 legalized as a result of the rules. If the ruleset is exhausted without
377 satisfying any rule, then it is considered unsupported.
379 When it doesn't declare the instruction legal, each pass over the rules may
380 request that one type changes to another type. Sometimes this can cause multiple
381 types to change but we avoid this as much as possible as making multiple changes
382 can make it difficult to avoid infinite loops where, for example, narrowing one
383 type causes another to be too small and widening that type causes the first one
386 In general, it's advisable to declare instructions legal as close to the top of
387 the rule as possible and to place any expensive rules as low as possible. This
388 helps with performance as testing for legality happens more often than
389 legalization and legalization can require multiple passes over the rules.
391 As a concrete example, consider the rule::
393 getActionDefinitionsBuilder({G_ADD, G_SUB, G_MUL, G_AND, G_OR, G_XOR, G_SHL})
394 .legalFor({s32, s64, v2s32, v4s32, v2s64})
395 .clampScalar(0, s32, s64)
396 .widenScalarToNextPow2(0);
398 and the instruction::
400 %2:_(s7) = G_ADD %0:_(s7), %1:_(s7)
402 this doesn't meet the predicate for the :ref:`.legalFor() <legalfor>` as ``s7``
403 is not one of the listed types so it falls through to the
404 :ref:`.clampScalar() <clampscalar>`. It does meet the predicate for this rule
405 as the type is smaller than the ``s32`` and this rule instructs the legalizer
406 to change type 0 to ``s32``. It then restarts from the top. This time it does
407 satisfy ``.legalFor()`` and the resulting output is::
409 %3:_(s32) = G_ANYEXT %0:_(s7)
410 %4:_(s32) = G_ANYEXT %1:_(s7)
411 %5:_(s32) = G_ADD %3:_(s32), %4:_(s32)
412 %2:_(s7) = G_TRUNC %5:_(s32)
414 where the ``G_ADD`` is legal and the other instructions are scheduled for
415 processing by the legalizer.
420 There are various rule factories that append rules to a ruleset but they have a
421 few actions in common:
425 * ``legalIf()``, ``legalFor()``, etc. declare an instruction to be legal if the
426 predicate is satisfied.
428 * ``narrowScalarIf()``, ``narrowScalarFor()``, etc. declare an instruction to be illegal
429 if the predicate is satisfied and indicates that narrowing the scalars in one
430 of the types to a specific type would make it more legal. This action supports
431 both scalars and vectors.
433 * ``widenScalarIf()``, ``widenScalarFor()``, etc. declare an instruction to be illegal
434 if the predicate is satisfied and indicates that widening the scalars in one
435 of the types to a specific type would make it more legal. This action supports
436 both scalars and vectors.
438 * ``fewerElementsIf()``, ``fewerElementsFor()``, etc. declare an instruction to be
439 illegal if the predicate is satisfied and indicates reducing the number of
440 vector elements in one of the types to a specific type would make it more
441 legal. This action supports vectors.
443 * ``moreElementsIf()``, ``moreElementsFor()``, etc. declare an instruction to be illegal
444 if the predicate is satisfied and indicates increasing the number of vector
445 elements in one of the types to a specific type would make it more legal.
446 This action supports vectors.
448 * ``lowerIf()``, ``lowerFor()``, etc. declare an instruction to be illegal if the
449 predicate is satisfied and indicates that replacing it with equivalent
450 instruction(s) would make it more legal. Support for this action differs for
453 * ``libcallIf()``, ``libcallFor()``, etc. declare an instruction to be illegal if the
454 predicate is satisfied and indicates that replacing it with a libcall would
455 make it more legal. Support for this action differs for
458 * ``customIf()``, ``customFor()``, etc. declare an instruction to be illegal if the
459 predicate is satisfied and indicates that the backend developer will supply
460 a means of making it more legal.
462 * ``unsupportedIf()``, ``unsupportedFor()``, etc. declare an instruction to be illegal
463 if the predicate is satisfied and indicates that there is no way to make it
464 legal and the compiler should fail.
466 * ``fallback()`` falls back on an older API and should only be used while porting
467 existing code from that API.
472 The rule factories also have predicates in common:
474 * ``legal()``, ``lower()``, etc. are always satisfied.
476 * ``legalIf()``, ``narrowScalarIf()``, etc. are satisfied if the user-supplied
477 ``LegalityPredicate`` function returns true. This predicate has access to the
478 information in the ``LegalityQuery`` to make its decision.
479 User-supplied predicates can also be combined using ``all(P0, P1, ...)``.
481 * ``legalFor()``, ``narrowScalarFor()``, etc. are satisfied if the type matches one in
482 a given set of types. For example ``.legalFor({s16, s32})`` declares the
483 instruction legal if type 0 is either s16 or s32. Additional versions for two
484 and three type indices are generally available. For these, all the type
485 indices considered together must match all the types in one of the tuples. So
486 ``.legalFor({{s16, s32}, {s32, s64}})`` will only accept ``{s16, s32}``, or
487 ``{s32, s64}`` but will not accept ``{s16, s64}``.
489 * ``legalForTypesWithMemSize()``, ``narrowScalarForTypesWithMemSize()``, etc. are
490 similar to ``legalFor()``, ``narrowScalarFor()``, etc. but additionally require a
491 MachineMemOperand to have a given size in each tuple.
493 * ``legalForCartesianProduct()``, ``narrowScalarForCartesianProduct()``, etc. are
494 satisfied if each type index matches one element in each of the independent
495 sets. So ``.legalForCartesianProduct({s16, s32}, {s32, s64})`` will accept
496 ``{s16, s32}``, ``{s16, s64}``, ``{s32, s32}``, and ``{s32, s64}``.
501 There are some composite rules for common situations built out of the above facilities:
503 * ``widenScalarToNextPow2()`` is like ``widenScalarIf()`` but is satisfied iff the type
504 size in bits is not a power of 2 and selects a target type that is the next
509 * ``minScalar()`` is like ``widenScalarIf()`` but is satisfied iff the type
510 size in bits is smaller than the given minimum and selects the minimum as the
511 target type. Similarly, there is also a ``maxScalar()`` for the maximum and a
512 ``clampScalar()`` to do both at once.
514 * ``minScalarSameAs()`` is like ``minScalar()`` but the minimum is taken from another
517 * ``moreElementsToNextMultiple()`` is like ``moreElementsToNextPow2()`` but is based on
518 multiples of X rather than powers of 2.
524 An alternative worth investigating is to generalize the API to represent
525 actions using ``std::function`` that implements the action, instead of explicit
526 enum tokens (``Legal``, ``WidenScalar``, ...).
529 Moreover, we could use TableGen to initially infer legality of operation from
530 existing patterns (as any pattern we can select is by definition legal).
531 Expanding that to describe legalization actions is a much larger but
532 potentially useful project.
534 .. rubric:: Footnotes
536 .. [#legalizer-legacy-footnote] An API is broadly similar to
537 SelectionDAG/TargetLowering is available but is not recommended as a more
538 powerful API is available.
540 .. _min-legalizerinfo:
545 GlobalISel's legalizer has a great deal of flexibility in how a given target
546 shapes the GMIR that the rest of the backend must handle. However, there are
547 a small number of requirements that all targets must meet.
549 Before discussing the minimum requirements, we'll need some terminology:
552 The set of types which is the union of all possible types produced by at
553 least one legal instruction.
556 The set of types which is the union of all possible types consumed by at
557 least one legal instruction.
559 Both sets are often identical but there's no guarantee of that. For example,
560 it's not uncommon to be unable to consume s64 but still be able to produce it
561 for a few specific instructions.
563 Minimum Rules For Scalars
564 """""""""""""""""""""""""
566 * G_ANYEXT must be legal for all inputs from the producer type set and all larger
567 outputs from the consumer type set.
568 * G_TRUNC must be legal for all inputs from the producer type set and all
569 smaller outputs from the consumer type set.
571 G_ANYEXT, and G_TRUNC have mandatory legality since the GMIR requires a means to
572 connect operations with different type sizes. They are usually trivial to support
573 since G_ANYEXT doesn't define the value of the additional bits and G_TRUNC is
574 discarding bits. The other conversions can be lowered into G_ANYEXT/G_TRUNC
575 with some additional operations that are subject to further legalization. For
576 example, G_SEXT can lower to:
581 and the G_CONSTANT/G_SHL/G_ASHR can further lower to other operations or target
582 instructions. Similarly, G_FPEXT has no legality requirement since it can lower
583 to a G_ANYEXT followed by a target instruction.
585 G_MERGE_VALUES and G_UNMERGE_VALUES do not have legality requirements since the
586 former can lower to G_ANYEXT and some other legalizable instructions, while the
587 latter can lower to some legalizable instructions followed by G_TRUNC.
589 Minimum Legality For Vectors
590 """"""""""""""""""""""""""""
592 Within the vector types, there aren't any defined conversions in LLVM IR as
593 vectors are often converted by reinterpreting the bits or by decomposing the
594 vector and reconstituting it as a different type. As such, G_BITCAST is the
595 only operation to account for. We generally don't require that it's legal
596 because it can usually be lowered to COPY (or to nothing using
597 replaceAllUses()). However, there are situations where G_BITCAST is non-trivial
598 (e.g. little-endian vectors of big-endian data such as on big-endian MIPS MSA and
599 big-endian ARM NEON, see `_i_bitcast`). To account for this G_BITCAST must be
600 legal for all type combinations that change the bit pattern in the value.
602 There are no legality requirements for G_BUILD_VECTOR, or G_BUILD_VECTOR_TRUNC
603 since these can be handled by:
604 * Declaring them legal.
606 * Lowering them to G_TRUNC+G_ANYEXT and some legalizable instructions.
607 * Lowering them to target instructions which are legal by definition.
609 The same reasoning also allows G_UNMERGE_VALUES to lack legality requirements
612 Minimum Legality for Pointers
613 """""""""""""""""""""""""""""
615 There are no minimum rules for pointers since G_INTTOPTR and G_PTRTOINT can
616 be selected to a COPY from register class to another by the legalizer.
618 Minimum Legality For Operations
619 """""""""""""""""""""""""""""""
621 The rules for G_ANYEXT, G_MERGE_VALUES, G_BITCAST, G_BUILD_VECTOR,
622 G_BUILD_VECTOR_TRUNC, G_CONCAT_VECTORS, G_UNMERGE_VALUES, G_PTRTOINT, and
623 G_INTTOPTR have already been noted above. In addition to those, the following
624 operations have requirements:
625 * At least one G_IMPLICIT_DEF must be legal. This is usually trivial as it
626 requires no code to be selected.
627 * G_PHI must be legal for all types in the producer and consumer typesets. This
628 is usually trivial as it requires no code to be selected.
629 * At least one G_FRAME_INDEX must be legal
630 * At least one G_BLOCK_ADDR must be legal
632 There are many other operations you'd expect to have legality requirements but
633 they can be lowered to target instructions which are legal by definition.
640 This pass constrains the :ref:`gmir-gvregs` operands of generic
641 instructions to some :ref:`gmir-regbank`.
643 It iteratively maps instructions to a set of per-operand bank assignment.
644 The possible mappings are determined by the target-provided
645 :ref:`RegisterBankInfo <api-registerbankinfo>`.
646 The mapping is then applied, possibly introducing ``COPY`` instructions if
649 It traverses the ``MachineFunction`` top down so that all operands are already
650 mapped when analyzing an instruction.
652 This pass could also remap target-specific instructions when beneficial.
653 In the future, this could replace the ExeDepsFix pass, as we can directly
654 select the best variant for an instruction that's available on multiple banks.
656 .. _api-registerbankinfo:
658 API: RegisterBankInfo
659 ^^^^^^^^^^^^^^^^^^^^^
661 The ``RegisterBankInfo`` class describes multiple aspects of register banks.
663 * **Banks**: ``addRegBankCoverage`` --- which register bank covers each
666 * **Cross-Bank Copies**: ``copyCost`` --- the cost of a ``COPY`` from one bank
669 * **Default Mapping**: ``getInstrMapping`` --- the default bank assignments for
672 * **Alternative Mapping**: ``getInstrAlternativeMapping`` --- the other
673 possible bank assignments for a given instruction.
676 All this information should eventually be static and generated by TableGen,
677 mostly using existing information augmented by bank descriptions.
680 ``getInstrMapping`` is currently separate from ``getInstrAlternativeMapping``
681 because the latter is more expensive: as we move to static mapping info,
682 both methods should be free, and we should merge them.
684 .. _regbankselect-modes:
689 ``RegBankSelect`` currently has two modes:
691 * **Fast** --- For each instruction, pick a target-provided "default" bank
692 assignment. This is the default at -O0.
694 * **Greedy** --- For each instruction, pick the cheapest of several
695 target-provided bank assignment alternatives.
697 We intend to eventually introduce an additional optimizing mode:
699 * **Global** --- Across multiple instructions, pick the cheapest combination of
703 On AArch64, we are considering using the Greedy mode even at -O0 (or perhaps at
704 backend -O1): because :ref:`gmir-llt` doesn't distinguish floating point from
705 integer scalars, the default assignment for loads and stores is the integer
706 bank, introducing cross-bank copies on most floating point operations.
709 .. _instructionselect:
714 This pass transforms generic machine instructions into equivalent
715 target-specific instructions. It traverses the ``MachineFunction`` bottom-up,
716 selecting uses before definitions, enabling trivial dead code elimination.
718 .. _api-instructionselector:
720 API: InstructionSelector
721 ^^^^^^^^^^^^^^^^^^^^^^^^
723 The target implements the ``InstructionSelector`` class, containing the
724 target-specific selection logic proper.
726 The instance is provided by the subtarget, so that it can specialize the
727 selector by subtarget feature (with, e.g., a vector selector overriding parts
728 of a general-purpose common selector).
729 We might also want to parameterize it by MachineFunction, to enable selector
730 variants based on function attributes like optsize.
732 The simple API consists of:
736 virtual bool select(MachineInstr &MI)
738 This target-provided method is responsible for mutating (or replacing) a
739 possibly-generic MI into a fully target-specific equivalent.
740 It is also responsible for doing the necessary constraining of gvregs into the
741 appropriate register classes as well as passing through COPY instructions to
742 the register allocator.
744 The ``InstructionSelector`` can fold other instructions into the selected MI,
745 by walking the use-def chain of the vreg operands.
746 As GlobalISel is Global, this folding can occur across basic blocks.
748 SelectionDAG Rule Imports
749 ^^^^^^^^^^^^^^^^^^^^^^^^^
751 TableGen will import SelectionDAG rules and provide the following function to
756 bool selectImpl(MachineInstr &MI)
758 The ``--stats`` option can be used to determine what proportion of rules were
759 successfully imported. The easiest way to use this is to copy the
760 ``-gen-globalisel`` tablegen command from ``ninja -v`` and modify it.
762 Similarly, the ``--warn-on-skipped-patterns`` option can be used to obtain the
763 reasons that rules weren't imported. This can be used to focus on the most
764 important rejection reasons.
769 PatLeafs cannot be imported because their C++ is implemented in terms of
770 ``SDNode`` objects. PatLeafs that handle immediate predicates should be
771 replaced by ``ImmLeaf``, ``IntImmLeaf``, or ``FPImmLeaf`` as appropriate.
773 There's no standard answer for other PatLeafs. Some standard predicates have
774 been baked into TableGen but this should not generally be done.
779 Custom SDNodes should be mapped to Target Pseudos using ``GINodeEquiv``. This
780 will cause the instruction selector to import them but you will also need to
781 ensure the target pseudo is introduced to the MIR before the instruction
782 selector. Any preceding pass is suitable but the legalizer will be a
783 particularly common choice.
788 ComplexPatterns cannot be imported because their C++ is implemented in terms of
789 ``SDNode`` objects. GlobalISel versions should be defined with
790 ``GIComplexOperandMatcher`` and mapped to ComplexPattern with
791 ``GIComplexPatternEquiv``.
793 The following predicates are useful for porting ComplexPattern:
795 * isBaseWithConstantOffset() - Check for base+offset structures
796 * isOperandImmEqual() - Check for a particular constant
797 * isObviouslySafeToFold() - Check for reasons an instruction can't be sunk and folded into another.
799 There are some important points for the C++ implementation:
801 * Don't modify MIR in the predicate
802 * Renderer lambdas should capture by value to avoid use-after-free. They will be used after the predicate returns.
803 * Only create instructions in a renderer lambda. GlobalISel won't clean up things you create but don't use.
811 .. _maintainability-iterative:
813 Iterative Transformations
814 -------------------------
816 Passes are split into small, iterative transformations, with all state
817 represented in the MIR.
819 This differs from SelectionDAG (in particular, the legalizer) using various
820 in-memory side-tables.
823 .. _maintainability-mir:
828 .. FIXME: Update the MIRLangRef to include GMI additions.
830 :ref:`gmir` is serializable (see :doc:`MIRLangRef`).
831 Combined with :ref:`maintainability-iterative`, this enables much finer-grained
832 testing, rather than requiring large and fragile IR-to-assembly tests.
834 The current "stage" in the :ref:`pipeline` is represented by a set of
835 ``MachineFunctionProperties``:
838 * ``regBankSelected``
842 .. _maintainability-verifier:
847 The pass approach lets us use the ``MachineVerifier`` to enforce invariants.
848 For instance, a ``regBankSelected`` function may not have gvregs without
852 The ``MachineVerifier`` being monolithic, some of the checks we want to do
853 can't be integrated to it: GlobalISel is a separate library, so we can't
854 directly reference it from CodeGen. For instance, legality checks are
855 currently done in RegBankSelect/InstructionSelect proper. We could #ifdef out
856 the checks, or we could add some sort of verifier API.
861 Progress and Future Work
862 ========================
864 The initial goal is to replace FastISel on AArch64. The next step will be to
865 replace SelectionDAG as the optimized ISel.
868 While we iterate on GlobalISel, we strive to avoid affecting the performance of
869 SelectionDAG, FastISel, or the other MIR passes. For instance, the types of
870 :ref:`gmir-gvregs` are stored in a separate table in ``MachineRegisterInfo``,
871 that is destroyed after :ref:`instructionselect`.
873 .. _progress-fastisel:
878 For the initial FastISel replacement, we intend to fallback to SelectionDAG on
881 Currently, compile-time of the fast pipeline is within 1.5x of FastISel.
882 We're optimistic we can get to within 1.1/1.2x, but beating FastISel will be
883 challenging given the multi-pass approach.
884 Still, supporting all IR (via a complete legalizer) and avoiding the fallback
885 to SelectionDAG in the worst case should enable better amortized performance
886 than SelectionDAG+FastISel.
889 We considered never having a fallback to SelectionDAG, instead deciding early
890 whether a given function is supported by GlobalISel or not. The decision would
891 be based on :ref:`milegalizer` queries.
892 We abandoned that for two reasons:
893 a) on IR inputs, we'd need to basically simulate the :ref:`irtranslator`;
894 b) to be robust against unforeseen failures and to enable iterative
897 .. _progress-targets:
899 Support For Other Targets
900 -------------------------
902 In parallel, we're investigating adding support for other - ideally quite
903 different - targets. For instance, there is some initial AMDGPU support.
908 Porting GlobalISel to A New Target
909 ==================================
911 There are four major classes to implement by the target:
913 * :ref:`CallLowering <api-calllowering>` --- lower calls, returns, and arguments
914 according to the ABI.
915 * :ref:`RegisterBankInfo <api-registerbankinfo>` --- describe
916 :ref:`gmir-regbank` coverage, cross-bank copy cost, and the mapping of
917 operands onto banks for each instruction.
918 * :ref:`LegalizerInfo <api-legalizerinfo>` --- describe what is legal, and how
919 to legalize what isn't.
920 * :ref:`InstructionSelector <api-instructionselector>` --- select generic MIR
921 to target-specific MIR.
925 * ``TargetPassConfig`` --- create the passes constituting the pipeline,
926 including additional passes not included in the :ref:`pipeline`.
933 * `Global Instruction Selection - A Proposal by Quentin Colombet @LLVMDevMeeting 2015 <https://www.youtube.com/watch?v=F6GGbYtae3g>`_
934 * `Global Instruction Selection - Status by Quentin Colombet, Ahmed Bougacha, and Tim Northover @LLVMDevMeeting 2016 <https://www.youtube.com/watch?v=6tfb344A7w8>`_
935 * `GlobalISel - LLVM's Latest Instruction Selection Framework by Diana Picus @FOSDEM17 <https://www.youtube.com/watch?v=d6dF6E4BPeU>`_
936 * GlobalISel: Past, Present, and Future by Quentin Colombet and Ahmed Bougacha @LLVMDevMeeting 2017
937 * Head First into GlobalISel by Daniel Sanders, Aditya Nandakumar, and Justin Bogner @LLVMDevMeeting 2017