1 ==============================
2 LLVM Language Reference Manual
3 ==============================
12 This document is a reference manual for the LLVM assembly language. LLVM
13 is a Static Single Assignment (SSA) based representation that provides
14 type safety, low-level operations, flexibility, and the capability of
15 representing 'all' high-level languages cleanly. It is the common code
16 representation used throughout all phases of the LLVM compilation
22 The LLVM code representation is designed to be used in three different
23 forms: as an in-memory compiler IR, as an on-disk bitcode representation
24 (suitable for fast loading by a Just-In-Time compiler), and as a human
25 readable assembly language representation. This allows LLVM to provide a
26 powerful intermediate representation for efficient compiler
27 transformations and analysis, while providing a natural means to debug
28 and visualize the transformations. The three different forms of LLVM are
29 all equivalent. This document describes the human readable
30 representation and notation.
32 The LLVM representation aims to be light-weight and low-level while
33 being expressive, typed, and extensible at the same time. It aims to be
34 a "universal IR" of sorts, by being at a low enough level that
35 high-level ideas may be cleanly mapped to it (similar to how
36 microprocessors are "universal IR's", allowing many source languages to
37 be mapped to them). By providing type information, LLVM can be used as
38 the target of optimizations: for example, through pointer analysis, it
39 can be proven that a C automatic variable is never accessed outside of
40 the current function, allowing it to be promoted to a simple SSA value
41 instead of a memory location.
48 It is important to note that this document describes 'well formed' LLVM
49 assembly language. There is a difference between what the parser accepts
50 and what is considered 'well formed'. For example, the following
51 instruction is syntactically okay, but not well formed:
57 because the definition of ``%x`` does not dominate all of its uses. The
58 LLVM infrastructure provides a verification pass that may be used to
59 verify that an LLVM module is well formed. This pass is automatically
60 run by the parser after parsing input assembly and by the optimizer
61 before it outputs bitcode. The violations pointed out by the verifier
62 pass indicate bugs in transformation passes or input to the parser.
69 LLVM identifiers come in two basic types: global and local. Global
70 identifiers (functions, global variables) begin with the ``'@'``
71 character. Local identifiers (register names, types) begin with the
72 ``'%'`` character. Additionally, there are three different formats for
73 identifiers, for different purposes:
75 #. Named values are represented as a string of characters with their
76 prefix. For example, ``%foo``, ``@DivisionByZero``,
77 ``%a.really.long.identifier``. The actual regular expression used is
78 '``[%@][-a-zA-Z$._][-a-zA-Z$._0-9]*``'. Identifiers that require other
79 characters in their names can be surrounded with quotes. Special
80 characters may be escaped using ``"\xx"`` where ``xx`` is the ASCII
81 code for the character in hexadecimal. In this way, any character can
82 be used in a name value, even quotes themselves. The ``"\01"`` prefix
83 can be used on global values to suppress mangling.
84 #. Unnamed values are represented as an unsigned numeric value with
85 their prefix. For example, ``%12``, ``@2``, ``%44``.
86 #. Constants, which are described in the section Constants_ below.
88 LLVM requires that values start with a prefix for two reasons: Compilers
89 don't need to worry about name clashes with reserved words, and the set
90 of reserved words may be expanded in the future without penalty.
91 Additionally, unnamed identifiers allow a compiler to quickly come up
92 with a temporary variable without having to avoid symbol table
95 Reserved words in LLVM are very similar to reserved words in other
96 languages. There are keywords for different opcodes ('``add``',
97 '``bitcast``', '``ret``', etc...), for primitive type names ('``void``',
98 '``i32``', etc...), and others. These reserved words cannot conflict
99 with variable names, because none of them start with a prefix character
100 (``'%'`` or ``'@'``).
102 Here is an example of LLVM code to multiply the integer variable
109 %result = mul i32 %X, 8
111 After strength reduction:
115 %result = shl i32 %X, 3
121 %0 = add i32 %X, %X ; yields i32:%0
122 %1 = add i32 %0, %0 ; yields i32:%1
123 %result = add i32 %1, %1
125 This last way of multiplying ``%X`` by 8 illustrates several important
126 lexical features of LLVM:
128 #. Comments are delimited with a '``;``' and go until the end of line.
129 #. Unnamed temporaries are created when the result of a computation is
130 not assigned to a named value.
131 #. Unnamed temporaries are numbered sequentially (using a per-function
132 incrementing counter, starting with 0). Note that basic blocks and unnamed
133 function parameters are included in this numbering. For example, if the
134 entry basic block is not given a label name and all function parameters are
135 named, then it will get number 0.
137 It also shows a convention that we follow in this document. When
138 demonstrating instructions, we will follow an instruction with a comment
139 that defines the type and name of value produced.
147 LLVM programs are composed of ``Module``'s, each of which is a
148 translation unit of the input programs. Each module consists of
149 functions, global variables, and symbol table entries. Modules may be
150 combined together with the LLVM linker, which merges function (and
151 global variable) definitions, resolves forward declarations, and merges
152 symbol table entries. Here is an example of the "hello world" module:
156 ; Declare the string constant as a global constant.
157 @.str = private unnamed_addr constant [13 x i8] c"hello world\0A\00"
159 ; External declaration of the puts function
160 declare i32 @puts(i8* nocapture) nounwind
162 ; Definition of main function
163 define i32 @main() { ; i32()*
164 ; Convert [13 x i8]* to i8*...
165 %cast210 = getelementptr [13 x i8], [13 x i8]* @.str, i64 0, i64 0
167 ; Call puts function to write out the string to stdout.
168 call i32 @puts(i8* %cast210)
173 !0 = !{i32 42, null, !"string"}
176 This example is made up of a :ref:`global variable <globalvars>` named
177 "``.str``", an external declaration of the "``puts``" function, a
178 :ref:`function definition <functionstructure>` for "``main``" and
179 :ref:`named metadata <namedmetadatastructure>` "``foo``".
181 In general, a module is made up of a list of global values (where both
182 functions and global variables are global values). Global values are
183 represented by a pointer to a memory location (in this case, a pointer
184 to an array of char, and a pointer to a function), and have one of the
185 following :ref:`linkage types <linkage>`.
192 All Global Variables and Functions have one of the following types of
196 Global values with "``private``" linkage are only directly
197 accessible by objects in the current module. In particular, linking
198 code into a module with a private global value may cause the
199 private to be renamed as necessary to avoid collisions. Because the
200 symbol is private to the module, all references can be updated. This
201 doesn't show up in any symbol table in the object file.
203 Similar to private, but the value shows as a local symbol
204 (``STB_LOCAL`` in the case of ELF) in the object file. This
205 corresponds to the notion of the '``static``' keyword in C.
206 ``available_externally``
207 Globals with "``available_externally``" linkage are never emitted into
208 the object file corresponding to the LLVM module. From the linker's
209 perspective, an ``available_externally`` global is equivalent to
210 an external declaration. They exist to allow inlining and other
211 optimizations to take place given knowledge of the definition of the
212 global, which is known to be somewhere outside the module. Globals
213 with ``available_externally`` linkage are allowed to be discarded at
214 will, and allow inlining and other optimizations. This linkage type is
215 only allowed on definitions, not declarations.
217 Globals with "``linkonce``" linkage are merged with other globals of
218 the same name when linkage occurs. This can be used to implement
219 some forms of inline functions, templates, or other code which must
220 be generated in each translation unit that uses it, but where the
221 body may be overridden with a more definitive definition later.
222 Unreferenced ``linkonce`` globals are allowed to be discarded. Note
223 that ``linkonce`` linkage does not actually allow the optimizer to
224 inline the body of this function into callers because it doesn't
225 know if this definition of the function is the definitive definition
226 within the program or whether it will be overridden by a stronger
227 definition. To enable inlining and other optimizations, use
228 "``linkonce_odr``" linkage.
230 "``weak``" linkage has the same merging semantics as ``linkonce``
231 linkage, except that unreferenced globals with ``weak`` linkage may
232 not be discarded. This is used for globals that are declared "weak"
235 "``common``" linkage is most similar to "``weak``" linkage, but they
236 are used for tentative definitions in C, such as "``int X;``" at
237 global scope. Symbols with "``common``" linkage are merged in the
238 same way as ``weak symbols``, and they may not be deleted if
239 unreferenced. ``common`` symbols may not have an explicit section,
240 must have a zero initializer, and may not be marked
241 ':ref:`constant <globalvars>`'. Functions and aliases may not have
244 .. _linkage_appending:
247 "``appending``" linkage may only be applied to global variables of
248 pointer to array type. When two global variables with appending
249 linkage are linked together, the two global arrays are appended
250 together. This is the LLVM, typesafe, equivalent of having the
251 system linker append together "sections" with identical names when
254 Unfortunately this doesn't correspond to any feature in .o files, so it
255 can only be used for variables like ``llvm.global_ctors`` which llvm
256 interprets specially.
259 The semantics of this linkage follow the ELF object file model: the
260 symbol is weak until linked, if not linked, the symbol becomes null
261 instead of being an undefined reference.
262 ``linkonce_odr``, ``weak_odr``
263 Some languages allow differing globals to be merged, such as two
264 functions with different semantics. Other languages, such as
265 ``C++``, ensure that only equivalent globals are ever merged (the
266 "one definition rule" --- "ODR"). Such languages can use the
267 ``linkonce_odr`` and ``weak_odr`` linkage types to indicate that the
268 global will only be merged with equivalent globals. These linkage
269 types are otherwise the same as their non-``odr`` versions.
271 If none of the above identifiers are used, the global is externally
272 visible, meaning that it participates in linkage and can be used to
273 resolve external symbol references.
275 It is illegal for a global variable or function *declaration* to have any
276 linkage type other than ``external`` or ``extern_weak``.
283 LLVM :ref:`functions <functionstructure>`, :ref:`calls <i_call>` and
284 :ref:`invokes <i_invoke>` can all have an optional calling convention
285 specified for the call. The calling convention of any pair of dynamic
286 caller/callee must match, or the behavior of the program is undefined.
287 The following calling conventions are supported by LLVM, and more may be
290 "``ccc``" - The C calling convention
291 This calling convention (the default if no other calling convention
292 is specified) matches the target C calling conventions. This calling
293 convention supports varargs function calls and tolerates some
294 mismatch in the declared prototype and implemented declaration of
295 the function (as does normal C).
296 "``fastcc``" - The fast calling convention
297 This calling convention attempts to make calls as fast as possible
298 (e.g. by passing things in registers). This calling convention
299 allows the target to use whatever tricks it wants to produce fast
300 code for the target, without having to conform to an externally
301 specified ABI (Application Binary Interface). `Tail calls can only
302 be optimized when this, the tailcc, the GHC or the HiPE convention is
303 used. <CodeGenerator.html#id80>`_ This calling convention does not
304 support varargs and requires the prototype of all callees to exactly
305 match the prototype of the function definition.
306 "``coldcc``" - The cold calling convention
307 This calling convention attempts to make code in the caller as
308 efficient as possible under the assumption that the call is not
309 commonly executed. As such, these calls often preserve all registers
310 so that the call does not break any live ranges in the caller side.
311 This calling convention does not support varargs and requires the
312 prototype of all callees to exactly match the prototype of the
313 function definition. Furthermore the inliner doesn't consider such function
315 "``cc 10``" - GHC convention
316 This calling convention has been implemented specifically for use by
317 the `Glasgow Haskell Compiler (GHC) <http://www.haskell.org/ghc>`_.
318 It passes everything in registers, going to extremes to achieve this
319 by disabling callee save registers. This calling convention should
320 not be used lightly but only for specific situations such as an
321 alternative to the *register pinning* performance technique often
322 used when implementing functional programming languages. At the
323 moment only X86 supports this convention and it has the following
326 - On *X86-32* only supports up to 4 bit type parameters. No
327 floating-point types are supported.
328 - On *X86-64* only supports up to 10 bit type parameters and 6
329 floating-point parameters.
331 This calling convention supports `tail call
332 optimization <CodeGenerator.html#id80>`_ but requires both the
333 caller and callee are using it.
334 "``cc 11``" - The HiPE calling convention
335 This calling convention has been implemented specifically for use by
336 the `High-Performance Erlang
337 (HiPE) <http://www.it.uu.se/research/group/hipe/>`_ compiler, *the*
338 native code compiler of the `Ericsson's Open Source Erlang/OTP
339 system <http://www.erlang.org/download.shtml>`_. It uses more
340 registers for argument passing than the ordinary C calling
341 convention and defines no callee-saved registers. The calling
342 convention properly supports `tail call
343 optimization <CodeGenerator.html#id80>`_ but requires that both the
344 caller and the callee use it. It uses a *register pinning*
345 mechanism, similar to GHC's convention, for keeping frequently
346 accessed runtime components pinned to specific hardware registers.
347 At the moment only X86 supports this convention (both 32 and 64
349 "``webkit_jscc``" - WebKit's JavaScript calling convention
350 This calling convention has been implemented for `WebKit FTL JIT
351 <https://trac.webkit.org/wiki/FTLJIT>`_. It passes arguments on the
352 stack right to left (as cdecl does), and returns a value in the
353 platform's customary return register.
354 "``anyregcc``" - Dynamic calling convention for code patching
355 This is a special convention that supports patching an arbitrary code
356 sequence in place of a call site. This convention forces the call
357 arguments into registers but allows them to be dynamically
358 allocated. This can currently only be used with calls to
359 llvm.experimental.patchpoint because only this intrinsic records
360 the location of its arguments in a side table. See :doc:`StackMaps`.
361 "``preserve_mostcc``" - The `PreserveMost` calling convention
362 This calling convention attempts to make the code in the caller as
363 unintrusive as possible. This convention behaves identically to the `C`
364 calling convention on how arguments and return values are passed, but it
365 uses a different set of caller/callee-saved registers. This alleviates the
366 burden of saving and recovering a large register set before and after the
367 call in the caller. If the arguments are passed in callee-saved registers,
368 then they will be preserved by the callee across the call. This doesn't
369 apply for values returned in callee-saved registers.
371 - On X86-64 the callee preserves all general purpose registers, except for
372 R11. R11 can be used as a scratch register. Floating-point registers
373 (XMMs/YMMs) are not preserved and need to be saved by the caller.
375 The idea behind this convention is to support calls to runtime functions
376 that have a hot path and a cold path. The hot path is usually a small piece
377 of code that doesn't use many registers. The cold path might need to call out to
378 another function and therefore only needs to preserve the caller-saved
379 registers, which haven't already been saved by the caller. The
380 `PreserveMost` calling convention is very similar to the `cold` calling
381 convention in terms of caller/callee-saved registers, but they are used for
382 different types of function calls. `coldcc` is for function calls that are
383 rarely executed, whereas `preserve_mostcc` function calls are intended to be
384 on the hot path and definitely executed a lot. Furthermore `preserve_mostcc`
385 doesn't prevent the inliner from inlining the function call.
387 This calling convention will be used by a future version of the ObjectiveC
388 runtime and should therefore still be considered experimental at this time.
389 Although this convention was created to optimize certain runtime calls to
390 the ObjectiveC runtime, it is not limited to this runtime and might be used
391 by other runtimes in the future too. The current implementation only
392 supports X86-64, but the intention is to support more architectures in the
394 "``preserve_allcc``" - The `PreserveAll` calling convention
395 This calling convention attempts to make the code in the caller even less
396 intrusive than the `PreserveMost` calling convention. This calling
397 convention also behaves identical to the `C` calling convention on how
398 arguments and return values are passed, but it uses a different set of
399 caller/callee-saved registers. This removes the burden of saving and
400 recovering a large register set before and after the call in the caller. If
401 the arguments are passed in callee-saved registers, then they will be
402 preserved by the callee across the call. This doesn't apply for values
403 returned in callee-saved registers.
405 - On X86-64 the callee preserves all general purpose registers, except for
406 R11. R11 can be used as a scratch register. Furthermore it also preserves
407 all floating-point registers (XMMs/YMMs).
409 The idea behind this convention is to support calls to runtime functions
410 that don't need to call out to any other functions.
412 This calling convention, like the `PreserveMost` calling convention, will be
413 used by a future version of the ObjectiveC runtime and should be considered
414 experimental at this time.
415 "``cxx_fast_tlscc``" - The `CXX_FAST_TLS` calling convention for access functions
416 Clang generates an access function to access C++-style TLS. The access
417 function generally has an entry block, an exit block and an initialization
418 block that is run at the first time. The entry and exit blocks can access
419 a few TLS IR variables, each access will be lowered to a platform-specific
422 This calling convention aims to minimize overhead in the caller by
423 preserving as many registers as possible (all the registers that are
424 preserved on the fast path, composed of the entry and exit blocks).
426 This calling convention behaves identical to the `C` calling convention on
427 how arguments and return values are passed, but it uses a different set of
428 caller/callee-saved registers.
430 Given that each platform has its own lowering sequence, hence its own set
431 of preserved registers, we can't use the existing `PreserveMost`.
433 - On X86-64 the callee preserves all general purpose registers, except for
435 "``tailcc``" - Tail callable calling convention
436 This calling convention ensures that calls in tail position will always be
437 tail call optimized. This calling convention is equivalent to fastcc,
438 except for an additional guarantee that tail calls will be produced
439 whenever possible. `Tail calls can only be optimized when this, the fastcc,
440 the GHC or the HiPE convention is used. <CodeGenerator.html#id80>`_ This
441 calling convention does not support varargs and requires the prototype of
442 all callees to exactly match the prototype of the function definition.
443 "``swiftcc``" - This calling convention is used for Swift language.
444 - On X86-64 RCX and R8 are available for additional integer returns, and
445 XMM2 and XMM3 are available for additional FP/vector returns.
446 - On iOS platforms, we use AAPCS-VFP calling convention.
448 This calling convention is like ``swiftcc`` in most respects, but also the
449 callee pops the argument area of the stack so that mandatory tail calls are
450 possible as in ``tailcc``.
451 "``cfguard_checkcc``" - Windows Control Flow Guard (Check mechanism)
452 This calling convention is used for the Control Flow Guard check function,
453 calls to which can be inserted before indirect calls to check that the call
454 target is a valid function address. The check function has no return value,
455 but it will trigger an OS-level error if the address is not a valid target.
456 The set of registers preserved by the check function, and the register
457 containing the target address are architecture-specific.
459 - On X86 the target address is passed in ECX.
460 - On ARM the target address is passed in R0.
461 - On AArch64 the target address is passed in X15.
462 "``cc <n>``" - Numbered convention
463 Any calling convention may be specified by number, allowing
464 target-specific calling conventions to be used. Target specific
465 calling conventions start at 64.
467 More calling conventions can be added/defined on an as-needed basis, to
468 support Pascal conventions or any other well-known target-independent
471 .. _visibilitystyles:
476 All Global Variables and Functions have one of the following visibility
479 "``default``" - Default style
480 On targets that use the ELF object file format, default visibility
481 means that the declaration is visible to other modules and, in
482 shared libraries, means that the declared entity may be overridden.
483 On Darwin, default visibility means that the declaration is visible
484 to other modules. Default visibility corresponds to "external
485 linkage" in the language.
486 "``hidden``" - Hidden style
487 Two declarations of an object with hidden visibility refer to the
488 same object if they are in the same shared object. Usually, hidden
489 visibility indicates that the symbol will not be placed into the
490 dynamic symbol table, so no other module (executable or shared
491 library) can reference it directly.
492 "``protected``" - Protected style
493 On ELF, protected visibility indicates that the symbol will be
494 placed in the dynamic symbol table, but that references within the
495 defining module will bind to the local symbol. That is, the symbol
496 cannot be overridden by another module.
498 A symbol with ``internal`` or ``private`` linkage must have ``default``
506 All Global Variables, Functions and Aliases can have one of the following
510 "``dllimport``" causes the compiler to reference a function or variable via
511 a global pointer to a pointer that is set up by the DLL exporting the
512 symbol. On Microsoft Windows targets, the pointer name is formed by
513 combining ``__imp_`` and the function or variable name.
515 "``dllexport``" causes the compiler to provide a global pointer to a pointer
516 in a DLL, so that it can be referenced with the ``dllimport`` attribute. On
517 Microsoft Windows targets, the pointer name is formed by combining
518 ``__imp_`` and the function or variable name. Since this storage class
519 exists for defining a dll interface, the compiler, assembler and linker know
520 it is externally referenced and must refrain from deleting the symbol.
524 Thread Local Storage Models
525 ---------------------------
527 A variable may be defined as ``thread_local``, which means that it will
528 not be shared by threads (each thread will have a separated copy of the
529 variable). Not all targets support thread-local variables. Optionally, a
530 TLS model may be specified:
533 For variables that are only used within the current shared library.
535 For variables in modules that will not be loaded dynamically.
537 For variables defined in the executable and only used within it.
539 If no explicit model is given, the "general dynamic" model is used.
541 The models correspond to the ELF TLS models; see `ELF Handling For
542 Thread-Local Storage <http://people.redhat.com/drepper/tls.pdf>`_ for
543 more information on under which circumstances the different models may
544 be used. The target may choose a different TLS model if the specified
545 model is not supported, or if a better choice of model can be made.
547 A model can also be specified in an alias, but then it only governs how
548 the alias is accessed. It will not have any effect in the aliasee.
550 For platforms without linker support of ELF TLS model, the -femulated-tls
551 flag can be used to generate GCC compatible emulated TLS code.
553 .. _runtime_preemption_model:
555 Runtime Preemption Specifiers
556 -----------------------------
558 Global variables, functions and aliases may have an optional runtime preemption
559 specifier. If a preemption specifier isn't given explicitly, then a
560 symbol is assumed to be ``dso_preemptable``.
563 Indicates that the function or variable may be replaced by a symbol from
564 outside the linkage unit at runtime.
567 The compiler may assume that a function or variable marked as ``dso_local``
568 will resolve to a symbol within the same linkage unit. Direct access will
569 be generated even if the definition is not within this compilation unit.
576 LLVM IR allows you to specify both "identified" and "literal" :ref:`structure
577 types <t_struct>`. Literal types are uniqued structurally, but identified types
578 are never uniqued. An :ref:`opaque structural type <t_opaque>` can also be used
579 to forward declare a type that is not yet available.
581 An example of an identified structure specification is:
585 %mytype = type { %mytype*, i32 }
587 Prior to the LLVM 3.0 release, identified types were structurally uniqued. Only
588 literal types are uniqued in recent versions of LLVM.
592 Non-Integral Pointer Type
593 -------------------------
595 Note: non-integral pointer types are a work in progress, and they should be
596 considered experimental at this time.
598 LLVM IR optionally allows the frontend to denote pointers in certain address
599 spaces as "non-integral" via the :ref:`datalayout string<langref_datalayout>`.
600 Non-integral pointer types represent pointers that have an *unspecified* bitwise
601 representation; that is, the integral representation may be target dependent or
602 unstable (not backed by a fixed integer).
604 ``inttoptr`` and ``ptrtoint`` instructions have the same semantics as for
605 integral (i.e. normal) pointers in that they convert integers to and from
606 corresponding pointer types, but there are additional implications to be
607 aware of. Because the bit-representation of a non-integral pointer may
608 not be stable, two identical casts of the same operand may or may not
609 return the same value. Said differently, the conversion to or from the
610 non-integral type depends on environmental state in an implementation
613 If the frontend wishes to observe a *particular* value following a cast, the
614 generated IR must fence with the underlying environment in an implementation
615 defined manner. (In practice, this tends to require ``noinline`` routines for
618 From the perspective of the optimizer, ``inttoptr`` and ``ptrtoint`` for
619 non-integral types are analogous to ones on integral types with one
620 key exception: the optimizer may not, in general, insert new dynamic
621 occurrences of such casts. If a new cast is inserted, the optimizer would
622 need to either ensure that a) all possible values are valid, or b)
623 appropriate fencing is inserted. Since the appropriate fencing is
624 implementation defined, the optimizer can't do the latter. The former is
625 challenging as many commonly expected properties, such as
626 ``ptrtoint(v)-ptrtoint(v) == 0``, don't hold for non-integral types.
633 Global variables define regions of memory allocated at compilation time
636 Global variable definitions must be initialized.
638 Global variables in other translation units can also be declared, in which
639 case they don't have an initializer.
641 Global variables can optionally specify a :ref:`linkage type <linkage>`.
643 Either global variable definitions or declarations may have an explicit section
644 to be placed in and may have an optional explicit alignment specified. If there
645 is a mismatch between the explicit or inferred section information for the
646 variable declaration and its definition the resulting behavior is undefined.
648 A variable may be defined as a global ``constant``, which indicates that
649 the contents of the variable will **never** be modified (enabling better
650 optimization, allowing the global data to be placed in the read-only
651 section of an executable, etc). Note that variables that need runtime
652 initialization cannot be marked ``constant`` as there is a store to the
655 LLVM explicitly allows *declarations* of global variables to be marked
656 constant, even if the final definition of the global is not. This
657 capability can be used to enable slightly better optimization of the
658 program, but requires the language definition to guarantee that
659 optimizations based on the 'constantness' are valid for the translation
660 units that do not include the definition.
662 As SSA values, global variables define pointer values that are in scope
663 (i.e. they dominate) all basic blocks in the program. Global variables
664 always define a pointer to their "content" type because they describe a
665 region of memory, and all memory objects in LLVM are accessed through
668 Global variables can be marked with ``unnamed_addr`` which indicates
669 that the address is not significant, only the content. Constants marked
670 like this can be merged with other constants if they have the same
671 initializer. Note that a constant with significant address *can* be
672 merged with a ``unnamed_addr`` constant, the result being a constant
673 whose address is significant.
675 If the ``local_unnamed_addr`` attribute is given, the address is known to
676 not be significant within the module.
678 A global variable may be declared to reside in a target-specific
679 numbered address space. For targets that support them, address spaces
680 may affect how optimizations are performed and/or what target
681 instructions are used to access the variable. The default address space
682 is zero. The address space qualifier must precede any other attributes.
684 LLVM allows an explicit section to be specified for globals. If the
685 target supports it, it will emit globals to the section specified.
686 Additionally, the global can placed in a comdat if the target has the necessary
689 External declarations may have an explicit section specified. Section
690 information is retained in LLVM IR for targets that make use of this
691 information. Attaching section information to an external declaration is an
692 assertion that its definition is located in the specified section. If the
693 definition is located in a different section, the behavior is undefined.
695 By default, global initializers are optimized by assuming that global
696 variables defined within the module are not modified from their
697 initial values before the start of the global initializer. This is
698 true even for variables potentially accessible from outside the
699 module, including those with external linkage or appearing in
700 ``@llvm.used`` or dllexported variables. This assumption may be suppressed
701 by marking the variable with ``externally_initialized``.
703 An explicit alignment may be specified for a global, which must be a
704 power of 2. If not present, or if the alignment is set to zero, the
705 alignment of the global is set by the target to whatever it feels
706 convenient. If an explicit alignment is specified, the global is forced
707 to have exactly that alignment. Targets and optimizers are not allowed
708 to over-align the global if the global has an assigned section. In this
709 case, the extra alignment could be observable: for example, code could
710 assume that the globals are densely packed in their section and try to
711 iterate over them as an array, alignment padding would break this
712 iteration. The maximum alignment is ``1 << 29``.
714 For global variables declarations, as well as definitions that may be
715 replaced at link time (``linkonce``, ``weak``, ``extern_weak`` and ``common``
716 linkage types), LLVM makes no assumptions about the allocation size of the
717 variables, except that they may not overlap. The alignment of a global variable
718 declaration or replaceable definition must not be greater than the alignment of
719 the definition it resolves to.
721 Globals can also have a :ref:`DLL storage class <dllstorageclass>`,
722 an optional :ref:`runtime preemption specifier <runtime_preemption_model>`,
723 an optional :ref:`global attributes <glattrs>` and
724 an optional list of attached :ref:`metadata <metadata>`.
726 Variables and aliases can have a
727 :ref:`Thread Local Storage Model <tls_model>`.
729 :ref:`Scalable vectors <t_vector>` cannot be global variables or members of
730 arrays because their size is unknown at compile time. They are allowed in
731 structs to facilitate intrinsics returning multiple values. Structs containing
732 scalable vectors cannot be used in loads, stores, allocas, or GEPs.
736 @<GlobalVarName> = [Linkage] [PreemptionSpecifier] [Visibility]
737 [DLLStorageClass] [ThreadLocal]
738 [(unnamed_addr|local_unnamed_addr)] [AddrSpace]
739 [ExternallyInitialized]
740 <global | constant> <Type> [<InitializerConstant>]
741 [, section "name"] [, comdat [($name)]]
742 [, align <Alignment>] (, !name !N)*
744 For example, the following defines a global in a numbered address space
745 with an initializer, section, and alignment:
749 @G = addrspace(5) constant float 1.0, section "foo", align 4
751 The following example just declares a global variable
755 @G = external global i32
757 The following example defines a thread-local global with the
758 ``initialexec`` TLS model:
762 @G = thread_local(initialexec) global i32 0, align 4
764 .. _functionstructure:
769 LLVM function definitions consist of the "``define``" keyword, an
770 optional :ref:`linkage type <linkage>`, an optional :ref:`runtime preemption
771 specifier <runtime_preemption_model>`, an optional :ref:`visibility
772 style <visibility>`, an optional :ref:`DLL storage class <dllstorageclass>`,
773 an optional :ref:`calling convention <callingconv>`,
774 an optional ``unnamed_addr`` attribute, a return type, an optional
775 :ref:`parameter attribute <paramattrs>` for the return type, a function
776 name, a (possibly empty) argument list (each with optional :ref:`parameter
777 attributes <paramattrs>`), optional :ref:`function attributes <fnattrs>`,
778 an optional address space, an optional section, an optional alignment,
779 an optional :ref:`comdat <langref_comdats>`,
780 an optional :ref:`garbage collector name <gc>`, an optional :ref:`prefix <prefixdata>`,
781 an optional :ref:`prologue <prologuedata>`,
782 an optional :ref:`personality <personalityfn>`,
783 an optional list of attached :ref:`metadata <metadata>`,
784 an opening curly brace, a list of basic blocks, and a closing curly brace.
786 LLVM function declarations consist of the "``declare``" keyword, an
787 optional :ref:`linkage type <linkage>`, an optional :ref:`visibility style
788 <visibility>`, an optional :ref:`DLL storage class <dllstorageclass>`, an
789 optional :ref:`calling convention <callingconv>`, an optional ``unnamed_addr``
790 or ``local_unnamed_addr`` attribute, an optional address space, a return type,
791 an optional :ref:`parameter attribute <paramattrs>` for the return type, a function name, a possibly
792 empty list of arguments, an optional alignment, an optional :ref:`garbage
793 collector name <gc>`, an optional :ref:`prefix <prefixdata>`, and an optional
794 :ref:`prologue <prologuedata>`.
796 A function definition contains a list of basic blocks, forming the CFG (Control
797 Flow Graph) for the function. Each basic block may optionally start with a label
798 (giving the basic block a symbol table entry), contains a list of instructions,
799 and ends with a :ref:`terminator <terminators>` instruction (such as a branch or
800 function return). If an explicit label name is not provided, a block is assigned
801 an implicit numbered label, using the next value from the same counter as used
802 for unnamed temporaries (:ref:`see above<identifiers>`). For example, if a
803 function entry block does not have an explicit label, it will be assigned label
804 "%0", then the first unnamed temporary in that block will be "%1", etc. If a
805 numeric label is explicitly specified, it must match the numeric label that
806 would be used implicitly.
808 The first basic block in a function is special in two ways: it is
809 immediately executed on entrance to the function, and it is not allowed
810 to have predecessor basic blocks (i.e. there can not be any branches to
811 the entry block of a function). Because the block can have no
812 predecessors, it also cannot have any :ref:`PHI nodes <i_phi>`.
814 LLVM allows an explicit section to be specified for functions. If the
815 target supports it, it will emit functions to the section specified.
816 Additionally, the function can be placed in a COMDAT.
818 An explicit alignment may be specified for a function. If not present,
819 or if the alignment is set to zero, the alignment of the function is set
820 by the target to whatever it feels convenient. If an explicit alignment
821 is specified, the function is forced to have at least that much
822 alignment. All alignments must be a power of 2.
824 If the ``unnamed_addr`` attribute is given, the address is known to not
825 be significant and two identical functions can be merged.
827 If the ``local_unnamed_addr`` attribute is given, the address is known to
828 not be significant within the module.
830 If an explicit address space is not given, it will default to the program
831 address space from the :ref:`datalayout string<langref_datalayout>`.
835 define [linkage] [PreemptionSpecifier] [visibility] [DLLStorageClass]
837 <ResultType> @<FunctionName> ([argument list])
838 [(unnamed_addr|local_unnamed_addr)] [AddrSpace] [fn Attrs]
839 [section "name"] [comdat [($name)]] [align N] [gc] [prefix Constant]
840 [prologue Constant] [personality Constant] (!name !N)* { ... }
842 The argument list is a comma separated sequence of arguments where each
843 argument is of the following form:
847 <type> [parameter Attrs] [name]
855 Aliases, unlike function or variables, don't create any new data. They
856 are just a new symbol and metadata for an existing position.
858 Aliases have a name and an aliasee that is either a global value or a
861 Aliases may have an optional :ref:`linkage type <linkage>`, an optional
862 :ref:`runtime preemption specifier <runtime_preemption_model>`, an optional
863 :ref:`visibility style <visibility>`, an optional :ref:`DLL storage class
864 <dllstorageclass>` and an optional :ref:`tls model <tls_model>`.
868 @<Name> = [Linkage] [PreemptionSpecifier] [Visibility] [DLLStorageClass] [ThreadLocal] [(unnamed_addr|local_unnamed_addr)] alias <AliaseeTy>, <AliaseeTy>* @<Aliasee>
870 The linkage must be one of ``private``, ``internal``, ``linkonce``, ``weak``,
871 ``linkonce_odr``, ``weak_odr``, ``external``. Note that some system linkers
872 might not correctly handle dropping a weak symbol that is aliased.
874 Aliases that are not ``unnamed_addr`` are guaranteed to have the same address as
875 the aliasee expression. ``unnamed_addr`` ones are only guaranteed to point
878 If the ``local_unnamed_addr`` attribute is given, the address is known to
879 not be significant within the module.
881 Since aliases are only a second name, some restrictions apply, of which
882 some can only be checked when producing an object file:
884 * The expression defining the aliasee must be computable at assembly
885 time. Since it is just a name, no relocations can be used.
887 * No alias in the expression can be weak as the possibility of the
888 intermediate alias being overridden cannot be represented in an
891 * No global value in the expression can be a declaration, since that
892 would require a relocation, which is not possible.
899 IFuncs, like as aliases, don't create any new data or func. They are just a new
900 symbol that dynamic linker resolves at runtime by calling a resolver function.
902 IFuncs have a name and a resolver that is a function called by dynamic linker
903 that returns address of another function associated with the name.
905 IFunc may have an optional :ref:`linkage type <linkage>` and an optional
906 :ref:`visibility style <visibility>`.
910 @<Name> = [Linkage] [Visibility] ifunc <IFuncTy>, <ResolverTy>* @<Resolver>
918 Comdat IR provides access to object file COMDAT/section group functionality
919 which represents interrelated sections.
921 Comdats have a name which represents the COMDAT key and a selection kind to
922 provide input on how the linker deduplicates comdats with the same key in two
923 different object files. A comdat must be included or omitted as a unit.
924 Discarding the whole comdat is allowed but discarding a subset is not.
926 A global object may be a member of at most one comdat. Aliases are placed in the
927 same COMDAT that their aliasee computes to, if any.
931 $<Name> = comdat SelectionKind
933 For selection kinds other than ``nodeduplicate``, only one of the duplicate
934 comdats may be retained by the linker and the members of the remaining comdats
935 must be discarded. The following selection kinds are supported:
938 The linker may choose any COMDAT key, the choice is arbitrary.
940 The linker may choose any COMDAT key but the sections must contain the
943 The linker will choose the section containing the largest COMDAT key.
945 No deduplication is performed.
947 The linker may choose any COMDAT key but the sections must contain the
950 - XCOFF and Mach-O don't support COMDATs.
951 - COFF supports all selection kinds. Non-``nodeduplicate`` selection kinds need
952 a non-local linkage COMDAT symbol.
953 - ELF supports ``any`` and ``nodeduplicate``.
954 - WebAssembly only supports ``any``.
956 Here is an example of a COFF COMDAT where a function will only be selected if
957 the COMDAT key's section is the largest:
961 $foo = comdat largest
962 @foo = global i32 2, comdat($foo)
964 define void @bar() comdat($foo) {
968 In a COFF object file, this will create a COMDAT section with selection kind
969 ``IMAGE_COMDAT_SELECT_LARGEST`` containing the contents of the ``@foo`` symbol
970 and another COMDAT section with selection kind
971 ``IMAGE_COMDAT_SELECT_ASSOCIATIVE`` which is associated with the first COMDAT
972 section and contains the contents of the ``@bar`` symbol.
974 As a syntactic sugar the ``$name`` can be omitted if the name is the same as
980 @foo = global i32 2, comdat
981 @bar = global i32 3, comdat($foo)
983 There are some restrictions on the properties of the global object.
984 It, or an alias to it, must have the same name as the COMDAT group when
986 The contents and size of this object may be used during link-time to determine
987 which COMDAT groups get selected depending on the selection kind.
988 Because the name of the object must match the name of the COMDAT group, the
989 linkage of the global object must not be local; local symbols can get renamed
990 if a collision occurs in the symbol table.
992 The combined use of COMDATS and section attributes may yield surprising results.
999 @g1 = global i32 42, section "sec", comdat($foo)
1000 @g2 = global i32 42, section "sec", comdat($bar)
1002 From the object file perspective, this requires the creation of two sections
1003 with the same name. This is necessary because both globals belong to different
1004 COMDAT groups and COMDATs, at the object file level, are represented by
1007 Note that certain IR constructs like global variables and functions may
1008 create COMDATs in the object file in addition to any which are specified using
1009 COMDAT IR. This arises when the code generator is configured to emit globals
1010 in individual sections (e.g. when `-data-sections` or `-function-sections`
1011 is supplied to `llc`).
1013 .. _namedmetadatastructure:
1018 Named metadata is a collection of metadata. :ref:`Metadata
1019 nodes <metadata>` (but not metadata strings) are the only valid
1020 operands for a named metadata.
1022 #. Named metadata are represented as a string of characters with the
1023 metadata prefix. The rules for metadata names are the same as for
1024 identifiers, but quoted names are not allowed. ``"\xx"`` type escapes
1025 are still valid, which allows any character to be part of a name.
1029 ; Some unnamed metadata nodes, which are referenced by the named metadata.
1034 !name = !{!0, !1, !2}
1038 Parameter Attributes
1039 --------------------
1041 The return type and each parameter of a function type may have a set of
1042 *parameter attributes* associated with them. Parameter attributes are
1043 used to communicate additional information about the result or
1044 parameters of a function. Parameter attributes are considered to be part
1045 of the function, not of the function type, so functions with different
1046 parameter attributes can have the same function type.
1048 Parameter attributes are simple keywords that follow the type specified.
1049 If multiple parameter attributes are needed, they are space separated.
1052 .. code-block:: llvm
1054 declare i32 @printf(i8* noalias nocapture, ...)
1055 declare i32 @atoi(i8 zeroext)
1056 declare signext i8 @returns_signed_char()
1058 Note that any attributes for the function result (``nounwind``,
1059 ``readonly``) come immediately after the argument list.
1061 Currently, only the following parameter attributes are defined:
1064 This indicates to the code generator that the parameter or return
1065 value should be zero-extended to the extent required by the target's
1066 ABI by the caller (for a parameter) or the callee (for a return value).
1068 This indicates to the code generator that the parameter or return
1069 value should be sign-extended to the extent required by the target's
1070 ABI (which is usually 32-bits) by the caller (for a parameter) or
1071 the callee (for a return value).
1073 This indicates that this parameter or return value should be treated
1074 in a special target-dependent fashion while emitting code for
1075 a function call or return (usually, by putting it in a register as
1076 opposed to memory, though some targets use it to distinguish between
1077 two different kinds of registers). Use of this attribute is
1080 This indicates that the pointer parameter should really be passed by
1081 value to the function. The attribute implies that a hidden copy of
1082 the pointee is made between the caller and the callee, so the callee
1083 is unable to modify the value in the caller. This attribute is only
1084 valid on LLVM pointer arguments. It is generally used to pass
1085 structs and arrays by value, but is also valid on pointers to
1086 scalars. The copy is considered to belong to the caller not the
1087 callee (for example, ``readonly`` functions should not write to
1088 ``byval`` parameters). This is not a valid attribute for return
1091 The byval type argument indicates the in-memory value type, and
1092 must be the same as the pointee type of the argument.
1094 The byval attribute also supports specifying an alignment with the
1095 align attribute. It indicates the alignment of the stack slot to
1096 form and the known alignment of the pointer specified to the call
1097 site. If the alignment is not specified, then the code generator
1098 makes a target-specific assumption.
1104 The ``byref`` argument attribute allows specifying the pointee
1105 memory type of an argument. This is similar to ``byval``, but does
1106 not imply a copy is made anywhere, or that the argument is passed
1107 on the stack. This implies the pointer is dereferenceable up to
1108 the storage size of the type.
1110 It is not generally permissible to introduce a write to an
1111 ``byref`` pointer. The pointer may have any address space and may
1114 This is not a valid attribute for return values.
1116 The alignment for an ``byref`` parameter can be explicitly
1117 specified by combining it with the ``align`` attribute, similar to
1118 ``byval``. If the alignment is not specified, then the code generator
1119 makes a target-specific assumption.
1121 This is intended for representing ABI constraints, and is not
1122 intended to be inferred for optimization use.
1124 .. _attr_preallocated:
1126 ``preallocated(<ty>)``
1127 This indicates that the pointer parameter should really be passed by
1128 value to the function, and that the pointer parameter's pointee has
1129 already been initialized before the call instruction. This attribute
1130 is only valid on LLVM pointer arguments. The argument must be the value
1131 returned by the appropriate
1132 :ref:`llvm.call.preallocated.arg<int_call_preallocated_arg>` on non
1133 ``musttail`` calls, or the corresponding caller parameter in ``musttail``
1134 calls, although it is ignored during codegen.
1136 A non ``musttail`` function call with a ``preallocated`` attribute in
1137 any parameter must have a ``"preallocated"`` operand bundle. A ``musttail``
1138 function call cannot have a ``"preallocated"`` operand bundle.
1140 The preallocated attribute requires a type argument, which must be
1141 the same as the pointee type of the argument.
1143 The preallocated attribute also supports specifying an alignment with the
1144 align attribute. It indicates the alignment of the stack slot to
1145 form and the known alignment of the pointer specified to the call
1146 site. If the alignment is not specified, then the code generator
1147 makes a target-specific assumption.
1153 The ``inalloca`` argument attribute allows the caller to take the
1154 address of outgoing stack arguments. An ``inalloca`` argument must
1155 be a pointer to stack memory produced by an ``alloca`` instruction.
1156 The alloca, or argument allocation, must also be tagged with the
1157 inalloca keyword. Only the last argument may have the ``inalloca``
1158 attribute, and that argument is guaranteed to be passed in memory.
1160 An argument allocation may be used by a call at most once because
1161 the call may deallocate it. The ``inalloca`` attribute cannot be
1162 used in conjunction with other attributes that affect argument
1163 storage, like ``inreg``, ``nest``, ``sret``, or ``byval``. The
1164 ``inalloca`` attribute also disables LLVM's implicit lowering of
1165 large aggregate return values, which means that frontend authors
1166 must lower them with ``sret`` pointers.
1168 When the call site is reached, the argument allocation must have
1169 been the most recent stack allocation that is still live, or the
1170 behavior is undefined. It is possible to allocate additional stack
1171 space after an argument allocation and before its call site, but it
1172 must be cleared off with :ref:`llvm.stackrestore
1173 <int_stackrestore>`.
1175 The inalloca attribute requires a type argument, which must be the
1176 same as the pointee type of the argument.
1178 See :doc:`InAlloca` for more information on how to use this
1182 This indicates that the pointer parameter specifies the address of a
1183 structure that is the return value of the function in the source
1184 program. This pointer must be guaranteed by the caller to be valid:
1185 loads and stores to the structure may be assumed by the callee not
1186 to trap and to be properly aligned. This is not a valid attribute
1189 The sret type argument specifies the in memory type, which must be
1190 the same as the pointee type of the argument.
1192 .. _attr_elementtype:
1194 ``elementtype(<ty>)``
1196 The ``elementtype`` argument attribute can be used to specify a pointer
1197 element type in a way that is compatible with `opaque pointers
1198 <OpaquePointers.html>`.
1200 The ``elementtype`` attribute by itself does not carry any specific
1201 semantics. However, certain intrinsics may require this attribute to be
1202 present and assign it particular semantics. This will be documented on
1203 individual intrinsics.
1205 The attribute may only be applied to pointer typed arguments of intrinsic
1206 calls. It cannot be applied to non-intrinsic calls, and cannot be applied
1207 to parameters on function declarations. For non-opaque pointers, the type
1208 passed to ``elementtype`` must match the pointer element type.
1212 ``align <n>`` or ``align(<n>)``
1213 This indicates that the pointer value has the specified alignment.
1214 If the pointer value does not have the specified alignment,
1215 :ref:`poison value <poisonvalues>` is returned or passed instead. The
1216 ``align`` attribute should be combined with the ``noundef`` attribute to
1217 ensure a pointer is aligned, or otherwise the behavior is undefined. Note
1218 that ``align 1`` has no effect on non-byval, non-preallocated arguments.
1220 Note that this attribute has additional semantics when combined with the
1221 ``byval`` or ``preallocated`` attribute, which are documented there.
1226 This indicates that memory locations accessed via pointer values
1227 :ref:`based <pointeraliasing>` on the argument or return value are not also
1228 accessed, during the execution of the function, via pointer values not
1229 *based* on the argument or return value. This guarantee only holds for
1230 memory locations that are *modified*, by any means, during the execution of
1231 the function. The attribute on a return value also has additional semantics
1232 described below. The caller shares the responsibility with the callee for
1233 ensuring that these requirements are met. For further details, please see
1234 the discussion of the NoAlias response in :ref:`alias analysis <Must, May,
1237 Note that this definition of ``noalias`` is intentionally similar
1238 to the definition of ``restrict`` in C99 for function arguments.
1240 For function return values, C99's ``restrict`` is not meaningful,
1241 while LLVM's ``noalias`` is. Furthermore, the semantics of the ``noalias``
1242 attribute on return values are stronger than the semantics of the attribute
1243 when used on function arguments. On function return values, the ``noalias``
1244 attribute indicates that the function acts like a system memory allocation
1245 function, returning a pointer to allocated storage disjoint from the
1246 storage for any other object accessible to the caller.
1251 This indicates that the callee does not :ref:`capture <pointercapture>` the
1252 pointer. This is not a valid attribute for return values.
1253 This attribute applies only to the particular copy of the pointer passed in
1254 this argument. A caller could pass two copies of the same pointer with one
1255 being annotated nocapture and the other not, and the callee could validly
1256 capture through the non annotated parameter.
1258 .. code-block:: llvm
1260 define void @f(i8* nocapture %a, i8* %b) {
1264 call void @f(i8* @glb, i8* @glb) ; well-defined
1267 This indicates that callee does not free the pointer argument. This is not
1268 a valid attribute for return values.
1273 This indicates that the pointer parameter can be excised using the
1274 :ref:`trampoline intrinsics <int_trampoline>`. This is not a valid
1275 attribute for return values and can only be applied to one parameter.
1278 This indicates that the function always returns the argument as its return
1279 value. This is a hint to the optimizer and code generator used when
1280 generating the caller, allowing value propagation, tail call optimization,
1281 and omission of register saves and restores in some cases; it is not
1282 checked or enforced when generating the callee. The parameter and the
1283 function return type must be valid operands for the
1284 :ref:`bitcast instruction <i_bitcast>`. This is not a valid attribute for
1285 return values and can only be applied to one parameter.
1288 This indicates that the parameter or return pointer is not null. This
1289 attribute may only be applied to pointer typed parameters. This is not
1290 checked or enforced by LLVM; if the parameter or return pointer is null,
1291 :ref:`poison value <poisonvalues>` is returned or passed instead.
1292 The ``nonnull`` attribute should be combined with the ``noundef`` attribute
1293 to ensure a pointer is not null or otherwise the behavior is undefined.
1295 ``dereferenceable(<n>)``
1296 This indicates that the parameter or return pointer is dereferenceable. This
1297 attribute may only be applied to pointer typed parameters. A pointer that
1298 is dereferenceable can be loaded from speculatively without a risk of
1299 trapping. The number of bytes known to be dereferenceable must be provided
1300 in parentheses. It is legal for the number of bytes to be less than the
1301 size of the pointee type. The ``nonnull`` attribute does not imply
1302 dereferenceability (consider a pointer to one element past the end of an
1303 array), however ``dereferenceable(<n>)`` does imply ``nonnull`` in
1304 ``addrspace(0)`` (which is the default address space), except if the
1305 ``null_pointer_is_valid`` function attribute is present.
1306 ``n`` should be a positive number. The pointer should be well defined,
1307 otherwise it is undefined behavior. This means ``dereferenceable(<n>)``
1308 implies ``noundef``.
1310 ``dereferenceable_or_null(<n>)``
1311 This indicates that the parameter or return value isn't both
1312 non-null and non-dereferenceable (up to ``<n>`` bytes) at the same
1313 time. All non-null pointers tagged with
1314 ``dereferenceable_or_null(<n>)`` are ``dereferenceable(<n>)``.
1315 For address space 0 ``dereferenceable_or_null(<n>)`` implies that
1316 a pointer is exactly one of ``dereferenceable(<n>)`` or ``null``,
1317 and in other address spaces ``dereferenceable_or_null(<n>)``
1318 implies that a pointer is at least one of ``dereferenceable(<n>)``
1319 or ``null`` (i.e. it may be both ``null`` and
1320 ``dereferenceable(<n>)``). This attribute may only be applied to
1321 pointer typed parameters.
1324 This indicates that the parameter is the self/context parameter. This is not
1325 a valid attribute for return values and can only be applied to one
1329 This indicates that the parameter is the asynchronous context parameter and
1330 triggers the creation of a target-specific extended frame record to store
1331 this pointer. This is not a valid attribute for return values and can only
1332 be applied to one parameter.
1335 This attribute is motivated to model and optimize Swift error handling. It
1336 can be applied to a parameter with pointer to pointer type or a
1337 pointer-sized alloca. At the call site, the actual argument that corresponds
1338 to a ``swifterror`` parameter has to come from a ``swifterror`` alloca or
1339 the ``swifterror`` parameter of the caller. A ``swifterror`` value (either
1340 the parameter or the alloca) can only be loaded and stored from, or used as
1341 a ``swifterror`` argument. This is not a valid attribute for return values
1342 and can only be applied to one parameter.
1344 These constraints allow the calling convention to optimize access to
1345 ``swifterror`` variables by associating them with a specific register at
1346 call boundaries rather than placing them in memory. Since this does change
1347 the calling convention, a function which uses the ``swifterror`` attribute
1348 on a parameter is not ABI-compatible with one which does not.
1350 These constraints also allow LLVM to assume that a ``swifterror`` argument
1351 does not alias any other memory visible within a function and that a
1352 ``swifterror`` alloca passed as an argument does not escape.
1355 This indicates the parameter is required to be an immediate
1356 value. This must be a trivial immediate integer or floating-point
1357 constant. Undef or constant expressions are not valid. This is
1358 only valid on intrinsic declarations and cannot be applied to a
1359 call site or arbitrary function.
1362 This attribute applies to parameters and return values. If the value
1363 representation contains any undefined or poison bits, the behavior is
1364 undefined. Note that this does not refer to padding introduced by the
1365 type's storage representation.
1368 This indicates the alignment that should be considered by the backend when
1369 assigning this parameter to a stack slot during calling convention
1370 lowering. The enforcement of the specified alignment is target-dependent,
1371 as target-specific calling convention rules may override this value. This
1372 attribute serves the purpose of carrying language specific alignment
1373 information that is not mapped to base types in the backend (for example,
1374 over-alignment specification through language attributes).
1378 Garbage Collector Strategy Names
1379 --------------------------------
1381 Each function may specify a garbage collector strategy name, which is simply a
1384 .. code-block:: llvm
1386 define void @f() gc "name" { ... }
1388 The supported values of *name* includes those :ref:`built in to LLVM
1389 <builtin-gc-strategies>` and any provided by loaded plugins. Specifying a GC
1390 strategy will cause the compiler to alter its output in order to support the
1391 named garbage collection algorithm. Note that LLVM itself does not contain a
1392 garbage collector, this functionality is restricted to generating machine code
1393 which can interoperate with a collector provided externally.
1400 Prefix data is data associated with a function which the code
1401 generator will emit immediately before the function's entrypoint.
1402 The purpose of this feature is to allow frontends to associate
1403 language-specific runtime metadata with specific functions and make it
1404 available through the function pointer while still allowing the
1405 function pointer to be called.
1407 To access the data for a given function, a program may bitcast the
1408 function pointer to a pointer to the constant's type and dereference
1409 index -1. This implies that the IR symbol points just past the end of
1410 the prefix data. For instance, take the example of a function annotated
1411 with a single ``i32``,
1413 .. code-block:: llvm
1415 define void @f() prefix i32 123 { ... }
1417 The prefix data can be referenced as,
1419 .. code-block:: llvm
1421 %0 = bitcast void* () @f to i32*
1422 %a = getelementptr inbounds i32, i32* %0, i32 -1
1423 %b = load i32, i32* %a
1425 Prefix data is laid out as if it were an initializer for a global variable
1426 of the prefix data's type. The function will be placed such that the
1427 beginning of the prefix data is aligned. This means that if the size
1428 of the prefix data is not a multiple of the alignment size, the
1429 function's entrypoint will not be aligned. If alignment of the
1430 function's entrypoint is desired, padding must be added to the prefix
1433 A function may have prefix data but no body. This has similar semantics
1434 to the ``available_externally`` linkage in that the data may be used by the
1435 optimizers but will not be emitted in the object file.
1442 The ``prologue`` attribute allows arbitrary code (encoded as bytes) to
1443 be inserted prior to the function body. This can be used for enabling
1444 function hot-patching and instrumentation.
1446 To maintain the semantics of ordinary function calls, the prologue data must
1447 have a particular format. Specifically, it must begin with a sequence of
1448 bytes which decode to a sequence of machine instructions, valid for the
1449 module's target, which transfer control to the point immediately succeeding
1450 the prologue data, without performing any other visible action. This allows
1451 the inliner and other passes to reason about the semantics of the function
1452 definition without needing to reason about the prologue data. Obviously this
1453 makes the format of the prologue data highly target dependent.
1455 A trivial example of valid prologue data for the x86 architecture is ``i8 144``,
1456 which encodes the ``nop`` instruction:
1458 .. code-block:: text
1460 define void @f() prologue i8 144 { ... }
1462 Generally prologue data can be formed by encoding a relative branch instruction
1463 which skips the metadata, as in this example of valid prologue data for the
1464 x86_64 architecture, where the first two bytes encode ``jmp .+10``:
1466 .. code-block:: text
1468 %0 = type <{ i8, i8, i8* }>
1470 define void @f() prologue %0 <{ i8 235, i8 8, i8* @md}> { ... }
1472 A function may have prologue data but no body. This has similar semantics
1473 to the ``available_externally`` linkage in that the data may be used by the
1474 optimizers but will not be emitted in the object file.
1478 Personality Function
1479 --------------------
1481 The ``personality`` attribute permits functions to specify what function
1482 to use for exception handling.
1489 Attribute groups are groups of attributes that are referenced by objects within
1490 the IR. They are important for keeping ``.ll`` files readable, because a lot of
1491 functions will use the same set of attributes. In the degenerative case of a
1492 ``.ll`` file that corresponds to a single ``.c`` file, the single attribute
1493 group will capture the important command line flags used to build that file.
1495 An attribute group is a module-level object. To use an attribute group, an
1496 object references the attribute group's ID (e.g. ``#37``). An object may refer
1497 to more than one attribute group. In that situation, the attributes from the
1498 different groups are merged.
1500 Here is an example of attribute groups for a function that should always be
1501 inlined, has a stack alignment of 4, and which shouldn't use SSE instructions:
1503 .. code-block:: llvm
1505 ; Target-independent attributes:
1506 attributes #0 = { alwaysinline alignstack=4 }
1508 ; Target-dependent attributes:
1509 attributes #1 = { "no-sse" }
1511 ; Function @f has attributes: alwaysinline, alignstack=4, and "no-sse".
1512 define void @f() #0 #1 { ... }
1519 Function attributes are set to communicate additional information about
1520 a function. Function attributes are considered to be part of the
1521 function, not of the function type, so functions with different function
1522 attributes can have the same function type.
1524 Function attributes are simple keywords that follow the type specified.
1525 If multiple attributes are needed, they are space separated. For
1528 .. code-block:: llvm
1530 define void @f() noinline { ... }
1531 define void @f() alwaysinline { ... }
1532 define void @f() alwaysinline optsize { ... }
1533 define void @f() optsize { ... }
1536 This attribute indicates that, when emitting the prologue and
1537 epilogue, the backend should forcibly align the stack pointer.
1538 Specify the desired alignment, which must be a power of two, in
1540 ``allocsize(<EltSizeParam>[, <NumEltsParam>])``
1541 This attribute indicates that the annotated function will always return at
1542 least a given number of bytes (or null). Its arguments are zero-indexed
1543 parameter numbers; if one argument is provided, then it's assumed that at
1544 least ``CallSite.Args[EltSizeParam]`` bytes will be available at the
1545 returned pointer. If two are provided, then it's assumed that
1546 ``CallSite.Args[EltSizeParam] * CallSite.Args[NumEltsParam]`` bytes are
1547 available. The referenced parameters must be integer types. No assumptions
1548 are made about the contents of the returned block of memory.
1550 This attribute indicates that the inliner should attempt to inline
1551 this function into callers whenever possible, ignoring any active
1552 inlining size threshold for this caller.
1554 This indicates that the callee function at a call site should be
1555 recognized as a built-in function, even though the function's declaration
1556 uses the ``nobuiltin`` attribute. This is only valid at call sites for
1557 direct calls to functions that are declared with the ``nobuiltin``
1560 This attribute indicates that this function is rarely called. When
1561 computing edge weights, basic blocks post-dominated by a cold
1562 function call are also considered to be cold; and, thus, given low
1565 In some parallel execution models, there exist operations that cannot be
1566 made control-dependent on any additional values. We call such operations
1567 ``convergent``, and mark them with this attribute.
1569 The ``convergent`` attribute may appear on functions or call/invoke
1570 instructions. When it appears on a function, it indicates that calls to
1571 this function should not be made control-dependent on additional values.
1572 For example, the intrinsic ``llvm.nvvm.barrier0`` is ``convergent``, so
1573 calls to this intrinsic cannot be made control-dependent on additional
1576 When it appears on a call/invoke, the ``convergent`` attribute indicates
1577 that we should treat the call as though we're calling a convergent
1578 function. This is particularly useful on indirect calls; without this we
1579 may treat such calls as though the target is non-convergent.
1581 The optimizer may remove the ``convergent`` attribute on functions when it
1582 can prove that the function does not execute any convergent operations.
1583 Similarly, the optimizer may remove ``convergent`` on calls/invokes when it
1584 can prove that the call/invoke cannot call a convergent function.
1585 ``disable_sanitizer_instrumentation``
1586 When instrumenting code with sanitizers, it can be important to skip certain
1587 functions to ensure no instrumentation is applied to them.
1589 This attribute is not always similar to absent ``sanitize_<name>``
1590 attributes: depending on the specific sanitizer, code can be inserted into
1591 functions regardless of the ``sanitize_<name>`` attribute to prevent false
1594 ``disable_sanitizer_instrumentation`` disables all kinds of instrumentation,
1595 taking precedence over the ``sanitize_<name>`` attributes and other compiler
1599 This attribute tells the code generator whether the function
1600 should keep the frame pointer. The code generator may emit the frame pointer
1601 even if this attribute says the frame pointer can be eliminated.
1602 The allowed string values are:
1604 * ``"none"`` (default) - the frame pointer can be eliminated.
1605 * ``"non-leaf"`` - the frame pointer should be kept if the function calls
1607 * ``"all"`` - the frame pointer should be kept.
1609 This attribute indicates that this function is a hot spot of the program
1610 execution. The function will be optimized more aggressively and will be
1611 placed into special subsection of the text section to improving locality.
1613 When profile feedback is enabled, this attribute has the precedence over
1614 the profile information. By marking a function ``hot``, users can work
1615 around the cases where the training input does not have good coverage
1616 on all the hot functions.
1617 ``inaccessiblememonly``
1618 This attribute indicates that the function may only access memory that
1619 is not accessible by the module being compiled. This is a weaker form
1620 of ``readnone``. If the function reads or writes other memory, the
1621 behavior is undefined.
1622 ``inaccessiblemem_or_argmemonly``
1623 This attribute indicates that the function may only access memory that is
1624 either not accessible by the module being compiled, or is pointed to
1625 by its pointer arguments. This is a weaker form of ``argmemonly``. If the
1626 function reads or writes other memory, the behavior is undefined.
1628 This attribute indicates that the source code contained a hint that
1629 inlining this function is desirable (such as the "inline" keyword in
1630 C/C++). It is just a hint; it imposes no requirements on the
1633 This attribute indicates that the function should be added to a
1634 jump-instruction table at code-generation time, and that all address-taken
1635 references to this function should be replaced with a reference to the
1636 appropriate jump-instruction-table function pointer. Note that this creates
1637 a new pointer for the original function, which means that code that depends
1638 on function-pointer identity can break. So, any function annotated with
1639 ``jumptable`` must also be ``unnamed_addr``.
1641 This attribute suggests that optimization passes and code generator
1642 passes make choices that keep the code size of this function as small
1643 as possible and perform optimizations that may sacrifice runtime
1644 performance in order to minimize the size of the generated code.
1646 This attribute disables prologue / epilogue emission for the
1647 function. This can have very system-specific consequences.
1648 ``"no-inline-line-tables"``
1649 When this attribute is set to true, the inliner discards source locations
1650 when inlining code and instead uses the source location of the call site.
1651 Breakpoints set on code that was inlined into the current function will
1652 not fire during the execution of the inlined call sites. If the debugger
1653 stops inside an inlined call site, it will appear to be stopped at the
1654 outermost inlined call site.
1656 When this attribute is set to true, the jump tables and lookup tables that
1657 can be generated from a switch case lowering are disabled.
1659 This indicates that the callee function at a call site is not recognized as
1660 a built-in function. LLVM will retain the original call and not replace it
1661 with equivalent code based on the semantics of the built-in function, unless
1662 the call site uses the ``builtin`` attribute. This is valid at call sites
1663 and on function declarations and definitions.
1665 This attribute indicates that calls to the function cannot be
1666 duplicated. A call to a ``noduplicate`` function may be moved
1667 within its parent function, but may not be duplicated within
1668 its parent function.
1670 A function containing a ``noduplicate`` call may still
1671 be an inlining candidate, provided that the call is not
1672 duplicated by inlining. That implies that the function has
1673 internal linkage and only has one call site, so the original
1674 call is dead after inlining.
1676 This function attribute indicates that the function does not, directly or
1677 transitively, call a memory-deallocation function (``free``, for example)
1678 on a memory allocation which existed before the call.
1680 As a result, uncaptured pointers that are known to be dereferenceable
1681 prior to a call to a function with the ``nofree`` attribute are still
1682 known to be dereferenceable after the call. The capturing condition is
1683 necessary in environments where the function might communicate the
1684 pointer to another thread which then deallocates the memory. Alternatively,
1685 ``nosync`` would ensure such communication cannot happen and even captured
1686 pointers cannot be freed by the function.
1688 A ``nofree`` function is explicitly allowed to free memory which it
1689 allocated or (if not ``nosync``) arrange for another thread to free
1690 memory on it's behalf. As a result, perhaps surprisingly, a ``nofree``
1691 function can return a pointer to a previously deallocated memory object.
1693 Disallows implicit floating-point code. This inhibits optimizations that
1694 use floating-point code and floating-point/SIMD/vector registers for
1695 operations that are not nominally floating-point. LLVM instructions that
1696 perform floating-point operations or require access to floating-point
1697 registers may still cause floating-point code to be generated.
1699 This attribute indicates that the inliner should never inline this
1700 function in any situation. This attribute may not be used together
1701 with the ``alwaysinline`` attribute.
1703 This attribute indicates that calls to this function should never be merged
1704 during optimization. For example, it will prevent tail merging otherwise
1705 identical code sequences that raise an exception or terminate the program.
1706 Tail merging normally reduces the precision of source location information,
1707 making stack traces less useful for debugging. This attribute gives the
1708 user control over the tradeoff between code size and debug information
1711 This attribute suppresses lazy symbol binding for the function. This
1712 may make calls to the function faster, at the cost of extra program
1713 startup time if the function is not called during program startup.
1715 This function attribute prevents instrumentation based profiling, used for
1716 coverage or profile based optimization, from being added to a function,
1719 This attribute indicates that the code generator should not use a
1720 red zone, even if the target-specific ABI normally permits it.
1721 ``indirect-tls-seg-refs``
1722 This attribute indicates that the code generator should not use
1723 direct TLS access through segment registers, even if the
1724 target-specific ABI normally permits it.
1726 This function attribute indicates that the function never returns
1727 normally, hence through a return instruction. This produces undefined
1728 behavior at runtime if the function ever does dynamically return. Annotated
1729 functions may still raise an exception, i.a., ``nounwind`` is not implied.
1731 This function attribute indicates that the function does not call itself
1732 either directly or indirectly down any possible call path. This produces
1733 undefined behavior at runtime if the function ever does recurse.
1735 This function attribute indicates that a call of this function will
1736 either exhibit undefined behavior or comes back and continues execution
1737 at a point in the existing call stack that includes the current invocation.
1738 Annotated functions may still raise an exception, i.a., ``nounwind`` is not implied.
1739 If an invocation of an annotated function does not return control back
1740 to a point in the call stack, the behavior is undefined.
1742 This function attribute indicates that the function does not communicate
1743 (synchronize) with another thread through memory or other well-defined means.
1744 Synchronization is considered possible in the presence of `atomic` accesses
1745 that enforce an order, thus not "unordered" and "monotonic", `volatile` accesses,
1746 as well as `convergent` function calls. Note that through `convergent` function calls
1747 non-memory communication, e.g., cross-lane operations, are possible and are also
1748 considered synchronization. However `convergent` does not contradict `nosync`.
1749 If an annotated function does ever synchronize with another thread,
1750 the behavior is undefined.
1752 This function attribute indicates that the function never raises an
1753 exception. If the function does raise an exception, its runtime
1754 behavior is undefined. However, functions marked nounwind may still
1755 trap or generate asynchronous exceptions. Exception handling schemes
1756 that are recognized by LLVM to handle asynchronous exceptions, such
1757 as SEH, will still provide their implementation defined semantics.
1758 ``nosanitize_coverage``
1759 This attribute indicates that SanitizerCoverage instrumentation is disabled
1761 ``null_pointer_is_valid``
1762 If ``null_pointer_is_valid`` is set, then the ``null`` address
1763 in address-space 0 is considered to be a valid address for memory loads and
1764 stores. Any analysis or optimization should not treat dereferencing a
1765 pointer to ``null`` as undefined behavior in this function.
1766 Note: Comparing address of a global variable to ``null`` may still
1767 evaluate to false because of a limitation in querying this attribute inside
1768 constant expressions.
1770 This attribute indicates that this function should be optimized
1771 for maximum fuzzing signal.
1773 This function attribute indicates that most optimization passes will skip
1774 this function, with the exception of interprocedural optimization passes.
1775 Code generation defaults to the "fast" instruction selector.
1776 This attribute cannot be used together with the ``alwaysinline``
1777 attribute; this attribute is also incompatible
1778 with the ``minsize`` attribute and the ``optsize`` attribute.
1780 This attribute requires the ``noinline`` attribute to be specified on
1781 the function as well, so the function is never inlined into any caller.
1782 Only functions with the ``alwaysinline`` attribute are valid
1783 candidates for inlining into the body of this function.
1785 This attribute suggests that optimization passes and code generator
1786 passes make choices that keep the code size of this function low,
1787 and otherwise do optimizations specifically to reduce code size as
1788 long as they do not significantly impact runtime performance.
1789 ``"patchable-function"``
1790 This attribute tells the code generator that the code
1791 generated for this function needs to follow certain conventions that
1792 make it possible for a runtime function to patch over it later.
1793 The exact effect of this attribute depends on its string value,
1794 for which there currently is one legal possibility:
1796 * ``"prologue-short-redirect"`` - This style of patchable
1797 function is intended to support patching a function prologue to
1798 redirect control away from the function in a thread safe
1799 manner. It guarantees that the first instruction of the
1800 function will be large enough to accommodate a short jump
1801 instruction, and will be sufficiently aligned to allow being
1802 fully changed via an atomic compare-and-swap instruction.
1803 While the first requirement can be satisfied by inserting large
1804 enough NOP, LLVM can and will try to re-purpose an existing
1805 instruction (i.e. one that would have to be emitted anyway) as
1806 the patchable instruction larger than a short jump.
1808 ``"prologue-short-redirect"`` is currently only supported on
1811 This attribute by itself does not imply restrictions on
1812 inter-procedural optimizations. All of the semantic effects the
1813 patching may have to be separately conveyed via the linkage type.
1815 This attribute indicates that the function will trigger a guard region
1816 in the end of the stack. It ensures that accesses to the stack must be
1817 no further apart than the size of the guard region to a previous
1818 access of the stack. It takes one required string value, the name of
1819 the stack probing function that will be called.
1821 If a function that has a ``"probe-stack"`` attribute is inlined into
1822 a function with another ``"probe-stack"`` attribute, the resulting
1823 function has the ``"probe-stack"`` attribute of the caller. If a
1824 function that has a ``"probe-stack"`` attribute is inlined into a
1825 function that has no ``"probe-stack"`` attribute at all, the resulting
1826 function has the ``"probe-stack"`` attribute of the callee.
1828 On a function, this attribute indicates that the function computes its
1829 result (or decides to unwind an exception) based strictly on its arguments,
1830 without dereferencing any pointer arguments or otherwise accessing
1831 any mutable state (e.g. memory, control registers, etc) visible to
1832 caller functions. It does not write through any pointer arguments
1833 (including ``byval`` arguments) and never changes any state visible
1834 to callers. This means while it cannot unwind exceptions by calling
1835 the ``C++`` exception throwing methods (since they write to memory), there may
1836 be non-``C++`` mechanisms that throw exceptions without writing to LLVM
1839 On an argument, this attribute indicates that the function does not
1840 dereference that pointer argument, even though it may read or write the
1841 memory that the pointer points to if accessed through other pointers.
1843 If a readnone function reads or writes memory visible to the program, or
1844 has other side-effects, the behavior is undefined. If a function reads from
1845 or writes to a readnone pointer argument, the behavior is undefined.
1847 On a function, this attribute indicates that the function does not write
1848 through any pointer arguments (including ``byval`` arguments) or otherwise
1849 modify any state (e.g. memory, control registers, etc) visible to
1850 caller functions. It may dereference pointer arguments and read
1851 state that may be set in the caller. A readonly function always
1852 returns the same value (or unwinds an exception identically) when
1853 called with the same set of arguments and global state. This means while it
1854 cannot unwind exceptions by calling the ``C++`` exception throwing methods
1855 (since they write to memory), there may be non-``C++`` mechanisms that throw
1856 exceptions without writing to LLVM visible memory.
1858 On an argument, this attribute indicates that the function does not write
1859 through this pointer argument, even though it may write to the memory that
1860 the pointer points to.
1862 If a readonly function writes memory visible to the program, or
1863 has other side-effects, the behavior is undefined. If a function writes to
1864 a readonly pointer argument, the behavior is undefined.
1865 ``"stack-probe-size"``
1866 This attribute controls the behavior of stack probes: either
1867 the ``"probe-stack"`` attribute, or ABI-required stack probes, if any.
1868 It defines the size of the guard region. It ensures that if the function
1869 may use more stack space than the size of the guard region, stack probing
1870 sequence will be emitted. It takes one required integer value, which
1873 If a function that has a ``"stack-probe-size"`` attribute is inlined into
1874 a function with another ``"stack-probe-size"`` attribute, the resulting
1875 function has the ``"stack-probe-size"`` attribute that has the lower
1876 numeric value. If a function that has a ``"stack-probe-size"`` attribute is
1877 inlined into a function that has no ``"stack-probe-size"`` attribute
1878 at all, the resulting function has the ``"stack-probe-size"`` attribute
1880 ``"no-stack-arg-probe"``
1881 This attribute disables ABI-required stack probes, if any.
1883 On a function, this attribute indicates that the function may write to but
1884 does not read from memory.
1886 On an argument, this attribute indicates that the function may write to but
1887 does not read through this pointer argument (even though it may read from
1888 the memory that the pointer points to).
1890 If a writeonly function reads memory visible to the program, or
1891 has other side-effects, the behavior is undefined. If a function reads
1892 from a writeonly pointer argument, the behavior is undefined.
1894 This attribute indicates that the only memory accesses inside function are
1895 loads and stores from objects pointed to by its pointer-typed arguments,
1896 with arbitrary offsets. Or in other words, all memory operations in the
1897 function can refer to memory only using pointers based on its function
1900 Note that ``argmemonly`` can be used together with ``readonly`` attribute
1901 in order to specify that function reads only from its arguments.
1903 If an argmemonly function reads or writes memory other than the pointer
1904 arguments, or has other side-effects, the behavior is undefined.
1906 This attribute indicates that this function can return twice. The C
1907 ``setjmp`` is an example of such a function. The compiler disables
1908 some optimizations (like tail calls) in the caller of these
1911 This attribute indicates that
1912 `SafeStack <https://clang.llvm.org/docs/SafeStack.html>`_
1913 protection is enabled for this function.
1915 If a function that has a ``safestack`` attribute is inlined into a
1916 function that doesn't have a ``safestack`` attribute or which has an
1917 ``ssp``, ``sspstrong`` or ``sspreq`` attribute, then the resulting
1918 function will have a ``safestack`` attribute.
1919 ``sanitize_address``
1920 This attribute indicates that AddressSanitizer checks
1921 (dynamic address safety analysis) are enabled for this function.
1923 This attribute indicates that MemorySanitizer checks (dynamic detection
1924 of accesses to uninitialized memory) are enabled for this function.
1926 This attribute indicates that ThreadSanitizer checks
1927 (dynamic thread safety analysis) are enabled for this function.
1928 ``sanitize_hwaddress``
1929 This attribute indicates that HWAddressSanitizer checks
1930 (dynamic address safety analysis based on tagged pointers) are enabled for
1933 This attribute indicates that MemTagSanitizer checks
1934 (dynamic address safety analysis based on Armv8 MTE) are enabled for
1936 ``speculative_load_hardening``
1937 This attribute indicates that
1938 `Speculative Load Hardening <https://llvm.org/docs/SpeculativeLoadHardening.html>`_
1939 should be enabled for the function body.
1941 Speculative Load Hardening is a best-effort mitigation against
1942 information leak attacks that make use of control flow
1943 miss-speculation - specifically miss-speculation of whether a branch
1944 is taken or not. Typically vulnerabilities enabling such attacks are
1945 classified as "Spectre variant #1". Notably, this does not attempt to
1946 mitigate against miss-speculation of branch target, classified as
1947 "Spectre variant #2" vulnerabilities.
1949 When inlining, the attribute is sticky. Inlining a function that carries
1950 this attribute will cause the caller to gain the attribute. This is intended
1951 to provide a maximally conservative model where the code in a function
1952 annotated with this attribute will always (even after inlining) end up
1955 This function attribute indicates that the function does not have any
1956 effects besides calculating its result and does not have undefined behavior.
1957 Note that ``speculatable`` is not enough to conclude that along any
1958 particular execution path the number of calls to this function will not be
1959 externally observable. This attribute is only valid on functions
1960 and declarations, not on individual call sites. If a function is
1961 incorrectly marked as speculatable and really does exhibit
1962 undefined behavior, the undefined behavior may be observed even
1963 if the call site is dead code.
1966 This attribute indicates that the function should emit a stack
1967 smashing protector. It is in the form of a "canary" --- a random value
1968 placed on the stack before the local variables that's checked upon
1969 return from the function to see if it has been overwritten. A
1970 heuristic is used to determine if a function needs stack protectors
1971 or not. The heuristic used will enable protectors for functions with:
1973 - Character arrays larger than ``ssp-buffer-size`` (default 8).
1974 - Aggregates containing character arrays larger than ``ssp-buffer-size``.
1975 - Calls to alloca() with variable sizes or constant sizes greater than
1976 ``ssp-buffer-size``.
1978 Variables that are identified as requiring a protector will be arranged
1979 on the stack such that they are adjacent to the stack protector guard.
1981 A function with the ``ssp`` attribute but without the ``alwaysinline``
1982 attribute cannot be inlined into a function without a
1983 ``ssp/sspreq/sspstrong`` attribute. If inlined, the caller will get the
1984 ``ssp`` attribute. ``call``, ``invoke``, and ``callbr`` instructions with
1985 the ``alwaysinline`` attribute force inlining.
1987 This attribute indicates that the function should emit a stack smashing
1988 protector. This attribute causes a strong heuristic to be used when
1989 determining if a function needs stack protectors. The strong heuristic
1990 will enable protectors for functions with:
1992 - Arrays of any size and type
1993 - Aggregates containing an array of any size and type.
1994 - Calls to alloca().
1995 - Local variables that have had their address taken.
1997 Variables that are identified as requiring a protector will be arranged
1998 on the stack such that they are adjacent to the stack protector guard.
1999 The specific layout rules are:
2001 #. Large arrays and structures containing large arrays
2002 (``>= ssp-buffer-size``) are closest to the stack protector.
2003 #. Small arrays and structures containing small arrays
2004 (``< ssp-buffer-size``) are 2nd closest to the protector.
2005 #. Variables that have had their address taken are 3rd closest to the
2008 This overrides the ``ssp`` function attribute.
2010 A function with the ``sspstrong`` attribute but without the
2011 ``alwaysinline`` attribute cannot be inlined into a function without a
2012 ``ssp/sspstrong/sspreq`` attribute. If inlined, the caller will get the
2013 ``sspstrong`` attribute unless the ``sspreq`` attribute exists. ``call``,
2014 ``invoke``, and ``callbr`` instructions with the ``alwaysinline`` attribute
2017 This attribute indicates that the function should *always* emit a stack
2018 smashing protector. This overrides the ``ssp`` and ``sspstrong`` function
2021 Variables that are identified as requiring a protector will be arranged
2022 on the stack such that they are adjacent to the stack protector guard.
2023 The specific layout rules are:
2025 #. Large arrays and structures containing large arrays
2026 (``>= ssp-buffer-size``) are closest to the stack protector.
2027 #. Small arrays and structures containing small arrays
2028 (``< ssp-buffer-size``) are 2nd closest to the protector.
2029 #. Variables that have had their address taken are 3rd closest to the
2032 A function with the ``sspreq`` attribute but without the ``alwaysinline``
2033 attribute cannot be inlined into a function without a
2034 ``ssp/sspstrong/sspreq`` attribute. If inlined, the caller will get the
2035 ``sspreq`` attribute. ``call``, ``invoke``, and ``callbr`` instructions
2036 with the ``alwaysinline`` attribute force inlining.
2039 This attribute indicates that the function was called from a scope that
2040 requires strict floating-point semantics. LLVM will not attempt any
2041 optimizations that require assumptions about the floating-point rounding
2042 mode or that might alter the state of floating-point status flags that
2043 might otherwise be set or cleared by calling this function. LLVM will
2044 not introduce any new floating-point instructions that may trap.
2046 ``"denormal-fp-math"``
2047 This indicates the denormal (subnormal) handling that may be
2048 assumed for the default floating-point environment. This is a
2049 comma separated pair. The elements may be one of ``"ieee"``,
2050 ``"preserve-sign"``, or ``"positive-zero"``. The first entry
2051 indicates the flushing mode for the result of floating point
2052 operations. The second indicates the handling of denormal inputs
2053 to floating point instructions. For compatibility with older
2054 bitcode, if the second value is omitted, both input and output
2055 modes will assume the same mode.
2057 If this is attribute is not specified, the default is
2060 If the output mode is ``"preserve-sign"``, or ``"positive-zero"``,
2061 denormal outputs may be flushed to zero by standard floating-point
2062 operations. It is not mandated that flushing to zero occurs, but if
2063 a denormal output is flushed to zero, it must respect the sign
2064 mode. Not all targets support all modes. While this indicates the
2065 expected floating point mode the function will be executed with,
2066 this does not make any attempt to ensure the mode is
2067 consistent. User or platform code is expected to set the floating
2068 point mode appropriately before function entry.
2070 If the input mode is ``"preserve-sign"``, or ``"positive-zero"``, a
2071 floating-point operation must treat any input denormal value as
2072 zero. In some situations, if an instruction does not respect this
2073 mode, the input may need to be converted to 0 as if by
2074 ``@llvm.canonicalize`` during lowering for correctness.
2076 ``"denormal-fp-math-f32"``
2077 Same as ``"denormal-fp-math"``, but only controls the behavior of
2078 the 32-bit float type (or vectors of 32-bit floats). If both are
2079 are present, this overrides ``"denormal-fp-math"``. Not all targets
2080 support separately setting the denormal mode per type, and no
2081 attempt is made to diagnose unsupported uses. Currently this
2082 attribute is respected by the AMDGPU and NVPTX backends.
2085 This attribute indicates that the function will delegate to some other
2086 function with a tail call. The prototype of a thunk should not be used for
2087 optimization purposes. The caller is expected to cast the thunk prototype to
2088 match the thunk target prototype.
2090 This attribute indicates that the ABI being targeted requires that
2091 an unwind table entry be produced for this function even if we can
2092 show that no exceptions passes by it. This is normally the case for
2093 the ELF x86-64 abi, but it can be disabled for some compilation
2096 This attribute indicates that no control-flow check will be performed on
2097 the attributed entity. It disables -fcf-protection=<> for a specific
2098 entity to fine grain the HW control flow protection mechanism. The flag
2099 is target independent and currently appertains to a function or function
2102 This attribute indicates that the ShadowCallStack checks are enabled for
2103 the function. The instrumentation checks that the return address for the
2104 function has not changed between the function prolog and epilog. It is
2105 currently x86_64-specific.
2107 This attribute indicates that the function is required to return, unwind,
2108 or interact with the environment in an observable way e.g. via a volatile
2109 memory access, I/O, or other synchronization. The ``mustprogress``
2110 attribute is intended to model the requirements of the first section of
2111 [intro.progress] of the C++ Standard. As a consequence, a loop in a
2112 function with the `mustprogress` attribute can be assumed to terminate if
2113 it does not interact with the environment in an observable way, and
2114 terminating loops without side-effects can be removed. If a `mustprogress`
2115 function does not satisfy this contract, the behavior is undefined. This
2116 attribute does not apply transitively to callees, but does apply to call
2117 sites within the function. Note that `willreturn` implies `mustprogress`.
2118 ``"warn-stack-size"="<threshold>"``
2119 This attribute sets a threshold to emit diagnostics once the frame size is
2120 known should the frame size exceed the specified value. It takes one
2121 required integer value, which should be a non-negative integer, and less
2122 than `UINT_MAX`. It's unspecified which threshold will be used when
2123 duplicate definitions are linked together with differing values.
2124 ``vscale_range(<min>[, <max>])``
2125 This attribute indicates the minimum and maximum vscale value for the given
2126 function. A value of 0 means unbounded. If the optional max value is omitted
2127 then max is set to the value of min. If the attribute is not present, no
2128 assumptions are made about the range of vscale.
2130 Call Site Attributes
2131 ----------------------
2133 In addition to function attributes the following call site only
2134 attributes are supported:
2136 ``vector-function-abi-variant``
2137 This attribute can be attached to a :ref:`call <i_call>` to list
2138 the vector functions associated to the function. Notice that the
2139 attribute cannot be attached to a :ref:`invoke <i_invoke>` or a
2140 :ref:`callbr <i_callbr>` instruction. The attribute consists of a
2141 comma separated list of mangled names. The order of the list does
2142 not imply preference (it is logically a set). The compiler is free
2143 to pick any listed vector function of its choosing.
2145 The syntax for the mangled names is as follows:::
2147 _ZGV<isa><mask><vlen><parameters>_<scalar_name>[(<vector_redirection>)]
2149 When present, the attribute informs the compiler that the function
2150 ``<scalar_name>`` has a corresponding vector variant that can be
2151 used to perform the concurrent invocation of ``<scalar_name>`` on
2152 vectors. The shape of the vector function is described by the
2153 tokens between the prefix ``_ZGV`` and the ``<scalar_name>``
2154 token. The standard name of the vector function is
2155 ``_ZGV<isa><mask><vlen><parameters>_<scalar_name>``. When present,
2156 the optional token ``(<vector_redirection>)`` informs the compiler
2157 that a custom name is provided in addition to the standard one
2158 (custom names can be provided for example via the use of ``declare
2159 variant`` in OpenMP 5.0). The declaration of the variant must be
2160 present in the IR Module. The signature of the vector variant is
2161 determined by the rules of the Vector Function ABI (VFABI)
2162 specifications of the target. For Arm and X86, the VFABI can be
2163 found at https://github.com/ARM-software/abi-aa and
2164 https://software.intel.com/content/www/us/en/develop/download/vector-simd-function-abi.html,
2167 For X86 and Arm targets, the values of the tokens in the standard
2168 name are those that are defined in the VFABI. LLVM has an internal
2169 ``<isa>`` token that can be used to create scalar-to-vector
2170 mappings for functions that are not directly associated to any of
2171 the target ISAs (for example, some of the mappings stored in the
2172 TargetLibraryInfo). Valid values for the ``<isa>`` token are:::
2174 <isa>:= b | c | d | e -> X86 SSE, AVX, AVX2, AVX512
2175 | n | s -> Armv8 Advanced SIMD, SVE
2176 | __LLVM__ -> Internal LLVM Vector ISA
2178 For all targets currently supported (x86, Arm and Internal LLVM),
2179 the remaining tokens can have the following values:::
2181 <mask>:= M | N -> mask | no mask
2183 <vlen>:= number -> number of lanes
2184 | x -> VLA (Vector Length Agnostic)
2186 <parameters>:= v -> vector
2187 | l | l <number> -> linear
2188 | R | R <number> -> linear with ref modifier
2189 | L | L <number> -> linear with val modifier
2190 | U | U <number> -> linear with uval modifier
2191 | ls <pos> -> runtime linear
2192 | Rs <pos> -> runtime linear with ref modifier
2193 | Ls <pos> -> runtime linear with val modifier
2194 | Us <pos> -> runtime linear with uval modifier
2197 <scalar_name>:= name of the scalar function
2199 <vector_redirection>:= optional, custom name of the vector function
2201 ``preallocated(<ty>)``
2202 This attribute is required on calls to ``llvm.call.preallocated.arg``
2203 and cannot be used on any other call. See
2204 :ref:`llvm.call.preallocated.arg<int_call_preallocated_arg>` for more
2212 Attributes may be set to communicate additional information about a global variable.
2213 Unlike :ref:`function attributes <fnattrs>`, attributes on a global variable
2214 are grouped into a single :ref:`attribute group <attrgrp>`.
2221 Operand bundles are tagged sets of SSA values that can be associated
2222 with certain LLVM instructions (currently only ``call`` s and
2223 ``invoke`` s). In a way they are like metadata, but dropping them is
2224 incorrect and will change program semantics.
2228 operand bundle set ::= '[' operand bundle (, operand bundle )* ']'
2229 operand bundle ::= tag '(' [ bundle operand ] (, bundle operand )* ')'
2230 bundle operand ::= SSA value
2231 tag ::= string constant
2233 Operand bundles are **not** part of a function's signature, and a
2234 given function may be called from multiple places with different kinds
2235 of operand bundles. This reflects the fact that the operand bundles
2236 are conceptually a part of the ``call`` (or ``invoke``), not the
2237 callee being dispatched to.
2239 Operand bundles are a generic mechanism intended to support
2240 runtime-introspection-like functionality for managed languages. While
2241 the exact semantics of an operand bundle depend on the bundle tag,
2242 there are certain limitations to how much the presence of an operand
2243 bundle can influence the semantics of a program. These restrictions
2244 are described as the semantics of an "unknown" operand bundle. As
2245 long as the behavior of an operand bundle is describable within these
2246 restrictions, LLVM does not need to have special knowledge of the
2247 operand bundle to not miscompile programs containing it.
2249 - The bundle operands for an unknown operand bundle escape in unknown
2250 ways before control is transferred to the callee or invokee.
2251 - Calls and invokes with operand bundles have unknown read / write
2252 effect on the heap on entry and exit (even if the call target is
2253 ``readnone`` or ``readonly``), unless they're overridden with
2254 callsite specific attributes.
2255 - An operand bundle at a call site cannot change the implementation
2256 of the called function. Inter-procedural optimizations work as
2257 usual as long as they take into account the first two properties.
2259 More specific types of operand bundles are described below.
2261 .. _deopt_opbundles:
2263 Deoptimization Operand Bundles
2264 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2266 Deoptimization operand bundles are characterized by the ``"deopt"``
2267 operand bundle tag. These operand bundles represent an alternate
2268 "safe" continuation for the call site they're attached to, and can be
2269 used by a suitable runtime to deoptimize the compiled frame at the
2270 specified call site. There can be at most one ``"deopt"`` operand
2271 bundle attached to a call site. Exact details of deoptimization is
2272 out of scope for the language reference, but it usually involves
2273 rewriting a compiled frame into a set of interpreted frames.
2275 From the compiler's perspective, deoptimization operand bundles make
2276 the call sites they're attached to at least ``readonly``. They read
2277 through all of their pointer typed operands (even if they're not
2278 otherwise escaped) and the entire visible heap. Deoptimization
2279 operand bundles do not capture their operands except during
2280 deoptimization, in which case control will not be returned to the
2283 The inliner knows how to inline through calls that have deoptimization
2284 operand bundles. Just like inlining through a normal call site
2285 involves composing the normal and exceptional continuations, inlining
2286 through a call site with a deoptimization operand bundle needs to
2287 appropriately compose the "safe" deoptimization continuation. The
2288 inliner does this by prepending the parent's deoptimization
2289 continuation to every deoptimization continuation in the inlined body.
2290 E.g. inlining ``@f`` into ``@g`` in the following example
2292 .. code-block:: llvm
2295 call void @x() ;; no deopt state
2296 call void @y() [ "deopt"(i32 10) ]
2297 call void @y() [ "deopt"(i32 10), "unknown"(i8* null) ]
2302 call void @f() [ "deopt"(i32 20) ]
2308 .. code-block:: llvm
2311 call void @x() ;; still no deopt state
2312 call void @y() [ "deopt"(i32 20, i32 10) ]
2313 call void @y() [ "deopt"(i32 20, i32 10), "unknown"(i8* null) ]
2317 It is the frontend's responsibility to structure or encode the
2318 deoptimization state in a way that syntactically prepending the
2319 caller's deoptimization state to the callee's deoptimization state is
2320 semantically equivalent to composing the caller's deoptimization
2321 continuation after the callee's deoptimization continuation.
2325 Funclet Operand Bundles
2326 ^^^^^^^^^^^^^^^^^^^^^^^
2328 Funclet operand bundles are characterized by the ``"funclet"``
2329 operand bundle tag. These operand bundles indicate that a call site
2330 is within a particular funclet. There can be at most one
2331 ``"funclet"`` operand bundle attached to a call site and it must have
2332 exactly one bundle operand.
2334 If any funclet EH pads have been "entered" but not "exited" (per the
2335 `description in the EH doc\ <ExceptionHandling.html#wineh-constraints>`_),
2336 it is undefined behavior to execute a ``call`` or ``invoke`` which:
2338 * does not have a ``"funclet"`` bundle and is not a ``call`` to a nounwind
2340 * has a ``"funclet"`` bundle whose operand is not the most-recently-entered
2341 not-yet-exited funclet EH pad.
2343 Similarly, if no funclet EH pads have been entered-but-not-yet-exited,
2344 executing a ``call`` or ``invoke`` with a ``"funclet"`` bundle is undefined behavior.
2346 GC Transition Operand Bundles
2347 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2349 GC transition operand bundles are characterized by the
2350 ``"gc-transition"`` operand bundle tag. These operand bundles mark a
2351 call as a transition between a function with one GC strategy to a
2352 function with a different GC strategy. If coordinating the transition
2353 between GC strategies requires additional code generation at the call
2354 site, these bundles may contain any values that are needed by the
2355 generated code. For more details, see :ref:`GC Transitions
2356 <gc_transition_args>`.
2358 The bundle contain an arbitrary list of Values which need to be passed
2359 to GC transition code. They will be lowered and passed as operands to
2360 the appropriate GC_TRANSITION nodes in the selection DAG. It is assumed
2361 that these arguments must be available before and after (but not
2362 necessarily during) the execution of the callee.
2364 .. _assume_opbundles:
2366 Assume Operand Bundles
2367 ^^^^^^^^^^^^^^^^^^^^^^
2369 Operand bundles on an :ref:`llvm.assume <int_assume>` allows representing
2370 assumptions that a :ref:`parameter attribute <paramattrs>` or a
2371 :ref:`function attribute <fnattrs>` holds for a certain value at a certain
2372 location. Operand bundles enable assumptions that are either hard or impossible
2373 to represent as a boolean argument of an :ref:`llvm.assume <int_assume>`.
2375 An assume operand bundle has the form:
2379 "<tag>"([ <holds for value> [, <attribute argument>] ])
2381 * The tag of the operand bundle is usually the name of attribute that can be
2382 assumed to hold. It can also be `ignore`, this tag doesn't contain any
2383 information and should be ignored.
2384 * The first argument if present is the value for which the attribute hold.
2385 * The second argument if present is an argument of the attribute.
2387 If there are no arguments the attribute is a property of the call location.
2389 If the represented attribute expects a constant argument, the argument provided
2390 to the operand bundle should be a constant as well.
2394 .. code-block:: llvm
2396 call void @llvm.assume(i1 true) ["align"(i32* %val, i32 8)]
2398 allows the optimizer to assume that at location of call to
2399 :ref:`llvm.assume <int_assume>` ``%val`` has an alignment of at least 8.
2401 .. code-block:: llvm
2403 call void @llvm.assume(i1 %cond) ["cold"(), "nonnull"(i64* %val)]
2405 allows the optimizer to assume that the :ref:`llvm.assume <int_assume>`
2406 call location is cold and that ``%val`` may not be null.
2408 Just like for the argument of :ref:`llvm.assume <int_assume>`, if any of the
2409 provided guarantees are violated at runtime the behavior is undefined.
2411 Even if the assumed property can be encoded as a boolean value, like
2412 ``nonnull``, using operand bundles to express the property can still have
2415 * Attributes that can be expressed via operand bundles are directly the
2416 property that the optimizer uses and cares about. Encoding attributes as
2417 operand bundles removes the need for an instruction sequence that represents
2418 the property (e.g., `icmp ne i32* %p, null` for `nonnull`) and for the
2419 optimizer to deduce the property from that instruction sequence.
2420 * Expressing the property using operand bundles makes it easy to identify the
2421 use of the value as a use in an :ref:`llvm.assume <int_assume>`. This then
2422 simplifies and improves heuristics, e.g., for use "use-sensitive"
2425 .. _ob_preallocated:
2427 Preallocated Operand Bundles
2428 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2430 Preallocated operand bundles are characterized by the ``"preallocated"``
2431 operand bundle tag. These operand bundles allow separation of the allocation
2432 of the call argument memory from the call site. This is necessary to pass
2433 non-trivially copyable objects by value in a way that is compatible with MSVC
2434 on some targets. There can be at most one ``"preallocated"`` operand bundle
2435 attached to a call site and it must have exactly one bundle operand, which is
2436 a token generated by ``@llvm.call.preallocated.setup``. A call with this
2437 operand bundle should not adjust the stack before entering the function, as
2438 that will have been done by one of the ``@llvm.call.preallocated.*`` intrinsics.
2440 .. code-block:: llvm
2442 %foo = type { i64, i32 }
2446 %t = call token @llvm.call.preallocated.setup(i32 1)
2447 %a = call i8* @llvm.call.preallocated.arg(token %t, i32 0) preallocated(%foo)
2448 %b = bitcast i8* %a to %foo*
2450 call void @bar(i32 42, %foo* preallocated(%foo) %b) ["preallocated"(token %t)]
2454 GC Live Operand Bundles
2455 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2457 A "gc-live" operand bundle is only valid on a :ref:`gc.statepoint <gc_statepoint>`
2458 intrinsic. The operand bundle must contain every pointer to a garbage collected
2459 object which potentially needs to be updated by the garbage collector.
2461 When lowered, any relocated value will be recorded in the corresponding
2462 :ref:`stackmap entry <statepoint-stackmap-format>`. See the intrinsic description
2463 for further details.
2465 ObjC ARC Attached Call Operand Bundles
2466 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2468 A ``"clang.arc.attachedcall`` operand bundle on a call indicates the call is
2469 implicitly followed by a marker instruction and a call to an ObjC runtime
2470 function that uses the result of the call. If the argument passed to the operand
2471 bundle is 0, ``@objc_retainAutoreleasedReturnValue`` is called. If 1 is passed,
2472 ``@objc_unsafeClaimAutoreleasedReturnValue`` is called. The return value of a
2473 call with this bundle is used by a call to ``@llvm.objc.clang.arc.noop.use``
2474 unless the called function's return type is void, in which case the operand
2477 The operand bundle is needed to ensure the call is immediately followed by the
2478 marker instruction or the ObjC runtime call in the final output.
2482 Module-Level Inline Assembly
2483 ----------------------------
2485 Modules may contain "module-level inline asm" blocks, which corresponds
2486 to the GCC "file scope inline asm" blocks. These blocks are internally
2487 concatenated by LLVM and treated as a single unit, but may be separated
2488 in the ``.ll`` file if desired. The syntax is very simple:
2490 .. code-block:: llvm
2492 module asm "inline asm code goes here"
2493 module asm "more can go here"
2495 The strings can contain any character by escaping non-printable
2496 characters. The escape sequence used is simply "\\xx" where "xx" is the
2497 two digit hex code for the number.
2499 Note that the assembly string *must* be parseable by LLVM's integrated assembler
2500 (unless it is disabled), even when emitting a ``.s`` file.
2502 .. _langref_datalayout:
2507 A module may specify a target specific data layout string that specifies
2508 how data is to be laid out in memory. The syntax for the data layout is
2511 .. code-block:: llvm
2513 target datalayout = "layout specification"
2515 The *layout specification* consists of a list of specifications
2516 separated by the minus sign character ('-'). Each specification starts
2517 with a letter and may include other information after the letter to
2518 define some aspect of the data layout. The specifications accepted are
2522 Specifies that the target lays out data in big-endian form. That is,
2523 the bits with the most significance have the lowest address
2526 Specifies that the target lays out data in little-endian form. That
2527 is, the bits with the least significance have the lowest address
2530 Specifies the natural alignment of the stack in bits. Alignment
2531 promotion of stack variables is limited to the natural stack
2532 alignment to avoid dynamic stack realignment. The stack alignment
2533 must be a multiple of 8-bits. If omitted, the natural stack
2534 alignment defaults to "unspecified", which does not prevent any
2535 alignment promotions.
2536 ``P<address space>``
2537 Specifies the address space that corresponds to program memory.
2538 Harvard architectures can use this to specify what space LLVM
2539 should place things such as functions into. If omitted, the
2540 program memory space defaults to the default address space of 0,
2541 which corresponds to a Von Neumann architecture that has code
2542 and data in the same space.
2543 ``G<address space>``
2544 Specifies the address space to be used by default when creating global
2545 variables. If omitted, the globals address space defaults to the default
2547 Note: variable declarations without an address space are always created in
2548 address space 0, this property only affects the default value to be used
2549 when creating globals without additional contextual information (e.g. in
2551 ``A<address space>``
2552 Specifies the address space of objects created by '``alloca``'.
2553 Defaults to the default address space of 0.
2554 ``p[n]:<size>:<abi>:<pref>:<idx>``
2555 This specifies the *size* of a pointer and its ``<abi>`` and
2556 ``<pref>``\erred alignments for address space ``n``. The fourth parameter
2557 ``<idx>`` is a size of index that used for address calculation. If not
2558 specified, the default index size is equal to the pointer size. All sizes
2559 are in bits. The address space, ``n``, is optional, and if not specified,
2560 denotes the default address space 0. The value of ``n`` must be
2561 in the range [1,2^23).
2562 ``i<size>:<abi>:<pref>``
2563 This specifies the alignment for an integer type of a given bit
2564 ``<size>``. The value of ``<size>`` must be in the range [1,2^23).
2565 ``v<size>:<abi>:<pref>``
2566 This specifies the alignment for a vector type of a given bit
2568 ``f<size>:<abi>:<pref>``
2569 This specifies the alignment for a floating-point type of a given bit
2570 ``<size>``. Only values of ``<size>`` that are supported by the target
2571 will work. 32 (float) and 64 (double) are supported on all targets; 80
2572 or 128 (different flavors of long double) are also supported on some
2575 This specifies the alignment for an object of aggregate type.
2577 This specifies the alignment for function pointers.
2578 The options for ``<type>`` are:
2580 * ``i``: The alignment of function pointers is independent of the alignment
2581 of functions, and is a multiple of ``<abi>``.
2582 * ``n``: The alignment of function pointers is a multiple of the explicit
2583 alignment specified on the function, and is a multiple of ``<abi>``.
2585 If present, specifies that llvm names are mangled in the output. Symbols
2586 prefixed with the mangling escape character ``\01`` are passed through
2587 directly to the assembler without the escape character. The mangling style
2590 * ``e``: ELF mangling: Private symbols get a ``.L`` prefix.
2591 * ``m``: Mips mangling: Private symbols get a ``$`` prefix.
2592 * ``o``: Mach-O mangling: Private symbols get ``L`` prefix. Other
2593 symbols get a ``_`` prefix.
2594 * ``x``: Windows x86 COFF mangling: Private symbols get the usual prefix.
2595 Regular C symbols get a ``_`` prefix. Functions with ``__stdcall``,
2596 ``__fastcall``, and ``__vectorcall`` have custom mangling that appends
2597 ``@N`` where N is the number of bytes used to pass parameters. C++ symbols
2598 starting with ``?`` are not mangled in any way.
2599 * ``w``: Windows COFF mangling: Similar to ``x``, except that normal C
2600 symbols do not receive a ``_`` prefix.
2601 * ``a``: XCOFF mangling: Private symbols get a ``L..`` prefix.
2602 ``n<size1>:<size2>:<size3>...``
2603 This specifies a set of native integer widths for the target CPU in
2604 bits. For example, it might contain ``n32`` for 32-bit PowerPC,
2605 ``n32:64`` for PowerPC 64, or ``n8:16:32:64`` for X86-64. Elements of
2606 this set are considered to support most general arithmetic operations
2608 ``ni:<address space0>:<address space1>:<address space2>...``
2609 This specifies pointer types with the specified address spaces
2610 as :ref:`Non-Integral Pointer Type <nointptrtype>` s. The ``0``
2611 address space cannot be specified as non-integral.
2613 On every specification that takes a ``<abi>:<pref>``, specifying the
2614 ``<pref>`` alignment is optional. If omitted, the preceding ``:``
2615 should be omitted too and ``<pref>`` will be equal to ``<abi>``.
2617 When constructing the data layout for a given target, LLVM starts with a
2618 default set of specifications which are then (possibly) overridden by
2619 the specifications in the ``datalayout`` keyword. The default
2620 specifications are given in this list:
2622 - ``E`` - big endian
2623 - ``p:64:64:64`` - 64-bit pointers with 64-bit alignment.
2624 - ``p[n]:64:64:64`` - Other address spaces are assumed to be the
2625 same as the default address space.
2626 - ``S0`` - natural stack alignment is unspecified
2627 - ``i1:8:8`` - i1 is 8-bit (byte) aligned
2628 - ``i8:8:8`` - i8 is 8-bit (byte) aligned
2629 - ``i16:16:16`` - i16 is 16-bit aligned
2630 - ``i32:32:32`` - i32 is 32-bit aligned
2631 - ``i64:32:64`` - i64 has ABI alignment of 32-bits but preferred
2632 alignment of 64-bits
2633 - ``f16:16:16`` - half is 16-bit aligned
2634 - ``f32:32:32`` - float is 32-bit aligned
2635 - ``f64:64:64`` - double is 64-bit aligned
2636 - ``f128:128:128`` - quad is 128-bit aligned
2637 - ``v64:64:64`` - 64-bit vector is 64-bit aligned
2638 - ``v128:128:128`` - 128-bit vector is 128-bit aligned
2639 - ``a:0:64`` - aggregates are 64-bit aligned
2641 When LLVM is determining the alignment for a given type, it uses the
2644 #. If the type sought is an exact match for one of the specifications,
2645 that specification is used.
2646 #. If no match is found, and the type sought is an integer type, then
2647 the smallest integer type that is larger than the bitwidth of the
2648 sought type is used. If none of the specifications are larger than
2649 the bitwidth then the largest integer type is used. For example,
2650 given the default specifications above, the i7 type will use the
2651 alignment of i8 (next largest) while both i65 and i256 will use the
2652 alignment of i64 (largest specified).
2653 #. If no match is found, and the type sought is a vector type, then the
2654 largest vector type that is smaller than the sought vector type will
2655 be used as a fall back. This happens because <128 x double> can be
2656 implemented in terms of 64 <2 x double>, for example.
2658 The function of the data layout string may not be what you expect.
2659 Notably, this is not a specification from the frontend of what alignment
2660 the code generator should use.
2662 Instead, if specified, the target data layout is required to match what
2663 the ultimate *code generator* expects. This string is used by the
2664 mid-level optimizers to improve code, and this only works if it matches
2665 what the ultimate code generator uses. There is no way to generate IR
2666 that does not embed this target-specific detail into the IR. If you
2667 don't specify the string, the default specifications will be used to
2668 generate a Data Layout and the optimization phases will operate
2669 accordingly and introduce target specificity into the IR with respect to
2670 these default specifications.
2677 A module may specify a target triple string that describes the target
2678 host. The syntax for the target triple is simply:
2680 .. code-block:: llvm
2682 target triple = "x86_64-apple-macosx10.7.0"
2684 The *target triple* string consists of a series of identifiers delimited
2685 by the minus sign character ('-'). The canonical forms are:
2689 ARCHITECTURE-VENDOR-OPERATING_SYSTEM
2690 ARCHITECTURE-VENDOR-OPERATING_SYSTEM-ENVIRONMENT
2692 This information is passed along to the backend so that it generates
2693 code for the proper architecture. It's possible to override this on the
2694 command line with the ``-mtriple`` command line option.
2699 ----------------------
2701 A memory object, or simply object, is a region of a memory space that is
2702 reserved by a memory allocation such as :ref:`alloca <i_alloca>`, heap
2703 allocation calls, and global variable definitions.
2704 Once it is allocated, the bytes stored in the region can only be read or written
2705 through a pointer that is :ref:`based on <pointeraliasing>` the allocation
2707 If a pointer that is not based on the object tries to read or write to the
2708 object, it is undefined behavior.
2710 A lifetime of a memory object is a property that decides its accessibility.
2711 Unless stated otherwise, a memory object is alive since its allocation, and
2712 dead after its deallocation.
2713 It is undefined behavior to access a memory object that isn't alive, but
2714 operations that don't dereference it such as
2715 :ref:`getelementptr <i_getelementptr>`, :ref:`ptrtoint <i_ptrtoint>` and
2716 :ref:`icmp <i_icmp>` return a valid result.
2717 This explains code motion of these instructions across operations that
2718 impact the object's lifetime.
2719 A stack object's lifetime can be explicitly specified using
2720 :ref:`llvm.lifetime.start <int_lifestart>` and
2721 :ref:`llvm.lifetime.end <int_lifeend>` intrinsic function calls.
2723 .. _pointeraliasing:
2725 Pointer Aliasing Rules
2726 ----------------------
2728 Any memory access must be done through a pointer value associated with
2729 an address range of the memory access, otherwise the behavior is
2730 undefined. Pointer values are associated with address ranges according
2731 to the following rules:
2733 - A pointer value is associated with the addresses associated with any
2734 value it is *based* on.
2735 - An address of a global variable is associated with the address range
2736 of the variable's storage.
2737 - The result value of an allocation instruction is associated with the
2738 address range of the allocated storage.
2739 - A null pointer in the default address-space is associated with no
2741 - An :ref:`undef value <undefvalues>` in *any* address-space is
2742 associated with no address.
2743 - An integer constant other than zero or a pointer value returned from
2744 a function not defined within LLVM may be associated with address
2745 ranges allocated through mechanisms other than those provided by
2746 LLVM. Such ranges shall not overlap with any ranges of addresses
2747 allocated by mechanisms provided by LLVM.
2749 A pointer value is *based* on another pointer value according to the
2752 - A pointer value formed from a scalar ``getelementptr`` operation is *based* on
2753 the pointer-typed operand of the ``getelementptr``.
2754 - The pointer in lane *l* of the result of a vector ``getelementptr`` operation
2755 is *based* on the pointer in lane *l* of the vector-of-pointers-typed operand
2756 of the ``getelementptr``.
2757 - The result value of a ``bitcast`` is *based* on the operand of the
2759 - A pointer value formed by an ``inttoptr`` is *based* on all pointer
2760 values that contribute (directly or indirectly) to the computation of
2761 the pointer's value.
2762 - The "*based* on" relationship is transitive.
2764 Note that this definition of *"based"* is intentionally similar to the
2765 definition of *"based"* in C99, though it is slightly weaker.
2767 LLVM IR does not associate types with memory. The result type of a
2768 ``load`` merely indicates the size and alignment of the memory from
2769 which to load, as well as the interpretation of the value. The first
2770 operand type of a ``store`` similarly only indicates the size and
2771 alignment of the store.
2773 Consequently, type-based alias analysis, aka TBAA, aka
2774 ``-fstrict-aliasing``, is not applicable to general unadorned LLVM IR.
2775 :ref:`Metadata <metadata>` may be used to encode additional information
2776 which specialized optimization passes may use to implement type-based
2784 Given a function call and a pointer that is passed as an argument or stored in
2785 the memory before the call, a pointer is *captured* by the call if it makes a
2786 copy of any part of the pointer that outlives the call.
2787 To be precise, a pointer is captured if one or more of the following conditions
2790 1. The call stores any bit of the pointer carrying information into a place,
2791 and the stored bits can be read from the place by the caller after this call
2794 .. code-block:: llvm
2796 @glb = global i8* null
2797 @glb2 = global i8* null
2798 @glb3 = global i8* null
2799 @glbi = global i32 0
2801 define i8* @f(i8* %a, i8* %b, i8* %c, i8* %d, i8* %e) {
2802 store i8* %a, i8** @glb ; %a is captured by this call
2804 store i8* %b, i8** @glb2 ; %b isn't captured because the stored value is overwritten by the store below
2805 store i8* null, i8** @glb2
2807 store i8* %c, i8** @glb3
2808 call void @g() ; If @g makes a copy of %c that outlives this call (@f), %c is captured
2809 store i8* null, i8** @glb3
2811 %i = ptrtoint i8* %d to i64
2812 %j = trunc i64 %i to i32
2813 store i32 %j, i32* @glbi ; %d is captured
2815 ret i8* %e ; %e is captured
2818 2. The call stores any bit of the pointer carrying information into a place,
2819 and the stored bits can be safely read from the place by another thread via
2822 .. code-block:: llvm
2824 @lock = global i1 true
2826 define void @f(i8* %a) {
2827 store i8* %a, i8** @glb
2828 store atomic i1 false, i1* @lock release ; %a is captured because another thread can safely read @glb
2829 store i8* null, i8** @glb
2833 3. The call's behavior depends on any bit of the pointer carrying information.
2835 .. code-block:: llvm
2839 define void @f(i8* %a) {
2840 %c = icmp eq i8* %a, @glb
2841 br i1 %c, label %BB_EXIT, label %BB_CONTINUE ; escapes %a
2849 4. The pointer is used in a volatile access as its address.
2854 Volatile Memory Accesses
2855 ------------------------
2857 Certain memory accesses, such as :ref:`load <i_load>`'s,
2858 :ref:`store <i_store>`'s, and :ref:`llvm.memcpy <int_memcpy>`'s may be
2859 marked ``volatile``. The optimizers must not change the number of
2860 volatile operations or change their order of execution relative to other
2861 volatile operations. The optimizers *may* change the order of volatile
2862 operations relative to non-volatile operations. This is not Java's
2863 "volatile" and has no cross-thread synchronization behavior.
2865 A volatile load or store may have additional target-specific semantics.
2866 Any volatile operation can have side effects, and any volatile operation
2867 can read and/or modify state which is not accessible via a regular load
2868 or store in this module. Volatile operations may use addresses which do
2869 not point to memory (like MMIO registers). This means the compiler may
2870 not use a volatile operation to prove a non-volatile access to that
2871 address has defined behavior.
2873 The allowed side-effects for volatile accesses are limited. If a
2874 non-volatile store to a given address would be legal, a volatile
2875 operation may modify the memory at that address. A volatile operation
2876 may not modify any other memory accessible by the module being compiled.
2877 A volatile operation may not call any code in the current module.
2879 The compiler may assume execution will continue after a volatile operation,
2880 so operations which modify memory or may have undefined behavior can be
2881 hoisted past a volatile operation.
2883 As an exception to the preceding rule, the compiler may not assume execution
2884 will continue after a volatile store operation. This restriction is necessary
2885 to support the somewhat common pattern in C of intentionally storing to an
2886 invalid pointer to crash the program. In the future, it might make sense to
2887 allow frontends to control this behavior.
2889 IR-level volatile loads and stores cannot safely be optimized into llvm.memcpy
2890 or llvm.memmove intrinsics even when those intrinsics are flagged volatile.
2891 Likewise, the backend should never split or merge target-legal volatile
2892 load/store instructions. Similarly, IR-level volatile loads and stores cannot
2893 change from integer to floating-point or vice versa.
2895 .. admonition:: Rationale
2897 Platforms may rely on volatile loads and stores of natively supported
2898 data width to be executed as single instruction. For example, in C
2899 this holds for an l-value of volatile primitive type with native
2900 hardware support, but not necessarily for aggregate types. The
2901 frontend upholds these expectations, which are intentionally
2902 unspecified in the IR. The rules above ensure that IR transformations
2903 do not violate the frontend's contract with the language.
2907 Memory Model for Concurrent Operations
2908 --------------------------------------
2910 The LLVM IR does not define any way to start parallel threads of
2911 execution or to register signal handlers. Nonetheless, there are
2912 platform-specific ways to create them, and we define LLVM IR's behavior
2913 in their presence. This model is inspired by the C++0x memory model.
2915 For a more informal introduction to this model, see the :doc:`Atomics`.
2917 We define a *happens-before* partial order as the least partial order
2920 - Is a superset of single-thread program order, and
2921 - When a *synchronizes-with* ``b``, includes an edge from ``a`` to
2922 ``b``. *Synchronizes-with* pairs are introduced by platform-specific
2923 techniques, like pthread locks, thread creation, thread joining,
2924 etc., and by atomic instructions. (See also :ref:`Atomic Memory Ordering
2925 Constraints <ordering>`).
2927 Note that program order does not introduce *happens-before* edges
2928 between a thread and signals executing inside that thread.
2930 Every (defined) read operation (load instructions, memcpy, atomic
2931 loads/read-modify-writes, etc.) R reads a series of bytes written by
2932 (defined) write operations (store instructions, atomic
2933 stores/read-modify-writes, memcpy, etc.). For the purposes of this
2934 section, initialized globals are considered to have a write of the
2935 initializer which is atomic and happens before any other read or write
2936 of the memory in question. For each byte of a read R, R\ :sub:`byte`
2937 may see any write to the same byte, except:
2939 - If write\ :sub:`1` happens before write\ :sub:`2`, and
2940 write\ :sub:`2` happens before R\ :sub:`byte`, then
2941 R\ :sub:`byte` does not see write\ :sub:`1`.
2942 - If R\ :sub:`byte` happens before write\ :sub:`3`, then
2943 R\ :sub:`byte` does not see write\ :sub:`3`.
2945 Given that definition, R\ :sub:`byte` is defined as follows:
2947 - If R is volatile, the result is target-dependent. (Volatile is
2948 supposed to give guarantees which can support ``sig_atomic_t`` in
2949 C/C++, and may be used for accesses to addresses that do not behave
2950 like normal memory. It does not generally provide cross-thread
2952 - Otherwise, if there is no write to the same byte that happens before
2953 R\ :sub:`byte`, R\ :sub:`byte` returns ``undef`` for that byte.
2954 - Otherwise, if R\ :sub:`byte` may see exactly one write,
2955 R\ :sub:`byte` returns the value written by that write.
2956 - Otherwise, if R is atomic, and all the writes R\ :sub:`byte` may
2957 see are atomic, it chooses one of the values written. See the :ref:`Atomic
2958 Memory Ordering Constraints <ordering>` section for additional
2959 constraints on how the choice is made.
2960 - Otherwise R\ :sub:`byte` returns ``undef``.
2962 R returns the value composed of the series of bytes it read. This
2963 implies that some bytes within the value may be ``undef`` **without**
2964 the entire value being ``undef``. Note that this only defines the
2965 semantics of the operation; it doesn't mean that targets will emit more
2966 than one instruction to read the series of bytes.
2968 Note that in cases where none of the atomic intrinsics are used, this
2969 model places only one restriction on IR transformations on top of what
2970 is required for single-threaded execution: introducing a store to a byte
2971 which might not otherwise be stored is not allowed in general.
2972 (Specifically, in the case where another thread might write to and read
2973 from an address, introducing a store can change a load that may see
2974 exactly one write into a load that may see multiple writes.)
2978 Atomic Memory Ordering Constraints
2979 ----------------------------------
2981 Atomic instructions (:ref:`cmpxchg <i_cmpxchg>`,
2982 :ref:`atomicrmw <i_atomicrmw>`, :ref:`fence <i_fence>`,
2983 :ref:`atomic load <i_load>`, and :ref:`atomic store <i_store>`) take
2984 ordering parameters that determine which other atomic instructions on
2985 the same address they *synchronize with*. These semantics are borrowed
2986 from Java and C++0x, but are somewhat more colloquial. If these
2987 descriptions aren't precise enough, check those specs (see spec
2988 references in the :doc:`atomics guide <Atomics>`).
2989 :ref:`fence <i_fence>` instructions treat these orderings somewhat
2990 differently since they don't take an address. See that instruction's
2991 documentation for details.
2993 For a simpler introduction to the ordering constraints, see the
2997 The set of values that can be read is governed by the happens-before
2998 partial order. A value cannot be read unless some operation wrote
2999 it. This is intended to provide a guarantee strong enough to model
3000 Java's non-volatile shared variables. This ordering cannot be
3001 specified for read-modify-write operations; it is not strong enough
3002 to make them atomic in any interesting way.
3004 In addition to the guarantees of ``unordered``, there is a single
3005 total order for modifications by ``monotonic`` operations on each
3006 address. All modification orders must be compatible with the
3007 happens-before order. There is no guarantee that the modification
3008 orders can be combined to a global total order for the whole program
3009 (and this often will not be possible). The read in an atomic
3010 read-modify-write operation (:ref:`cmpxchg <i_cmpxchg>` and
3011 :ref:`atomicrmw <i_atomicrmw>`) reads the value in the modification
3012 order immediately before the value it writes. If one atomic read
3013 happens before another atomic read of the same address, the later
3014 read must see the same value or a later value in the address's
3015 modification order. This disallows reordering of ``monotonic`` (or
3016 stronger) operations on the same address. If an address is written
3017 ``monotonic``-ally by one thread, and other threads ``monotonic``-ally
3018 read that address repeatedly, the other threads must eventually see
3019 the write. This corresponds to the C++0x/C1x
3020 ``memory_order_relaxed``.
3022 In addition to the guarantees of ``monotonic``, a
3023 *synchronizes-with* edge may be formed with a ``release`` operation.
3024 This is intended to model C++'s ``memory_order_acquire``.
3026 In addition to the guarantees of ``monotonic``, if this operation
3027 writes a value which is subsequently read by an ``acquire``
3028 operation, it *synchronizes-with* that operation. (This isn't a
3029 complete description; see the C++0x definition of a release
3030 sequence.) This corresponds to the C++0x/C1x
3031 ``memory_order_release``.
3032 ``acq_rel`` (acquire+release)
3033 Acts as both an ``acquire`` and ``release`` operation on its
3034 address. This corresponds to the C++0x/C1x ``memory_order_acq_rel``.
3035 ``seq_cst`` (sequentially consistent)
3036 In addition to the guarantees of ``acq_rel`` (``acquire`` for an
3037 operation that only reads, ``release`` for an operation that only
3038 writes), there is a global total order on all
3039 sequentially-consistent operations on all addresses, which is
3040 consistent with the *happens-before* partial order and with the
3041 modification orders of all the affected addresses. Each
3042 sequentially-consistent read sees the last preceding write to the
3043 same address in this global order. This corresponds to the C++0x/C1x
3044 ``memory_order_seq_cst`` and Java volatile.
3048 If an atomic operation is marked ``syncscope("singlethread")``, it only
3049 *synchronizes with* and only participates in the seq\_cst total orderings of
3050 other operations running in the same thread (for example, in signal handlers).
3052 If an atomic operation is marked ``syncscope("<target-scope>")``, where
3053 ``<target-scope>`` is a target specific synchronization scope, then it is target
3054 dependent if it *synchronizes with* and participates in the seq\_cst total
3055 orderings of other operations.
3057 Otherwise, an atomic operation that is not marked ``syncscope("singlethread")``
3058 or ``syncscope("<target-scope>")`` *synchronizes with* and participates in the
3059 seq\_cst total orderings of other operations that are not marked
3060 ``syncscope("singlethread")`` or ``syncscope("<target-scope>")``.
3064 Floating-Point Environment
3065 --------------------------
3067 The default LLVM floating-point environment assumes that floating-point
3068 instructions do not have side effects. Results assume the round-to-nearest
3069 rounding mode. No floating-point exception state is maintained in this
3070 environment. Therefore, there is no attempt to create or preserve invalid
3071 operation (SNaN) or division-by-zero exceptions.
3073 The benefit of this exception-free assumption is that floating-point
3074 operations may be speculated freely without any other fast-math relaxations
3075 to the floating-point model.
3077 Code that requires different behavior than this should use the
3078 :ref:`Constrained Floating-Point Intrinsics <constrainedfp>`.
3085 LLVM IR floating-point operations (:ref:`fneg <i_fneg>`, :ref:`fadd <i_fadd>`,
3086 :ref:`fsub <i_fsub>`, :ref:`fmul <i_fmul>`, :ref:`fdiv <i_fdiv>`,
3087 :ref:`frem <i_frem>`, :ref:`fcmp <i_fcmp>`), :ref:`phi <i_phi>`,
3088 :ref:`select <i_select>` and :ref:`call <i_call>`
3089 may use the following flags to enable otherwise unsafe
3090 floating-point transformations.
3093 No NaNs - Allow optimizations to assume the arguments and result are not
3094 NaN. If an argument is a nan, or the result would be a nan, it produces
3095 a :ref:`poison value <poisonvalues>` instead.
3098 No Infs - Allow optimizations to assume the arguments and result are not
3099 +/-Inf. If an argument is +/-Inf, or the result would be +/-Inf, it
3100 produces a :ref:`poison value <poisonvalues>` instead.
3103 No Signed Zeros - Allow optimizations to treat the sign of a zero
3104 argument or result as insignificant. This does not imply that -0.0
3105 is poison and/or guaranteed to not exist in the operation.
3108 Allow Reciprocal - Allow optimizations to use the reciprocal of an
3109 argument rather than perform division.
3112 Allow floating-point contraction (e.g. fusing a multiply followed by an
3113 addition into a fused multiply-and-add). This does not enable reassociating
3114 to form arbitrary contractions. For example, ``(a*b) + (c*d) + e`` can not
3115 be transformed into ``(a*b) + ((c*d) + e)`` to create two fma operations.
3118 Approximate functions - Allow substitution of approximate calculations for
3119 functions (sin, log, sqrt, etc). See floating-point intrinsic definitions
3120 for places where this can apply to LLVM's intrinsic math functions.
3123 Allow reassociation transformations for floating-point instructions.
3124 This may dramatically change results in floating-point.
3127 This flag implies all of the others.
3131 Use-list Order Directives
3132 -------------------------
3134 Use-list directives encode the in-memory order of each use-list, allowing the
3135 order to be recreated. ``<order-indexes>`` is a comma-separated list of
3136 indexes that are assigned to the referenced value's uses. The referenced
3137 value's use-list is immediately sorted by these indexes.
3139 Use-list directives may appear at function scope or global scope. They are not
3140 instructions, and have no effect on the semantics of the IR. When they're at
3141 function scope, they must appear after the terminator of the final basic block.
3143 If basic blocks have their address taken via ``blockaddress()`` expressions,
3144 ``uselistorder_bb`` can be used to reorder their use-lists from outside their
3151 uselistorder <ty> <value>, { <order-indexes> }
3152 uselistorder_bb @function, %block { <order-indexes> }
3158 define void @foo(i32 %arg1, i32 %arg2) {
3160 ; ... instructions ...
3162 ; ... instructions ...
3164 ; At function scope.
3165 uselistorder i32 %arg1, { 1, 0, 2 }
3166 uselistorder label %bb, { 1, 0 }
3170 uselistorder i32* @global, { 1, 2, 0 }
3171 uselistorder i32 7, { 1, 0 }
3172 uselistorder i32 (i32) @bar, { 1, 0 }
3173 uselistorder_bb @foo, %bb, { 5, 1, 3, 2, 0, 4 }
3175 .. _source_filename:
3180 The *source filename* string is set to the original module identifier,
3181 which will be the name of the compiled source file when compiling from
3182 source through the clang front end, for example. It is then preserved through
3185 This is currently necessary to generate a consistent unique global
3186 identifier for local functions used in profile data, which prepends the
3187 source file name to the local function name.
3189 The syntax for the source file name is simply:
3191 .. code-block:: text
3193 source_filename = "/path/to/source.c"
3200 The LLVM type system is one of the most important features of the
3201 intermediate representation. Being typed enables a number of
3202 optimizations to be performed on the intermediate representation
3203 directly, without having to do extra analyses on the side before the
3204 transformation. A strong type system makes it easier to read the
3205 generated code and enables novel analyses and transformations that are
3206 not feasible to perform on normal three address code representations.
3216 The void type does not represent any value and has no size.
3234 The function type can be thought of as a function signature. It consists of a
3235 return type and a list of formal parameter types. The return type of a function
3236 type is a void type or first class type --- except for :ref:`label <t_label>`
3237 and :ref:`metadata <t_metadata>` types.
3243 <returntype> (<parameter list>)
3245 ...where '``<parameter list>``' is a comma-separated list of type
3246 specifiers. Optionally, the parameter list may include a type ``...``, which
3247 indicates that the function takes a variable number of arguments. Variable
3248 argument functions can access their arguments with the :ref:`variable argument
3249 handling intrinsic <int_varargs>` functions. '``<returntype>``' is any type
3250 except :ref:`label <t_label>` and :ref:`metadata <t_metadata>`.
3254 +---------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
3255 | ``i32 (i32)`` | function taking an ``i32``, returning an ``i32`` |
3256 +---------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
3257 | ``float (i16, i32 *) *`` | :ref:`Pointer <t_pointer>` to a function that takes an ``i16`` and a :ref:`pointer <t_pointer>` to ``i32``, returning ``float``. |
3258 +---------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
3259 | ``i32 (i8*, ...)`` | A vararg function that takes at least one :ref:`pointer <t_pointer>` to ``i8`` (char in C), which returns an integer. This is the signature for ``printf`` in LLVM. |
3260 +---------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
3261 | ``{i32, i32} (i32)`` | A function taking an ``i32``, returning a :ref:`structure <t_struct>` containing two ``i32`` values |
3262 +---------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
3269 The :ref:`first class <t_firstclass>` types are perhaps the most important.
3270 Values of these types are the only ones which can be produced by
3278 These are the types that are valid in registers from CodeGen's perspective.
3287 The integer type is a very simple type that simply specifies an
3288 arbitrary bit width for the integer type desired. Any bit width from 1
3289 bit to 2\ :sup:`23`\ -1 (about 8 million) can be specified.
3297 The number of bits the integer will occupy is specified by the ``N``
3303 +----------------+------------------------------------------------+
3304 | ``i1`` | a single-bit integer. |
3305 +----------------+------------------------------------------------+
3306 | ``i32`` | a 32-bit integer. |
3307 +----------------+------------------------------------------------+
3308 | ``i1942652`` | a really big integer of over 1 million bits. |
3309 +----------------+------------------------------------------------+
3313 Floating-Point Types
3314 """"""""""""""""""""
3323 - 16-bit floating-point value
3326 - 16-bit "brain" floating-point value (7-bit significand). Provides the
3327 same number of exponent bits as ``float``, so that it matches its dynamic
3328 range, but with greatly reduced precision. Used in Intel's AVX-512 BF16
3329 extensions and Arm's ARMv8.6-A extensions, among others.
3332 - 32-bit floating-point value
3335 - 64-bit floating-point value
3338 - 128-bit floating-point value (113-bit significand)
3341 - 80-bit floating-point value (X87)
3344 - 128-bit floating-point value (two 64-bits)
3346 The binary format of half, float, double, and fp128 correspond to the
3347 IEEE-754-2008 specifications for binary16, binary32, binary64, and binary128
3355 The x86_amx type represents a value held in an AMX tile register on an x86
3356 machine. The operations allowed on it are quite limited. Only few intrinsics
3357 are allowed: stride load and store, zero and dot product. No instruction is
3358 allowed for this type. There are no arguments, arrays, pointers, vectors
3359 or constants of this type.
3373 The x86_mmx type represents a value held in an MMX register on an x86
3374 machine. The operations allowed on it are quite limited: parameters and
3375 return values, load and store, and bitcast. User-specified MMX
3376 instructions are represented as intrinsic or asm calls with arguments
3377 and/or results of this type. There are no arrays, vectors or constants
3394 The pointer type is used to specify memory locations. Pointers are
3395 commonly used to reference objects in memory.
3397 Pointer types may have an optional address space attribute defining the
3398 numbered address space where the pointed-to object resides. The default
3399 address space is number zero. The semantics of non-zero address spaces
3400 are target-specific.
3402 Note that LLVM does not permit pointers to void (``void*``) nor does it
3403 permit pointers to labels (``label*``). Use ``i8*`` instead.
3405 LLVM is in the process of transitioning to
3406 `opaque pointers <OpaquePointers.html#opaque-pointers>`_.
3407 Opaque pointers do not have a pointee type. Rather, instructions
3408 interacting through pointers specify the type of the underlying memory
3409 they are interacting with. Opaque pointers are still in the process of
3410 being worked on and are not complete.
3421 +-------------------------+--------------------------------------------------------------------------------------------------------------+
3422 | ``[4 x i32]*`` | A :ref:`pointer <t_pointer>` to :ref:`array <t_array>` of four ``i32`` values. |
3423 +-------------------------+--------------------------------------------------------------------------------------------------------------+
3424 | ``i32 (i32*) *`` | A :ref:`pointer <t_pointer>` to a :ref:`function <t_function>` that takes an ``i32*``, returning an ``i32``. |
3425 +-------------------------+--------------------------------------------------------------------------------------------------------------+
3426 | ``i32 addrspace(5)*`` | A :ref:`pointer <t_pointer>` to an ``i32`` value that resides in address space 5. |
3427 +-------------------------+--------------------------------------------------------------------------------------------------------------+
3428 | ``ptr`` | An opaque pointer type to a value that resides in address space 0. |
3429 +-------------------------+--------------------------------------------------------------------------------------------------------------+
3430 | ``ptr addrspace(5)`` | An opaque pointer type to a value that resides in address space 5. |
3431 +-------------------------+--------------------------------------------------------------------------------------------------------------+
3440 A vector type is a simple derived type that represents a vector of
3441 elements. Vector types are used when multiple primitive data are
3442 operated in parallel using a single instruction (SIMD). A vector type
3443 requires a size (number of elements), an underlying primitive data type,
3444 and a scalable property to represent vectors where the exact hardware
3445 vector length is unknown at compile time. Vector types are considered
3446 :ref:`first class <t_firstclass>`.
3450 In general vector elements are laid out in memory in the same way as
3451 :ref:`array types <t_array>`. Such an analogy works fine as long as the vector
3452 elements are byte sized. However, when the elements of the vector aren't byte
3453 sized it gets a bit more complicated. One way to describe the layout is by
3454 describing what happens when a vector such as <N x iM> is bitcasted to an
3455 integer type with N*M bits, and then following the rules for storing such an
3458 A bitcast from a vector type to a scalar integer type will see the elements
3459 being packed together (without padding). The order in which elements are
3460 inserted in the integer depends on endianess. For little endian element zero
3461 is put in the least significant bits of the integer, and for big endian
3462 element zero is put in the most significant bits.
3464 Using a vector such as ``<i4 1, i4 2, i4 3, i4 5>`` as an example, together
3465 with the analogy that we can replace a vector store by a bitcast followed by
3466 an integer store, we get this for big endian:
3468 .. code-block:: llvm
3470 %val = bitcast <4 x i4> <i4 1, i4 2, i4 3, i4 5> to i16
3472 ; Bitcasting from a vector to an integral type can be seen as
3473 ; concatenating the values:
3474 ; %val now has the hexadecimal value 0x1235.
3476 store i16 %val, i16* %ptr
3478 ; In memory the content will be (8-bit addressing):
3480 ; [%ptr + 0]: 00010010 (0x12)
3481 ; [%ptr + 1]: 00110101 (0x35)
3483 The same example for little endian:
3485 .. code-block:: llvm
3487 %val = bitcast <4 x i4> <i4 1, i4 2, i4 3, i4 5> to i16
3489 ; Bitcasting from a vector to an integral type can be seen as
3490 ; concatenating the values:
3491 ; %val now has the hexadecimal value 0x5321.
3493 store i16 %val, i16* %ptr
3495 ; In memory the content will be (8-bit addressing):
3497 ; [%ptr + 0]: 01010011 (0x53)
3498 ; [%ptr + 1]: 00100001 (0x21)
3500 When ``<N*M>`` isn't evenly divisible by the byte size the exact memory layout
3501 is unspecified (just like it is for an integral type of the same size). This
3502 is because different targets could put the padding at different positions when
3503 the type size is smaller than the type's store size.
3509 < <# elements> x <elementtype> > ; Fixed-length vector
3510 < vscale x <# elements> x <elementtype> > ; Scalable vector
3512 The number of elements is a constant integer value larger than 0;
3513 elementtype may be any integer, floating-point or pointer type. Vectors
3514 of size zero are not allowed. For scalable vectors, the total number of
3515 elements is a constant multiple (called vscale) of the specified number
3516 of elements; vscale is a positive integer that is unknown at compile time
3517 and the same hardware-dependent constant for all scalable vectors at run
3518 time. The size of a specific scalable vector type is thus constant within
3519 IR, even if the exact size in bytes cannot be determined until run time.
3523 +------------------------+----------------------------------------------------+
3524 | ``<4 x i32>`` | Vector of 4 32-bit integer values. |
3525 +------------------------+----------------------------------------------------+
3526 | ``<8 x float>`` | Vector of 8 32-bit floating-point values. |
3527 +------------------------+----------------------------------------------------+
3528 | ``<2 x i64>`` | Vector of 2 64-bit integer values. |
3529 +------------------------+----------------------------------------------------+
3530 | ``<4 x i64*>`` | Vector of 4 pointers to 64-bit integer values. |
3531 +------------------------+----------------------------------------------------+
3532 | ``<vscale x 4 x i32>`` | Vector with a multiple of 4 32-bit integer values. |
3533 +------------------------+----------------------------------------------------+
3542 The label type represents code labels.
3557 The token type is used when a value is associated with an instruction
3558 but all uses of the value must not attempt to introspect or obscure it.
3559 As such, it is not appropriate to have a :ref:`phi <i_phi>` or
3560 :ref:`select <i_select>` of type token.
3577 The metadata type represents embedded metadata. No derived types may be
3578 created from metadata except for :ref:`function <t_function>` arguments.
3591 Aggregate Types are a subset of derived types that can contain multiple
3592 member types. :ref:`Arrays <t_array>` and :ref:`structs <t_struct>` are
3593 aggregate types. :ref:`Vectors <t_vector>` are not considered to be
3603 The array type is a very simple derived type that arranges elements
3604 sequentially in memory. The array type requires a size (number of
3605 elements) and an underlying data type.
3611 [<# elements> x <elementtype>]
3613 The number of elements is a constant integer value; ``elementtype`` may
3614 be any type with a size.
3618 +------------------+--------------------------------------+
3619 | ``[40 x i32]`` | Array of 40 32-bit integer values. |
3620 +------------------+--------------------------------------+
3621 | ``[41 x i32]`` | Array of 41 32-bit integer values. |
3622 +------------------+--------------------------------------+
3623 | ``[4 x i8]`` | Array of 4 8-bit integer values. |
3624 +------------------+--------------------------------------+
3626 Here are some examples of multidimensional arrays:
3628 +-----------------------------+----------------------------------------------------------+
3629 | ``[3 x [4 x i32]]`` | 3x4 array of 32-bit integer values. |
3630 +-----------------------------+----------------------------------------------------------+
3631 | ``[12 x [10 x float]]`` | 12x10 array of single precision floating-point values. |
3632 +-----------------------------+----------------------------------------------------------+
3633 | ``[2 x [3 x [4 x i16]]]`` | 2x3x4 array of 16-bit integer values. |
3634 +-----------------------------+----------------------------------------------------------+
3636 There is no restriction on indexing beyond the end of the array implied
3637 by a static type (though there are restrictions on indexing beyond the
3638 bounds of an allocated object in some cases). This means that
3639 single-dimension 'variable sized array' addressing can be implemented in
3640 LLVM with a zero length array type. An implementation of 'pascal style
3641 arrays' in LLVM could use the type "``{ i32, [0 x float]}``", for
3651 The structure type is used to represent a collection of data members
3652 together in memory. The elements of a structure may be any type that has
3655 Structures in memory are accessed using '``load``' and '``store``' by
3656 getting a pointer to a field with the '``getelementptr``' instruction.
3657 Structures in registers are accessed using the '``extractvalue``' and
3658 '``insertvalue``' instructions.
3660 Structures may optionally be "packed" structures, which indicate that
3661 the alignment of the struct is one byte, and that there is no padding
3662 between the elements. In non-packed structs, padding between field types
3663 is inserted as defined by the DataLayout string in the module, which is
3664 required to match what the underlying code generator expects.
3666 Structures can either be "literal" or "identified". A literal structure
3667 is defined inline with other types (e.g. ``{i32, i32}*``) whereas
3668 identified types are always defined at the top level with a name.
3669 Literal types are uniqued by their contents and can never be recursive
3670 or opaque since there is no way to write one. Identified types can be
3671 recursive, can be opaqued, and are never uniqued.
3677 %T1 = type { <type list> } ; Identified normal struct type
3678 %T2 = type <{ <type list> }> ; Identified packed struct type
3682 +------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
3683 | ``{ i32, i32, i32 }`` | A triple of three ``i32`` values |
3684 +------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
3685 | ``{ float, i32 (i32) * }`` | A pair, where the first element is a ``float`` and the second element is a :ref:`pointer <t_pointer>` to a :ref:`function <t_function>` that takes an ``i32``, returning an ``i32``. |
3686 +------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
3687 | ``<{ i8, i32 }>`` | A packed struct known to be 5 bytes in size. |
3688 +------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
3692 Opaque Structure Types
3693 """"""""""""""""""""""
3697 Opaque structure types are used to represent structure types that
3698 do not have a body specified. This corresponds (for example) to the C
3699 notion of a forward declared structure. They can be named (``%X``) or
3711 +--------------+-------------------+
3712 | ``opaque`` | An opaque type. |
3713 +--------------+-------------------+
3720 LLVM has several different basic types of constants. This section
3721 describes them all and their syntax.
3726 **Boolean constants**
3727 The two strings '``true``' and '``false``' are both valid constants
3729 **Integer constants**
3730 Standard integers (such as '4') are constants of the
3731 :ref:`integer <t_integer>` type. Negative numbers may be used with
3733 **Floating-point constants**
3734 Floating-point constants use standard decimal notation (e.g.
3735 123.421), exponential notation (e.g. 1.23421e+2), or a more precise
3736 hexadecimal notation (see below). The assembler requires the exact
3737 decimal value of a floating-point constant. For example, the
3738 assembler accepts 1.25 but rejects 1.3 because 1.3 is a repeating
3739 decimal in binary. Floating-point constants must have a
3740 :ref:`floating-point <t_floating>` type.
3741 **Null pointer constants**
3742 The identifier '``null``' is recognized as a null pointer constant
3743 and must be of :ref:`pointer type <t_pointer>`.
3745 The identifier '``none``' is recognized as an empty token constant
3746 and must be of :ref:`token type <t_token>`.
3748 The one non-intuitive notation for constants is the hexadecimal form of
3749 floating-point constants. For example, the form
3750 '``double 0x432ff973cafa8000``' is equivalent to (but harder to read
3751 than) '``double 4.5e+15``'. The only time hexadecimal floating-point
3752 constants are required (and the only time that they are generated by the
3753 disassembler) is when a floating-point constant must be emitted but it
3754 cannot be represented as a decimal floating-point number in a reasonable
3755 number of digits. For example, NaN's, infinities, and other special
3756 values are represented in their IEEE hexadecimal format so that assembly
3757 and disassembly do not cause any bits to change in the constants.
3759 When using the hexadecimal form, constants of types bfloat, half, float, and
3760 double are represented using the 16-digit form shown above (which matches the
3761 IEEE754 representation for double); bfloat, half and float values must, however,
3762 be exactly representable as bfloat, IEEE 754 half, and IEEE 754 single
3763 precision respectively. Hexadecimal format is always used for long double, and
3764 there are three forms of long double. The 80-bit format used by x86 is
3765 represented as ``0xK`` followed by 20 hexadecimal digits. The 128-bit format
3766 used by PowerPC (two adjacent doubles) is represented by ``0xM`` followed by 32
3767 hexadecimal digits. The IEEE 128-bit format is represented by ``0xL`` followed
3768 by 32 hexadecimal digits. Long doubles will only work if they match the long
3769 double format on your target. The IEEE 16-bit format (half precision) is
3770 represented by ``0xH`` followed by 4 hexadecimal digits. The bfloat 16-bit
3771 format is represented by ``0xR`` followed by 4 hexadecimal digits. All
3772 hexadecimal formats are big-endian (sign bit at the left).
3774 There are no constants of type x86_mmx and x86_amx.
3776 .. _complexconstants:
3781 Complex constants are a (potentially recursive) combination of simple
3782 constants and smaller complex constants.
3784 **Structure constants**
3785 Structure constants are represented with notation similar to
3786 structure type definitions (a comma separated list of elements,
3787 surrounded by braces (``{}``)). For example:
3788 "``{ i32 4, float 17.0, i32* @G }``", where "``@G``" is declared as
3789 "``@G = external global i32``". Structure constants must have
3790 :ref:`structure type <t_struct>`, and the number and types of elements
3791 must match those specified by the type.
3793 Array constants are represented with notation similar to array type
3794 definitions (a comma separated list of elements, surrounded by
3795 square brackets (``[]``)). For example:
3796 "``[ i32 42, i32 11, i32 74 ]``". Array constants must have
3797 :ref:`array type <t_array>`, and the number and types of elements must
3798 match those specified by the type. As a special case, character array
3799 constants may also be represented as a double-quoted string using the ``c``
3800 prefix. For example: "``c"Hello World\0A\00"``".
3801 **Vector constants**
3802 Vector constants are represented with notation similar to vector
3803 type definitions (a comma separated list of elements, surrounded by
3804 less-than/greater-than's (``<>``)). For example:
3805 "``< i32 42, i32 11, i32 74, i32 100 >``". Vector constants
3806 must have :ref:`vector type <t_vector>`, and the number and types of
3807 elements must match those specified by the type.
3808 **Zero initialization**
3809 The string '``zeroinitializer``' can be used to zero initialize a
3810 value to zero of *any* type, including scalar and
3811 :ref:`aggregate <t_aggregate>` types. This is often used to avoid
3812 having to print large zero initializers (e.g. for large arrays) and
3813 is always exactly equivalent to using explicit zero initializers.
3815 A metadata node is a constant tuple without types. For example:
3816 "``!{!0, !{!2, !0}, !"test"}``". Metadata can reference constant values,
3817 for example: "``!{!0, i32 0, i8* @global, i64 (i64)* @function, !"str"}``".
3818 Unlike other typed constants that are meant to be interpreted as part of
3819 the instruction stream, metadata is a place to attach additional
3820 information such as debug info.
3822 Global Variable and Function Addresses
3823 --------------------------------------
3825 The addresses of :ref:`global variables <globalvars>` and
3826 :ref:`functions <functionstructure>` are always implicitly valid
3827 (link-time) constants. These constants are explicitly referenced when
3828 the :ref:`identifier for the global <identifiers>` is used and always have
3829 :ref:`pointer <t_pointer>` type. For example, the following is a legal LLVM
3832 .. code-block:: llvm
3836 @Z = global [2 x i32*] [ i32* @X, i32* @Y ]
3843 The string '``undef``' can be used anywhere a constant is expected, and
3844 indicates that the user of the value may receive an unspecified
3845 bit-pattern. Undefined values may be of any type (other than '``label``'
3846 or '``void``') and be used anywhere a constant is permitted.
3848 Undefined values are useful because they indicate to the compiler that
3849 the program is well defined no matter what value is used. This gives the
3850 compiler more freedom to optimize. Here are some examples of
3851 (potentially surprising) transformations that are valid (in pseudo IR):
3853 .. code-block:: llvm
3863 This is safe because all of the output bits are affected by the undef
3864 bits. Any output bit can have a zero or one depending on the input bits.
3866 .. code-block:: llvm
3874 %A = %X ;; By choosing undef as 0
3875 %B = %X ;; By choosing undef as -1
3880 These logical operations have bits that are not always affected by the
3881 input. For example, if ``%X`` has a zero bit, then the output of the
3882 '``and``' operation will always be a zero for that bit, no matter what
3883 the corresponding bit from the '``undef``' is. As such, it is unsafe to
3884 optimize or assume that the result of the '``and``' is '``undef``'.
3885 However, it is safe to assume that all bits of the '``undef``' could be
3886 0, and optimize the '``and``' to 0. Likewise, it is safe to assume that
3887 all the bits of the '``undef``' operand to the '``or``' could be set,
3888 allowing the '``or``' to be folded to -1.
3890 .. code-block:: llvm
3892 %A = select undef, %X, %Y
3893 %B = select undef, 42, %Y
3894 %C = select %X, %Y, undef
3904 This set of examples shows that undefined '``select``' (and conditional
3905 branch) conditions can go *either way*, but they have to come from one
3906 of the two operands. In the ``%A`` example, if ``%X`` and ``%Y`` were
3907 both known to have a clear low bit, then ``%A`` would have to have a
3908 cleared low bit. However, in the ``%C`` example, the optimizer is
3909 allowed to assume that the '``undef``' operand could be the same as
3910 ``%Y``, allowing the whole '``select``' to be eliminated.
3912 .. code-block:: llvm
3914 %A = xor undef, undef
3931 This example points out that two '``undef``' operands are not
3932 necessarily the same. This can be surprising to people (and also matches
3933 C semantics) where they assume that "``X^X``" is always zero, even if
3934 ``X`` is undefined. This isn't true for a number of reasons, but the
3935 short answer is that an '``undef``' "variable" can arbitrarily change
3936 its value over its "live range". This is true because the variable
3937 doesn't actually *have a live range*. Instead, the value is logically
3938 read from arbitrary registers that happen to be around when needed, so
3939 the value is not necessarily consistent over time. In fact, ``%A`` and
3940 ``%C`` need to have the same semantics or the core LLVM "replace all
3941 uses with" concept would not hold.
3943 To ensure all uses of a given register observe the same value (even if
3944 '``undef``'), the :ref:`freeze instruction <i_freeze>` can be used.
3946 .. code-block:: llvm
3954 These examples show the crucial difference between an *undefined value*
3955 and *undefined behavior*. An undefined value (like '``undef``') is
3956 allowed to have an arbitrary bit-pattern. This means that the ``%A``
3957 operation can be constant folded to '``0``', because the '``undef``'
3958 could be zero, and zero divided by any value is zero.
3959 However, in the second example, we can make a more aggressive
3960 assumption: because the ``undef`` is allowed to be an arbitrary value,
3961 we are allowed to assume that it could be zero. Since a divide by zero
3962 has *undefined behavior*, we are allowed to assume that the operation
3963 does not execute at all. This allows us to delete the divide and all
3964 code after it. Because the undefined operation "can't happen", the
3965 optimizer can assume that it occurs in dead code.
3967 .. code-block:: text
3969 a: store undef -> %X
3970 b: store %X -> undef
3975 A store *of* an undefined value can be assumed to not have any effect;
3976 we can assume that the value is overwritten with bits that happen to
3977 match what was already there. However, a store *to* an undefined
3978 location could clobber arbitrary memory, therefore, it has undefined
3981 Branching on an undefined value is undefined behavior.
3982 This explains optimizations that depend on branch conditions to construct
3983 predicates, such as Correlated Value Propagation and Global Value Numbering.
3984 In case of switch instruction, the branch condition should be frozen, otherwise
3985 it is undefined behavior.
3987 .. code-block:: llvm
3990 br undef, BB1, BB2 ; UB
3992 %X = and i32 undef, 255
3993 switch %X, label %ret [ .. ] ; UB
3995 store undef, i8* %ptr
3996 %X = load i8* %ptr ; %X is undef
3997 switch i8 %X, label %ret [ .. ] ; UB
4000 %X = or i8 undef, 255 ; always 255
4001 switch i8 %X, label %ret [ .. ] ; Well-defined
4003 %X = freeze i1 undef
4004 br %X, BB1, BB2 ; Well-defined (non-deterministic jump)
4007 This is also consistent with the behavior of MemorySanitizer.
4008 MemorySanitizer, detector of uses of uninitialized memory,
4009 defines a branch with condition that depends on an undef value (or
4010 certain other values, like e.g. a result of a load from heap-allocated
4011 memory that has never been stored to) to have an externally visible
4012 side effect. For this reason functions with *sanitize_memory*
4013 attribute are not allowed to produce such branches "out of thin
4014 air". More strictly, an optimization that inserts a conditional branch
4015 is only valid if in all executions where the branch condition has at
4016 least one undefined bit, the same branch condition is evaluated in the
4024 A poison value is a result of an erroneous operation.
4025 In order to facilitate speculative execution, many instructions do not
4026 invoke immediate undefined behavior when provided with illegal operands,
4027 and return a poison value instead.
4028 The string '``poison``' can be used anywhere a constant is expected, and
4029 operations such as :ref:`add <i_add>` with the ``nsw`` flag can produce
4032 Poison value behavior is defined in terms of value *dependence*:
4034 - Values other than :ref:`phi <i_phi>` nodes, :ref:`select <i_select>`, and
4035 :ref:`freeze <i_freeze>` instructions depend on their operands.
4036 - :ref:`Phi <i_phi>` nodes depend on the operand corresponding to
4037 their dynamic predecessor basic block.
4038 - :ref:`Select <i_select>` instructions depend on their condition operand and
4039 their selected operand.
4040 - Function arguments depend on the corresponding actual argument values
4041 in the dynamic callers of their functions.
4042 - :ref:`Call <i_call>` instructions depend on the :ref:`ret <i_ret>`
4043 instructions that dynamically transfer control back to them.
4044 - :ref:`Invoke <i_invoke>` instructions depend on the
4045 :ref:`ret <i_ret>`, :ref:`resume <i_resume>`, or exception-throwing
4046 call instructions that dynamically transfer control back to them.
4047 - Non-volatile loads and stores depend on the most recent stores to all
4048 of the referenced memory addresses, following the order in the IR
4049 (including loads and stores implied by intrinsics such as
4050 :ref:`@llvm.memcpy <int_memcpy>`.)
4051 - An instruction with externally visible side effects depends on the
4052 most recent preceding instruction with externally visible side
4053 effects, following the order in the IR. (This includes :ref:`volatile
4054 operations <volatile>`.)
4055 - An instruction *control-depends* on a :ref:`terminator
4056 instruction <terminators>` if the terminator instruction has
4057 multiple successors and the instruction is always executed when
4058 control transfers to one of the successors, and may not be executed
4059 when control is transferred to another.
4060 - Additionally, an instruction also *control-depends* on a terminator
4061 instruction if the set of instructions it otherwise depends on would
4062 be different if the terminator had transferred control to a different
4064 - Dependence is transitive.
4065 - Vector elements may be independently poisoned. Therefore, transforms
4066 on instructions such as shufflevector must be careful to propagate
4067 poison across values or elements only as allowed by the original code.
4069 An instruction that *depends* on a poison value, produces a poison value
4070 itself. A poison value may be relaxed into an
4071 :ref:`undef value <undefvalues>`, which takes an arbitrary bit-pattern.
4072 Propagation of poison can be stopped with the
4073 :ref:`freeze instruction <i_freeze>`.
4075 This means that immediate undefined behavior occurs if a poison value is
4076 used as an instruction operand that has any values that trigger undefined
4077 behavior. Notably this includes (but is not limited to):
4079 - The pointer operand of a :ref:`load <i_load>`, :ref:`store <i_store>` or
4080 any other pointer dereferencing instruction (independent of address
4082 - The divisor operand of a ``udiv``, ``sdiv``, ``urem`` or ``srem``
4084 - The condition operand of a :ref:`br <i_br>` instruction.
4085 - The callee operand of a :ref:`call <i_call>` or :ref:`invoke <i_invoke>`
4087 - The parameter operand of a :ref:`call <i_call>` or :ref:`invoke <i_invoke>`
4088 instruction, when the function or invoking call site has a ``noundef``
4089 attribute in the corresponding position.
4090 - The operand of a :ref:`ret <i_ret>` instruction if the function or invoking
4091 call site has a `noundef` attribute in the return value position.
4093 Here are some examples:
4095 .. code-block:: llvm
4098 %poison = sub nuw i32 0, 1 ; Results in a poison value.
4099 %poison2 = sub i32 poison, 1 ; Also results in a poison value.
4100 %still_poison = and i32 %poison, 0 ; 0, but also poison.
4101 %poison_yet_again = getelementptr i32, i32* @h, i32 %still_poison
4102 store i32 0, i32* %poison_yet_again ; Undefined behavior due to
4105 store i32 %poison, i32* @g ; Poison value stored to memory.
4106 %poison3 = load i32, i32* @g ; Poison value loaded back from memory.
4108 %narrowaddr = bitcast i32* @g to i16*
4109 %wideaddr = bitcast i32* @g to i64*
4110 %poison4 = load i16, i16* %narrowaddr ; Returns a poison value.
4111 %poison5 = load i64, i64* %wideaddr ; Returns a poison value.
4113 %cmp = icmp slt i32 %poison, 0 ; Returns a poison value.
4114 br i1 %cmp, label %end, label %end ; undefined behavior
4118 .. _welldefinedvalues:
4123 Given a program execution, a value is *well defined* if the value does not
4124 have an undef bit and is not poison in the execution.
4125 An aggregate value or vector is well defined if its elements are well defined.
4126 The padding of an aggregate isn't considered, since it isn't visible
4127 without storing it into memory and loading it with a different type.
4129 A constant of a :ref:`single value <t_single_value>`, non-vector type is well
4130 defined if it is neither '``undef``' constant nor '``poison``' constant.
4131 The result of :ref:`freeze instruction <i_freeze>` is well defined regardless
4136 Addresses of Basic Blocks
4137 -------------------------
4139 ``blockaddress(@function, %block)``
4141 The '``blockaddress``' constant computes the address of the specified
4142 basic block in the specified function.
4144 It always has an ``i8 addrspace(P)*`` type, where ``P`` is the address space
4145 of the function containing ``%block`` (usually ``addrspace(0)``).
4147 Taking the address of the entry block is illegal.
4149 This value only has defined behavior when used as an operand to the
4150 ':ref:`indirectbr <i_indirectbr>`' or ':ref:`callbr <i_callbr>`'instruction, or
4151 for comparisons against null. Pointer equality tests between labels addresses
4152 results in undefined behavior --- though, again, comparison against null is ok,
4153 and no label is equal to the null pointer. This may be passed around as an
4154 opaque pointer sized value as long as the bits are not inspected. This
4155 allows ``ptrtoint`` and arithmetic to be performed on these values so
4156 long as the original value is reconstituted before the ``indirectbr`` or
4157 ``callbr`` instruction.
4159 Finally, some targets may provide defined semantics when using the value
4160 as the operand to an inline assembly, but that is target specific.
4162 .. _dso_local_equivalent:
4164 DSO Local Equivalent
4165 --------------------
4167 ``dso_local_equivalent @func``
4169 A '``dso_local_equivalent``' constant represents a function which is
4170 functionally equivalent to a given function, but is always defined in the
4171 current linkage unit. The resulting pointer has the same type as the underlying
4172 function. The resulting pointer is permitted, but not required, to be different
4173 from a pointer to the function, and it may have different values in different
4176 The target function may not have ``extern_weak`` linkage.
4178 ``dso_local_equivalent`` can be implemented as such:
4180 - If the function has local linkage, hidden visibility, or is
4181 ``dso_local``, ``dso_local_equivalent`` can be implemented as simply a pointer
4183 - ``dso_local_equivalent`` can be implemented with a stub that tail-calls the
4184 function. Many targets support relocations that resolve at link time to either
4185 a function or a stub for it, depending on if the function is defined within the
4186 linkage unit; LLVM will use this when available. (This is commonly called a
4187 "PLT stub".) On other targets, the stub may need to be emitted explicitly.
4189 This can be used wherever a ``dso_local`` instance of a function is needed without
4190 needing to explicitly make the original function ``dso_local``. An instance where
4191 this can be used is for static offset calculations between a function and some other
4192 ``dso_local`` symbol. This is especially useful for the Relative VTables C++ ABI,
4193 where dynamic relocations for function pointers in VTables can be replaced with
4194 static relocations for offsets between the VTable and virtual functions which
4195 may not be ``dso_local``.
4197 This is currently only supported for ELF binary formats.
4201 Constant Expressions
4202 --------------------
4204 Constant expressions are used to allow expressions involving other
4205 constants to be used as constants. Constant expressions may be of any
4206 :ref:`first class <t_firstclass>` type and may involve any LLVM operation
4207 that does not have side effects (e.g. load and call are not supported).
4208 The following is the syntax for constant expressions:
4210 ``trunc (CST to TYPE)``
4211 Perform the :ref:`trunc operation <i_trunc>` on constants.
4212 ``zext (CST to TYPE)``
4213 Perform the :ref:`zext operation <i_zext>` on constants.
4214 ``sext (CST to TYPE)``
4215 Perform the :ref:`sext operation <i_sext>` on constants.
4216 ``fptrunc (CST to TYPE)``
4217 Truncate a floating-point constant to another floating-point type.
4218 The size of CST must be larger than the size of TYPE. Both types
4219 must be floating-point.
4220 ``fpext (CST to TYPE)``
4221 Floating-point extend a constant to another type. The size of CST
4222 must be smaller or equal to the size of TYPE. Both types must be
4224 ``fptoui (CST to TYPE)``
4225 Convert a floating-point constant to the corresponding unsigned
4226 integer constant. TYPE must be a scalar or vector integer type. CST
4227 must be of scalar or vector floating-point type. Both CST and TYPE
4228 must be scalars, or vectors of the same number of elements. If the
4229 value won't fit in the integer type, the result is a
4230 :ref:`poison value <poisonvalues>`.
4231 ``fptosi (CST to TYPE)``
4232 Convert a floating-point constant to the corresponding signed
4233 integer constant. TYPE must be a scalar or vector integer type. CST
4234 must be of scalar or vector floating-point type. Both CST and TYPE
4235 must be scalars, or vectors of the same number of elements. If the
4236 value won't fit in the integer type, the result is a
4237 :ref:`poison value <poisonvalues>`.
4238 ``uitofp (CST to TYPE)``
4239 Convert an unsigned integer constant to the corresponding
4240 floating-point constant. TYPE must be a scalar or vector floating-point
4241 type. CST must be of scalar or vector integer type. Both CST and TYPE must
4242 be scalars, or vectors of the same number of elements.
4243 ``sitofp (CST to TYPE)``
4244 Convert a signed integer constant to the corresponding floating-point
4245 constant. TYPE must be a scalar or vector floating-point type.
4246 CST must be of scalar or vector integer type. Both CST and TYPE must
4247 be scalars, or vectors of the same number of elements.
4248 ``ptrtoint (CST to TYPE)``
4249 Perform the :ref:`ptrtoint operation <i_ptrtoint>` on constants.
4250 ``inttoptr (CST to TYPE)``
4251 Perform the :ref:`inttoptr operation <i_inttoptr>` on constants.
4252 This one is *really* dangerous!
4253 ``bitcast (CST to TYPE)``
4254 Convert a constant, CST, to another TYPE.
4255 The constraints of the operands are the same as those for the
4256 :ref:`bitcast instruction <i_bitcast>`.
4257 ``addrspacecast (CST to TYPE)``
4258 Convert a constant pointer or constant vector of pointer, CST, to another
4259 TYPE in a different address space. The constraints of the operands are the
4260 same as those for the :ref:`addrspacecast instruction <i_addrspacecast>`.
4261 ``getelementptr (TY, CSTPTR, IDX0, IDX1, ...)``, ``getelementptr inbounds (TY, CSTPTR, IDX0, IDX1, ...)``
4262 Perform the :ref:`getelementptr operation <i_getelementptr>` on
4263 constants. As with the :ref:`getelementptr <i_getelementptr>`
4264 instruction, the index list may have one or more indexes, which are
4265 required to make sense for the type of "pointer to TY".
4266 ``select (COND, VAL1, VAL2)``
4267 Perform the :ref:`select operation <i_select>` on constants.
4268 ``icmp COND (VAL1, VAL2)``
4269 Perform the :ref:`icmp operation <i_icmp>` on constants.
4270 ``fcmp COND (VAL1, VAL2)``
4271 Perform the :ref:`fcmp operation <i_fcmp>` on constants.
4272 ``extractelement (VAL, IDX)``
4273 Perform the :ref:`extractelement operation <i_extractelement>` on
4275 ``insertelement (VAL, ELT, IDX)``
4276 Perform the :ref:`insertelement operation <i_insertelement>` on
4278 ``shufflevector (VEC1, VEC2, IDXMASK)``
4279 Perform the :ref:`shufflevector operation <i_shufflevector>` on
4281 ``extractvalue (VAL, IDX0, IDX1, ...)``
4282 Perform the :ref:`extractvalue operation <i_extractvalue>` on
4283 constants. The index list is interpreted in a similar manner as
4284 indices in a ':ref:`getelementptr <i_getelementptr>`' operation. At
4285 least one index value must be specified.
4286 ``insertvalue (VAL, ELT, IDX0, IDX1, ...)``
4287 Perform the :ref:`insertvalue operation <i_insertvalue>` on constants.
4288 The index list is interpreted in a similar manner as indices in a
4289 ':ref:`getelementptr <i_getelementptr>`' operation. At least one index
4290 value must be specified.
4291 ``OPCODE (LHS, RHS)``
4292 Perform the specified operation of the LHS and RHS constants. OPCODE
4293 may be any of the :ref:`binary <binaryops>` or :ref:`bitwise
4294 binary <bitwiseops>` operations. The constraints on operands are
4295 the same as those for the corresponding instruction (e.g. no bitwise
4296 operations on floating-point values are allowed).
4303 Inline Assembler Expressions
4304 ----------------------------
4306 LLVM supports inline assembler expressions (as opposed to :ref:`Module-Level
4307 Inline Assembly <moduleasm>`) through the use of a special value. This value
4308 represents the inline assembler as a template string (containing the
4309 instructions to emit), a list of operand constraints (stored as a string), a
4310 flag that indicates whether or not the inline asm expression has side effects,
4311 and a flag indicating whether the function containing the asm needs to align its
4312 stack conservatively.
4314 The template string supports argument substitution of the operands using "``$``"
4315 followed by a number, to indicate substitution of the given register/memory
4316 location, as specified by the constraint string. "``${NUM:MODIFIER}``" may also
4317 be used, where ``MODIFIER`` is a target-specific annotation for how to print the
4318 operand (See :ref:`inline-asm-modifiers`).
4320 A literal "``$``" may be included by using "``$$``" in the template. To include
4321 other special characters into the output, the usual "``\XX``" escapes may be
4322 used, just as in other strings. Note that after template substitution, the
4323 resulting assembly string is parsed by LLVM's integrated assembler unless it is
4324 disabled -- even when emitting a ``.s`` file -- and thus must contain assembly
4325 syntax known to LLVM.
4327 LLVM also supports a few more substitutions useful for writing inline assembly:
4329 - ``${:uid}``: Expands to a decimal integer unique to this inline assembly blob.
4330 This substitution is useful when declaring a local label. Many standard
4331 compiler optimizations, such as inlining, may duplicate an inline asm blob.
4332 Adding a blob-unique identifier ensures that the two labels will not conflict
4333 during assembly. This is used to implement `GCC's %= special format
4334 string <https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html>`_.
4335 - ``${:comment}``: Expands to the comment character of the current target's
4336 assembly dialect. This is usually ``#``, but many targets use other strings,
4337 such as ``;``, ``//``, or ``!``.
4338 - ``${:private}``: Expands to the assembler private label prefix. Labels with
4339 this prefix will not appear in the symbol table of the assembled object.
4340 Typically the prefix is ``L``, but targets may use other strings. ``.L`` is
4343 LLVM's support for inline asm is modeled closely on the requirements of Clang's
4344 GCC-compatible inline-asm support. Thus, the feature-set and the constraint and
4345 modifier codes listed here are similar or identical to those in GCC's inline asm
4346 support. However, to be clear, the syntax of the template and constraint strings
4347 described here is *not* the same as the syntax accepted by GCC and Clang, and,
4348 while most constraint letters are passed through as-is by Clang, some get
4349 translated to other codes when converting from the C source to the LLVM
4352 An example inline assembler expression is:
4354 .. code-block:: llvm
4356 i32 (i32) asm "bswap $0", "=r,r"
4358 Inline assembler expressions may **only** be used as the callee operand
4359 of a :ref:`call <i_call>` or an :ref:`invoke <i_invoke>` instruction.
4360 Thus, typically we have:
4362 .. code-block:: llvm
4364 %X = call i32 asm "bswap $0", "=r,r"(i32 %Y)
4366 Inline asms with side effects not visible in the constraint list must be
4367 marked as having side effects. This is done through the use of the
4368 '``sideeffect``' keyword, like so:
4370 .. code-block:: llvm
4372 call void asm sideeffect "eieio", ""()
4374 In some cases inline asms will contain code that will not work unless
4375 the stack is aligned in some way, such as calls or SSE instructions on
4376 x86, yet will not contain code that does that alignment within the asm.
4377 The compiler should make conservative assumptions about what the asm
4378 might contain and should generate its usual stack alignment code in the
4379 prologue if the '``alignstack``' keyword is present:
4381 .. code-block:: llvm
4383 call void asm alignstack "eieio", ""()
4385 Inline asms also support using non-standard assembly dialects. The
4386 assumed dialect is ATT. When the '``inteldialect``' keyword is present,
4387 the inline asm is using the Intel dialect. Currently, ATT and Intel are
4388 the only supported dialects. An example is:
4390 .. code-block:: llvm
4392 call void asm inteldialect "eieio", ""()
4394 In the case that the inline asm might unwind the stack,
4395 the '``unwind``' keyword must be used, so that the compiler emits
4396 unwinding information:
4398 .. code-block:: llvm
4400 call void asm unwind "call func", ""()
4402 If the inline asm unwinds the stack and isn't marked with
4403 the '``unwind``' keyword, the behavior is undefined.
4405 If multiple keywords appear, the '``sideeffect``' keyword must come
4406 first, the '``alignstack``' keyword second, the '``inteldialect``' keyword
4407 third and the '``unwind``' keyword last.
4409 Inline Asm Constraint String
4410 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
4412 The constraint list is a comma-separated string, each element containing one or
4413 more constraint codes.
4415 For each element in the constraint list an appropriate register or memory
4416 operand will be chosen, and it will be made available to assembly template
4417 string expansion as ``$0`` for the first constraint in the list, ``$1`` for the
4420 There are three different types of constraints, which are distinguished by a
4421 prefix symbol in front of the constraint code: Output, Input, and Clobber. The
4422 constraints must always be given in that order: outputs first, then inputs, then
4423 clobbers. They cannot be intermingled.
4425 There are also three different categories of constraint codes:
4427 - Register constraint. This is either a register class, or a fixed physical
4428 register. This kind of constraint will allocate a register, and if necessary,
4429 bitcast the argument or result to the appropriate type.
4430 - Memory constraint. This kind of constraint is for use with an instruction
4431 taking a memory operand. Different constraints allow for different addressing
4432 modes used by the target.
4433 - Immediate value constraint. This kind of constraint is for an integer or other
4434 immediate value which can be rendered directly into an instruction. The
4435 various target-specific constraints allow the selection of a value in the
4436 proper range for the instruction you wish to use it with.
4441 Output constraints are specified by an "``=``" prefix (e.g. "``=r``"). This
4442 indicates that the assembly will write to this operand, and the operand will
4443 then be made available as a return value of the ``asm`` expression. Output
4444 constraints do not consume an argument from the call instruction. (Except, see
4445 below about indirect outputs).
4447 Normally, it is expected that no output locations are written to by the assembly
4448 expression until *all* of the inputs have been read. As such, LLVM may assign
4449 the same register to an output and an input. If this is not safe (e.g. if the
4450 assembly contains two instructions, where the first writes to one output, and
4451 the second reads an input and writes to a second output), then the "``&``"
4452 modifier must be used (e.g. "``=&r``") to specify that the output is an
4453 "early-clobber" output. Marking an output as "early-clobber" ensures that LLVM
4454 will not use the same register for any inputs (other than an input tied to this
4460 Input constraints do not have a prefix -- just the constraint codes. Each input
4461 constraint will consume one argument from the call instruction. It is not
4462 permitted for the asm to write to any input register or memory location (unless
4463 that input is tied to an output). Note also that multiple inputs may all be
4464 assigned to the same register, if LLVM can determine that they necessarily all
4465 contain the same value.
4467 Instead of providing a Constraint Code, input constraints may also "tie"
4468 themselves to an output constraint, by providing an integer as the constraint
4469 string. Tied inputs still consume an argument from the call instruction, and
4470 take up a position in the asm template numbering as is usual -- they will simply
4471 be constrained to always use the same register as the output they've been tied
4472 to. For example, a constraint string of "``=r,0``" says to assign a register for
4473 output, and use that register as an input as well (it being the 0'th
4476 It is permitted to tie an input to an "early-clobber" output. In that case, no
4477 *other* input may share the same register as the input tied to the early-clobber
4478 (even when the other input has the same value).
4480 You may only tie an input to an output which has a register constraint, not a
4481 memory constraint. Only a single input may be tied to an output.
4483 There is also an "interesting" feature which deserves a bit of explanation: if a
4484 register class constraint allocates a register which is too small for the value
4485 type operand provided as input, the input value will be split into multiple
4486 registers, and all of them passed to the inline asm.
4488 However, this feature is often not as useful as you might think.
4490 Firstly, the registers are *not* guaranteed to be consecutive. So, on those
4491 architectures that have instructions which operate on multiple consecutive
4492 instructions, this is not an appropriate way to support them. (e.g. the 32-bit
4493 SparcV8 has a 64-bit load, which instruction takes a single 32-bit register. The
4494 hardware then loads into both the named register, and the next register. This
4495 feature of inline asm would not be useful to support that.)
4497 A few of the targets provide a template string modifier allowing explicit access
4498 to the second register of a two-register operand (e.g. MIPS ``L``, ``M``, and
4499 ``D``). On such an architecture, you can actually access the second allocated
4500 register (yet, still, not any subsequent ones). But, in that case, you're still
4501 probably better off simply splitting the value into two separate operands, for
4502 clarity. (e.g. see the description of the ``A`` constraint on X86, which,
4503 despite existing only for use with this feature, is not really a good idea to
4506 Indirect inputs and outputs
4507 """""""""""""""""""""""""""
4509 Indirect output or input constraints can be specified by the "``*``" modifier
4510 (which goes after the "``=``" in case of an output). This indicates that the asm
4511 will write to or read from the contents of an *address* provided as an input
4512 argument. (Note that in this way, indirect outputs act more like an *input* than
4513 an output: just like an input, they consume an argument of the call expression,
4514 rather than producing a return value. An indirect output constraint is an
4515 "output" only in that the asm is expected to write to the contents of the input
4516 memory location, instead of just read from it).
4518 This is most typically used for memory constraint, e.g. "``=*m``", to pass the
4519 address of a variable as a value.
4521 It is also possible to use an indirect *register* constraint, but only on output
4522 (e.g. "``=*r``"). This will cause LLVM to allocate a register for an output
4523 value normally, and then, separately emit a store to the address provided as
4524 input, after the provided inline asm. (It's not clear what value this
4525 functionality provides, compared to writing the store explicitly after the asm
4526 statement, and it can only produce worse code, since it bypasses many
4527 optimization passes. I would recommend not using it.)
4533 A clobber constraint is indicated by a "``~``" prefix. A clobber does not
4534 consume an input operand, nor generate an output. Clobbers cannot use any of the
4535 general constraint code letters -- they may use only explicit register
4536 constraints, e.g. "``~{eax}``". The one exception is that a clobber string of
4537 "``~{memory}``" indicates that the assembly writes to arbitrary undeclared
4538 memory locations -- not only the memory pointed to by a declared indirect
4541 Note that clobbering named registers that are also present in output
4542 constraints is not legal.
4547 After a potential prefix comes constraint code, or codes.
4549 A Constraint Code is either a single letter (e.g. "``r``"), a "``^``" character
4550 followed by two letters (e.g. "``^wc``"), or "``{``" register-name "``}``"
4553 The one and two letter constraint codes are typically chosen to be the same as
4554 GCC's constraint codes.
4556 A single constraint may include one or more than constraint code in it, leaving
4557 it up to LLVM to choose which one to use. This is included mainly for
4558 compatibility with the translation of GCC inline asm coming from clang.
4560 There are two ways to specify alternatives, and either or both may be used in an
4561 inline asm constraint list:
4563 1) Append the codes to each other, making a constraint code set. E.g. "``im``"
4564 or "``{eax}m``". This means "choose any of the options in the set". The
4565 choice of constraint is made independently for each constraint in the
4568 2) Use "``|``" between constraint code sets, creating alternatives. Every
4569 constraint in the constraint list must have the same number of alternative
4570 sets. With this syntax, the same alternative in *all* of the items in the
4571 constraint list will be chosen together.
4573 Putting those together, you might have a two operand constraint string like
4574 ``"rm|r,ri|rm"``. This indicates that if operand 0 is ``r`` or ``m``, then
4575 operand 1 may be one of ``r`` or ``i``. If operand 0 is ``r``, then operand 1
4576 may be one of ``r`` or ``m``. But, operand 0 and 1 cannot both be of type m.
4578 However, the use of either of the alternatives features is *NOT* recommended, as
4579 LLVM is not able to make an intelligent choice about which one to use. (At the
4580 point it currently needs to choose, not enough information is available to do so
4581 in a smart way.) Thus, it simply tries to make a choice that's most likely to
4582 compile, not one that will be optimal performance. (e.g., given "``rm``", it'll
4583 always choose to use memory, not registers). And, if given multiple registers,
4584 or multiple register classes, it will simply choose the first one. (In fact, it
4585 doesn't currently even ensure explicitly specified physical registers are
4586 unique, so specifying multiple physical registers as alternatives, like
4587 ``{r11}{r12},{r11}{r12}``, will assign r11 to both operands, not at all what was
4590 Supported Constraint Code List
4591 """"""""""""""""""""""""""""""
4593 The constraint codes are, in general, expected to behave the same way they do in
4594 GCC. LLVM's support is often implemented on an 'as-needed' basis, to support C
4595 inline asm code which was supported by GCC. A mismatch in behavior between LLVM
4596 and GCC likely indicates a bug in LLVM.
4598 Some constraint codes are typically supported by all targets:
4600 - ``r``: A register in the target's general purpose register class.
4601 - ``m``: A memory address operand. It is target-specific what addressing modes
4602 are supported, typical examples are register, or register + register offset,
4603 or register + immediate offset (of some target-specific size).
4604 - ``i``: An integer constant (of target-specific width). Allows either a simple
4605 immediate, or a relocatable value.
4606 - ``n``: An integer constant -- *not* including relocatable values.
4607 - ``s``: An integer constant, but allowing *only* relocatable values.
4608 - ``X``: Allows an operand of any kind, no constraint whatsoever. Typically
4609 useful to pass a label for an asm branch or call.
4611 .. FIXME: but that surely isn't actually okay to jump out of an asm
4612 block without telling llvm about the control transfer???)
4614 - ``{register-name}``: Requires exactly the named physical register.
4616 Other constraints are target-specific:
4620 - ``z``: An immediate integer 0. Outputs ``WZR`` or ``XZR``, as appropriate.
4621 - ``I``: An immediate integer valid for an ``ADD`` or ``SUB`` instruction,
4622 i.e. 0 to 4095 with optional shift by 12.
4623 - ``J``: An immediate integer that, when negated, is valid for an ``ADD`` or
4624 ``SUB`` instruction, i.e. -1 to -4095 with optional left shift by 12.
4625 - ``K``: An immediate integer that is valid for the 'bitmask immediate 32' of a
4626 logical instruction like ``AND``, ``EOR``, or ``ORR`` with a 32-bit register.
4627 - ``L``: An immediate integer that is valid for the 'bitmask immediate 64' of a
4628 logical instruction like ``AND``, ``EOR``, or ``ORR`` with a 64-bit register.
4629 - ``M``: An immediate integer for use with the ``MOV`` assembly alias on a
4630 32-bit register. This is a superset of ``K``: in addition to the bitmask
4631 immediate, also allows immediate integers which can be loaded with a single
4632 ``MOVZ`` or ``MOVL`` instruction.
4633 - ``N``: An immediate integer for use with the ``MOV`` assembly alias on a
4634 64-bit register. This is a superset of ``L``.
4635 - ``Q``: Memory address operand must be in a single register (no
4636 offsets). (However, LLVM currently does this for the ``m`` constraint as
4638 - ``r``: A 32 or 64-bit integer register (W* or X*).
4639 - ``w``: A 32, 64, or 128-bit floating-point, SIMD or SVE vector register.
4640 - ``x``: Like w, but restricted to registers 0 to 15 inclusive.
4641 - ``y``: Like w, but restricted to SVE vector registers Z0 to Z7 inclusive.
4642 - ``Upl``: One of the low eight SVE predicate registers (P0 to P7)
4643 - ``Upa``: Any of the SVE predicate registers (P0 to P15)
4647 - ``r``: A 32 or 64-bit integer register.
4648 - ``[0-9]v``: The 32-bit VGPR register, number 0-9.
4649 - ``[0-9]s``: The 32-bit SGPR register, number 0-9.
4650 - ``[0-9]a``: The 32-bit AGPR register, number 0-9.
4651 - ``I``: An integer inline constant in the range from -16 to 64.
4652 - ``J``: A 16-bit signed integer constant.
4653 - ``A``: An integer or a floating-point inline constant.
4654 - ``B``: A 32-bit signed integer constant.
4655 - ``C``: A 32-bit unsigned integer constant or an integer inline constant in the range from -16 to 64.
4656 - ``DA``: A 64-bit constant that can be split into two "A" constants.
4657 - ``DB``: A 64-bit constant that can be split into two "B" constants.
4661 - ``Q``, ``Um``, ``Un``, ``Uq``, ``Us``, ``Ut``, ``Uv``, ``Uy``: Memory address
4662 operand. Treated the same as operand ``m``, at the moment.
4663 - ``Te``: An even general-purpose 32-bit integer register: ``r0,r2,...,r12,r14``
4664 - ``To``: An odd general-purpose 32-bit integer register: ``r1,r3,...,r11``
4666 ARM and ARM's Thumb2 mode:
4668 - ``j``: An immediate integer between 0 and 65535 (valid for ``MOVW``)
4669 - ``I``: An immediate integer valid for a data-processing instruction.
4670 - ``J``: An immediate integer between -4095 and 4095.
4671 - ``K``: An immediate integer whose bitwise inverse is valid for a
4672 data-processing instruction. (Can be used with template modifier "``B``" to
4673 print the inverted value).
4674 - ``L``: An immediate integer whose negation is valid for a data-processing
4675 instruction. (Can be used with template modifier "``n``" to print the negated
4677 - ``M``: A power of two or an integer between 0 and 32.
4678 - ``N``: Invalid immediate constraint.
4679 - ``O``: Invalid immediate constraint.
4680 - ``r``: A general-purpose 32-bit integer register (``r0-r15``).
4681 - ``l``: In Thumb2 mode, low 32-bit GPR registers (``r0-r7``). In ARM mode, same
4683 - ``h``: In Thumb2 mode, a high 32-bit GPR register (``r8-r15``). In ARM mode,
4685 - ``w``: A 32, 64, or 128-bit floating-point/SIMD register in the ranges
4686 ``s0-s31``, ``d0-d31``, or ``q0-q15``, respectively.
4687 - ``t``: A 32, 64, or 128-bit floating-point/SIMD register in the ranges
4688 ``s0-s31``, ``d0-d15``, or ``q0-q7``, respectively.
4689 - ``x``: A 32, 64, or 128-bit floating-point/SIMD register in the ranges
4690 ``s0-s15``, ``d0-d7``, or ``q0-q3``, respectively.
4694 - ``I``: An immediate integer between 0 and 255.
4695 - ``J``: An immediate integer between -255 and -1.
4696 - ``K``: An immediate integer between 0 and 255, with optional left-shift by
4698 - ``L``: An immediate integer between -7 and 7.
4699 - ``M``: An immediate integer which is a multiple of 4 between 0 and 1020.
4700 - ``N``: An immediate integer between 0 and 31.
4701 - ``O``: An immediate integer which is a multiple of 4 between -508 and 508.
4702 - ``r``: A low 32-bit GPR register (``r0-r7``).
4703 - ``l``: A low 32-bit GPR register (``r0-r7``).
4704 - ``h``: A high GPR register (``r0-r7``).
4705 - ``w``: A 32, 64, or 128-bit floating-point/SIMD register in the ranges
4706 ``s0-s31``, ``d0-d31``, or ``q0-q15``, respectively.
4707 - ``t``: A 32, 64, or 128-bit floating-point/SIMD register in the ranges
4708 ``s0-s31``, ``d0-d15``, or ``q0-q7``, respectively.
4709 - ``x``: A 32, 64, or 128-bit floating-point/SIMD register in the ranges
4710 ``s0-s15``, ``d0-d7``, or ``q0-q3``, respectively.
4715 - ``o``, ``v``: A memory address operand, treated the same as constraint ``m``,
4717 - ``r``: A 32 or 64-bit register.
4721 - ``r``: An 8 or 16-bit register.
4725 - ``I``: An immediate signed 16-bit integer.
4726 - ``J``: An immediate integer zero.
4727 - ``K``: An immediate unsigned 16-bit integer.
4728 - ``L``: An immediate 32-bit integer, where the lower 16 bits are 0.
4729 - ``N``: An immediate integer between -65535 and -1.
4730 - ``O``: An immediate signed 15-bit integer.
4731 - ``P``: An immediate integer between 1 and 65535.
4732 - ``m``: A memory address operand. In MIPS-SE mode, allows a base address
4733 register plus 16-bit immediate offset. In MIPS mode, just a base register.
4734 - ``R``: A memory address operand. In MIPS-SE mode, allows a base address
4735 register plus a 9-bit signed offset. In MIPS mode, the same as constraint
4737 - ``ZC``: A memory address operand, suitable for use in a ``pref``, ``ll``, or
4738 ``sc`` instruction on the given subtarget (details vary).
4739 - ``r``, ``d``, ``y``: A 32 or 64-bit GPR register.
4740 - ``f``: A 32 or 64-bit FPU register (``F0-F31``), or a 128-bit MSA register
4741 (``W0-W31``). In the case of MSA registers, it is recommended to use the ``w``
4742 argument modifier for compatibility with GCC.
4743 - ``c``: A 32-bit or 64-bit GPR register suitable for indirect jump (always
4745 - ``l``: The ``lo`` register, 32 or 64-bit.
4750 - ``b``: A 1-bit integer register.
4751 - ``c`` or ``h``: A 16-bit integer register.
4752 - ``r``: A 32-bit integer register.
4753 - ``l`` or ``N``: A 64-bit integer register.
4754 - ``f``: A 32-bit float register.
4755 - ``d``: A 64-bit float register.
4760 - ``I``: An immediate signed 16-bit integer.
4761 - ``J``: An immediate unsigned 16-bit integer, shifted left 16 bits.
4762 - ``K``: An immediate unsigned 16-bit integer.
4763 - ``L``: An immediate signed 16-bit integer, shifted left 16 bits.
4764 - ``M``: An immediate integer greater than 31.
4765 - ``N``: An immediate integer that is an exact power of 2.
4766 - ``O``: The immediate integer constant 0.
4767 - ``P``: An immediate integer constant whose negation is a signed 16-bit
4769 - ``es``, ``o``, ``Q``, ``Z``, ``Zy``: A memory address operand, currently
4770 treated the same as ``m``.
4771 - ``r``: A 32 or 64-bit integer register.
4772 - ``b``: A 32 or 64-bit integer register, excluding ``R0`` (that is:
4774 - ``f``: A 32 or 64-bit float register (``F0-F31``),
4775 - ``v``: For ``4 x f32`` or ``4 x f64`` types, a 128-bit altivec vector
4776 register (``V0-V31``).
4778 - ``y``: Condition register (``CR0-CR7``).
4779 - ``wc``: An individual CR bit in a CR register.
4780 - ``wa``, ``wd``, ``wf``: Any 128-bit VSX vector register, from the full VSX
4781 register set (overlapping both the floating-point and vector register files).
4782 - ``ws``: A 32 or 64-bit floating-point register, from the full VSX register
4787 - ``A``: An address operand (using a general-purpose register, without an
4789 - ``I``: A 12-bit signed integer immediate operand.
4790 - ``J``: A zero integer immediate operand.
4791 - ``K``: A 5-bit unsigned integer immediate operand.
4792 - ``f``: A 32- or 64-bit floating-point register (requires F or D extension).
4793 - ``r``: A 32- or 64-bit general-purpose register (depending on the platform
4795 - ``vr``: A vector register. (requires V extension).
4796 - ``vm``: A vector mask register. (requires V extension).
4800 - ``I``: An immediate 13-bit signed integer.
4801 - ``r``: A 32-bit integer register.
4802 - ``f``: Any floating-point register on SparcV8, or a floating-point
4803 register in the "low" half of the registers on SparcV9.
4804 - ``e``: Any floating-point register. (Same as ``f`` on SparcV8.)
4808 - ``I``: An immediate unsigned 8-bit integer.
4809 - ``J``: An immediate unsigned 12-bit integer.
4810 - ``K``: An immediate signed 16-bit integer.
4811 - ``L``: An immediate signed 20-bit integer.
4812 - ``M``: An immediate integer 0x7fffffff.
4813 - ``Q``: A memory address operand with a base address and a 12-bit immediate
4814 unsigned displacement.
4815 - ``R``: A memory address operand with a base address, a 12-bit immediate
4816 unsigned displacement, and an index register.
4817 - ``S``: A memory address operand with a base address and a 20-bit immediate
4818 signed displacement.
4819 - ``T``: A memory address operand with a base address, a 20-bit immediate
4820 signed displacement, and an index register.
4821 - ``r`` or ``d``: A 32, 64, or 128-bit integer register.
4822 - ``a``: A 32, 64, or 128-bit integer address register (excludes R0, which in an
4823 address context evaluates as zero).
4824 - ``h``: A 32-bit value in the high part of a 64bit data register
4826 - ``f``: A 32, 64, or 128-bit floating-point register.
4830 - ``I``: An immediate integer between 0 and 31.
4831 - ``J``: An immediate integer between 0 and 64.
4832 - ``K``: An immediate signed 8-bit integer.
4833 - ``L``: An immediate integer, 0xff or 0xffff or (in 64-bit mode only)
4835 - ``M``: An immediate integer between 0 and 3.
4836 - ``N``: An immediate unsigned 8-bit integer.
4837 - ``O``: An immediate integer between 0 and 127.
4838 - ``e``: An immediate 32-bit signed integer.
4839 - ``Z``: An immediate 32-bit unsigned integer.
4840 - ``o``, ``v``: Treated the same as ``m``, at the moment.
4841 - ``q``: An 8, 16, 32, or 64-bit register which can be accessed as an 8-bit
4842 ``l`` integer register. On X86-32, this is the ``a``, ``b``, ``c``, and ``d``
4843 registers, and on X86-64, it is all of the integer registers.
4844 - ``Q``: An 8, 16, 32, or 64-bit register which can be accessed as an 8-bit
4845 ``h`` integer register. This is the ``a``, ``b``, ``c``, and ``d`` registers.
4846 - ``r`` or ``l``: An 8, 16, 32, or 64-bit integer register.
4847 - ``R``: An 8, 16, 32, or 64-bit "legacy" integer register -- one which has
4848 existed since i386, and can be accessed without the REX prefix.
4849 - ``f``: A 32, 64, or 80-bit '387 FPU stack pseudo-register.
4850 - ``y``: A 64-bit MMX register, if MMX is enabled.
4851 - ``x``: If SSE is enabled: a 32 or 64-bit scalar operand, or 128-bit vector
4852 operand in a SSE register. If AVX is also enabled, can also be a 256-bit
4853 vector operand in an AVX register. If AVX-512 is also enabled, can also be a
4854 512-bit vector operand in an AVX512 register, Otherwise, an error.
4855 - ``Y``: The same as ``x``, if *SSE2* is enabled, otherwise an error.
4856 - ``A``: Special case: allocates EAX first, then EDX, for a single operand (in
4857 32-bit mode, a 64-bit integer operand will get split into two registers). It
4858 is not recommended to use this constraint, as in 64-bit mode, the 64-bit
4859 operand will get allocated only to RAX -- if two 32-bit operands are needed,
4860 you're better off splitting it yourself, before passing it to the asm
4865 - ``r``: A 32-bit integer register.
4868 .. _inline-asm-modifiers:
4870 Asm template argument modifiers
4871 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
4873 In the asm template string, modifiers can be used on the operand reference, like
4876 The modifiers are, in general, expected to behave the same way they do in
4877 GCC. LLVM's support is often implemented on an 'as-needed' basis, to support C
4878 inline asm code which was supported by GCC. A mismatch in behavior between LLVM
4879 and GCC likely indicates a bug in LLVM.
4883 - ``c``: Print an immediate integer constant unadorned, without
4884 the target-specific immediate punctuation (e.g. no ``$`` prefix).
4885 - ``n``: Negate and print immediate integer constant unadorned, without the
4886 target-specific immediate punctuation (e.g. no ``$`` prefix).
4887 - ``l``: Print as an unadorned label, without the target-specific label
4888 punctuation (e.g. no ``$`` prefix).
4892 - ``w``: Print a GPR register with a ``w*`` name instead of ``x*`` name. E.g.,
4893 instead of ``x30``, print ``w30``.
4894 - ``x``: Print a GPR register with a ``x*`` name. (this is the default, anyhow).
4895 - ``b``, ``h``, ``s``, ``d``, ``q``: Print a floating-point/SIMD register with a
4896 ``b*``, ``h*``, ``s*``, ``d*``, or ``q*`` name, rather than the default of
4905 - ``a``: Print an operand as an address (with ``[`` and ``]`` surrounding a
4909 - ``y``: Print a VFP single-precision register as an indexed double (e.g. print
4910 as ``d4[1]`` instead of ``s9``)
4911 - ``B``: Bitwise invert and print an immediate integer constant without ``#``
4913 - ``L``: Print the low 16-bits of an immediate integer constant.
4914 - ``M``: Print as a register set suitable for ldm/stm. Also prints *all*
4915 register operands subsequent to the specified one (!), so use carefully.
4916 - ``Q``: Print the low-order register of a register-pair, or the low-order
4917 register of a two-register operand.
4918 - ``R``: Print the high-order register of a register-pair, or the high-order
4919 register of a two-register operand.
4920 - ``H``: Print the second register of a register-pair. (On a big-endian system,
4921 ``H`` is equivalent to ``Q``, and on little-endian system, ``H`` is equivalent
4924 .. FIXME: H doesn't currently support printing the second register
4925 of a two-register operand.
4927 - ``e``: Print the low doubleword register of a NEON quad register.
4928 - ``f``: Print the high doubleword register of a NEON quad register.
4929 - ``m``: Print the base register of a memory operand without the ``[`` and ``]``
4934 - ``L``: Print the second register of a two-register operand. Requires that it
4935 has been allocated consecutively to the first.
4937 .. FIXME: why is it restricted to consecutive ones? And there's
4938 nothing that ensures that happens, is there?
4940 - ``I``: Print the letter 'i' if the operand is an integer constant, otherwise
4941 nothing. Used to print 'addi' vs 'add' instructions.
4945 No additional modifiers.
4949 - ``X``: Print an immediate integer as hexadecimal
4950 - ``x``: Print the low 16 bits of an immediate integer as hexadecimal.
4951 - ``d``: Print an immediate integer as decimal.
4952 - ``m``: Subtract one and print an immediate integer as decimal.
4953 - ``z``: Print $0 if an immediate zero, otherwise print normally.
4954 - ``L``: Print the low-order register of a two-register operand, or prints the
4955 address of the low-order word of a double-word memory operand.
4957 .. FIXME: L seems to be missing memory operand support.
4959 - ``M``: Print the high-order register of a two-register operand, or prints the
4960 address of the high-order word of a double-word memory operand.
4962 .. FIXME: M seems to be missing memory operand support.
4964 - ``D``: Print the second register of a two-register operand, or prints the
4965 second word of a double-word memory operand. (On a big-endian system, ``D`` is
4966 equivalent to ``L``, and on little-endian system, ``D`` is equivalent to
4968 - ``w``: No effect. Provided for compatibility with GCC which requires this
4969 modifier in order to print MSA registers (``W0-W31``) with the ``f``
4978 - ``L``: Print the second register of a two-register operand. Requires that it
4979 has been allocated consecutively to the first.
4981 .. FIXME: why is it restricted to consecutive ones? And there's
4982 nothing that ensures that happens, is there?
4984 - ``I``: Print the letter 'i' if the operand is an integer constant, otherwise
4985 nothing. Used to print 'addi' vs 'add' instructions.
4986 - ``y``: For a memory operand, prints formatter for a two-register X-form
4987 instruction. (Currently always prints ``r0,OPERAND``).
4988 - ``U``: Prints 'u' if the memory operand is an update form, and nothing
4989 otherwise. (NOTE: LLVM does not support update form, so this will currently
4990 always print nothing)
4991 - ``X``: Prints 'x' if the memory operand is an indexed form. (NOTE: LLVM does
4992 not support indexed form, so this will currently always print nothing)
4996 - ``i``: Print the letter 'i' if the operand is not a register, otherwise print
4997 nothing. Used to print 'addi' vs 'add' instructions, etc.
4998 - ``z``: Print the register ``zero`` if an immediate zero, otherwise print
5007 SystemZ implements only ``n``, and does *not* support any of the other
5008 target-independent modifiers.
5012 - ``c``: Print an unadorned integer or symbol name. (The latter is
5013 target-specific behavior for this typically target-independent modifier).
5014 - ``A``: Print a register name with a '``*``' before it.
5015 - ``b``: Print an 8-bit register name (e.g. ``al``); do nothing on a memory
5017 - ``h``: Print the upper 8-bit register name (e.g. ``ah``); do nothing on a
5019 - ``w``: Print the 16-bit register name (e.g. ``ax``); do nothing on a memory
5021 - ``k``: Print the 32-bit register name (e.g. ``eax``); do nothing on a memory
5023 - ``q``: Print the 64-bit register name (e.g. ``rax``), if 64-bit registers are
5024 available, otherwise the 32-bit register name; do nothing on a memory operand.
5025 - ``n``: Negate and print an unadorned integer, or, for operands other than an
5026 immediate integer (e.g. a relocatable symbol expression), print a '-' before
5027 the operand. (The behavior for relocatable symbol expressions is a
5028 target-specific behavior for this typically target-independent modifier)
5029 - ``H``: Print a memory reference with additional offset +8.
5030 - ``P``: Print a memory reference or operand for use as the argument of a call
5031 instruction. (E.g. omit ``(rip)``, even though it's PC-relative.)
5035 No additional modifiers.
5041 The call instructions that wrap inline asm nodes may have a
5042 "``!srcloc``" MDNode attached to it that contains a list of constant
5043 integers. If present, the code generator will use the integer as the
5044 location cookie value when report errors through the ``LLVMContext``
5045 error reporting mechanisms. This allows a front-end to correlate backend
5046 errors that occur with inline asm back to the source code that produced
5049 .. code-block:: llvm
5051 call void asm sideeffect "something bad", ""(), !srcloc !42
5053 !42 = !{ i32 1234567 }
5055 It is up to the front-end to make sense of the magic numbers it places
5056 in the IR. If the MDNode contains multiple constants, the code generator
5057 will use the one that corresponds to the line of the asm that the error
5065 LLVM IR allows metadata to be attached to instructions and global objects in the
5066 program that can convey extra information about the code to the optimizers and
5067 code generator. One example application of metadata is source-level
5068 debug information. There are two metadata primitives: strings and nodes.
5070 Metadata does not have a type, and is not a value. If referenced from a
5071 ``call`` instruction, it uses the ``metadata`` type.
5073 All metadata are identified in syntax by an exclamation point ('``!``').
5075 .. _metadata-string:
5077 Metadata Nodes and Metadata Strings
5078 -----------------------------------
5080 A metadata string is a string surrounded by double quotes. It can
5081 contain any character by escaping non-printable characters with
5082 "``\xx``" where "``xx``" is the two digit hex code. For example:
5085 Metadata nodes are represented with notation similar to structure
5086 constants (a comma separated list of elements, surrounded by braces and
5087 preceded by an exclamation point). Metadata nodes can have any values as
5088 their operand. For example:
5090 .. code-block:: llvm
5092 !{ !"test\00", i32 10}
5094 Metadata nodes that aren't uniqued use the ``distinct`` keyword. For example:
5096 .. code-block:: text
5098 !0 = distinct !{!"test\00", i32 10}
5100 ``distinct`` nodes are useful when nodes shouldn't be merged based on their
5101 content. They can also occur when transformations cause uniquing collisions
5102 when metadata operands change.
5104 A :ref:`named metadata <namedmetadatastructure>` is a collection of
5105 metadata nodes, which can be looked up in the module symbol table. For
5108 .. code-block:: llvm
5112 Metadata can be used as function arguments. Here the ``llvm.dbg.value``
5113 intrinsic is using three metadata arguments:
5115 .. code-block:: llvm
5117 call void @llvm.dbg.value(metadata !24, metadata !25, metadata !26)
5119 Metadata can be attached to an instruction. Here metadata ``!21`` is attached
5120 to the ``add`` instruction using the ``!dbg`` identifier:
5122 .. code-block:: llvm
5124 %indvar.next = add i64 %indvar, 1, !dbg !21
5126 Instructions may not have multiple metadata attachments with the same
5129 Metadata can also be attached to a function or a global variable. Here metadata
5130 ``!22`` is attached to the ``f1`` and ``f2`` functions, and the globals ``g1``
5131 and ``g2`` using the ``!dbg`` identifier:
5133 .. code-block:: llvm
5135 declare !dbg !22 void @f1()
5136 define void @f2() !dbg !22 {
5140 @g1 = global i32 0, !dbg !22
5141 @g2 = external global i32, !dbg !22
5143 Unlike instructions, global objects (functions and global variables) may have
5144 multiple metadata attachments with the same identifier.
5146 A transformation is required to drop any metadata attachment that it does not
5147 know or know it can't preserve. Currently there is an exception for metadata
5148 attachment to globals for ``!type`` and ``!absolute_symbol`` which can't be
5149 unconditionally dropped unless the global is itself deleted.
5151 Metadata attached to a module using named metadata may not be dropped, with
5152 the exception of debug metadata (named metadata with the name ``!llvm.dbg.*``).
5154 More information about specific metadata nodes recognized by the
5155 optimizers and code generator is found below.
5157 .. _specialized-metadata:
5159 Specialized Metadata Nodes
5160 ^^^^^^^^^^^^^^^^^^^^^^^^^^
5162 Specialized metadata nodes are custom data structures in metadata (as opposed
5163 to generic tuples). Their fields are labelled, and can be specified in any
5166 These aren't inherently debug info centric, but currently all the specialized
5167 metadata nodes are related to debug info.
5174 ``DICompileUnit`` nodes represent a compile unit. The ``enums:``,
5175 ``retainedTypes:``, ``globals:``, ``imports:`` and ``macros:`` fields are tuples
5176 containing the debug info to be emitted along with the compile unit, regardless
5177 of code optimizations (some nodes are only emitted if there are references to
5178 them from instructions). The ``debugInfoForProfiling:`` field is a boolean
5179 indicating whether or not line-table discriminators are updated to provide
5180 more-accurate debug info for profiling results.
5182 .. code-block:: text
5184 !0 = !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang",
5185 isOptimized: true, flags: "-O2", runtimeVersion: 2,
5186 splitDebugFilename: "abc.debug", emissionKind: FullDebug,
5187 enums: !2, retainedTypes: !3, globals: !4, imports: !5,
5188 macros: !6, dwoId: 0x0abcd)
5190 Compile unit descriptors provide the root scope for objects declared in a
5191 specific compilation unit. File descriptors are defined using this scope. These
5192 descriptors are collected by a named metadata node ``!llvm.dbg.cu``. They keep
5193 track of global variables, type information, and imported entities (declarations
5201 ``DIFile`` nodes represent files. The ``filename:`` can include slashes.
5203 .. code-block:: none
5205 !0 = !DIFile(filename: "path/to/file", directory: "/path/to/dir",
5206 checksumkind: CSK_MD5,
5207 checksum: "000102030405060708090a0b0c0d0e0f")
5209 Files are sometimes used in ``scope:`` fields, and are the only valid target
5210 for ``file:`` fields.
5211 Valid values for ``checksumkind:`` field are: {CSK_None, CSK_MD5, CSK_SHA1, CSK_SHA256}
5218 ``DIBasicType`` nodes represent primitive types, such as ``int``, ``bool`` and
5219 ``float``. ``tag:`` defaults to ``DW_TAG_base_type``.
5221 .. code-block:: text
5223 !0 = !DIBasicType(name: "unsigned char", size: 8, align: 8,
5224 encoding: DW_ATE_unsigned_char)
5225 !1 = !DIBasicType(tag: DW_TAG_unspecified_type, name: "decltype(nullptr)")
5227 The ``encoding:`` describes the details of the type. Usually it's one of the
5230 .. code-block:: text
5236 DW_ATE_signed_char = 6
5238 DW_ATE_unsigned_char = 8
5240 .. _DISubroutineType:
5245 ``DISubroutineType`` nodes represent subroutine types. Their ``types:`` field
5246 refers to a tuple; the first operand is the return type, while the rest are the
5247 types of the formal arguments in order. If the first operand is ``null``, that
5248 represents a function with no return value (such as ``void foo() {}`` in C++).
5250 .. code-block:: text
5252 !0 = !BasicType(name: "int", size: 32, align: 32, DW_ATE_signed)
5253 !1 = !BasicType(name: "char", size: 8, align: 8, DW_ATE_signed_char)
5254 !2 = !DISubroutineType(types: !{null, !0, !1}) ; void (int, char)
5261 ``DIDerivedType`` nodes represent types derived from other types, such as
5264 .. code-block:: text
5266 !0 = !DIBasicType(name: "unsigned char", size: 8, align: 8,
5267 encoding: DW_ATE_unsigned_char)
5268 !1 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !0, size: 32,
5271 The following ``tag:`` values are valid:
5273 .. code-block:: text
5276 DW_TAG_pointer_type = 15
5277 DW_TAG_reference_type = 16
5279 DW_TAG_inheritance = 28
5280 DW_TAG_ptr_to_member_type = 31
5281 DW_TAG_const_type = 38
5283 DW_TAG_volatile_type = 53
5284 DW_TAG_restrict_type = 55
5285 DW_TAG_atomic_type = 71
5287 .. _DIDerivedTypeMember:
5289 ``DW_TAG_member`` is used to define a member of a :ref:`composite type
5290 <DICompositeType>`. The type of the member is the ``baseType:``. The
5291 ``offset:`` is the member's bit offset. If the composite type has an ODR
5292 ``identifier:`` and does not set ``flags: DIFwdDecl``, then the member is
5293 uniqued based only on its ``name:`` and ``scope:``.
5295 ``DW_TAG_inheritance`` and ``DW_TAG_friend`` are used in the ``elements:``
5296 field of :ref:`composite types <DICompositeType>` to describe parents and
5299 ``DW_TAG_typedef`` is used to provide a name for the ``baseType:``.
5301 ``DW_TAG_pointer_type``, ``DW_TAG_reference_type``, ``DW_TAG_const_type``,
5302 ``DW_TAG_volatile_type``, ``DW_TAG_restrict_type`` and ``DW_TAG_atomic_type``
5303 are used to qualify the ``baseType:``.
5305 Note that the ``void *`` type is expressed as a type derived from NULL.
5307 .. _DICompositeType:
5312 ``DICompositeType`` nodes represent types composed of other types, like
5313 structures and unions. ``elements:`` points to a tuple of the composed types.
5315 If the source language supports ODR, the ``identifier:`` field gives the unique
5316 identifier used for type merging between modules. When specified,
5317 :ref:`subprogram declarations <DISubprogramDeclaration>` and :ref:`member
5318 derived types <DIDerivedTypeMember>` that reference the ODR-type in their
5319 ``scope:`` change uniquing rules.
5321 For a given ``identifier:``, there should only be a single composite type that
5322 does not have ``flags: DIFlagFwdDecl`` set. LLVM tools that link modules
5323 together will unique such definitions at parse time via the ``identifier:``
5324 field, even if the nodes are ``distinct``.
5326 .. code-block:: text
5328 !0 = !DIEnumerator(name: "SixKind", value: 7)
5329 !1 = !DIEnumerator(name: "SevenKind", value: 7)
5330 !2 = !DIEnumerator(name: "NegEightKind", value: -8)
5331 !3 = !DICompositeType(tag: DW_TAG_enumeration_type, name: "Enum", file: !12,
5332 line: 2, size: 32, align: 32, identifier: "_M4Enum",
5333 elements: !{!0, !1, !2})
5335 The following ``tag:`` values are valid:
5337 .. code-block:: text
5339 DW_TAG_array_type = 1
5340 DW_TAG_class_type = 2
5341 DW_TAG_enumeration_type = 4
5342 DW_TAG_structure_type = 19
5343 DW_TAG_union_type = 23
5345 For ``DW_TAG_array_type``, the ``elements:`` should be :ref:`subrange
5346 descriptors <DISubrange>`, each representing the range of subscripts at that
5347 level of indexing. The ``DIFlagVector`` flag to ``flags:`` indicates that an
5348 array type is a native packed vector. The optional ``dataLocation`` is a
5349 DIExpression that describes how to get from an object's address to the actual
5350 raw data, if they aren't equivalent. This is only supported for array types,
5351 particularly to describe Fortran arrays, which have an array descriptor in
5352 addition to the array data. Alternatively it can also be DIVariable which
5353 has the address of the actual raw data. The Fortran language supports pointer
5354 arrays which can be attached to actual arrays, this attachment between pointer
5355 and pointee is called association. The optional ``associated`` is a
5356 DIExpression that describes whether the pointer array is currently associated.
5357 The optional ``allocated`` is a DIExpression that describes whether the
5358 allocatable array is currently allocated. The optional ``rank`` is a
5359 DIExpression that describes the rank (number of dimensions) of fortran assumed
5360 rank array (rank is known at runtime).
5362 For ``DW_TAG_enumeration_type``, the ``elements:`` should be :ref:`enumerator
5363 descriptors <DIEnumerator>`, each representing the definition of an enumeration
5364 value for the set. All enumeration type descriptors are collected in the
5365 ``enums:`` field of the :ref:`compile unit <DICompileUnit>`.
5367 For ``DW_TAG_structure_type``, ``DW_TAG_class_type``, and
5368 ``DW_TAG_union_type``, the ``elements:`` should be :ref:`derived types
5369 <DIDerivedType>` with ``tag: DW_TAG_member``, ``tag: DW_TAG_inheritance``, or
5370 ``tag: DW_TAG_friend``; or :ref:`subprograms <DISubprogram>` with
5371 ``isDefinition: false``.
5378 ``DISubrange`` nodes are the elements for ``DW_TAG_array_type`` variants of
5379 :ref:`DICompositeType`.
5381 - ``count: -1`` indicates an empty array.
5382 - ``count: !9`` describes the count with a :ref:`DILocalVariable`.
5383 - ``count: !11`` describes the count with a :ref:`DIGlobalVariable`.
5385 .. code-block:: text
5387 !0 = !DISubrange(count: 5, lowerBound: 0) ; array counting from 0
5388 !1 = !DISubrange(count: 5, lowerBound: 1) ; array counting from 1
5389 !2 = !DISubrange(count: -1) ; empty array.
5391 ; Scopes used in rest of example
5392 !6 = !DIFile(filename: "vla.c", directory: "/path/to/file")
5393 !7 = distinct !DICompileUnit(language: DW_LANG_C99, file: !6)
5394 !8 = distinct !DISubprogram(name: "foo", scope: !7, file: !6, line: 5)
5396 ; Use of local variable as count value
5397 !9 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
5398 !10 = !DILocalVariable(name: "count", scope: !8, file: !6, line: 42, type: !9)
5399 !11 = !DISubrange(count: !10, lowerBound: 0)
5401 ; Use of global variable as count value
5402 !12 = !DIGlobalVariable(name: "count", scope: !8, file: !6, line: 22, type: !9)
5403 !13 = !DISubrange(count: !12, lowerBound: 0)
5410 ``DIEnumerator`` nodes are the elements for ``DW_TAG_enumeration_type``
5411 variants of :ref:`DICompositeType`.
5413 .. code-block:: text
5415 !0 = !DIEnumerator(name: "SixKind", value: 7)
5416 !1 = !DIEnumerator(name: "SevenKind", value: 7)
5417 !2 = !DIEnumerator(name: "NegEightKind", value: -8)
5419 DITemplateTypeParameter
5420 """""""""""""""""""""""
5422 ``DITemplateTypeParameter`` nodes represent type parameters to generic source
5423 language constructs. They are used (optionally) in :ref:`DICompositeType` and
5424 :ref:`DISubprogram` ``templateParams:`` fields.
5426 .. code-block:: text
5428 !0 = !DITemplateTypeParameter(name: "Ty", type: !1)
5430 DITemplateValueParameter
5431 """"""""""""""""""""""""
5433 ``DITemplateValueParameter`` nodes represent value parameters to generic source
5434 language constructs. ``tag:`` defaults to ``DW_TAG_template_value_parameter``,
5435 but if specified can also be set to ``DW_TAG_GNU_template_template_param`` or
5436 ``DW_TAG_GNU_template_param_pack``. They are used (optionally) in
5437 :ref:`DICompositeType` and :ref:`DISubprogram` ``templateParams:`` fields.
5439 .. code-block:: text
5441 !0 = !DITemplateValueParameter(name: "Ty", type: !1, value: i32 7)
5446 ``DINamespace`` nodes represent namespaces in the source language.
5448 .. code-block:: text
5450 !0 = !DINamespace(name: "myawesomeproject", scope: !1, file: !2, line: 7)
5452 .. _DIGlobalVariable:
5457 ``DIGlobalVariable`` nodes represent global variables in the source language.
5459 .. code-block:: text
5461 @foo = global i32, !dbg !0
5462 !0 = !DIGlobalVariableExpression(var: !1, expr: !DIExpression())
5463 !1 = !DIGlobalVariable(name: "foo", linkageName: "foo", scope: !2,
5464 file: !3, line: 7, type: !4, isLocal: true,
5465 isDefinition: false, declaration: !5)
5468 DIGlobalVariableExpression
5469 """"""""""""""""""""""""""
5471 ``DIGlobalVariableExpression`` nodes tie a :ref:`DIGlobalVariable` together
5472 with a :ref:`DIExpression`.
5474 .. code-block:: text
5476 @lower = global i32, !dbg !0
5477 @upper = global i32, !dbg !1
5478 !0 = !DIGlobalVariableExpression(
5480 expr: !DIExpression(DW_OP_LLVM_fragment, 0, 32)
5482 !1 = !DIGlobalVariableExpression(
5484 expr: !DIExpression(DW_OP_LLVM_fragment, 32, 32)
5486 !2 = !DIGlobalVariable(name: "split64", linkageName: "split64", scope: !3,
5487 file: !4, line: 8, type: !5, declaration: !6)
5489 All global variable expressions should be referenced by the `globals:` field of
5490 a :ref:`compile unit <DICompileUnit>`.
5497 ``DISubprogram`` nodes represent functions from the source language. A distinct
5498 ``DISubprogram`` may be attached to a function definition using ``!dbg``
5499 metadata. A unique ``DISubprogram`` may be attached to a function declaration
5500 used for call site debug info. The ``retainedNodes:`` field is a list of
5501 :ref:`variables <DILocalVariable>` and :ref:`labels <DILabel>` that must be
5502 retained, even if their IR counterparts are optimized out of the IR. The
5503 ``type:`` field must point at an :ref:`DISubroutineType`.
5505 .. _DISubprogramDeclaration:
5507 When ``isDefinition: false``, subprograms describe a declaration in the type
5508 tree as opposed to a definition of a function. If the scope is a composite
5509 type with an ODR ``identifier:`` and that does not set ``flags: DIFwdDecl``,
5510 then the subprogram declaration is uniqued based only on its ``linkageName:``
5513 .. code-block:: text
5515 define void @_Z3foov() !dbg !0 {
5519 !0 = distinct !DISubprogram(name: "foo", linkageName: "_Zfoov", scope: !1,
5520 file: !2, line: 7, type: !3, isLocal: true,
5521 isDefinition: true, scopeLine: 8,
5523 virtuality: DW_VIRTUALITY_pure_virtual,
5524 virtualIndex: 10, flags: DIFlagPrototyped,
5525 isOptimized: true, unit: !5, templateParams: !6,
5526 declaration: !7, retainedNodes: !8,
5534 ``DILexicalBlock`` nodes describe nested blocks within a :ref:`subprogram
5535 <DISubprogram>`. The line number and column numbers are used to distinguish
5536 two lexical blocks at same depth. They are valid targets for ``scope:``
5539 .. code-block:: text
5541 !0 = distinct !DILexicalBlock(scope: !1, file: !2, line: 7, column: 35)
5543 Usually lexical blocks are ``distinct`` to prevent node merging based on
5546 .. _DILexicalBlockFile:
5551 ``DILexicalBlockFile`` nodes are used to discriminate between sections of a
5552 :ref:`lexical block <DILexicalBlock>`. The ``file:`` field can be changed to
5553 indicate textual inclusion, or the ``discriminator:`` field can be used to
5554 discriminate between control flow within a single block in the source language.
5556 .. code-block:: text
5558 !0 = !DILexicalBlock(scope: !3, file: !4, line: 7, column: 35)
5559 !1 = !DILexicalBlockFile(scope: !0, file: !4, discriminator: 0)
5560 !2 = !DILexicalBlockFile(scope: !0, file: !4, discriminator: 1)
5567 ``DILocation`` nodes represent source debug locations. The ``scope:`` field is
5568 mandatory, and points at an :ref:`DILexicalBlockFile`, an
5569 :ref:`DILexicalBlock`, or an :ref:`DISubprogram`.
5571 .. code-block:: text
5573 !0 = !DILocation(line: 2900, column: 42, scope: !1, inlinedAt: !2)
5575 .. _DILocalVariable:
5580 ``DILocalVariable`` nodes represent local variables in the source language. If
5581 the ``arg:`` field is set to non-zero, then this variable is a subprogram
5582 parameter, and it will be included in the ``retainedNodes:`` field of its
5583 :ref:`DISubprogram`.
5585 .. code-block:: text
5587 !0 = !DILocalVariable(name: "this", arg: 1, scope: !3, file: !2, line: 7,
5588 type: !3, flags: DIFlagArtificial)
5589 !1 = !DILocalVariable(name: "x", arg: 2, scope: !4, file: !2, line: 7,
5591 !2 = !DILocalVariable(name: "y", scope: !5, file: !2, line: 7, type: !3)
5598 ``DIExpression`` nodes represent expressions that are inspired by the DWARF
5599 expression language. They are used in :ref:`debug intrinsics<dbg_intrinsics>`
5600 (such as ``llvm.dbg.declare`` and ``llvm.dbg.value``) to describe how the
5601 referenced LLVM variable relates to the source language variable. Debug
5602 intrinsics are interpreted left-to-right: start by pushing the value/address
5603 operand of the intrinsic onto a stack, then repeatedly push and evaluate
5604 opcodes from the DIExpression until the final variable description is produced.
5606 The current supported opcode vocabulary is limited:
5608 - ``DW_OP_deref`` dereferences the top of the expression stack.
5609 - ``DW_OP_plus`` pops the last two entries from the expression stack, adds
5610 them together and appends the result to the expression stack.
5611 - ``DW_OP_minus`` pops the last two entries from the expression stack, subtracts
5612 the last entry from the second last entry and appends the result to the
5614 - ``DW_OP_plus_uconst, 93`` adds ``93`` to the working expression.
5615 - ``DW_OP_LLVM_fragment, 16, 8`` specifies the offset and size (``16`` and ``8``
5616 here, respectively) of the variable fragment from the working expression. Note
5617 that contrary to DW_OP_bit_piece, the offset is describing the location
5618 within the described source variable.
5619 - ``DW_OP_LLVM_convert, 16, DW_ATE_signed`` specifies a bit size and encoding
5620 (``16`` and ``DW_ATE_signed`` here, respectively) to which the top of the
5621 expression stack is to be converted. Maps into a ``DW_OP_convert`` operation
5622 that references a base type constructed from the supplied values.
5623 - ``DW_OP_LLVM_tag_offset, tag_offset`` specifies that a memory tag should be
5624 optionally applied to the pointer. The memory tag is derived from the
5625 given tag offset in an implementation-defined manner.
5626 - ``DW_OP_swap`` swaps top two stack entries.
5627 - ``DW_OP_xderef`` provides extended dereference mechanism. The entry at the top
5628 of the stack is treated as an address. The second stack entry is treated as an
5629 address space identifier.
5630 - ``DW_OP_stack_value`` marks a constant value.
5631 - ``DW_OP_LLVM_entry_value, N`` may only appear in MIR and at the
5632 beginning of a ``DIExpression``. In DWARF a ``DBG_VALUE``
5633 instruction binding a ``DIExpression(DW_OP_LLVM_entry_value`` to a
5634 register is lowered to a ``DW_OP_entry_value [reg]``, pushing the
5635 value the register had upon function entry onto the stack. The next
5636 ``(N - 1)`` operations will be part of the ``DW_OP_entry_value``
5637 block argument. For example, ``!DIExpression(DW_OP_LLVM_entry_value,
5638 1, DW_OP_plus_uconst, 123, DW_OP_stack_value)`` specifies an
5639 expression where the entry value of the debug value instruction's
5640 value/address operand is pushed to the stack, and is added
5641 with 123. Due to framework limitations ``N`` can currently only
5644 The operation is introduced by the ``LiveDebugValues`` pass, which
5645 applies it only to function parameters that are unmodified
5646 throughout the function. Support is limited to simple register
5647 location descriptions, or as indirect locations (e.g., when a struct
5648 is passed-by-value to a callee via a pointer to a temporary copy
5649 made in the caller). The entry value op is also introduced by the
5650 ``AsmPrinter`` pass when a call site parameter value
5651 (``DW_AT_call_site_parameter_value``) is represented as entry value
5653 - ``DW_OP_LLVM_arg, N`` is used in debug intrinsics that refer to more than one
5654 value, such as one that calculates the sum of two registers. This is always
5655 used in combination with an ordered list of values, such that
5656 ``DW_OP_LLVM_arg, N`` refers to the ``N``th element in that list. For
5657 example, ``!DIExpression(DW_OP_LLVM_arg, 0, DW_OP_LLVM_arg, 1, DW_OP_minus,
5658 DW_OP_stack_value)`` used with the list ``(%reg1, %reg2)`` would evaluate to
5659 ``%reg1 - reg2``. This list of values should be provided by the containing
5660 intrinsic/instruction.
5661 - ``DW_OP_breg`` (or ``DW_OP_bregx``) represents a content on the provided
5662 signed offset of the specified register. The opcode is only generated by the
5663 ``AsmPrinter`` pass to describe call site parameter value which requires an
5664 expression over two registers.
5665 - ``DW_OP_push_object_address`` pushes the address of the object which can then
5666 serve as a descriptor in subsequent calculation. This opcode can be used to
5667 calculate bounds of fortran allocatable array which has array descriptors.
5668 - ``DW_OP_over`` duplicates the entry currently second in the stack at the top
5669 of the stack. This opcode can be used to calculate bounds of fortran assumed
5670 rank array which has rank known at run time and current dimension number is
5671 implicitly first element of the stack.
5672 - ``DW_OP_LLVM_implicit_pointer`` It specifies the dereferenced value. It can
5673 be used to represent pointer variables which are optimized out but the value
5674 it points to is known. This operator is required as it is different than DWARF
5675 operator DW_OP_implicit_pointer in representation and specification (number
5676 and types of operands) and later can not be used as multiple level.
5678 .. code-block:: text
5682 call void @llvm.dbg.value(metadata i32 4, metadata !17, metadata !20)
5683 !17 = !DILocalVariable(name: "ptr1", scope: !12, file: !3, line: 5,
5685 !18 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !19, size: 64)
5686 !19 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
5687 !20 = !DIExpression(DW_OP_LLVM_implicit_pointer))
5691 call void @llvm.dbg.value(metadata i32 4, metadata !17, metadata !21)
5692 !17 = !DILocalVariable(name: "ptr1", scope: !12, file: !3, line: 5,
5694 !18 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !19, size: 64)
5695 !19 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !20, size: 64)
5696 !20 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
5697 !21 = !DIExpression(DW_OP_LLVM_implicit_pointer,
5698 DW_OP_LLVM_implicit_pointer))
5700 DWARF specifies three kinds of simple location descriptions: Register, memory,
5701 and implicit location descriptions. Note that a location description is
5702 defined over certain ranges of a program, i.e the location of a variable may
5703 change over the course of the program. Register and memory location
5704 descriptions describe the *concrete location* of a source variable (in the
5705 sense that a debugger might modify its value), whereas *implicit locations*
5706 describe merely the actual *value* of a source variable which might not exist
5707 in registers or in memory (see ``DW_OP_stack_value``).
5709 A ``llvm.dbg.addr`` or ``llvm.dbg.declare`` intrinsic describes an indirect
5710 value (the address) of a source variable. The first operand of the intrinsic
5711 must be an address of some kind. A DIExpression attached to the intrinsic
5712 refines this address to produce a concrete location for the source variable.
5714 A ``llvm.dbg.value`` intrinsic describes the direct value of a source variable.
5715 The first operand of the intrinsic may be a direct or indirect value. A
5716 DIExpression attached to the intrinsic refines the first operand to produce a
5717 direct value. For example, if the first operand is an indirect value, it may be
5718 necessary to insert ``DW_OP_deref`` into the DIExpression in order to produce a
5719 valid debug intrinsic.
5723 A DIExpression is interpreted in the same way regardless of which kind of
5724 debug intrinsic it's attached to.
5726 .. code-block:: text
5728 !0 = !DIExpression(DW_OP_deref)
5729 !1 = !DIExpression(DW_OP_plus_uconst, 3)
5730 !1 = !DIExpression(DW_OP_constu, 3, DW_OP_plus)
5731 !2 = !DIExpression(DW_OP_bit_piece, 3, 7)
5732 !3 = !DIExpression(DW_OP_deref, DW_OP_constu, 3, DW_OP_plus, DW_OP_LLVM_fragment, 3, 7)
5733 !4 = !DIExpression(DW_OP_constu, 2, DW_OP_swap, DW_OP_xderef)
5734 !5 = !DIExpression(DW_OP_constu, 42, DW_OP_stack_value)
5739 ``DIArgList`` nodes hold a list of constant or SSA value references. These are
5740 used in :ref:`debug intrinsics<dbg_intrinsics>` (currently only in
5741 ``llvm.dbg.value``) in combination with a ``DIExpression`` that uses the
5742 ``DW_OP_LLVM_arg`` operator. Because a DIArgList may refer to local values
5743 within a function, it must only be used as a function argument, must always be
5744 inlined, and cannot appear in named metadata.
5746 .. code-block:: text
5748 llvm.dbg.value(metadata !DIArgList(i32 %a, i32 %b),
5750 metadata !DIExpression(DW_OP_LLVM_arg, 0, DW_OP_LLVM_arg, 1, DW_OP_plus))
5755 These flags encode various properties of DINodes.
5757 The `ExportSymbols` flag marks a class, struct or union whose members
5758 may be referenced as if they were defined in the containing class or
5759 union. This flag is used to decide whether the DW_AT_export_symbols can
5760 be used for the structure type.
5765 ``DIObjCProperty`` nodes represent Objective-C property nodes.
5767 .. code-block:: text
5769 !3 = !DIObjCProperty(name: "foo", file: !1, line: 7, setter: "setFoo",
5770 getter: "getFoo", attributes: 7, type: !2)
5775 ``DIImportedEntity`` nodes represent entities (such as modules) imported into a
5778 .. code-block:: text
5780 !2 = !DIImportedEntity(tag: DW_TAG_imported_module, name: "foo", scope: !0,
5781 entity: !1, line: 7)
5786 ``DIMacro`` nodes represent definition or undefinition of a macro identifiers.
5787 The ``name:`` field is the macro identifier, followed by macro parameters when
5788 defining a function-like macro, and the ``value`` field is the token-string
5789 used to expand the macro identifier.
5791 .. code-block:: text
5793 !2 = !DIMacro(macinfo: DW_MACINFO_define, line: 7, name: "foo(x)",
5795 !3 = !DIMacro(macinfo: DW_MACINFO_undef, line: 30, name: "foo")
5800 ``DIMacroFile`` nodes represent inclusion of source files.
5801 The ``nodes:`` field is a list of ``DIMacro`` and ``DIMacroFile`` nodes that
5802 appear in the included source file.
5804 .. code-block:: text
5806 !2 = !DIMacroFile(macinfo: DW_MACINFO_start_file, line: 7, file: !2,
5814 ``DILabel`` nodes represent labels within a :ref:`DISubprogram`. All fields of
5815 a ``DILabel`` are mandatory. The ``scope:`` field must be one of either a
5816 :ref:`DILexicalBlockFile`, a :ref:`DILexicalBlock`, or a :ref:`DISubprogram`.
5817 The ``name:`` field is the label identifier. The ``file:`` field is the
5818 :ref:`DIFile` the label is present in. The ``line:`` field is the source line
5819 within the file where the label is declared.
5821 .. code-block:: text
5823 !2 = !DILabel(scope: !0, name: "foo", file: !1, line: 7)
5828 In LLVM IR, memory does not have types, so LLVM's own type system is not
5829 suitable for doing type based alias analysis (TBAA). Instead, metadata is
5830 added to the IR to describe a type system of a higher level language. This
5831 can be used to implement C/C++ strict type aliasing rules, but it can also
5832 be used to implement custom alias analysis behavior for other languages.
5834 This description of LLVM's TBAA system is broken into two parts:
5835 :ref:`Semantics<tbaa_node_semantics>` talks about high level issues, and
5836 :ref:`Representation<tbaa_node_representation>` talks about the metadata
5837 encoding of various entities.
5839 It is always possible to trace any TBAA node to a "root" TBAA node (details
5840 in the :ref:`Representation<tbaa_node_representation>` section). TBAA
5841 nodes with different roots have an unknown aliasing relationship, and LLVM
5842 conservatively infers ``MayAlias`` between them. The rules mentioned in
5843 this section only pertain to TBAA nodes living under the same root.
5845 .. _tbaa_node_semantics:
5850 The TBAA metadata system, referred to as "struct path TBAA" (not to be
5851 confused with ``tbaa.struct``), consists of the following high level
5852 concepts: *Type Descriptors*, further subdivided into scalar type
5853 descriptors and struct type descriptors; and *Access Tags*.
5855 **Type descriptors** describe the type system of the higher level language
5856 being compiled. **Scalar type descriptors** describe types that do not
5857 contain other types. Each scalar type has a parent type, which must also
5858 be a scalar type or the TBAA root. Via this parent relation, scalar types
5859 within a TBAA root form a tree. **Struct type descriptors** denote types
5860 that contain a sequence of other type descriptors, at known offsets. These
5861 contained type descriptors can either be struct type descriptors themselves
5862 or scalar type descriptors.
5864 **Access tags** are metadata nodes attached to load and store instructions.
5865 Access tags use type descriptors to describe the *location* being accessed
5866 in terms of the type system of the higher level language. Access tags are
5867 tuples consisting of a base type, an access type and an offset. The base
5868 type is a scalar type descriptor or a struct type descriptor, the access
5869 type is a scalar type descriptor, and the offset is a constant integer.
5871 The access tag ``(BaseTy, AccessTy, Offset)`` can describe one of two
5874 * If ``BaseTy`` is a struct type, the tag describes a memory access (load
5875 or store) of a value of type ``AccessTy`` contained in the struct type
5876 ``BaseTy`` at offset ``Offset``.
5878 * If ``BaseTy`` is a scalar type, ``Offset`` must be 0 and ``BaseTy`` and
5879 ``AccessTy`` must be the same; and the access tag describes a scalar
5880 access with scalar type ``AccessTy``.
5882 We first define an ``ImmediateParent`` relation on ``(BaseTy, Offset)``
5885 * If ``BaseTy`` is a scalar type then ``ImmediateParent(BaseTy, 0)`` is
5886 ``(ParentTy, 0)`` where ``ParentTy`` is the parent of the scalar type as
5887 described in the TBAA metadata. ``ImmediateParent(BaseTy, Offset)`` is
5888 undefined if ``Offset`` is non-zero.
5890 * If ``BaseTy`` is a struct type then ``ImmediateParent(BaseTy, Offset)``
5891 is ``(NewTy, NewOffset)`` where ``NewTy`` is the type contained in
5892 ``BaseTy`` at offset ``Offset`` and ``NewOffset`` is ``Offset`` adjusted
5893 to be relative within that inner type.
5895 A memory access with an access tag ``(BaseTy1, AccessTy1, Offset1)``
5896 aliases a memory access with an access tag ``(BaseTy2, AccessTy2,
5897 Offset2)`` if either ``(BaseTy1, Offset1)`` is reachable from ``(Base2,
5898 Offset2)`` via the ``Parent`` relation or vice versa.
5900 As a concrete example, the type descriptor graph for the following program
5906 float f; // offset 4
5910 float f; // offset 0
5911 double d; // offset 4
5912 struct Inner inner_a; // offset 12
5915 void f(struct Outer* outer, struct Inner* inner, float* f, int* i, char* c) {
5916 outer->f = 0; // tag0: (OuterStructTy, FloatScalarTy, 0)
5917 outer->inner_a.i = 0; // tag1: (OuterStructTy, IntScalarTy, 12)
5918 outer->inner_a.f = 0.0; // tag2: (OuterStructTy, FloatScalarTy, 16)
5919 *f = 0.0; // tag3: (FloatScalarTy, FloatScalarTy, 0)
5922 is (note that in C and C++, ``char`` can be used to access any arbitrary
5925 .. code-block:: text
5928 CharScalarTy = ("char", Root, 0)
5929 FloatScalarTy = ("float", CharScalarTy, 0)
5930 DoubleScalarTy = ("double", CharScalarTy, 0)
5931 IntScalarTy = ("int", CharScalarTy, 0)
5932 InnerStructTy = {"Inner" (IntScalarTy, 0), (FloatScalarTy, 4)}
5933 OuterStructTy = {"Outer", (FloatScalarTy, 0), (DoubleScalarTy, 4),
5934 (InnerStructTy, 12)}
5937 with (e.g.) ``ImmediateParent(OuterStructTy, 12)`` = ``(InnerStructTy,
5938 0)``, ``ImmediateParent(InnerStructTy, 0)`` = ``(IntScalarTy, 0)``, and
5939 ``ImmediateParent(IntScalarTy, 0)`` = ``(CharScalarTy, 0)``.
5941 .. _tbaa_node_representation:
5946 The root node of a TBAA type hierarchy is an ``MDNode`` with 0 operands or
5947 with exactly one ``MDString`` operand.
5949 Scalar type descriptors are represented as an ``MDNode`` s with two
5950 operands. The first operand is an ``MDString`` denoting the name of the
5951 struct type. LLVM does not assign meaning to the value of this operand, it
5952 only cares about it being an ``MDString``. The second operand is an
5953 ``MDNode`` which points to the parent for said scalar type descriptor,
5954 which is either another scalar type descriptor or the TBAA root. Scalar
5955 type descriptors can have an optional third argument, but that must be the
5956 constant integer zero.
5958 Struct type descriptors are represented as ``MDNode`` s with an odd number
5959 of operands greater than 1. The first operand is an ``MDString`` denoting
5960 the name of the struct type. Like in scalar type descriptors the actual
5961 value of this name operand is irrelevant to LLVM. After the name operand,
5962 the struct type descriptors have a sequence of alternating ``MDNode`` and
5963 ``ConstantInt`` operands. With N starting from 1, the 2N - 1 th operand,
5964 an ``MDNode``, denotes a contained field, and the 2N th operand, a
5965 ``ConstantInt``, is the offset of the said contained field. The offsets
5966 must be in non-decreasing order.
5968 Access tags are represented as ``MDNode`` s with either 3 or 4 operands.
5969 The first operand is an ``MDNode`` pointing to the node representing the
5970 base type. The second operand is an ``MDNode`` pointing to the node
5971 representing the access type. The third operand is a ``ConstantInt`` that
5972 states the offset of the access. If a fourth field is present, it must be
5973 a ``ConstantInt`` valued at 0 or 1. If it is 1 then the access tag states
5974 that the location being accessed is "constant" (meaning
5975 ``pointsToConstantMemory`` should return true; see `other useful
5976 AliasAnalysis methods <AliasAnalysis.html#OtherItfs>`_). The TBAA root of
5977 the access type and the base type of an access tag must be the same, and
5978 that is the TBAA root of the access tag.
5980 '``tbaa.struct``' Metadata
5981 ^^^^^^^^^^^^^^^^^^^^^^^^^^
5983 The :ref:`llvm.memcpy <int_memcpy>` is often used to implement
5984 aggregate assignment operations in C and similar languages, however it
5985 is defined to copy a contiguous region of memory, which is more than
5986 strictly necessary for aggregate types which contain holes due to
5987 padding. Also, it doesn't contain any TBAA information about the fields
5990 ``!tbaa.struct`` metadata can describe which memory subregions in a
5991 memcpy are padding and what the TBAA tags of the struct are.
5993 The current metadata format is very simple. ``!tbaa.struct`` metadata
5994 nodes are a list of operands which are in conceptual groups of three.
5995 For each group of three, the first operand gives the byte offset of a
5996 field in bytes, the second gives its size in bytes, and the third gives
5999 .. code-block:: llvm
6001 !4 = !{ i64 0, i64 4, !1, i64 8, i64 4, !2 }
6003 This describes a struct with two fields. The first is at offset 0 bytes
6004 with size 4 bytes, and has tbaa tag !1. The second is at offset 8 bytes
6005 and has size 4 bytes and has tbaa tag !2.
6007 Note that the fields need not be contiguous. In this example, there is a
6008 4 byte gap between the two fields. This gap represents padding which
6009 does not carry useful data and need not be preserved.
6011 '``noalias``' and '``alias.scope``' Metadata
6012 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6014 ``noalias`` and ``alias.scope`` metadata provide the ability to specify generic
6015 noalias memory-access sets. This means that some collection of memory access
6016 instructions (loads, stores, memory-accessing calls, etc.) that carry
6017 ``noalias`` metadata can specifically be specified not to alias with some other
6018 collection of memory access instructions that carry ``alias.scope`` metadata.
6019 Each type of metadata specifies a list of scopes where each scope has an id and
6022 When evaluating an aliasing query, if for some domain, the set
6023 of scopes with that domain in one instruction's ``alias.scope`` list is a
6024 subset of (or equal to) the set of scopes for that domain in another
6025 instruction's ``noalias`` list, then the two memory accesses are assumed not to
6028 Because scopes in one domain don't affect scopes in other domains, separate
6029 domains can be used to compose multiple independent noalias sets. This is
6030 used for example during inlining. As the noalias function parameters are
6031 turned into noalias scope metadata, a new domain is used every time the
6032 function is inlined.
6034 The metadata identifying each domain is itself a list containing one or two
6035 entries. The first entry is the name of the domain. Note that if the name is a
6036 string then it can be combined across functions and translation units. A
6037 self-reference can be used to create globally unique domain names. A
6038 descriptive string may optionally be provided as a second list entry.
6040 The metadata identifying each scope is also itself a list containing two or
6041 three entries. The first entry is the name of the scope. Note that if the name
6042 is a string then it can be combined across functions and translation units. A
6043 self-reference can be used to create globally unique scope names. A metadata
6044 reference to the scope's domain is the second entry. A descriptive string may
6045 optionally be provided as a third list entry.
6049 .. code-block:: llvm
6051 ; Two scope domains:
6055 ; Some scopes in these domains:
6061 !5 = !{!4} ; A list containing only scope !4
6065 ; These two instructions don't alias:
6066 %0 = load float, float* %c, align 4, !alias.scope !5
6067 store float %0, float* %arrayidx.i, align 4, !noalias !5
6069 ; These two instructions also don't alias (for domain !1, the set of scopes
6070 ; in the !alias.scope equals that in the !noalias list):
6071 %2 = load float, float* %c, align 4, !alias.scope !5
6072 store float %2, float* %arrayidx.i2, align 4, !noalias !6
6074 ; These two instructions may alias (for domain !0, the set of scopes in
6075 ; the !noalias list is not a superset of, or equal to, the scopes in the
6076 ; !alias.scope list):
6077 %2 = load float, float* %c, align 4, !alias.scope !6
6078 store float %0, float* %arrayidx.i, align 4, !noalias !7
6080 '``fpmath``' Metadata
6081 ^^^^^^^^^^^^^^^^^^^^^
6083 ``fpmath`` metadata may be attached to any instruction of floating-point
6084 type. It can be used to express the maximum acceptable error in the
6085 result of that instruction, in ULPs, thus potentially allowing the
6086 compiler to use a more efficient but less accurate method of computing
6087 it. ULP is defined as follows:
6089 If ``x`` is a real number that lies between two finite consecutive
6090 floating-point numbers ``a`` and ``b``, without being equal to one
6091 of them, then ``ulp(x) = |b - a|``, otherwise ``ulp(x)`` is the
6092 distance between the two non-equal finite floating-point numbers
6093 nearest ``x``. Moreover, ``ulp(NaN)`` is ``NaN``.
6095 The metadata node shall consist of a single positive float type number
6096 representing the maximum relative error, for example:
6098 .. code-block:: llvm
6100 !0 = !{ float 2.5 } ; maximum acceptable inaccuracy is 2.5 ULPs
6104 '``range``' Metadata
6105 ^^^^^^^^^^^^^^^^^^^^
6107 ``range`` metadata may be attached only to ``load``, ``call`` and ``invoke`` of
6108 integer types. It expresses the possible ranges the loaded value or the value
6109 returned by the called function at this call site is in. If the loaded or
6110 returned value is not in the specified range, the behavior is undefined. The
6111 ranges are represented with a flattened list of integers. The loaded value or
6112 the value returned is known to be in the union of the ranges defined by each
6113 consecutive pair. Each pair has the following properties:
6115 - The type must match the type loaded by the instruction.
6116 - The pair ``a,b`` represents the range ``[a,b)``.
6117 - Both ``a`` and ``b`` are constants.
6118 - The range is allowed to wrap.
6119 - The range should not represent the full or empty set. That is,
6122 In addition, the pairs must be in signed order of the lower bound and
6123 they must be non-contiguous.
6127 .. code-block:: llvm
6129 %a = load i8, i8* %x, align 1, !range !0 ; Can only be 0 or 1
6130 %b = load i8, i8* %y, align 1, !range !1 ; Can only be 255 (-1), 0 or 1
6131 %c = call i8 @foo(), !range !2 ; Can only be 0, 1, 3, 4 or 5
6132 %d = invoke i8 @bar() to label %cont
6133 unwind label %lpad, !range !3 ; Can only be -2, -1, 3, 4 or 5
6135 !0 = !{ i8 0, i8 2 }
6136 !1 = !{ i8 255, i8 2 }
6137 !2 = !{ i8 0, i8 2, i8 3, i8 6 }
6138 !3 = !{ i8 -2, i8 0, i8 3, i8 6 }
6140 '``absolute_symbol``' Metadata
6141 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6143 ``absolute_symbol`` metadata may be attached to a global variable
6144 declaration. It marks the declaration as a reference to an absolute symbol,
6145 which causes the backend to use absolute relocations for the symbol even
6146 in position independent code, and expresses the possible ranges that the
6147 global variable's *address* (not its value) is in, in the same format as
6148 ``range`` metadata, with the extension that the pair ``all-ones,all-ones``
6149 may be used to represent the full set.
6151 Example (assuming 64-bit pointers):
6153 .. code-block:: llvm
6155 @a = external global i8, !absolute_symbol !0 ; Absolute symbol in range [0,256)
6156 @b = external global i8, !absolute_symbol !1 ; Absolute symbol in range [0,2^64)
6159 !0 = !{ i64 0, i64 256 }
6160 !1 = !{ i64 -1, i64 -1 }
6162 '``callees``' Metadata
6163 ^^^^^^^^^^^^^^^^^^^^^^
6165 ``callees`` metadata may be attached to indirect call sites. If ``callees``
6166 metadata is attached to a call site, and any callee is not among the set of
6167 functions provided by the metadata, the behavior is undefined. The intent of
6168 this metadata is to facilitate optimizations such as indirect-call promotion.
6169 For example, in the code below, the call instruction may only target the
6170 ``add`` or ``sub`` functions:
6172 .. code-block:: llvm
6174 %result = call i64 %binop(i64 %x, i64 %y), !callees !0
6177 !0 = !{i64 (i64, i64)* @add, i64 (i64, i64)* @sub}
6179 '``callback``' Metadata
6180 ^^^^^^^^^^^^^^^^^^^^^^^
6182 ``callback`` metadata may be attached to a function declaration, or definition.
6183 (Call sites are excluded only due to the lack of a use case.) For ease of
6184 exposition, we'll refer to the function annotated w/ metadata as a broker
6185 function. The metadata describes how the arguments of a call to the broker are
6186 in turn passed to the callback function specified by the metadata. Thus, the
6187 ``callback`` metadata provides a partial description of a call site inside the
6188 broker function with regards to the arguments of a call to the broker. The only
6189 semantic restriction on the broker function itself is that it is not allowed to
6190 inspect or modify arguments referenced in the ``callback`` metadata as
6191 pass-through to the callback function.
6193 The broker is not required to actually invoke the callback function at runtime.
6194 However, the assumptions about not inspecting or modifying arguments that would
6195 be passed to the specified callback function still hold, even if the callback
6196 function is not dynamically invoked. The broker is allowed to invoke the
6197 callback function more than once per invocation of the broker. The broker is
6198 also allowed to invoke (directly or indirectly) the function passed as a
6199 callback through another use. Finally, the broker is also allowed to relay the
6200 callback callee invocation to a different thread.
6202 The metadata is structured as follows: At the outer level, ``callback``
6203 metadata is a list of ``callback`` encodings. Each encoding starts with a
6204 constant ``i64`` which describes the argument position of the callback function
6205 in the call to the broker. The following elements, except the last, describe
6206 what arguments are passed to the callback function. Each element is again an
6207 ``i64`` constant identifying the argument of the broker that is passed through,
6208 or ``i64 -1`` to indicate an unknown or inspected argument. The order in which
6209 they are listed has to be the same in which they are passed to the callback
6210 callee. The last element of the encoding is a boolean which specifies how
6211 variadic arguments of the broker are handled. If it is true, all variadic
6212 arguments of the broker are passed through to the callback function *after* the
6213 arguments encoded explicitly before.
6215 In the code below, the ``pthread_create`` function is marked as a broker
6216 through the ``!callback !1`` metadata. In the example, there is only one
6217 callback encoding, namely ``!2``, associated with the broker. This encoding
6218 identifies the callback function as the second argument of the broker (``i64
6219 2``) and the sole argument of the callback function as the third one of the
6220 broker function (``i64 3``).
6222 .. FIXME why does the llvm-sphinx-docs builder give a highlighting
6223 error if the below is set to highlight as 'llvm', despite that we
6224 have misc.highlighting_failure set?
6226 .. code-block:: text
6228 declare !callback !1 dso_local i32 @pthread_create(i64*, %union.pthread_attr_t*, i8* (i8*)*, i8*)
6231 !2 = !{i64 2, i64 3, i1 false}
6234 Another example is shown below. The callback callee is the second argument of
6235 the ``__kmpc_fork_call`` function (``i64 2``). The callee is given two unknown
6236 values (each identified by a ``i64 -1``) and afterwards all
6237 variadic arguments that are passed to the ``__kmpc_fork_call`` call (due to the
6240 .. FIXME why does the llvm-sphinx-docs builder give a highlighting
6241 error if the below is set to highlight as 'llvm', despite that we
6242 have misc.highlighting_failure set?
6244 .. code-block:: text
6246 declare !callback !0 dso_local void @__kmpc_fork_call(%struct.ident_t*, i32, void (i32*, i32*, ...)*, ...)
6249 !1 = !{i64 2, i64 -1, i64 -1, i1 true}
6253 '``unpredictable``' Metadata
6254 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6256 ``unpredictable`` metadata may be attached to any branch or switch
6257 instruction. It can be used to express the unpredictability of control
6258 flow. Similar to the llvm.expect intrinsic, it may be used to alter
6259 optimizations related to compare and branch instructions. The metadata
6260 is treated as a boolean value; if it exists, it signals that the branch
6261 or switch that it is attached to is completely unpredictable.
6263 .. _md_dereferenceable:
6265 '``dereferenceable``' Metadata
6266 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6268 The existence of the ``!dereferenceable`` metadata on the instruction
6269 tells the optimizer that the value loaded is known to be dereferenceable.
6270 The number of bytes known to be dereferenceable is specified by the integer
6271 value in the metadata node. This is analogous to the ''dereferenceable''
6272 attribute on parameters and return values.
6274 .. _md_dereferenceable_or_null:
6276 '``dereferenceable_or_null``' Metadata
6277 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6279 The existence of the ``!dereferenceable_or_null`` metadata on the
6280 instruction tells the optimizer that the value loaded is known to be either
6281 dereferenceable or null.
6282 The number of bytes known to be dereferenceable is specified by the integer
6283 value in the metadata node. This is analogous to the ''dereferenceable_or_null''
6284 attribute on parameters and return values.
6291 It is sometimes useful to attach information to loop constructs. Currently,
6292 loop metadata is implemented as metadata attached to the branch instruction
6293 in the loop latch block. The loop metadata node is a list of
6294 other metadata nodes, each representing a property of the loop. Usually,
6295 the first item of the property node is a string. For example, the
6296 ``llvm.loop.unroll.count`` suggests an unroll factor to the loop
6299 .. code-block:: llvm
6301 br i1 %exitcond, label %._crit_edge, label %.lr.ph, !llvm.loop !0
6304 !1 = !{!"llvm.loop.unroll.enable"}
6305 !2 = !{!"llvm.loop.unroll.count", i32 4}
6307 For legacy reasons, the first item of a loop metadata node must be a
6308 reference to itself. Before the advent of the 'distinct' keyword, this
6309 forced the preservation of otherwise identical metadata nodes. Since
6310 the loop-metadata node can be attached to multiple nodes, the 'distinct'
6311 keyword has become unnecessary.
6313 Prior to the property nodes, one or two ``DILocation`` (debug location)
6314 nodes can be present in the list. The first, if present, identifies the
6315 source-code location where the loop begins. The second, if present,
6316 identifies the source-code location where the loop ends.
6318 Loop metadata nodes cannot be used as unique identifiers. They are
6319 neither persistent for the same loop through transformations nor
6320 necessarily unique to just one loop.
6322 '``llvm.loop.disable_nonforced``'
6323 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6325 This metadata disables all optional loop transformations unless
6326 explicitly instructed using other transformation metadata such as
6327 ``llvm.loop.unroll.enable``. That is, no heuristic will try to determine
6328 whether a transformation is profitable. The purpose is to avoid that the
6329 loop is transformed to a different loop before an explicitly requested
6330 (forced) transformation is applied. For instance, loop fusion can make
6331 other transformations impossible. Mandatory loop canonicalizations such
6332 as loop rotation are still applied.
6334 It is recommended to use this metadata in addition to any llvm.loop.*
6335 transformation directive. Also, any loop should have at most one
6336 directive applied to it (and a sequence of transformations built using
6337 followup-attributes). Otherwise, which transformation will be applied
6338 depends on implementation details such as the pass pipeline order.
6340 See :ref:`transformation-metadata` for details.
6342 '``llvm.loop.vectorize``' and '``llvm.loop.interleave``'
6343 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6345 Metadata prefixed with ``llvm.loop.vectorize`` or ``llvm.loop.interleave`` are
6346 used to control per-loop vectorization and interleaving parameters such as
6347 vectorization width and interleave count. These metadata should be used in
6348 conjunction with ``llvm.loop`` loop identification metadata. The
6349 ``llvm.loop.vectorize`` and ``llvm.loop.interleave`` metadata are only
6350 optimization hints and the optimizer will only interleave and vectorize loops if
6351 it believes it is safe to do so. The ``llvm.loop.parallel_accesses`` metadata
6352 which contains information about loop-carried memory dependencies can be helpful
6353 in determining the safety of these transformations.
6355 '``llvm.loop.interleave.count``' Metadata
6356 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6358 This metadata suggests an interleave count to the loop interleaver.
6359 The first operand is the string ``llvm.loop.interleave.count`` and the
6360 second operand is an integer specifying the interleave count. For
6363 .. code-block:: llvm
6365 !0 = !{!"llvm.loop.interleave.count", i32 4}
6367 Note that setting ``llvm.loop.interleave.count`` to 1 disables interleaving
6368 multiple iterations of the loop. If ``llvm.loop.interleave.count`` is set to 0
6369 then the interleave count will be determined automatically.
6371 '``llvm.loop.vectorize.enable``' Metadata
6372 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6374 This metadata selectively enables or disables vectorization for the loop. The
6375 first operand is the string ``llvm.loop.vectorize.enable`` and the second operand
6376 is a bit. If the bit operand value is 1 vectorization is enabled. A value of
6377 0 disables vectorization:
6379 .. code-block:: llvm
6381 !0 = !{!"llvm.loop.vectorize.enable", i1 0}
6382 !1 = !{!"llvm.loop.vectorize.enable", i1 1}
6384 '``llvm.loop.vectorize.predicate.enable``' Metadata
6385 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6387 This metadata selectively enables or disables creating predicated instructions
6388 for the loop, which can enable folding of the scalar epilogue loop into the
6389 main loop. The first operand is the string
6390 ``llvm.loop.vectorize.predicate.enable`` and the second operand is a bit. If
6391 the bit operand value is 1 vectorization is enabled. A value of 0 disables
6394 .. code-block:: llvm
6396 !0 = !{!"llvm.loop.vectorize.predicate.enable", i1 0}
6397 !1 = !{!"llvm.loop.vectorize.predicate.enable", i1 1}
6399 '``llvm.loop.vectorize.scalable.enable``' Metadata
6400 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6402 This metadata selectively enables or disables scalable vectorization for the
6403 loop, and only has any effect if vectorization for the loop is already enabled.
6404 The first operand is the string ``llvm.loop.vectorize.scalable.enable``
6405 and the second operand is a bit. If the bit operand value is 1 scalable
6406 vectorization is enabled, whereas a value of 0 reverts to the default fixed
6407 width vectorization:
6409 .. code-block:: llvm
6411 !0 = !{!"llvm.loop.vectorize.scalable.enable", i1 0}
6412 !1 = !{!"llvm.loop.vectorize.scalable.enable", i1 1}
6414 '``llvm.loop.vectorize.width``' Metadata
6415 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6417 This metadata sets the target width of the vectorizer. The first
6418 operand is the string ``llvm.loop.vectorize.width`` and the second
6419 operand is an integer specifying the width. For example:
6421 .. code-block:: llvm
6423 !0 = !{!"llvm.loop.vectorize.width", i32 4}
6425 Note that setting ``llvm.loop.vectorize.width`` to 1 disables
6426 vectorization of the loop. If ``llvm.loop.vectorize.width`` is set to
6427 0 or if the loop does not have this metadata the width will be
6428 determined automatically.
6430 '``llvm.loop.vectorize.followup_vectorized``' Metadata
6431 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6433 This metadata defines which loop attributes the vectorized loop will
6434 have. See :ref:`transformation-metadata` for details.
6436 '``llvm.loop.vectorize.followup_epilogue``' Metadata
6437 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6439 This metadata defines which loop attributes the epilogue will have. The
6440 epilogue is not vectorized and is executed when either the vectorized
6441 loop is not known to preserve semantics (because e.g., it processes two
6442 arrays that are found to alias by a runtime check) or for the last
6443 iterations that do not fill a complete set of vector lanes. See
6444 :ref:`Transformation Metadata <transformation-metadata>` for details.
6446 '``llvm.loop.vectorize.followup_all``' Metadata
6447 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6449 Attributes in the metadata will be added to both the vectorized and
6451 See :ref:`Transformation Metadata <transformation-metadata>` for details.
6453 '``llvm.loop.unroll``'
6454 ^^^^^^^^^^^^^^^^^^^^^^
6456 Metadata prefixed with ``llvm.loop.unroll`` are loop unrolling
6457 optimization hints such as the unroll factor. ``llvm.loop.unroll``
6458 metadata should be used in conjunction with ``llvm.loop`` loop
6459 identification metadata. The ``llvm.loop.unroll`` metadata are only
6460 optimization hints and the unrolling will only be performed if the
6461 optimizer believes it is safe to do so.
6463 '``llvm.loop.unroll.count``' Metadata
6464 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6466 This metadata suggests an unroll factor to the loop unroller. The
6467 first operand is the string ``llvm.loop.unroll.count`` and the second
6468 operand is a positive integer specifying the unroll factor. For
6471 .. code-block:: llvm
6473 !0 = !{!"llvm.loop.unroll.count", i32 4}
6475 If the trip count of the loop is less than the unroll count the loop
6476 will be partially unrolled.
6478 '``llvm.loop.unroll.disable``' Metadata
6479 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6481 This metadata disables loop unrolling. The metadata has a single operand
6482 which is the string ``llvm.loop.unroll.disable``. For example:
6484 .. code-block:: llvm
6486 !0 = !{!"llvm.loop.unroll.disable"}
6488 '``llvm.loop.unroll.runtime.disable``' Metadata
6489 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6491 This metadata disables runtime loop unrolling. The metadata has a single
6492 operand which is the string ``llvm.loop.unroll.runtime.disable``. For example:
6494 .. code-block:: llvm
6496 !0 = !{!"llvm.loop.unroll.runtime.disable"}
6498 '``llvm.loop.unroll.enable``' Metadata
6499 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6501 This metadata suggests that the loop should be fully unrolled if the trip count
6502 is known at compile time and partially unrolled if the trip count is not known
6503 at compile time. The metadata has a single operand which is the string
6504 ``llvm.loop.unroll.enable``. For example:
6506 .. code-block:: llvm
6508 !0 = !{!"llvm.loop.unroll.enable"}
6510 '``llvm.loop.unroll.full``' Metadata
6511 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6513 This metadata suggests that the loop should be unrolled fully. The
6514 metadata has a single operand which is the string ``llvm.loop.unroll.full``.
6517 .. code-block:: llvm
6519 !0 = !{!"llvm.loop.unroll.full"}
6521 '``llvm.loop.unroll.followup``' Metadata
6522 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6524 This metadata defines which loop attributes the unrolled loop will have.
6525 See :ref:`Transformation Metadata <transformation-metadata>` for details.
6527 '``llvm.loop.unroll.followup_remainder``' Metadata
6528 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6530 This metadata defines which loop attributes the remainder loop after
6531 partial/runtime unrolling will have. See
6532 :ref:`Transformation Metadata <transformation-metadata>` for details.
6534 '``llvm.loop.unroll_and_jam``'
6535 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6537 This metadata is treated very similarly to the ``llvm.loop.unroll`` metadata
6538 above, but affect the unroll and jam pass. In addition any loop with
6539 ``llvm.loop.unroll`` metadata but no ``llvm.loop.unroll_and_jam`` metadata will
6540 disable unroll and jam (so ``llvm.loop.unroll`` metadata will be left to the
6541 unroller, plus ``llvm.loop.unroll.disable`` metadata will disable unroll and jam
6544 The metadata for unroll and jam otherwise is the same as for ``unroll``.
6545 ``llvm.loop.unroll_and_jam.enable``, ``llvm.loop.unroll_and_jam.disable`` and
6546 ``llvm.loop.unroll_and_jam.count`` do the same as for unroll.
6547 ``llvm.loop.unroll_and_jam.full`` is not supported. Again these are only hints
6548 and the normal safety checks will still be performed.
6550 '``llvm.loop.unroll_and_jam.count``' Metadata
6551 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6553 This metadata suggests an unroll and jam factor to use, similarly to
6554 ``llvm.loop.unroll.count``. The first operand is the string
6555 ``llvm.loop.unroll_and_jam.count`` and the second operand is a positive integer
6556 specifying the unroll factor. For example:
6558 .. code-block:: llvm
6560 !0 = !{!"llvm.loop.unroll_and_jam.count", i32 4}
6562 If the trip count of the loop is less than the unroll count the loop
6563 will be partially unroll and jammed.
6565 '``llvm.loop.unroll_and_jam.disable``' Metadata
6566 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6568 This metadata disables loop unroll and jamming. The metadata has a single
6569 operand which is the string ``llvm.loop.unroll_and_jam.disable``. For example:
6571 .. code-block:: llvm
6573 !0 = !{!"llvm.loop.unroll_and_jam.disable"}
6575 '``llvm.loop.unroll_and_jam.enable``' Metadata
6576 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6578 This metadata suggests that the loop should be fully unroll and jammed if the
6579 trip count is known at compile time and partially unrolled if the trip count is
6580 not known at compile time. The metadata has a single operand which is the
6581 string ``llvm.loop.unroll_and_jam.enable``. For example:
6583 .. code-block:: llvm
6585 !0 = !{!"llvm.loop.unroll_and_jam.enable"}
6587 '``llvm.loop.unroll_and_jam.followup_outer``' Metadata
6588 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6590 This metadata defines which loop attributes the outer unrolled loop will
6591 have. See :ref:`Transformation Metadata <transformation-metadata>` for
6594 '``llvm.loop.unroll_and_jam.followup_inner``' Metadata
6595 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6597 This metadata defines which loop attributes the inner jammed loop will
6598 have. See :ref:`Transformation Metadata <transformation-metadata>` for
6601 '``llvm.loop.unroll_and_jam.followup_remainder_outer``' Metadata
6602 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6604 This metadata defines which attributes the epilogue of the outer loop
6605 will have. This loop is usually unrolled, meaning there is no such
6606 loop. This attribute will be ignored in this case. See
6607 :ref:`Transformation Metadata <transformation-metadata>` for details.
6609 '``llvm.loop.unroll_and_jam.followup_remainder_inner``' Metadata
6610 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6612 This metadata defines which attributes the inner loop of the epilogue
6613 will have. The outer epilogue will usually be unrolled, meaning there
6614 can be multiple inner remainder loops. See
6615 :ref:`Transformation Metadata <transformation-metadata>` for details.
6617 '``llvm.loop.unroll_and_jam.followup_all``' Metadata
6618 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6620 Attributes specified in the metadata is added to all
6621 ``llvm.loop.unroll_and_jam.*`` loops. See
6622 :ref:`Transformation Metadata <transformation-metadata>` for details.
6624 '``llvm.loop.licm_versioning.disable``' Metadata
6625 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6627 This metadata indicates that the loop should not be versioned for the purpose
6628 of enabling loop-invariant code motion (LICM). The metadata has a single operand
6629 which is the string ``llvm.loop.licm_versioning.disable``. For example:
6631 .. code-block:: llvm
6633 !0 = !{!"llvm.loop.licm_versioning.disable"}
6635 '``llvm.loop.distribute.enable``' Metadata
6636 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6638 Loop distribution allows splitting a loop into multiple loops. Currently,
6639 this is only performed if the entire loop cannot be vectorized due to unsafe
6640 memory dependencies. The transformation will attempt to isolate the unsafe
6641 dependencies into their own loop.
6643 This metadata can be used to selectively enable or disable distribution of the
6644 loop. The first operand is the string ``llvm.loop.distribute.enable`` and the
6645 second operand is a bit. If the bit operand value is 1 distribution is
6646 enabled. A value of 0 disables distribution:
6648 .. code-block:: llvm
6650 !0 = !{!"llvm.loop.distribute.enable", i1 0}
6651 !1 = !{!"llvm.loop.distribute.enable", i1 1}
6653 This metadata should be used in conjunction with ``llvm.loop`` loop
6654 identification metadata.
6656 '``llvm.loop.distribute.followup_coincident``' Metadata
6657 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6659 This metadata defines which attributes extracted loops with no cyclic
6660 dependencies will have (i.e. can be vectorized). See
6661 :ref:`Transformation Metadata <transformation-metadata>` for details.
6663 '``llvm.loop.distribute.followup_sequential``' Metadata
6664 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6666 This metadata defines which attributes the isolated loops with unsafe
6667 memory dependencies will have. See
6668 :ref:`Transformation Metadata <transformation-metadata>` for details.
6670 '``llvm.loop.distribute.followup_fallback``' Metadata
6671 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6673 If loop versioning is necessary, this metadata defined the attributes
6674 the non-distributed fallback version will have. See
6675 :ref:`Transformation Metadata <transformation-metadata>` for details.
6677 '``llvm.loop.distribute.followup_all``' Metadata
6678 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6680 The attributes in this metadata is added to all followup loops of the
6681 loop distribution pass. See
6682 :ref:`Transformation Metadata <transformation-metadata>` for details.
6684 '``llvm.licm.disable``' Metadata
6685 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6687 This metadata indicates that loop-invariant code motion (LICM) should not be
6688 performed on this loop. The metadata has a single operand which is the string
6689 ``llvm.licm.disable``. For example:
6691 .. code-block:: llvm
6693 !0 = !{!"llvm.licm.disable"}
6695 Note that although it operates per loop it isn't given the llvm.loop prefix
6696 as it is not affected by the ``llvm.loop.disable_nonforced`` metadata.
6698 '``llvm.access.group``' Metadata
6699 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6701 ``llvm.access.group`` metadata can be attached to any instruction that
6702 potentially accesses memory. It can point to a single distinct metadata
6703 node, which we call access group. This node represents all memory access
6704 instructions referring to it via ``llvm.access.group``. When an
6705 instruction belongs to multiple access groups, it can also point to a
6706 list of accesses groups, illustrated by the following example.
6708 .. code-block:: llvm
6710 %val = load i32, i32* %arrayidx, !llvm.access.group !0
6716 It is illegal for the list node to be empty since it might be confused
6717 with an access group.
6719 The access group metadata node must be 'distinct' to avoid collapsing
6720 multiple access groups by content. A access group metadata node must
6721 always be empty which can be used to distinguish an access group
6722 metadata node from a list of access groups. Being empty avoids the
6723 situation that the content must be updated which, because metadata is
6724 immutable by design, would required finding and updating all references
6725 to the access group node.
6727 The access group can be used to refer to a memory access instruction
6728 without pointing to it directly (which is not possible in global
6729 metadata). Currently, the only metadata making use of it is
6730 ``llvm.loop.parallel_accesses``.
6732 '``llvm.loop.parallel_accesses``' Metadata
6733 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6735 The ``llvm.loop.parallel_accesses`` metadata refers to one or more
6736 access group metadata nodes (see ``llvm.access.group``). It denotes that
6737 no loop-carried memory dependence exist between it and other instructions
6738 in the loop with this metadata.
6740 Let ``m1`` and ``m2`` be two instructions that both have the
6741 ``llvm.access.group`` metadata to the access group ``g1``, respectively
6742 ``g2`` (which might be identical). If a loop contains both access groups
6743 in its ``llvm.loop.parallel_accesses`` metadata, then the compiler can
6744 assume that there is no dependency between ``m1`` and ``m2`` carried by
6745 this loop. Instructions that belong to multiple access groups are
6746 considered having this property if at least one of the access groups
6747 matches the ``llvm.loop.parallel_accesses`` list.
6749 If all memory-accessing instructions in a loop have
6750 ``llvm.access.group`` metadata that each refer to one of the access
6751 groups of a loop's ``llvm.loop.parallel_accesses`` metadata, then the
6752 loop has no loop carried memory dependences and is considered to be a
6755 Note that if not all memory access instructions belong to an access
6756 group referred to by ``llvm.loop.parallel_accesses``, then the loop must
6757 not be considered trivially parallel. Additional
6758 memory dependence analysis is required to make that determination. As a fail
6759 safe mechanism, this causes loops that were originally parallel to be considered
6760 sequential (if optimization passes that are unaware of the parallel semantics
6761 insert new memory instructions into the loop body).
6763 Example of a loop that is considered parallel due to its correct use of
6764 both ``llvm.access.group`` and ``llvm.loop.parallel_accesses``
6767 .. code-block:: llvm
6771 %val0 = load i32, i32* %arrayidx, !llvm.access.group !1
6773 store i32 %val0, i32* %arrayidx1, !llvm.access.group !1
6775 br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0
6779 !0 = distinct !{!0, !{!"llvm.loop.parallel_accesses", !1}}
6782 It is also possible to have nested parallel loops:
6784 .. code-block:: llvm
6788 %val1 = load i32, i32* %arrayidx3, !llvm.access.group !4
6790 br label %inner.for.body
6794 %val0 = load i32, i32* %arrayidx1, !llvm.access.group !3
6796 store i32 %val0, i32* %arrayidx2, !llvm.access.group !3
6798 br i1 %exitcond, label %inner.for.end, label %inner.for.body, !llvm.loop !1
6802 store i32 %val1, i32* %arrayidx4, !llvm.access.group !4
6804 br i1 %exitcond, label %outer.for.end, label %outer.for.body, !llvm.loop !2
6806 outer.for.end: ; preds = %for.body
6808 !1 = distinct !{!1, !{!"llvm.loop.parallel_accesses", !3}} ; metadata for the inner loop
6809 !2 = distinct !{!2, !{!"llvm.loop.parallel_accesses", !3, !4}} ; metadata for the outer loop
6810 !3 = distinct !{} ; access group for instructions in the inner loop (which are implicitly contained in outer loop as well)
6811 !4 = distinct !{} ; access group for instructions in the outer, but not the inner loop
6813 '``llvm.loop.mustprogress``' Metadata
6814 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6816 The ``llvm.loop.mustprogress`` metadata indicates that this loop is required to
6817 terminate, unwind, or interact with the environment in an observable way e.g.
6818 via a volatile memory access, I/O, or other synchronization. If such a loop is
6819 not found to interact with the environment in an observable way, the loop may
6820 be removed. This corresponds to the ``mustprogress`` function attribute.
6822 '``irr_loop``' Metadata
6823 ^^^^^^^^^^^^^^^^^^^^^^^
6825 ``irr_loop`` metadata may be attached to the terminator instruction of a basic
6826 block that's an irreducible loop header (note that an irreducible loop has more
6827 than once header basic blocks.) If ``irr_loop`` metadata is attached to the
6828 terminator instruction of a basic block that is not really an irreducible loop
6829 header, the behavior is undefined. The intent of this metadata is to improve the
6830 accuracy of the block frequency propagation. For example, in the code below, the
6831 block ``header0`` may have a loop header weight (relative to the other headers of
6832 the irreducible loop) of 100:
6834 .. code-block:: llvm
6838 br i1 %cmp, label %t1, label %t2, !irr_loop !0
6841 !0 = !{"loop_header_weight", i64 100}
6843 Irreducible loop header weights are typically based on profile data.
6845 .. _md_invariant.group:
6847 '``invariant.group``' Metadata
6848 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6850 The experimental ``invariant.group`` metadata may be attached to
6851 ``load``/``store`` instructions referencing a single metadata with no entries.
6852 The existence of the ``invariant.group`` metadata on the instruction tells
6853 the optimizer that every ``load`` and ``store`` to the same pointer operand
6854 can be assumed to load or store the same
6855 value (but see the ``llvm.launder.invariant.group`` intrinsic which affects
6856 when two pointers are considered the same). Pointers returned by bitcast or
6857 getelementptr with only zero indices are considered the same.
6861 .. code-block:: llvm
6863 @unknownPtr = external global i8
6866 store i8 42, i8* %ptr, !invariant.group !0
6867 call void @foo(i8* %ptr)
6869 %a = load i8, i8* %ptr, !invariant.group !0 ; Can assume that value under %ptr didn't change
6870 call void @foo(i8* %ptr)
6872 %newPtr = call i8* @getPointer(i8* %ptr)
6873 %c = load i8, i8* %newPtr, !invariant.group !0 ; Can't assume anything, because we only have information about %ptr
6875 %unknownValue = load i8, i8* @unknownPtr
6876 store i8 %unknownValue, i8* %ptr, !invariant.group !0 ; Can assume that %unknownValue == 42
6878 call void @foo(i8* %ptr)
6879 %newPtr2 = call i8* @llvm.launder.invariant.group(i8* %ptr)
6880 %d = load i8, i8* %newPtr2, !invariant.group !0 ; Can't step through launder.invariant.group to get value of %ptr
6883 declare void @foo(i8*)
6884 declare i8* @getPointer(i8*)
6885 declare i8* @llvm.launder.invariant.group(i8*)
6889 The invariant.group metadata must be dropped when replacing one pointer by
6890 another based on aliasing information. This is because invariant.group is tied
6891 to the SSA value of the pointer operand.
6893 .. code-block:: llvm
6895 %v = load i8, i8* %x, !invariant.group !0
6896 ; if %x mustalias %y then we can replace the above instruction with
6897 %v = load i8, i8* %y
6899 Note that this is an experimental feature, which means that its semantics might
6900 change in the future.
6905 See :doc:`TypeMetadata`.
6907 '``associated``' Metadata
6908 ^^^^^^^^^^^^^^^^^^^^^^^^^
6910 The ``associated`` metadata may be attached to a global variable definition with
6911 a single argument that references a global object (optionally through an alias).
6913 This metadata lowers to the ELF section flag ``SHF_LINK_ORDER`` which prevents
6914 discarding of the global variable in linker GC unless the referenced object is
6915 also discarded. The linker support for this feature is spotty. For best
6916 compatibility, globals carrying this metadata should:
6918 - Be in ``@llvm.compiler.used``.
6919 - If the referenced global variable is in a comdat, be in the same comdat.
6921 ``!associated`` can not express many-to-one relationship. A global variable with
6922 the metadata should generally not be referenced by a function: the function may
6923 be inlined into other functions, leading to more references to the metadata.
6924 Ideally we would want to keep metadata alive as long as any inline location is
6925 alive, but this many-to-one relationship is not representable. Moreover, if the
6926 metadata is retained while the function is discarded, the linker will report an
6927 error of a relocation referencing a discarded section.
6929 The metadata is often used with an explicit section consisting of valid C
6930 identifiers so that the runtime can find the metadata section with
6931 linker-defined encapsulation symbols ``__start_<section_name>`` and
6932 ``__stop_<section_name>``.
6934 It does not have any effect on non-ELF targets.
6938 .. code-block:: text
6941 @a = global i32 1, comdat $a
6942 @b = internal global i32 2, comdat $a, section "abc", !associated !0
6949 The ``prof`` metadata is used to record profile data in the IR.
6950 The first operand of the metadata node indicates the profile metadata
6951 type. There are currently 3 types:
6952 :ref:`branch_weights<prof_node_branch_weights>`,
6953 :ref:`function_entry_count<prof_node_function_entry_count>`, and
6954 :ref:`VP<prof_node_VP>`.
6956 .. _prof_node_branch_weights:
6961 Branch weight metadata attached to a branch, select, switch or call instruction
6962 represents the likeliness of the associated branch being taken.
6963 For more information, see :doc:`BranchWeightMetadata`.
6965 .. _prof_node_function_entry_count:
6967 function_entry_count
6968 """"""""""""""""""""
6970 Function entry count metadata can be attached to function definitions
6971 to record the number of times the function is called. Used with BFI
6972 information, it is also used to derive the basic block profile count.
6973 For more information, see :doc:`BranchWeightMetadata`.
6980 VP (value profile) metadata can be attached to instructions that have
6981 value profile information. Currently this is indirect calls (where it
6982 records the hottest callees) and calls to memory intrinsics such as memcpy,
6983 memmove, and memset (where it records the hottest byte lengths).
6985 Each VP metadata node contains "VP" string, then a uint32_t value for the value
6986 profiling kind, a uint64_t value for the total number of times the instruction
6987 is executed, followed by uint64_t value and execution count pairs.
6988 The value profiling kind is 0 for indirect call targets and 1 for memory
6989 operations. For indirect call targets, each profile value is a hash
6990 of the callee function name, and for memory operations each value is the
6993 Note that the value counts do not need to add up to the total count
6994 listed in the third operand (in practice only the top hottest values
6995 are tracked and reported).
6997 Indirect call example:
6999 .. code-block:: llvm
7001 call void %f(), !prof !1
7002 !1 = !{!"VP", i32 0, i64 1600, i64 7651369219802541373, i64 1030, i64 -4377547752858689819, i64 410}
7004 Note that the VP type is 0 (the second operand), which indicates this is
7005 an indirect call value profile data. The third operand indicates that the
7006 indirect call executed 1600 times. The 4th and 6th operands give the
7007 hashes of the 2 hottest target functions' names (this is the same hash used
7008 to represent function names in the profile database), and the 5th and 7th
7009 operands give the execution count that each of the respective prior target
7010 functions was called.
7014 '``annotation``' Metadata
7015 ^^^^^^^^^^^^^^^^^^^^^^^^^
7017 The ``annotation`` metadata can be used to attach a tuple of annotation strings
7018 to any instruction. This metadata does not impact the semantics of the program
7019 and may only be used to provide additional insight about the program and
7020 transformations to users.
7024 .. code-block:: text
7026 %a.addr = alloca float*, align 8, !annotation !0
7027 !0 = !{!"auto-init"}
7029 Module Flags Metadata
7030 =====================
7032 Information about the module as a whole is difficult to convey to LLVM's
7033 subsystems. The LLVM IR isn't sufficient to transmit this information.
7034 The ``llvm.module.flags`` named metadata exists in order to facilitate
7035 this. These flags are in the form of key / value pairs --- much like a
7036 dictionary --- making it easy for any subsystem who cares about a flag to
7039 The ``llvm.module.flags`` metadata contains a list of metadata triplets.
7040 Each triplet has the following form:
7042 - The first element is a *behavior* flag, which specifies the behavior
7043 when two (or more) modules are merged together, and it encounters two
7044 (or more) metadata with the same ID. The supported behaviors are
7046 - The second element is a metadata string that is a unique ID for the
7047 metadata. Each module may only have one flag entry for each unique ID (not
7048 including entries with the **Require** behavior).
7049 - The third element is the value of the flag.
7051 When two (or more) modules are merged together, the resulting
7052 ``llvm.module.flags`` metadata is the union of the modules' flags. That is, for
7053 each unique metadata ID string, there will be exactly one entry in the merged
7054 modules ``llvm.module.flags`` metadata table, and the value for that entry will
7055 be determined by the merge behavior flag, as described below. The only exception
7056 is that entries with the *Require* behavior are always preserved.
7058 The following behaviors are supported:
7069 Emits an error if two values disagree, otherwise the resulting value
7070 is that of the operands.
7074 Emits a warning if two values disagree. The result value will be the
7075 operand for the flag from the first module being linked, or the max
7076 if the other module uses **Max** (in which case the resulting flag
7081 Adds a requirement that another module flag be present and have a
7082 specified value after linking is performed. The value must be a
7083 metadata pair, where the first element of the pair is the ID of the
7084 module flag to be restricted, and the second element of the pair is
7085 the value the module flag should be restricted to. This behavior can
7086 be used to restrict the allowable results (via triggering of an
7087 error) of linking IDs with the **Override** behavior.
7091 Uses the specified value, regardless of the behavior or value of the
7092 other module. If both modules specify **Override**, but the values
7093 differ, an error will be emitted.
7097 Appends the two values, which are required to be metadata nodes.
7101 Appends the two values, which are required to be metadata
7102 nodes. However, duplicate entries in the second list are dropped
7103 during the append operation.
7107 Takes the max of the two values, which are required to be integers.
7109 It is an error for a particular unique flag ID to have multiple behaviors,
7110 except in the case of **Require** (which adds restrictions on another metadata
7111 value) or **Override**.
7113 An example of module flags:
7115 .. code-block:: llvm
7117 !0 = !{ i32 1, !"foo", i32 1 }
7118 !1 = !{ i32 4, !"bar", i32 37 }
7119 !2 = !{ i32 2, !"qux", i32 42 }
7120 !3 = !{ i32 3, !"qux",
7125 !llvm.module.flags = !{ !0, !1, !2, !3 }
7127 - Metadata ``!0`` has the ID ``!"foo"`` and the value '1'. The behavior
7128 if two or more ``!"foo"`` flags are seen is to emit an error if their
7129 values are not equal.
7131 - Metadata ``!1`` has the ID ``!"bar"`` and the value '37'. The
7132 behavior if two or more ``!"bar"`` flags are seen is to use the value
7135 - Metadata ``!2`` has the ID ``!"qux"`` and the value '42'. The
7136 behavior if two or more ``!"qux"`` flags are seen is to emit a
7137 warning if their values are not equal.
7139 - Metadata ``!3`` has the ID ``!"qux"`` and the value:
7145 The behavior is to emit an error if the ``llvm.module.flags`` does not
7146 contain a flag with the ID ``!"foo"`` that has the value '1' after linking is
7149 Synthesized Functions Module Flags Metadata
7150 -------------------------------------------
7152 These metadata specify the default attributes synthesized functions should have.
7153 These metadata are currently respected by a few instrumentation passes, such as
7156 These metadata correspond to a few function attributes with significant code
7157 generation behaviors. Function attributes with just optimization purposes
7158 should not be listed because the performance impact of these synthesized
7161 - "frame-pointer": **Max**. The value can be 0, 1, or 2. A synthesized function
7162 will get the "frame-pointer" function attribute, with value being "none",
7163 "non-leaf", or "all", respectively.
7164 - "uwtable": **Max**. The value can be 0 or 1. If the value is 1, a synthesized
7165 function will get the ``uwtable`` function attribute.
7167 Objective-C Garbage Collection Module Flags Metadata
7168 ----------------------------------------------------
7170 On the Mach-O platform, Objective-C stores metadata about garbage
7171 collection in a special section called "image info". The metadata
7172 consists of a version number and a bitmask specifying what types of
7173 garbage collection are supported (if any) by the file. If two or more
7174 modules are linked together their garbage collection metadata needs to
7175 be merged rather than appended together.
7177 The Objective-C garbage collection module flags metadata consists of the
7178 following key-value pairs:
7187 * - ``Objective-C Version``
7188 - **[Required]** --- The Objective-C ABI version. Valid values are 1 and 2.
7190 * - ``Objective-C Image Info Version``
7191 - **[Required]** --- The version of the image info section. Currently
7194 * - ``Objective-C Image Info Section``
7195 - **[Required]** --- The section to place the metadata. Valid values are
7196 ``"__OBJC, __image_info, regular"`` for Objective-C ABI version 1, and
7197 ``"__DATA,__objc_imageinfo, regular, no_dead_strip"`` for
7198 Objective-C ABI version 2.
7200 * - ``Objective-C Garbage Collection``
7201 - **[Required]** --- Specifies whether garbage collection is supported or
7202 not. Valid values are 0, for no garbage collection, and 2, for garbage
7203 collection supported.
7205 * - ``Objective-C GC Only``
7206 - **[Optional]** --- Specifies that only garbage collection is supported.
7207 If present, its value must be 6. This flag requires that the
7208 ``Objective-C Garbage Collection`` flag have the value 2.
7210 Some important flag interactions:
7212 - If a module with ``Objective-C Garbage Collection`` set to 0 is
7213 merged with a module with ``Objective-C Garbage Collection`` set to
7214 2, then the resulting module has the
7215 ``Objective-C Garbage Collection`` flag set to 0.
7216 - A module with ``Objective-C Garbage Collection`` set to 0 cannot be
7217 merged with a module with ``Objective-C GC Only`` set to 6.
7219 C type width Module Flags Metadata
7220 ----------------------------------
7222 The ARM backend emits a section into each generated object file describing the
7223 options that it was compiled with (in a compiler-independent way) to prevent
7224 linking incompatible objects, and to allow automatic library selection. Some
7225 of these options are not visible at the IR level, namely wchar_t width and enum
7228 To pass this information to the backend, these options are encoded in module
7229 flags metadata, using the following key-value pairs:
7239 - * 0 --- sizeof(wchar_t) == 4
7240 * 1 --- sizeof(wchar_t) == 2
7243 - * 0 --- Enums are at least as large as an ``int``.
7244 * 1 --- Enums are stored in the smallest integer type which can
7245 represent all of its values.
7247 For example, the following metadata section specifies that the module was
7248 compiled with a ``wchar_t`` width of 4 bytes, and the underlying type of an
7249 enum is the smallest type which can represent all of its values::
7251 !llvm.module.flags = !{!0, !1}
7252 !0 = !{i32 1, !"short_wchar", i32 1}
7253 !1 = !{i32 1, !"short_enum", i32 0}
7255 LTO Post-Link Module Flags Metadata
7256 -----------------------------------
7258 Some optimisations are only when the entire LTO unit is present in the current
7259 module. This is represented by the ``LTOPostLink`` module flags metadata, which
7260 will be created with a value of ``1`` when LTO linking occurs.
7262 Automatic Linker Flags Named Metadata
7263 =====================================
7265 Some targets support embedding of flags to the linker inside individual object
7266 files. Typically this is used in conjunction with language extensions which
7267 allow source files to contain linker command line options, and have these
7268 automatically be transmitted to the linker via object files.
7270 These flags are encoded in the IR using named metadata with the name
7271 ``!llvm.linker.options``. Each operand is expected to be a metadata node
7272 which should be a list of other metadata nodes, each of which should be a
7273 list of metadata strings defining linker options.
7275 For example, the following metadata section specifies two separate sets of
7276 linker options, presumably to link against ``libz`` and the ``Cocoa``
7280 !1 = !{ !"-framework", !"Cocoa" }
7281 !llvm.linker.options = !{ !0, !1 }
7283 The metadata encoding as lists of lists of options, as opposed to a collapsed
7284 list of options, is chosen so that the IR encoding can use multiple option
7285 strings to specify e.g., a single library, while still having that specifier be
7286 preserved as an atomic element that can be recognized by a target specific
7287 assembly writer or object file emitter.
7289 Each individual option is required to be either a valid option for the target's
7290 linker, or an option that is reserved by the target specific assembly writer or
7291 object file emitter. No other aspect of these options is defined by the IR.
7293 Dependent Libs Named Metadata
7294 =============================
7296 Some targets support embedding of strings into object files to indicate
7297 a set of libraries to add to the link. Typically this is used in conjunction
7298 with language extensions which allow source files to explicitly declare the
7299 libraries they depend on, and have these automatically be transmitted to the
7300 linker via object files.
7302 The list is encoded in the IR using named metadata with the name
7303 ``!llvm.dependent-libraries``. Each operand is expected to be a metadata node
7304 which should contain a single string operand.
7306 For example, the following metadata section contains two library specifiers::
7308 !0 = !{!"a library specifier"}
7309 !1 = !{!"another library specifier"}
7310 !llvm.dependent-libraries = !{ !0, !1 }
7312 Each library specifier will be handled independently by the consuming linker.
7313 The effect of the library specifiers are defined by the consuming linker.
7320 Compiling with `ThinLTO <https://clang.llvm.org/docs/ThinLTO.html>`_
7321 causes the building of a compact summary of the module that is emitted into
7322 the bitcode. The summary is emitted into the LLVM assembly and identified
7323 in syntax by a caret ('``^``').
7325 The summary is parsed into a bitcode output, along with the Module
7326 IR, via the "``llvm-as``" tool. Tools that parse the Module IR for the purposes
7327 of optimization (e.g. "``clang -x ir``" and "``opt``"), will ignore the
7328 summary entries (just as they currently ignore summary entries in a bitcode
7331 Eventually, the summary will be parsed into a ModuleSummaryIndex object under
7332 the same conditions where summary index is currently built from bitcode.
7333 Specifically, tools that test the Thin Link portion of a ThinLTO compile
7334 (i.e. llvm-lto and llvm-lto2), or when parsing a combined index
7335 for a distributed ThinLTO backend via clang's "``-fthinlto-index=<>``" flag
7336 (this part is not yet implemented, use llvm-as to create a bitcode object
7337 before feeding into thin link tools for now).
7339 There are currently 3 types of summary entries in the LLVM assembly:
7340 :ref:`module paths<module_path_summary>`,
7341 :ref:`global values<gv_summary>`, and
7342 :ref:`type identifiers<typeid_summary>`.
7344 .. _module_path_summary:
7346 Module Path Summary Entry
7347 -------------------------
7349 Each module path summary entry lists a module containing global values included
7350 in the summary. For a single IR module there will be one such entry, but
7351 in a combined summary index produced during the thin link, there will be
7352 one module path entry per linked module with summary.
7356 .. code-block:: text
7358 ^0 = module: (path: "/path/to/file.o", hash: (2468601609, 1329373163, 1565878005, 638838075, 3148790418))
7360 The ``path`` field is a string path to the bitcode file, and the ``hash``
7361 field is the 160-bit SHA-1 hash of the IR bitcode contents, used for
7362 incremental builds and caching.
7366 Global Value Summary Entry
7367 --------------------------
7369 Each global value summary entry corresponds to a global value defined or
7370 referenced by a summarized module.
7374 .. code-block:: text
7376 ^4 = gv: (name: "f"[, summaries: (Summary)[, (Summary)]*]?) ; guid = 14740650423002898831
7378 For declarations, there will not be a summary list. For definitions, a
7379 global value will contain a list of summaries, one per module containing
7380 a definition. There can be multiple entries in a combined summary index
7381 for symbols with weak linkage.
7383 Each ``Summary`` format will depend on whether the global value is a
7384 :ref:`function<function_summary>`, :ref:`variable<variable_summary>`, or
7385 :ref:`alias<alias_summary>`.
7387 .. _function_summary:
7392 If the global value is a function, the ``Summary`` entry will look like:
7394 .. code-block:: text
7396 function: (module: ^0, flags: (linkage: external, notEligibleToImport: 0, live: 0, dsoLocal: 0), insts: 2[, FuncFlags]?[, Calls]?[, TypeIdInfo]?[, Params]?[, Refs]?
7398 The ``module`` field includes the summary entry id for the module containing
7399 this definition, and the ``flags`` field contains information such as
7400 the linkage type, a flag indicating whether it is legal to import the
7401 definition, whether it is globally live and whether the linker resolved it
7402 to a local definition (the latter two are populated during the thin link).
7403 The ``insts`` field contains the number of IR instructions in the function.
7404 Finally, there are several optional fields: :ref:`FuncFlags<funcflags_summary>`,
7405 :ref:`Calls<calls_summary>`, :ref:`TypeIdInfo<typeidinfo_summary>`,
7406 :ref:`Params<params_summary>`, :ref:`Refs<refs_summary>`.
7408 .. _variable_summary:
7410 Global Variable Summary
7411 ^^^^^^^^^^^^^^^^^^^^^^^
7413 If the global value is a variable, the ``Summary`` entry will look like:
7415 .. code-block:: text
7417 variable: (module: ^0, flags: (linkage: external, notEligibleToImport: 0, live: 0, dsoLocal: 0)[, Refs]?
7419 The variable entry contains a subset of the fields in a
7420 :ref:`function summary <function_summary>`, see the descriptions there.
7427 If the global value is an alias, the ``Summary`` entry will look like:
7429 .. code-block:: text
7431 alias: (module: ^0, flags: (linkage: external, notEligibleToImport: 0, live: 0, dsoLocal: 0), aliasee: ^2)
7433 The ``module`` and ``flags`` fields are as described for a
7434 :ref:`function summary <function_summary>`. The ``aliasee`` field
7435 contains a reference to the global value summary entry of the aliasee.
7437 .. _funcflags_summary:
7442 The optional ``FuncFlags`` field looks like:
7444 .. code-block:: text
7446 funcFlags: (readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0)
7448 If unspecified, flags are assumed to hold the conservative ``false`` value of
7456 The optional ``Calls`` field looks like:
7458 .. code-block:: text
7460 calls: ((Callee)[, (Callee)]*)
7462 where each ``Callee`` looks like:
7464 .. code-block:: text
7466 callee: ^1[, hotness: None]?[, relbf: 0]?
7468 The ``callee`` refers to the summary entry id of the callee. At most one
7469 of ``hotness`` (which can take the values ``Unknown``, ``Cold``, ``None``,
7470 ``Hot``, and ``Critical``), and ``relbf`` (which holds the integer
7471 branch frequency relative to the entry frequency, scaled down by 2^8)
7472 may be specified. The defaults are ``Unknown`` and ``0``, respectively.
7479 The optional ``Params`` is used by ``StackSafety`` and looks like:
7481 .. code-block:: text
7483 Params: ((Param)[, (Param)]*)
7485 where each ``Param`` describes pointer parameter access inside of the
7486 function and looks like:
7488 .. code-block:: text
7490 param: 4, offset: [0, 5][, calls: ((Callee)[, (Callee)]*)]?
7492 where the first ``param`` is the number of the parameter it describes,
7493 ``offset`` is the inclusive range of offsets from the pointer parameter to bytes
7494 which can be accessed by the function. This range does not include accesses by
7495 function calls from ``calls`` list.
7497 where each ``Callee`` describes how parameter is forwarded into other
7498 functions and looks like:
7500 .. code-block:: text
7502 callee: ^3, param: 5, offset: [-3, 3]
7504 The ``callee`` refers to the summary entry id of the callee, ``param`` is
7505 the number of the callee parameter which points into the callers parameter
7506 with offset known to be inside of the ``offset`` range. ``calls`` will be
7507 consumed and removed by thin link stage to update ``Param::offset`` so it
7508 covers all accesses possible by ``calls``.
7510 Pointer parameter without corresponding ``Param`` is considered unsafe and we
7511 assume that access with any offset is possible.
7515 If we have the following function:
7517 .. code-block:: text
7519 define i64 @foo(i64* %0, i32* %1, i8* %2, i8 %3) {
7520 store i32* %1, i32** @x
7521 %5 = getelementptr inbounds i8, i8* %2, i64 5
7522 %6 = load i8, i8* %5
7523 %7 = getelementptr inbounds i8, i8* %2, i8 %3
7524 tail call void @bar(i8 %3, i8* %7)
7525 %8 = load i64, i64* %0
7529 We can expect the record like this:
7531 .. code-block:: text
7533 params: ((param: 0, offset: [0, 7]),(param: 2, offset: [5, 5], calls: ((callee: ^3, param: 1, offset: [-128, 127]))))
7535 The function may access just 8 bytes of the parameter %0 . ``calls`` is empty,
7536 so the parameter is either not used for function calls or ``offset`` already
7537 covers all accesses from nested function calls.
7538 Parameter %1 escapes, so access is unknown.
7539 The function itself can access just a single byte of the parameter %2. Additional
7540 access is possible inside of the ``@bar`` or ``^3``. The function adds signed
7541 offset to the pointer and passes the result as the argument %1 into ``^3``.
7542 This record itself does not tell us how ``^3`` will access the parameter.
7543 Parameter %3 is not a pointer.
7550 The optional ``Refs`` field looks like:
7552 .. code-block:: text
7554 refs: ((Ref)[, (Ref)]*)
7556 where each ``Ref`` contains a reference to the summary id of the referenced
7557 value (e.g. ``^1``).
7559 .. _typeidinfo_summary:
7564 The optional ``TypeIdInfo`` field, used for
7565 `Control Flow Integrity <https://clang.llvm.org/docs/ControlFlowIntegrity.html>`_,
7568 .. code-block:: text
7570 typeIdInfo: [(TypeTests)]?[, (TypeTestAssumeVCalls)]?[, (TypeCheckedLoadVCalls)]?[, (TypeTestAssumeConstVCalls)]?[, (TypeCheckedLoadConstVCalls)]?
7572 These optional fields have the following forms:
7577 .. code-block:: text
7579 typeTests: (TypeIdRef[, TypeIdRef]*)
7581 Where each ``TypeIdRef`` refers to a :ref:`type id<typeid_summary>`
7582 by summary id or ``GUID``.
7584 TypeTestAssumeVCalls
7585 """"""""""""""""""""
7587 .. code-block:: text
7589 typeTestAssumeVCalls: (VFuncId[, VFuncId]*)
7591 Where each VFuncId has the format:
7593 .. code-block:: text
7595 vFuncId: (TypeIdRef, offset: 16)
7597 Where each ``TypeIdRef`` refers to a :ref:`type id<typeid_summary>`
7598 by summary id or ``GUID`` preceded by a ``guid:`` tag.
7600 TypeCheckedLoadVCalls
7601 """""""""""""""""""""
7603 .. code-block:: text
7605 typeCheckedLoadVCalls: (VFuncId[, VFuncId]*)
7607 Where each VFuncId has the format described for ``TypeTestAssumeVCalls``.
7609 TypeTestAssumeConstVCalls
7610 """""""""""""""""""""""""
7612 .. code-block:: text
7614 typeTestAssumeConstVCalls: (ConstVCall[, ConstVCall]*)
7616 Where each ConstVCall has the format:
7618 .. code-block:: text
7620 (VFuncId, args: (Arg[, Arg]*))
7622 and where each VFuncId has the format described for ``TypeTestAssumeVCalls``,
7623 and each Arg is an integer argument number.
7625 TypeCheckedLoadConstVCalls
7626 """"""""""""""""""""""""""
7628 .. code-block:: text
7630 typeCheckedLoadConstVCalls: (ConstVCall[, ConstVCall]*)
7632 Where each ConstVCall has the format described for
7633 ``TypeTestAssumeConstVCalls``.
7637 Type ID Summary Entry
7638 ---------------------
7640 Each type id summary entry corresponds to a type identifier resolution
7641 which is generated during the LTO link portion of the compile when building
7642 with `Control Flow Integrity <https://clang.llvm.org/docs/ControlFlowIntegrity.html>`_,
7643 so these are only present in a combined summary index.
7647 .. code-block:: text
7649 ^4 = typeid: (name: "_ZTS1A", summary: (typeTestRes: (kind: allOnes, sizeM1BitWidth: 7[, alignLog2: 0]?[, sizeM1: 0]?[, bitMask: 0]?[, inlineBits: 0]?)[, WpdResolutions]?)) ; guid = 7004155349499253778
7651 The ``typeTestRes`` gives the type test resolution ``kind`` (which may
7652 be ``unsat``, ``byteArray``, ``inline``, ``single``, or ``allOnes``), and
7653 the ``size-1`` bit width. It is followed by optional flags, which default to 0,
7654 and an optional WpdResolutions (whole program devirtualization resolution)
7655 field that looks like:
7657 .. code-block:: text
7659 wpdResolutions: ((offset: 0, WpdRes)[, (offset: 1, WpdRes)]*
7661 where each entry is a mapping from the given byte offset to the whole-program
7662 devirtualization resolution WpdRes, that has one of the following formats:
7664 .. code-block:: text
7666 wpdRes: (kind: branchFunnel)
7667 wpdRes: (kind: singleImpl, singleImplName: "_ZN1A1nEi")
7668 wpdRes: (kind: indir)
7670 Additionally, each wpdRes has an optional ``resByArg`` field, which
7671 describes the resolutions for calls with all constant integer arguments:
7673 .. code-block:: text
7675 resByArg: (ResByArg[, ResByArg]*)
7679 .. code-block:: text
7681 args: (Arg[, Arg]*), byArg: (kind: UniformRetVal[, info: 0][, byte: 0][, bit: 0])
7683 Where the ``kind`` can be ``Indir``, ``UniformRetVal``, ``UniqueRetVal``
7684 or ``VirtualConstProp``. The ``info`` field is only used if the kind
7685 is ``UniformRetVal`` (indicates the uniform return value), or
7686 ``UniqueRetVal`` (holds the return value associated with the unique vtable
7687 (0 or 1)). The ``byte`` and ``bit`` fields are only used if the target does
7688 not support the use of absolute symbols to store constants.
7690 .. _intrinsicglobalvariables:
7692 Intrinsic Global Variables
7693 ==========================
7695 LLVM has a number of "magic" global variables that contain data that
7696 affect code generation or other IR semantics. These are documented here.
7697 All globals of this sort should have a section specified as
7698 "``llvm.metadata``". This section and all globals that start with
7699 "``llvm.``" are reserved for use by LLVM.
7703 The '``llvm.used``' Global Variable
7704 -----------------------------------
7706 The ``@llvm.used`` global is an array which has
7707 :ref:`appending linkage <linkage_appending>`. This array contains a list of
7708 pointers to named global variables, functions and aliases which may optionally
7709 have a pointer cast formed of bitcast or getelementptr. For example, a legal
7712 .. code-block:: llvm
7717 @llvm.used = appending global [2 x i8*] [
7719 i8* bitcast (i32* @Y to i8*)
7720 ], section "llvm.metadata"
7722 If a symbol appears in the ``@llvm.used`` list, then the compiler, assembler,
7723 and linker are required to treat the symbol as if there is a reference to the
7724 symbol that it cannot see (which is why they have to be named). For example, if
7725 a variable has internal linkage and no references other than that from the
7726 ``@llvm.used`` list, it cannot be deleted. This is commonly used to represent
7727 references from inline asms and other things the compiler cannot "see", and
7728 corresponds to "``attribute((used))``" in GNU C.
7730 On some targets, the code generator must emit a directive to the
7731 assembler or object file to prevent the assembler and linker from
7732 removing the symbol.
7734 .. _gv_llvmcompilerused:
7736 The '``llvm.compiler.used``' Global Variable
7737 --------------------------------------------
7739 The ``@llvm.compiler.used`` directive is the same as the ``@llvm.used``
7740 directive, except that it only prevents the compiler from touching the
7741 symbol. On targets that support it, this allows an intelligent linker to
7742 optimize references to the symbol without being impeded as it would be
7745 This is a rare construct that should only be used in rare circumstances,
7746 and should not be exposed to source languages.
7748 .. _gv_llvmglobalctors:
7750 The '``llvm.global_ctors``' Global Variable
7751 -------------------------------------------
7753 .. code-block:: llvm
7755 %0 = type { i32, void ()*, i8* }
7756 @llvm.global_ctors = appending global [1 x %0] [%0 { i32 65535, void ()* @ctor, i8* @data }]
7758 The ``@llvm.global_ctors`` array contains a list of constructor
7759 functions, priorities, and an associated global or function.
7760 The functions referenced by this array will be called in ascending order
7761 of priority (i.e. lowest first) when the module is loaded. The order of
7762 functions with the same priority is not defined.
7764 If the third field is non-null, and points to a global variable
7765 or function, the initializer function will only run if the associated
7766 data from the current module is not discarded.
7767 On ELF the referenced global variable or function must be in a comdat.
7769 .. _llvmglobaldtors:
7771 The '``llvm.global_dtors``' Global Variable
7772 -------------------------------------------
7774 .. code-block:: llvm
7776 %0 = type { i32, void ()*, i8* }
7777 @llvm.global_dtors = appending global [1 x %0] [%0 { i32 65535, void ()* @dtor, i8* @data }]
7779 The ``@llvm.global_dtors`` array contains a list of destructor
7780 functions, priorities, and an associated global or function.
7781 The functions referenced by this array will be called in descending
7782 order of priority (i.e. highest first) when the module is unloaded. The
7783 order of functions with the same priority is not defined.
7785 If the third field is non-null, and points to a global variable
7786 or function, the destructor function will only run if the associated
7787 data from the current module is not discarded.
7788 On ELF the referenced global variable or function must be in a comdat.
7790 Instruction Reference
7791 =====================
7793 The LLVM instruction set consists of several different classifications
7794 of instructions: :ref:`terminator instructions <terminators>`, :ref:`binary
7795 instructions <binaryops>`, :ref:`bitwise binary
7796 instructions <bitwiseops>`, :ref:`memory instructions <memoryops>`, and
7797 :ref:`other instructions <otherops>`.
7801 Terminator Instructions
7802 -----------------------
7804 As mentioned :ref:`previously <functionstructure>`, every basic block in a
7805 program ends with a "Terminator" instruction, which indicates which
7806 block should be executed after the current block is finished. These
7807 terminator instructions typically yield a '``void``' value: they produce
7808 control flow, not values (the one exception being the
7809 ':ref:`invoke <i_invoke>`' instruction).
7811 The terminator instructions are: ':ref:`ret <i_ret>`',
7812 ':ref:`br <i_br>`', ':ref:`switch <i_switch>`',
7813 ':ref:`indirectbr <i_indirectbr>`', ':ref:`invoke <i_invoke>`',
7814 ':ref:`callbr <i_callbr>`'
7815 ':ref:`resume <i_resume>`', ':ref:`catchswitch <i_catchswitch>`',
7816 ':ref:`catchret <i_catchret>`',
7817 ':ref:`cleanupret <i_cleanupret>`',
7818 and ':ref:`unreachable <i_unreachable>`'.
7822 '``ret``' Instruction
7823 ^^^^^^^^^^^^^^^^^^^^^
7830 ret <type> <value> ; Return a value from a non-void function
7831 ret void ; Return from void function
7836 The '``ret``' instruction is used to return control flow (and optionally
7837 a value) from a function back to the caller.
7839 There are two forms of the '``ret``' instruction: one that returns a
7840 value and then causes control flow, and one that just causes control
7846 The '``ret``' instruction optionally accepts a single argument, the
7847 return value. The type of the return value must be a ':ref:`first
7848 class <t_firstclass>`' type.
7850 A function is not :ref:`well formed <wellformed>` if it has a non-void
7851 return type and contains a '``ret``' instruction with no return value or
7852 a return value with a type that does not match its type, or if it has a
7853 void return type and contains a '``ret``' instruction with a return
7859 When the '``ret``' instruction is executed, control flow returns back to
7860 the calling function's context. If the caller is a
7861 ":ref:`call <i_call>`" instruction, execution continues at the
7862 instruction after the call. If the caller was an
7863 ":ref:`invoke <i_invoke>`" instruction, execution continues at the
7864 beginning of the "normal" destination block. If the instruction returns
7865 a value, that value shall set the call or invoke instruction's return
7871 .. code-block:: llvm
7873 ret i32 5 ; Return an integer value of 5
7874 ret void ; Return from a void function
7875 ret { i32, i8 } { i32 4, i8 2 } ; Return a struct of values 4 and 2
7879 '``br``' Instruction
7880 ^^^^^^^^^^^^^^^^^^^^
7887 br i1 <cond>, label <iftrue>, label <iffalse>
7888 br label <dest> ; Unconditional branch
7893 The '``br``' instruction is used to cause control flow to transfer to a
7894 different basic block in the current function. There are two forms of
7895 this instruction, corresponding to a conditional branch and an
7896 unconditional branch.
7901 The conditional branch form of the '``br``' instruction takes a single
7902 '``i1``' value and two '``label``' values. The unconditional form of the
7903 '``br``' instruction takes a single '``label``' value as a target.
7908 Upon execution of a conditional '``br``' instruction, the '``i1``'
7909 argument is evaluated. If the value is ``true``, control flows to the
7910 '``iftrue``' ``label`` argument. If "cond" is ``false``, control flows
7911 to the '``iffalse``' ``label`` argument.
7912 If '``cond``' is ``poison`` or ``undef``, this instruction has undefined
7918 .. code-block:: llvm
7921 %cond = icmp eq i32 %a, %b
7922 br i1 %cond, label %IfEqual, label %IfUnequal
7930 '``switch``' Instruction
7931 ^^^^^^^^^^^^^^^^^^^^^^^^
7938 switch <intty> <value>, label <defaultdest> [ <intty> <val>, label <dest> ... ]
7943 The '``switch``' instruction is used to transfer control flow to one of
7944 several different places. It is a generalization of the '``br``'
7945 instruction, allowing a branch to occur to one of many possible
7951 The '``switch``' instruction uses three parameters: an integer
7952 comparison value '``value``', a default '``label``' destination, and an
7953 array of pairs of comparison value constants and '``label``'s. The table
7954 is not allowed to contain duplicate constant entries.
7959 The ``switch`` instruction specifies a table of values and destinations.
7960 When the '``switch``' instruction is executed, this table is searched
7961 for the given value. If the value is found, control flow is transferred
7962 to the corresponding destination; otherwise, control flow is transferred
7963 to the default destination.
7964 If '``value``' is ``poison`` or ``undef``, this instruction has undefined
7970 Depending on properties of the target machine and the particular
7971 ``switch`` instruction, this instruction may be code generated in
7972 different ways. For example, it could be generated as a series of
7973 chained conditional branches or with a lookup table.
7978 .. code-block:: llvm
7980 ; Emulate a conditional br instruction
7981 %Val = zext i1 %value to i32
7982 switch i32 %Val, label %truedest [ i32 0, label %falsedest ]
7984 ; Emulate an unconditional br instruction
7985 switch i32 0, label %dest [ ]
7987 ; Implement a jump table:
7988 switch i32 %val, label %otherwise [ i32 0, label %onzero
7990 i32 2, label %ontwo ]
7994 '``indirectbr``' Instruction
7995 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
8002 indirectbr <somety>* <address>, [ label <dest1>, label <dest2>, ... ]
8007 The '``indirectbr``' instruction implements an indirect branch to a
8008 label within the current function, whose address is specified by
8009 "``address``". Address must be derived from a
8010 :ref:`blockaddress <blockaddress>` constant.
8015 The '``address``' argument is the address of the label to jump to. The
8016 rest of the arguments indicate the full set of possible destinations
8017 that the address may point to. Blocks are allowed to occur multiple
8018 times in the destination list, though this isn't particularly useful.
8020 This destination list is required so that dataflow analysis has an
8021 accurate understanding of the CFG.
8026 Control transfers to the block specified in the address argument. All
8027 possible destination blocks must be listed in the label list, otherwise
8028 this instruction has undefined behavior. This implies that jumps to
8029 labels defined in other functions have undefined behavior as well.
8030 If '``address``' is ``poison`` or ``undef``, this instruction has undefined
8036 This is typically implemented with a jump through a register.
8041 .. code-block:: llvm
8043 indirectbr i8* %Addr, [ label %bb1, label %bb2, label %bb3 ]
8047 '``invoke``' Instruction
8048 ^^^^^^^^^^^^^^^^^^^^^^^^
8055 <result> = invoke [cconv] [ret attrs] [addrspace(<num>)] <ty>|<fnty> <fnptrval>(<function args>) [fn attrs]
8056 [operand bundles] to label <normal label> unwind label <exception label>
8061 The '``invoke``' instruction causes control to transfer to a specified
8062 function, with the possibility of control flow transfer to either the
8063 '``normal``' label or the '``exception``' label. If the callee function
8064 returns with the "``ret``" instruction, control flow will return to the
8065 "normal" label. If the callee (or any indirect callees) returns via the
8066 ":ref:`resume <i_resume>`" instruction or other exception handling
8067 mechanism, control is interrupted and continued at the dynamically
8068 nearest "exception" label.
8070 The '``exception``' label is a `landing
8071 pad <ExceptionHandling.html#overview>`_ for the exception. As such,
8072 '``exception``' label is required to have the
8073 ":ref:`landingpad <i_landingpad>`" instruction, which contains the
8074 information about the behavior of the program after unwinding happens,
8075 as its first non-PHI instruction. The restrictions on the
8076 "``landingpad``" instruction's tightly couples it to the "``invoke``"
8077 instruction, so that the important information contained within the
8078 "``landingpad``" instruction can't be lost through normal code motion.
8083 This instruction requires several arguments:
8085 #. The optional "cconv" marker indicates which :ref:`calling
8086 convention <callingconv>` the call should use. If none is
8087 specified, the call defaults to using C calling conventions.
8088 #. The optional :ref:`Parameter Attributes <paramattrs>` list for return
8089 values. Only '``zeroext``', '``signext``', and '``inreg``' attributes
8091 #. The optional addrspace attribute can be used to indicate the address space
8092 of the called function. If it is not specified, the program address space
8093 from the :ref:`datalayout string<langref_datalayout>` will be used.
8094 #. '``ty``': the type of the call instruction itself which is also the
8095 type of the return value. Functions that return no value are marked
8097 #. '``fnty``': shall be the signature of the function being invoked. The
8098 argument types must match the types implied by this signature. This
8099 type can be omitted if the function is not varargs.
8100 #. '``fnptrval``': An LLVM value containing a pointer to a function to
8101 be invoked. In most cases, this is a direct function invocation, but
8102 indirect ``invoke``'s are just as possible, calling an arbitrary pointer
8104 #. '``function args``': argument list whose types match the function
8105 signature argument types and parameter attributes. All arguments must
8106 be of :ref:`first class <t_firstclass>` type. If the function signature
8107 indicates the function accepts a variable number of arguments, the
8108 extra arguments can be specified.
8109 #. '``normal label``': the label reached when the called function
8110 executes a '``ret``' instruction.
8111 #. '``exception label``': the label reached when a callee returns via
8112 the :ref:`resume <i_resume>` instruction or other exception handling
8114 #. The optional :ref:`function attributes <fnattrs>` list.
8115 #. The optional :ref:`operand bundles <opbundles>` list.
8120 This instruction is designed to operate as a standard '``call``'
8121 instruction in most regards. The primary difference is that it
8122 establishes an association with a label, which is used by the runtime
8123 library to unwind the stack.
8125 This instruction is used in languages with destructors to ensure that
8126 proper cleanup is performed in the case of either a ``longjmp`` or a
8127 thrown exception. Additionally, this is important for implementation of
8128 '``catch``' clauses in high-level languages that support them.
8130 For the purposes of the SSA form, the definition of the value returned
8131 by the '``invoke``' instruction is deemed to occur on the edge from the
8132 current block to the "normal" label. If the callee unwinds then no
8133 return value is available.
8138 .. code-block:: llvm
8140 %retval = invoke i32 @Test(i32 15) to label %Continue
8141 unwind label %TestCleanup ; i32:retval set
8142 %retval = invoke coldcc i32 %Testfnptr(i32 15) to label %Continue
8143 unwind label %TestCleanup ; i32:retval set
8147 '``callbr``' Instruction
8148 ^^^^^^^^^^^^^^^^^^^^^^^^
8155 <result> = callbr [cconv] [ret attrs] [addrspace(<num>)] <ty>|<fnty> <fnptrval>(<function args>) [fn attrs]
8156 [operand bundles] to label <fallthrough label> [indirect labels]
8161 The '``callbr``' instruction causes control to transfer to a specified
8162 function, with the possibility of control flow transfer to either the
8163 '``fallthrough``' label or one of the '``indirect``' labels.
8165 This instruction should only be used to implement the "goto" feature of gcc
8166 style inline assembly. Any other usage is an error in the IR verifier.
8171 This instruction requires several arguments:
8173 #. The optional "cconv" marker indicates which :ref:`calling
8174 convention <callingconv>` the call should use. If none is
8175 specified, the call defaults to using C calling conventions.
8176 #. The optional :ref:`Parameter Attributes <paramattrs>` list for return
8177 values. Only '``zeroext``', '``signext``', and '``inreg``' attributes
8179 #. The optional addrspace attribute can be used to indicate the address space
8180 of the called function. If it is not specified, the program address space
8181 from the :ref:`datalayout string<langref_datalayout>` will be used.
8182 #. '``ty``': the type of the call instruction itself which is also the
8183 type of the return value. Functions that return no value are marked
8185 #. '``fnty``': shall be the signature of the function being called. The
8186 argument types must match the types implied by this signature. This
8187 type can be omitted if the function is not varargs.
8188 #. '``fnptrval``': An LLVM value containing a pointer to a function to
8189 be called. In most cases, this is a direct function call, but
8190 other ``callbr``'s are just as possible, calling an arbitrary pointer
8192 #. '``function args``': argument list whose types match the function
8193 signature argument types and parameter attributes. All arguments must
8194 be of :ref:`first class <t_firstclass>` type. If the function signature
8195 indicates the function accepts a variable number of arguments, the
8196 extra arguments can be specified.
8197 #. '``fallthrough label``': the label reached when the inline assembly's
8198 execution exits the bottom.
8199 #. '``indirect labels``': the labels reached when a callee transfers control
8200 to a location other than the '``fallthrough label``'. The blockaddress
8201 constant for these should also be in the list of '``function args``'.
8202 #. The optional :ref:`function attributes <fnattrs>` list.
8203 #. The optional :ref:`operand bundles <opbundles>` list.
8208 This instruction is designed to operate as a standard '``call``'
8209 instruction in most regards. The primary difference is that it
8210 establishes an association with additional labels to define where control
8211 flow goes after the call.
8213 The output values of a '``callbr``' instruction are available only to
8214 the '``fallthrough``' block, not to any '``indirect``' blocks(s).
8216 The only use of this today is to implement the "goto" feature of gcc inline
8217 assembly where additional labels can be provided as locations for the inline
8218 assembly to jump to.
8223 .. code-block:: llvm
8225 ; "asm goto" without output constraints.
8226 callbr void asm "", "r,X"(i32 %x, i8 *blockaddress(@foo, %indirect))
8227 to label %fallthrough [label %indirect]
8229 ; "asm goto" with output constraints.
8230 <result> = callbr i32 asm "", "=r,r,X"(i32 %x, i8 *blockaddress(@foo, %indirect))
8231 to label %fallthrough [label %indirect]
8235 '``resume``' Instruction
8236 ^^^^^^^^^^^^^^^^^^^^^^^^
8243 resume <type> <value>
8248 The '``resume``' instruction is a terminator instruction that has no
8254 The '``resume``' instruction requires one argument, which must have the
8255 same type as the result of any '``landingpad``' instruction in the same
8261 The '``resume``' instruction resumes propagation of an existing
8262 (in-flight) exception whose unwinding was interrupted with a
8263 :ref:`landingpad <i_landingpad>` instruction.
8268 .. code-block:: llvm
8270 resume { i8*, i32 } %exn
8274 '``catchswitch``' Instruction
8275 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
8282 <resultval> = catchswitch within <parent> [ label <handler1>, label <handler2>, ... ] unwind to caller
8283 <resultval> = catchswitch within <parent> [ label <handler1>, label <handler2>, ... ] unwind label <default>
8288 The '``catchswitch``' instruction is used by `LLVM's exception handling system
8289 <ExceptionHandling.html#overview>`_ to describe the set of possible catch handlers
8290 that may be executed by the :ref:`EH personality routine <personalityfn>`.
8295 The ``parent`` argument is the token of the funclet that contains the
8296 ``catchswitch`` instruction. If the ``catchswitch`` is not inside a funclet,
8297 this operand may be the token ``none``.
8299 The ``default`` argument is the label of another basic block beginning with
8300 either a ``cleanuppad`` or ``catchswitch`` instruction. This unwind destination
8301 must be a legal target with respect to the ``parent`` links, as described in
8302 the `exception handling documentation\ <ExceptionHandling.html#wineh-constraints>`_.
8304 The ``handlers`` are a nonempty list of successor blocks that each begin with a
8305 :ref:`catchpad <i_catchpad>` instruction.
8310 Executing this instruction transfers control to one of the successors in
8311 ``handlers``, if appropriate, or continues to unwind via the unwind label if
8314 The ``catchswitch`` is both a terminator and a "pad" instruction, meaning that
8315 it must be both the first non-phi instruction and last instruction in the basic
8316 block. Therefore, it must be the only non-phi instruction in the block.
8321 .. code-block:: text
8324 %cs1 = catchswitch within none [label %handler0, label %handler1] unwind to caller
8326 %cs2 = catchswitch within %parenthandler [label %handler0] unwind label %cleanup
8330 '``catchret``' Instruction
8331 ^^^^^^^^^^^^^^^^^^^^^^^^^^
8338 catchret from <token> to label <normal>
8343 The '``catchret``' instruction is a terminator instruction that has a
8350 The first argument to a '``catchret``' indicates which ``catchpad`` it
8351 exits. It must be a :ref:`catchpad <i_catchpad>`.
8352 The second argument to a '``catchret``' specifies where control will
8358 The '``catchret``' instruction ends an existing (in-flight) exception whose
8359 unwinding was interrupted with a :ref:`catchpad <i_catchpad>` instruction. The
8360 :ref:`personality function <personalityfn>` gets a chance to execute arbitrary
8361 code to, for example, destroy the active exception. Control then transfers to
8364 The ``token`` argument must be a token produced by a ``catchpad`` instruction.
8365 If the specified ``catchpad`` is not the most-recently-entered not-yet-exited
8366 funclet pad (as described in the `EH documentation\ <ExceptionHandling.html#wineh-constraints>`_),
8367 the ``catchret``'s behavior is undefined.
8372 .. code-block:: text
8374 catchret from %catch label %continue
8378 '``cleanupret``' Instruction
8379 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
8386 cleanupret from <value> unwind label <continue>
8387 cleanupret from <value> unwind to caller
8392 The '``cleanupret``' instruction is a terminator instruction that has
8393 an optional successor.
8399 The '``cleanupret``' instruction requires one argument, which indicates
8400 which ``cleanuppad`` it exits, and must be a :ref:`cleanuppad <i_cleanuppad>`.
8401 If the specified ``cleanuppad`` is not the most-recently-entered not-yet-exited
8402 funclet pad (as described in the `EH documentation\ <ExceptionHandling.html#wineh-constraints>`_),
8403 the ``cleanupret``'s behavior is undefined.
8405 The '``cleanupret``' instruction also has an optional successor, ``continue``,
8406 which must be the label of another basic block beginning with either a
8407 ``cleanuppad`` or ``catchswitch`` instruction. This unwind destination must
8408 be a legal target with respect to the ``parent`` links, as described in the
8409 `exception handling documentation\ <ExceptionHandling.html#wineh-constraints>`_.
8414 The '``cleanupret``' instruction indicates to the
8415 :ref:`personality function <personalityfn>` that one
8416 :ref:`cleanuppad <i_cleanuppad>` it transferred control to has ended.
8417 It transfers control to ``continue`` or unwinds out of the function.
8422 .. code-block:: text
8424 cleanupret from %cleanup unwind to caller
8425 cleanupret from %cleanup unwind label %continue
8429 '``unreachable``' Instruction
8430 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
8442 The '``unreachable``' instruction has no defined semantics. This
8443 instruction is used to inform the optimizer that a particular portion of
8444 the code is not reachable. This can be used to indicate that the code
8445 after a no-return function cannot be reached, and other facts.
8450 The '``unreachable``' instruction has no defined semantics.
8457 Unary operators require a single operand, execute an operation on
8458 it, and produce a single value. The operand might represent multiple
8459 data, as is the case with the :ref:`vector <t_vector>` data type. The
8460 result value has the same type as its operand.
8464 '``fneg``' Instruction
8465 ^^^^^^^^^^^^^^^^^^^^^^
8472 <result> = fneg [fast-math flags]* <ty> <op1> ; yields ty:result
8477 The '``fneg``' instruction returns the negation of its operand.
8482 The argument to the '``fneg``' instruction must be a
8483 :ref:`floating-point <t_floating>` or :ref:`vector <t_vector>` of
8484 floating-point values.
8489 The value produced is a copy of the operand with its sign bit flipped.
8490 This instruction can also take any number of :ref:`fast-math
8491 flags <fastmath>`, which are optimization hints to enable otherwise
8492 unsafe floating-point optimizations:
8497 .. code-block:: text
8499 <result> = fneg float %val ; yields float:result = -%var
8506 Binary operators are used to do most of the computation in a program.
8507 They require two operands of the same type, execute an operation on
8508 them, and produce a single value. The operands might represent multiple
8509 data, as is the case with the :ref:`vector <t_vector>` data type. The
8510 result value has the same type as its operands.
8512 There are several different binary operators:
8516 '``add``' Instruction
8517 ^^^^^^^^^^^^^^^^^^^^^
8524 <result> = add <ty> <op1>, <op2> ; yields ty:result
8525 <result> = add nuw <ty> <op1>, <op2> ; yields ty:result
8526 <result> = add nsw <ty> <op1>, <op2> ; yields ty:result
8527 <result> = add nuw nsw <ty> <op1>, <op2> ; yields ty:result
8532 The '``add``' instruction returns the sum of its two operands.
8537 The two arguments to the '``add``' instruction must be
8538 :ref:`integer <t_integer>` or :ref:`vector <t_vector>` of integer values. Both
8539 arguments must have identical types.
8544 The value produced is the integer sum of the two operands.
8546 If the sum has unsigned overflow, the result returned is the
8547 mathematical result modulo 2\ :sup:`n`\ , where n is the bit width of
8550 Because LLVM integers use a two's complement representation, this
8551 instruction is appropriate for both signed and unsigned integers.
8553 ``nuw`` and ``nsw`` stand for "No Unsigned Wrap" and "No Signed Wrap",
8554 respectively. If the ``nuw`` and/or ``nsw`` keywords are present, the
8555 result value of the ``add`` is a :ref:`poison value <poisonvalues>` if
8556 unsigned and/or signed overflow, respectively, occurs.
8561 .. code-block:: text
8563 <result> = add i32 4, %var ; yields i32:result = 4 + %var
8567 '``fadd``' Instruction
8568 ^^^^^^^^^^^^^^^^^^^^^^
8575 <result> = fadd [fast-math flags]* <ty> <op1>, <op2> ; yields ty:result
8580 The '``fadd``' instruction returns the sum of its two operands.
8585 The two arguments to the '``fadd``' instruction must be
8586 :ref:`floating-point <t_floating>` or :ref:`vector <t_vector>` of
8587 floating-point values. Both arguments must have identical types.
8592 The value produced is the floating-point sum of the two operands.
8593 This instruction is assumed to execute in the default :ref:`floating-point
8594 environment <floatenv>`.
8595 This instruction can also take any number of :ref:`fast-math
8596 flags <fastmath>`, which are optimization hints to enable otherwise
8597 unsafe floating-point optimizations:
8602 .. code-block:: text
8604 <result> = fadd float 4.0, %var ; yields float:result = 4.0 + %var
8608 '``sub``' Instruction
8609 ^^^^^^^^^^^^^^^^^^^^^
8616 <result> = sub <ty> <op1>, <op2> ; yields ty:result
8617 <result> = sub nuw <ty> <op1>, <op2> ; yields ty:result
8618 <result> = sub nsw <ty> <op1>, <op2> ; yields ty:result
8619 <result> = sub nuw nsw <ty> <op1>, <op2> ; yields ty:result
8624 The '``sub``' instruction returns the difference of its two operands.
8626 Note that the '``sub``' instruction is used to represent the '``neg``'
8627 instruction present in most other intermediate representations.
8632 The two arguments to the '``sub``' instruction must be
8633 :ref:`integer <t_integer>` or :ref:`vector <t_vector>` of integer values. Both
8634 arguments must have identical types.
8639 The value produced is the integer difference of the two operands.
8641 If the difference has unsigned overflow, the result returned is the
8642 mathematical result modulo 2\ :sup:`n`\ , where n is the bit width of
8645 Because LLVM integers use a two's complement representation, this
8646 instruction is appropriate for both signed and unsigned integers.
8648 ``nuw`` and ``nsw`` stand for "No Unsigned Wrap" and "No Signed Wrap",
8649 respectively. If the ``nuw`` and/or ``nsw`` keywords are present, the
8650 result value of the ``sub`` is a :ref:`poison value <poisonvalues>` if
8651 unsigned and/or signed overflow, respectively, occurs.
8656 .. code-block:: text
8658 <result> = sub i32 4, %var ; yields i32:result = 4 - %var
8659 <result> = sub i32 0, %val ; yields i32:result = -%var
8663 '``fsub``' Instruction
8664 ^^^^^^^^^^^^^^^^^^^^^^
8671 <result> = fsub [fast-math flags]* <ty> <op1>, <op2> ; yields ty:result
8676 The '``fsub``' instruction returns the difference of its two operands.
8681 The two arguments to the '``fsub``' instruction must be
8682 :ref:`floating-point <t_floating>` or :ref:`vector <t_vector>` of
8683 floating-point values. Both arguments must have identical types.
8688 The value produced is the floating-point difference of the two operands.
8689 This instruction is assumed to execute in the default :ref:`floating-point
8690 environment <floatenv>`.
8691 This instruction can also take any number of :ref:`fast-math
8692 flags <fastmath>`, which are optimization hints to enable otherwise
8693 unsafe floating-point optimizations:
8698 .. code-block:: text
8700 <result> = fsub float 4.0, %var ; yields float:result = 4.0 - %var
8701 <result> = fsub float -0.0, %val ; yields float:result = -%var
8705 '``mul``' Instruction
8706 ^^^^^^^^^^^^^^^^^^^^^
8713 <result> = mul <ty> <op1>, <op2> ; yields ty:result
8714 <result> = mul nuw <ty> <op1>, <op2> ; yields ty:result
8715 <result> = mul nsw <ty> <op1>, <op2> ; yields ty:result
8716 <result> = mul nuw nsw <ty> <op1>, <op2> ; yields ty:result
8721 The '``mul``' instruction returns the product of its two operands.
8726 The two arguments to the '``mul``' instruction must be
8727 :ref:`integer <t_integer>` or :ref:`vector <t_vector>` of integer values. Both
8728 arguments must have identical types.
8733 The value produced is the integer product of the two operands.
8735 If the result of the multiplication has unsigned overflow, the result
8736 returned is the mathematical result modulo 2\ :sup:`n`\ , where n is the
8737 bit width of the result.
8739 Because LLVM integers use a two's complement representation, and the
8740 result is the same width as the operands, this instruction returns the
8741 correct result for both signed and unsigned integers. If a full product
8742 (e.g. ``i32`` * ``i32`` -> ``i64``) is needed, the operands should be
8743 sign-extended or zero-extended as appropriate to the width of the full
8746 ``nuw`` and ``nsw`` stand for "No Unsigned Wrap" and "No Signed Wrap",
8747 respectively. If the ``nuw`` and/or ``nsw`` keywords are present, the
8748 result value of the ``mul`` is a :ref:`poison value <poisonvalues>` if
8749 unsigned and/or signed overflow, respectively, occurs.
8754 .. code-block:: text
8756 <result> = mul i32 4, %var ; yields i32:result = 4 * %var
8760 '``fmul``' Instruction
8761 ^^^^^^^^^^^^^^^^^^^^^^
8768 <result> = fmul [fast-math flags]* <ty> <op1>, <op2> ; yields ty:result
8773 The '``fmul``' instruction returns the product of its two operands.
8778 The two arguments to the '``fmul``' instruction must be
8779 :ref:`floating-point <t_floating>` or :ref:`vector <t_vector>` of
8780 floating-point values. Both arguments must have identical types.
8785 The value produced is the floating-point product of the two operands.
8786 This instruction is assumed to execute in the default :ref:`floating-point
8787 environment <floatenv>`.
8788 This instruction can also take any number of :ref:`fast-math
8789 flags <fastmath>`, which are optimization hints to enable otherwise
8790 unsafe floating-point optimizations:
8795 .. code-block:: text
8797 <result> = fmul float 4.0, %var ; yields float:result = 4.0 * %var
8801 '``udiv``' Instruction
8802 ^^^^^^^^^^^^^^^^^^^^^^
8809 <result> = udiv <ty> <op1>, <op2> ; yields ty:result
8810 <result> = udiv exact <ty> <op1>, <op2> ; yields ty:result
8815 The '``udiv``' instruction returns the quotient of its two operands.
8820 The two arguments to the '``udiv``' instruction must be
8821 :ref:`integer <t_integer>` or :ref:`vector <t_vector>` of integer values. Both
8822 arguments must have identical types.
8827 The value produced is the unsigned integer quotient of the two operands.
8829 Note that unsigned integer division and signed integer division are
8830 distinct operations; for signed integer division, use '``sdiv``'.
8832 Division by zero is undefined behavior. For vectors, if any element
8833 of the divisor is zero, the operation has undefined behavior.
8836 If the ``exact`` keyword is present, the result value of the ``udiv`` is
8837 a :ref:`poison value <poisonvalues>` if %op1 is not a multiple of %op2 (as
8838 such, "((a udiv exact b) mul b) == a").
8843 .. code-block:: text
8845 <result> = udiv i32 4, %var ; yields i32:result = 4 / %var
8849 '``sdiv``' Instruction
8850 ^^^^^^^^^^^^^^^^^^^^^^
8857 <result> = sdiv <ty> <op1>, <op2> ; yields ty:result
8858 <result> = sdiv exact <ty> <op1>, <op2> ; yields ty:result
8863 The '``sdiv``' instruction returns the quotient of its two operands.
8868 The two arguments to the '``sdiv``' instruction must be
8869 :ref:`integer <t_integer>` or :ref:`vector <t_vector>` of integer values. Both
8870 arguments must have identical types.
8875 The value produced is the signed integer quotient of the two operands
8876 rounded towards zero.
8878 Note that signed integer division and unsigned integer division are
8879 distinct operations; for unsigned integer division, use '``udiv``'.
8881 Division by zero is undefined behavior. For vectors, if any element
8882 of the divisor is zero, the operation has undefined behavior.
8883 Overflow also leads to undefined behavior; this is a rare case, but can
8884 occur, for example, by doing a 32-bit division of -2147483648 by -1.
8886 If the ``exact`` keyword is present, the result value of the ``sdiv`` is
8887 a :ref:`poison value <poisonvalues>` if the result would be rounded.
8892 .. code-block:: text
8894 <result> = sdiv i32 4, %var ; yields i32:result = 4 / %var
8898 '``fdiv``' Instruction
8899 ^^^^^^^^^^^^^^^^^^^^^^
8906 <result> = fdiv [fast-math flags]* <ty> <op1>, <op2> ; yields ty:result
8911 The '``fdiv``' instruction returns the quotient of its two operands.
8916 The two arguments to the '``fdiv``' instruction must be
8917 :ref:`floating-point <t_floating>` or :ref:`vector <t_vector>` of
8918 floating-point values. Both arguments must have identical types.
8923 The value produced is the floating-point quotient of the two operands.
8924 This instruction is assumed to execute in the default :ref:`floating-point
8925 environment <floatenv>`.
8926 This instruction can also take any number of :ref:`fast-math
8927 flags <fastmath>`, which are optimization hints to enable otherwise
8928 unsafe floating-point optimizations:
8933 .. code-block:: text
8935 <result> = fdiv float 4.0, %var ; yields float:result = 4.0 / %var
8939 '``urem``' Instruction
8940 ^^^^^^^^^^^^^^^^^^^^^^
8947 <result> = urem <ty> <op1>, <op2> ; yields ty:result
8952 The '``urem``' instruction returns the remainder from the unsigned
8953 division of its two arguments.
8958 The two arguments to the '``urem``' instruction must be
8959 :ref:`integer <t_integer>` or :ref:`vector <t_vector>` of integer values. Both
8960 arguments must have identical types.
8965 This instruction returns the unsigned integer *remainder* of a division.
8966 This instruction always performs an unsigned division to get the
8969 Note that unsigned integer remainder and signed integer remainder are
8970 distinct operations; for signed integer remainder, use '``srem``'.
8972 Taking the remainder of a division by zero is undefined behavior.
8973 For vectors, if any element of the divisor is zero, the operation has
8979 .. code-block:: text
8981 <result> = urem i32 4, %var ; yields i32:result = 4 % %var
8985 '``srem``' Instruction
8986 ^^^^^^^^^^^^^^^^^^^^^^
8993 <result> = srem <ty> <op1>, <op2> ; yields ty:result
8998 The '``srem``' instruction returns the remainder from the signed
8999 division of its two operands. This instruction can also take
9000 :ref:`vector <t_vector>` versions of the values in which case the elements
9006 The two arguments to the '``srem``' instruction must be
9007 :ref:`integer <t_integer>` or :ref:`vector <t_vector>` of integer values. Both
9008 arguments must have identical types.
9013 This instruction returns the *remainder* of a division (where the result
9014 is either zero or has the same sign as the dividend, ``op1``), not the
9015 *modulo* operator (where the result is either zero or has the same sign
9016 as the divisor, ``op2``) of a value. For more information about the
9017 difference, see `The Math
9018 Forum <http://mathforum.org/dr.math/problems/anne.4.28.99.html>`_. For a
9019 table of how this is implemented in various languages, please see
9021 operation <http://en.wikipedia.org/wiki/Modulo_operation>`_.
9023 Note that signed integer remainder and unsigned integer remainder are
9024 distinct operations; for unsigned integer remainder, use '``urem``'.
9026 Taking the remainder of a division by zero is undefined behavior.
9027 For vectors, if any element of the divisor is zero, the operation has
9029 Overflow also leads to undefined behavior; this is a rare case, but can
9030 occur, for example, by taking the remainder of a 32-bit division of
9031 -2147483648 by -1. (The remainder doesn't actually overflow, but this
9032 rule lets srem be implemented using instructions that return both the
9033 result of the division and the remainder.)
9038 .. code-block:: text
9040 <result> = srem i32 4, %var ; yields i32:result = 4 % %var
9044 '``frem``' Instruction
9045 ^^^^^^^^^^^^^^^^^^^^^^
9052 <result> = frem [fast-math flags]* <ty> <op1>, <op2> ; yields ty:result
9057 The '``frem``' instruction returns the remainder from the division of
9063 The two arguments to the '``frem``' instruction must be
9064 :ref:`floating-point <t_floating>` or :ref:`vector <t_vector>` of
9065 floating-point values. Both arguments must have identical types.
9070 The value produced is the floating-point remainder of the two operands.
9071 This is the same output as a libm '``fmod``' function, but without any
9072 possibility of setting ``errno``. The remainder has the same sign as the
9074 This instruction is assumed to execute in the default :ref:`floating-point
9075 environment <floatenv>`.
9076 This instruction can also take any number of :ref:`fast-math
9077 flags <fastmath>`, which are optimization hints to enable otherwise
9078 unsafe floating-point optimizations:
9083 .. code-block:: text
9085 <result> = frem float 4.0, %var ; yields float:result = 4.0 % %var
9089 Bitwise Binary Operations
9090 -------------------------
9092 Bitwise binary operators are used to do various forms of bit-twiddling
9093 in a program. They are generally very efficient instructions and can
9094 commonly be strength reduced from other instructions. They require two
9095 operands of the same type, execute an operation on them, and produce a
9096 single value. The resulting value is the same type as its operands.
9100 '``shl``' Instruction
9101 ^^^^^^^^^^^^^^^^^^^^^
9108 <result> = shl <ty> <op1>, <op2> ; yields ty:result
9109 <result> = shl nuw <ty> <op1>, <op2> ; yields ty:result
9110 <result> = shl nsw <ty> <op1>, <op2> ; yields ty:result
9111 <result> = shl nuw nsw <ty> <op1>, <op2> ; yields ty:result
9116 The '``shl``' instruction returns the first operand shifted to the left
9117 a specified number of bits.
9122 Both arguments to the '``shl``' instruction must be the same
9123 :ref:`integer <t_integer>` or :ref:`vector <t_vector>` of integer type.
9124 '``op2``' is treated as an unsigned value.
9129 The value produced is ``op1`` \* 2\ :sup:`op2` mod 2\ :sup:`n`,
9130 where ``n`` is the width of the result. If ``op2`` is (statically or
9131 dynamically) equal to or larger than the number of bits in
9132 ``op1``, this instruction returns a :ref:`poison value <poisonvalues>`.
9133 If the arguments are vectors, each vector element of ``op1`` is shifted
9134 by the corresponding shift amount in ``op2``.
9136 If the ``nuw`` keyword is present, then the shift produces a poison
9137 value if it shifts out any non-zero bits.
9138 If the ``nsw`` keyword is present, then the shift produces a poison
9139 value if it shifts out any bits that disagree with the resultant sign bit.
9144 .. code-block:: text
9146 <result> = shl i32 4, %var ; yields i32: 4 << %var
9147 <result> = shl i32 4, 2 ; yields i32: 16
9148 <result> = shl i32 1, 10 ; yields i32: 1024
9149 <result> = shl i32 1, 32 ; undefined
9150 <result> = shl <2 x i32> < i32 1, i32 1>, < i32 1, i32 2> ; yields: result=<2 x i32> < i32 2, i32 4>
9155 '``lshr``' Instruction
9156 ^^^^^^^^^^^^^^^^^^^^^^
9163 <result> = lshr <ty> <op1>, <op2> ; yields ty:result
9164 <result> = lshr exact <ty> <op1>, <op2> ; yields ty:result
9169 The '``lshr``' instruction (logical shift right) returns the first
9170 operand shifted to the right a specified number of bits with zero fill.
9175 Both arguments to the '``lshr``' instruction must be the same
9176 :ref:`integer <t_integer>` or :ref:`vector <t_vector>` of integer type.
9177 '``op2``' is treated as an unsigned value.
9182 This instruction always performs a logical shift right operation. The
9183 most significant bits of the result will be filled with zero bits after
9184 the shift. If ``op2`` is (statically or dynamically) equal to or larger
9185 than the number of bits in ``op1``, this instruction returns a :ref:`poison
9186 value <poisonvalues>`. If the arguments are vectors, each vector element
9187 of ``op1`` is shifted by the corresponding shift amount in ``op2``.
9189 If the ``exact`` keyword is present, the result value of the ``lshr`` is
9190 a poison value if any of the bits shifted out are non-zero.
9195 .. code-block:: text
9197 <result> = lshr i32 4, 1 ; yields i32:result = 2
9198 <result> = lshr i32 4, 2 ; yields i32:result = 1
9199 <result> = lshr i8 4, 3 ; yields i8:result = 0
9200 <result> = lshr i8 -2, 1 ; yields i8:result = 0x7F
9201 <result> = lshr i32 1, 32 ; undefined
9202 <result> = lshr <2 x i32> < i32 -2, i32 4>, < i32 1, i32 2> ; yields: result=<2 x i32> < i32 0x7FFFFFFF, i32 1>
9206 '``ashr``' Instruction
9207 ^^^^^^^^^^^^^^^^^^^^^^
9214 <result> = ashr <ty> <op1>, <op2> ; yields ty:result
9215 <result> = ashr exact <ty> <op1>, <op2> ; yields ty:result
9220 The '``ashr``' instruction (arithmetic shift right) returns the first
9221 operand shifted to the right a specified number of bits with sign
9227 Both arguments to the '``ashr``' instruction must be the same
9228 :ref:`integer <t_integer>` or :ref:`vector <t_vector>` of integer type.
9229 '``op2``' is treated as an unsigned value.
9234 This instruction always performs an arithmetic shift right operation,
9235 The most significant bits of the result will be filled with the sign bit
9236 of ``op1``. If ``op2`` is (statically or dynamically) equal to or larger
9237 than the number of bits in ``op1``, this instruction returns a :ref:`poison
9238 value <poisonvalues>`. If the arguments are vectors, each vector element
9239 of ``op1`` is shifted by the corresponding shift amount in ``op2``.
9241 If the ``exact`` keyword is present, the result value of the ``ashr`` is
9242 a poison value if any of the bits shifted out are non-zero.
9247 .. code-block:: text
9249 <result> = ashr i32 4, 1 ; yields i32:result = 2
9250 <result> = ashr i32 4, 2 ; yields i32:result = 1
9251 <result> = ashr i8 4, 3 ; yields i8:result = 0
9252 <result> = ashr i8 -2, 1 ; yields i8:result = -1
9253 <result> = ashr i32 1, 32 ; undefined
9254 <result> = ashr <2 x i32> < i32 -2, i32 4>, < i32 1, i32 3> ; yields: result=<2 x i32> < i32 -1, i32 0>
9258 '``and``' Instruction
9259 ^^^^^^^^^^^^^^^^^^^^^
9266 <result> = and <ty> <op1>, <op2> ; yields ty:result
9271 The '``and``' instruction returns the bitwise logical and of its two
9277 The two arguments to the '``and``' instruction must be
9278 :ref:`integer <t_integer>` or :ref:`vector <t_vector>` of integer values. Both
9279 arguments must have identical types.
9284 The truth table used for the '``and``' instruction is:
9301 .. code-block:: text
9303 <result> = and i32 4, %var ; yields i32:result = 4 & %var
9304 <result> = and i32 15, 40 ; yields i32:result = 8
9305 <result> = and i32 4, 8 ; yields i32:result = 0
9309 '``or``' Instruction
9310 ^^^^^^^^^^^^^^^^^^^^
9317 <result> = or <ty> <op1>, <op2> ; yields ty:result
9322 The '``or``' instruction returns the bitwise logical inclusive or of its
9328 The two arguments to the '``or``' instruction must be
9329 :ref:`integer <t_integer>` or :ref:`vector <t_vector>` of integer values. Both
9330 arguments must have identical types.
9335 The truth table used for the '``or``' instruction is:
9354 <result> = or i32 4, %var ; yields i32:result = 4 | %var
9355 <result> = or i32 15, 40 ; yields i32:result = 47
9356 <result> = or i32 4, 8 ; yields i32:result = 12
9360 '``xor``' Instruction
9361 ^^^^^^^^^^^^^^^^^^^^^
9368 <result> = xor <ty> <op1>, <op2> ; yields ty:result
9373 The '``xor``' instruction returns the bitwise logical exclusive or of
9374 its two operands. The ``xor`` is used to implement the "one's
9375 complement" operation, which is the "~" operator in C.
9380 The two arguments to the '``xor``' instruction must be
9381 :ref:`integer <t_integer>` or :ref:`vector <t_vector>` of integer values. Both
9382 arguments must have identical types.
9387 The truth table used for the '``xor``' instruction is:
9404 .. code-block:: text
9406 <result> = xor i32 4, %var ; yields i32:result = 4 ^ %var
9407 <result> = xor i32 15, 40 ; yields i32:result = 39
9408 <result> = xor i32 4, 8 ; yields i32:result = 12
9409 <result> = xor i32 %V, -1 ; yields i32:result = ~%V
9414 LLVM supports several instructions to represent vector operations in a
9415 target-independent manner. These instructions cover the element-access
9416 and vector-specific operations needed to process vectors effectively.
9417 While LLVM does directly support these vector operations, many
9418 sophisticated algorithms will want to use target-specific intrinsics to
9419 take full advantage of a specific target.
9421 .. _i_extractelement:
9423 '``extractelement``' Instruction
9424 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
9431 <result> = extractelement <n x <ty>> <val>, <ty2> <idx> ; yields <ty>
9432 <result> = extractelement <vscale x n x <ty>> <val>, <ty2> <idx> ; yields <ty>
9437 The '``extractelement``' instruction extracts a single scalar element
9438 from a vector at a specified index.
9443 The first operand of an '``extractelement``' instruction is a value of
9444 :ref:`vector <t_vector>` type. The second operand is an index indicating
9445 the position from which to extract the element. The index may be a
9446 variable of any integer type.
9451 The result is a scalar of the same type as the element type of ``val``.
9452 Its value is the value at position ``idx`` of ``val``. If ``idx``
9453 exceeds the length of ``val`` for a fixed-length vector, the result is a
9454 :ref:`poison value <poisonvalues>`. For a scalable vector, if the value
9455 of ``idx`` exceeds the runtime length of the vector, the result is a
9456 :ref:`poison value <poisonvalues>`.
9461 .. code-block:: text
9463 <result> = extractelement <4 x i32> %vec, i32 0 ; yields i32
9465 .. _i_insertelement:
9467 '``insertelement``' Instruction
9468 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
9475 <result> = insertelement <n x <ty>> <val>, <ty> <elt>, <ty2> <idx> ; yields <n x <ty>>
9476 <result> = insertelement <vscale x n x <ty>> <val>, <ty> <elt>, <ty2> <idx> ; yields <vscale x n x <ty>>
9481 The '``insertelement``' instruction inserts a scalar element into a
9482 vector at a specified index.
9487 The first operand of an '``insertelement``' instruction is a value of
9488 :ref:`vector <t_vector>` type. The second operand is a scalar value whose
9489 type must equal the element type of the first operand. The third operand
9490 is an index indicating the position at which to insert the value. The
9491 index may be a variable of any integer type.
9496 The result is a vector of the same type as ``val``. Its element values
9497 are those of ``val`` except at position ``idx``, where it gets the value
9498 ``elt``. If ``idx`` exceeds the length of ``val`` for a fixed-length vector,
9499 the result is a :ref:`poison value <poisonvalues>`. For a scalable vector,
9500 if the value of ``idx`` exceeds the runtime length of the vector, the result
9501 is a :ref:`poison value <poisonvalues>`.
9506 .. code-block:: text
9508 <result> = insertelement <4 x i32> %vec, i32 1, i32 0 ; yields <4 x i32>
9510 .. _i_shufflevector:
9512 '``shufflevector``' Instruction
9513 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
9520 <result> = shufflevector <n x <ty>> <v1>, <n x <ty>> <v2>, <m x i32> <mask> ; yields <m x <ty>>
9521 <result> = shufflevector <vscale x n x <ty>> <v1>, <vscale x n x <ty>> v2, <vscale x m x i32> <mask> ; yields <vscale x m x <ty>>
9526 The '``shufflevector``' instruction constructs a permutation of elements
9527 from two input vectors, returning a vector with the same element type as
9528 the input and length that is the same as the shuffle mask.
9533 The first two operands of a '``shufflevector``' instruction are vectors
9534 with the same type. The third argument is a shuffle mask vector constant
9535 whose element type is ``i32``. The mask vector elements must be constant
9536 integers or ``undef`` values. The result of the instruction is a vector
9537 whose length is the same as the shuffle mask and whose element type is the
9538 same as the element type of the first two operands.
9543 The elements of the two input vectors are numbered from left to right
9544 across both of the vectors. For each element of the result vector, the
9545 shuffle mask selects an element from one of the input vectors to copy
9546 to the result. Non-negative elements in the mask represent an index
9547 into the concatenated pair of input vectors.
9549 If the shuffle mask is undefined, the result vector is undefined. If
9550 the shuffle mask selects an undefined element from one of the input
9551 vectors, the resulting element is undefined. An undefined element
9552 in the mask vector specifies that the resulting element is undefined.
9553 An undefined element in the mask vector prevents a poisoned vector
9554 element from propagating.
9556 For scalable vectors, the only valid mask values at present are
9557 ``zeroinitializer`` and ``undef``, since we cannot write all indices as
9558 literals for a vector with a length unknown at compile time.
9563 .. code-block:: text
9565 <result> = shufflevector <4 x i32> %v1, <4 x i32> %v2,
9566 <4 x i32> <i32 0, i32 4, i32 1, i32 5> ; yields <4 x i32>
9567 <result> = shufflevector <4 x i32> %v1, <4 x i32> undef,
9568 <4 x i32> <i32 0, i32 1, i32 2, i32 3> ; yields <4 x i32> - Identity shuffle.
9569 <result> = shufflevector <8 x i32> %v1, <8 x i32> undef,
9570 <4 x i32> <i32 0, i32 1, i32 2, i32 3> ; yields <4 x i32>
9571 <result> = shufflevector <4 x i32> %v1, <4 x i32> %v2,
9572 <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7 > ; yields <8 x i32>
9574 Aggregate Operations
9575 --------------------
9577 LLVM supports several instructions for working with
9578 :ref:`aggregate <t_aggregate>` values.
9582 '``extractvalue``' Instruction
9583 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
9590 <result> = extractvalue <aggregate type> <val>, <idx>{, <idx>}*
9595 The '``extractvalue``' instruction extracts the value of a member field
9596 from an :ref:`aggregate <t_aggregate>` value.
9601 The first operand of an '``extractvalue``' instruction is a value of
9602 :ref:`struct <t_struct>` or :ref:`array <t_array>` type. The other operands are
9603 constant indices to specify which value to extract in a similar manner
9604 as indices in a '``getelementptr``' instruction.
9606 The major differences to ``getelementptr`` indexing are:
9608 - Since the value being indexed is not a pointer, the first index is
9609 omitted and assumed to be zero.
9610 - At least one index must be specified.
9611 - Not only struct indices but also array indices must be in bounds.
9616 The result is the value at the position in the aggregate specified by
9622 .. code-block:: text
9624 <result> = extractvalue {i32, float} %agg, 0 ; yields i32
9628 '``insertvalue``' Instruction
9629 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
9636 <result> = insertvalue <aggregate type> <val>, <ty> <elt>, <idx>{, <idx>}* ; yields <aggregate type>
9641 The '``insertvalue``' instruction inserts a value into a member field in
9642 an :ref:`aggregate <t_aggregate>` value.
9647 The first operand of an '``insertvalue``' instruction is a value of
9648 :ref:`struct <t_struct>` or :ref:`array <t_array>` type. The second operand is
9649 a first-class value to insert. The following operands are constant
9650 indices indicating the position at which to insert the value in a
9651 similar manner as indices in a '``extractvalue``' instruction. The value
9652 to insert must have the same type as the value identified by the
9658 The result is an aggregate of the same type as ``val``. Its value is
9659 that of ``val`` except that the value at the position specified by the
9660 indices is that of ``elt``.
9665 .. code-block:: llvm
9667 %agg1 = insertvalue {i32, float} undef, i32 1, 0 ; yields {i32 1, float undef}
9668 %agg2 = insertvalue {i32, float} %agg1, float %val, 1 ; yields {i32 1, float %val}
9669 %agg3 = insertvalue {i32, {float}} undef, float %val, 1, 0 ; yields {i32 undef, {float %val}}
9673 Memory Access and Addressing Operations
9674 ---------------------------------------
9676 A key design point of an SSA-based representation is how it represents
9677 memory. In LLVM, no memory locations are in SSA form, which makes things
9678 very simple. This section describes how to read, write, and allocate
9683 '``alloca``' Instruction
9684 ^^^^^^^^^^^^^^^^^^^^^^^^
9691 <result> = alloca [inalloca] <type> [, <ty> <NumElements>] [, align <alignment>] [, addrspace(<num>)] ; yields type addrspace(num)*:result
9696 The '``alloca``' instruction allocates memory on the stack frame of the
9697 currently executing function, to be automatically released when this
9698 function returns to its caller. If the address space is not explicitly
9699 specified, the object is allocated in the alloca address space from the
9700 :ref:`datalayout string<langref_datalayout>`.
9705 The '``alloca``' instruction allocates ``sizeof(<type>)*NumElements``
9706 bytes of memory on the runtime stack, returning a pointer of the
9707 appropriate type to the program. If "NumElements" is specified, it is
9708 the number of elements allocated, otherwise "NumElements" is defaulted
9709 to be one. If a constant alignment is specified, the value result of the
9710 allocation is guaranteed to be aligned to at least that boundary. The
9711 alignment may not be greater than ``1 << 29``. If not specified, or if
9712 zero, the target can choose to align the allocation on any convenient
9713 boundary compatible with the type.
9715 '``type``' may be any sized type.
9720 Memory is allocated; a pointer is returned. The allocated memory is
9721 uninitialized, and loading from uninitialized memory produces an undefined
9722 value. The operation itself is undefined if there is insufficient stack
9723 space for the allocation.'``alloca``'d memory is automatically released
9724 when the function returns. The '``alloca``' instruction is commonly used
9725 to represent automatic variables that must have an address available. When
9726 the function returns (either with the ``ret`` or ``resume`` instructions),
9727 the memory is reclaimed. Allocating zero bytes is legal, but the returned
9728 pointer may not be unique. The order in which memory is allocated (ie.,
9729 which way the stack grows) is not specified.
9731 Note that '``alloca``' outside of the alloca address space from the
9732 :ref:`datalayout string<langref_datalayout>` is meaningful only if the
9733 target has assigned it a semantics.
9735 If the returned pointer is used by :ref:`llvm.lifetime.start <int_lifestart>`,
9736 the returned object is initially dead.
9737 See :ref:`llvm.lifetime.start <int_lifestart>` and
9738 :ref:`llvm.lifetime.end <int_lifeend>` for the precise semantics of
9739 lifetime-manipulating intrinsics.
9744 .. code-block:: llvm
9746 %ptr = alloca i32 ; yields i32*:ptr
9747 %ptr = alloca i32, i32 4 ; yields i32*:ptr
9748 %ptr = alloca i32, i32 4, align 1024 ; yields i32*:ptr
9749 %ptr = alloca i32, align 1024 ; yields i32*:ptr
9753 '``load``' Instruction
9754 ^^^^^^^^^^^^^^^^^^^^^^
9761 <result> = load [volatile] <ty>, <ty>* <pointer>[, align <alignment>][, !nontemporal !<nontemp_node>][, !invariant.load !<empty_node>][, !invariant.group !<empty_node>][, !nonnull !<empty_node>][, !dereferenceable !<deref_bytes_node>][, !dereferenceable_or_null !<deref_bytes_node>][, !align !<align_node>][, !noundef !<empty_node>]
9762 <result> = load atomic [volatile] <ty>, <ty>* <pointer> [syncscope("<target-scope>")] <ordering>, align <alignment> [, !invariant.group !<empty_node>]
9763 !<nontemp_node> = !{ i32 1 }
9765 !<deref_bytes_node> = !{ i64 <dereferenceable_bytes> }
9766 !<align_node> = !{ i64 <value_alignment> }
9771 The '``load``' instruction is used to read from memory.
9776 The argument to the ``load`` instruction specifies the memory address from which
9777 to load. The type specified must be a :ref:`first class <t_firstclass>` type of
9778 known size (i.e. not containing an :ref:`opaque structural type <t_opaque>`). If
9779 the ``load`` is marked as ``volatile``, then the optimizer is not allowed to
9780 modify the number or order of execution of this ``load`` with other
9781 :ref:`volatile operations <volatile>`.
9783 If the ``load`` is marked as ``atomic``, it takes an extra :ref:`ordering
9784 <ordering>` and optional ``syncscope("<target-scope>")`` argument. The
9785 ``release`` and ``acq_rel`` orderings are not valid on ``load`` instructions.
9786 Atomic loads produce :ref:`defined <memmodel>` results when they may see
9787 multiple atomic stores. The type of the pointee must be an integer, pointer, or
9788 floating-point type whose bit width is a power of two greater than or equal to
9789 eight and less than or equal to a target-specific size limit. ``align`` must be
9790 explicitly specified on atomic loads, and the load has undefined behavior if the
9791 alignment is not set to a value which is at least the size in bytes of the
9792 pointee. ``!nontemporal`` does not have any defined semantics for atomic loads.
9794 The optional constant ``align`` argument specifies the alignment of the
9795 operation (that is, the alignment of the memory address). A value of 0
9796 or an omitted ``align`` argument means that the operation has the ABI
9797 alignment for the target. It is the responsibility of the code emitter
9798 to ensure that the alignment information is correct. Overestimating the
9799 alignment results in undefined behavior. Underestimating the alignment
9800 may produce less efficient code. An alignment of 1 is always safe. The
9801 maximum possible alignment is ``1 << 29``. An alignment value higher
9802 than the size of the loaded type implies memory up to the alignment
9803 value bytes can be safely loaded without trapping in the default
9804 address space. Access of the high bytes can interfere with debugging
9805 tools, so should not be accessed if the function has the
9806 ``sanitize_thread`` or ``sanitize_address`` attributes.
9808 The optional ``!nontemporal`` metadata must reference a single
9809 metadata name ``<nontemp_node>`` corresponding to a metadata node with one
9810 ``i32`` entry of value 1. The existence of the ``!nontemporal``
9811 metadata on the instruction tells the optimizer and code generator
9812 that this load is not expected to be reused in the cache. The code
9813 generator may select special instructions to save cache bandwidth, such
9814 as the ``MOVNT`` instruction on x86.
9816 The optional ``!invariant.load`` metadata must reference a single
9817 metadata name ``<empty_node>`` corresponding to a metadata node with no
9818 entries. If a load instruction tagged with the ``!invariant.load``
9819 metadata is executed, the memory location referenced by the load has
9820 to contain the same value at all points in the program where the
9821 memory location is dereferenceable; otherwise, the behavior is
9824 The optional ``!invariant.group`` metadata must reference a single metadata name
9825 ``<empty_node>`` corresponding to a metadata node with no entries.
9826 See ``invariant.group`` metadata :ref:`invariant.group <md_invariant.group>`.
9828 The optional ``!nonnull`` metadata must reference a single
9829 metadata name ``<empty_node>`` corresponding to a metadata node with no
9830 entries. The existence of the ``!nonnull`` metadata on the
9831 instruction tells the optimizer that the value loaded is known to
9832 never be null. If the value is null at runtime, the behavior is undefined.
9833 This is analogous to the ``nonnull`` attribute on parameters and return
9834 values. This metadata can only be applied to loads of a pointer type.
9836 The optional ``!dereferenceable`` metadata must reference a single metadata
9837 name ``<deref_bytes_node>`` corresponding to a metadata node with one ``i64``
9839 See ``dereferenceable`` metadata :ref:`dereferenceable <md_dereferenceable>`.
9841 The optional ``!dereferenceable_or_null`` metadata must reference a single
9842 metadata name ``<deref_bytes_node>`` corresponding to a metadata node with one
9844 See ``dereferenceable_or_null`` metadata :ref:`dereferenceable_or_null
9845 <md_dereferenceable_or_null>`.
9847 The optional ``!align`` metadata must reference a single metadata name
9848 ``<align_node>`` corresponding to a metadata node with one ``i64`` entry.
9849 The existence of the ``!align`` metadata on the instruction tells the
9850 optimizer that the value loaded is known to be aligned to a boundary specified
9851 by the integer value in the metadata node. The alignment must be a power of 2.
9852 This is analogous to the ''align'' attribute on parameters and return values.
9853 This metadata can only be applied to loads of a pointer type. If the returned
9854 value is not appropriately aligned at runtime, the behavior is undefined.
9856 The optional ``!noundef`` metadata must reference a single metadata name
9857 ``<empty_node>`` corresponding to a node with no entries. The existence of
9858 ``!noundef`` metadata on the instruction tells the optimizer that the value
9859 loaded is known to be :ref:`well defined <welldefinedvalues>`.
9860 If the value isn't well defined, the behavior is undefined.
9865 The location of memory pointed to is loaded. If the value being loaded
9866 is of scalar type then the number of bytes read does not exceed the
9867 minimum number of bytes needed to hold all bits of the type. For
9868 example, loading an ``i24`` reads at most three bytes. When loading a
9869 value of a type like ``i20`` with a size that is not an integral number
9870 of bytes, the result is undefined if the value was not originally
9871 written using a store of the same type.
9872 If the value being loaded is of aggregate type, the bytes that correspond to
9873 padding may be accessed but are ignored, because it is impossible to observe
9874 padding from the loaded aggregate value.
9875 If ``<pointer>`` is not a well-defined value, the behavior is undefined.
9880 .. code-block:: llvm
9882 %ptr = alloca i32 ; yields i32*:ptr
9883 store i32 3, i32* %ptr ; yields void
9884 %val = load i32, i32* %ptr ; yields i32:val = i32 3
9888 '``store``' Instruction
9889 ^^^^^^^^^^^^^^^^^^^^^^^
9896 store [volatile] <ty> <value>, <ty>* <pointer>[, align <alignment>][, !nontemporal !<nontemp_node>][, !invariant.group !<empty_node>] ; yields void
9897 store atomic [volatile] <ty> <value>, <ty>* <pointer> [syncscope("<target-scope>")] <ordering>, align <alignment> [, !invariant.group !<empty_node>] ; yields void
9898 !<nontemp_node> = !{ i32 1 }
9904 The '``store``' instruction is used to write to memory.
9909 There are two arguments to the ``store`` instruction: a value to store and an
9910 address at which to store it. The type of the ``<pointer>`` operand must be a
9911 pointer to the :ref:`first class <t_firstclass>` type of the ``<value>``
9912 operand. If the ``store`` is marked as ``volatile``, then the optimizer is not
9913 allowed to modify the number or order of execution of this ``store`` with other
9914 :ref:`volatile operations <volatile>`. Only values of :ref:`first class
9915 <t_firstclass>` types of known size (i.e. not containing an :ref:`opaque
9916 structural type <t_opaque>`) can be stored.
9918 If the ``store`` is marked as ``atomic``, it takes an extra :ref:`ordering
9919 <ordering>` and optional ``syncscope("<target-scope>")`` argument. The
9920 ``acquire`` and ``acq_rel`` orderings aren't valid on ``store`` instructions.
9921 Atomic loads produce :ref:`defined <memmodel>` results when they may see
9922 multiple atomic stores. The type of the pointee must be an integer, pointer, or
9923 floating-point type whose bit width is a power of two greater than or equal to
9924 eight and less than or equal to a target-specific size limit. ``align`` must be
9925 explicitly specified on atomic stores, and the store has undefined behavior if
9926 the alignment is not set to a value which is at least the size in bytes of the
9927 pointee. ``!nontemporal`` does not have any defined semantics for atomic stores.
9929 The optional constant ``align`` argument specifies the alignment of the
9930 operation (that is, the alignment of the memory address). A value of 0
9931 or an omitted ``align`` argument means that the operation has the ABI
9932 alignment for the target. It is the responsibility of the code emitter
9933 to ensure that the alignment information is correct. Overestimating the
9934 alignment results in undefined behavior. Underestimating the
9935 alignment may produce less efficient code. An alignment of 1 is always
9936 safe. The maximum possible alignment is ``1 << 29``. An alignment
9937 value higher than the size of the stored type implies memory up to the
9938 alignment value bytes can be stored to without trapping in the default
9939 address space. Storing to the higher bytes however may result in data
9940 races if another thread can access the same address. Introducing a
9941 data race is not allowed. Storing to the extra bytes is not allowed
9942 even in situations where a data race is known to not exist if the
9943 function has the ``sanitize_address`` attribute.
9945 The optional ``!nontemporal`` metadata must reference a single metadata
9946 name ``<nontemp_node>`` corresponding to a metadata node with one ``i32`` entry
9947 of value 1. The existence of the ``!nontemporal`` metadata on the instruction
9948 tells the optimizer and code generator that this load is not expected to
9949 be reused in the cache. The code generator may select special
9950 instructions to save cache bandwidth, such as the ``MOVNT`` instruction on
9953 The optional ``!invariant.group`` metadata must reference a
9954 single metadata name ``<empty_node>``. See ``invariant.group`` metadata.
9959 The contents of memory are updated to contain ``<value>`` at the
9960 location specified by the ``<pointer>`` operand. If ``<value>`` is
9961 of scalar type then the number of bytes written does not exceed the
9962 minimum number of bytes needed to hold all bits of the type. For
9963 example, storing an ``i24`` writes at most three bytes. When writing a
9964 value of a type like ``i20`` with a size that is not an integral number
9965 of bytes, it is unspecified what happens to the extra bits that do not
9966 belong to the type, but they will typically be overwritten.
9967 If ``<value>`` is of aggregate type, padding is filled with
9968 :ref:`undef <undefvalues>`.
9969 If ``<pointer>`` is not a well-defined value, the behavior is undefined.
9974 .. code-block:: llvm
9976 %ptr = alloca i32 ; yields i32*:ptr
9977 store i32 3, i32* %ptr ; yields void
9978 %val = load i32, i32* %ptr ; yields i32:val = i32 3
9982 '``fence``' Instruction
9983 ^^^^^^^^^^^^^^^^^^^^^^^
9990 fence [syncscope("<target-scope>")] <ordering> ; yields void
9995 The '``fence``' instruction is used to introduce happens-before edges
10001 '``fence``' instructions take an :ref:`ordering <ordering>` argument which
10002 defines what *synchronizes-with* edges they add. They can only be given
10003 ``acquire``, ``release``, ``acq_rel``, and ``seq_cst`` orderings.
10008 A fence A which has (at least) ``release`` ordering semantics
10009 *synchronizes with* a fence B with (at least) ``acquire`` ordering
10010 semantics if and only if there exist atomic operations X and Y, both
10011 operating on some atomic object M, such that A is sequenced before X, X
10012 modifies M (either directly or through some side effect of a sequence
10013 headed by X), Y is sequenced before B, and Y observes M. This provides a
10014 *happens-before* dependency between A and B. Rather than an explicit
10015 ``fence``, one (but not both) of the atomic operations X or Y might
10016 provide a ``release`` or ``acquire`` (resp.) ordering constraint and
10017 still *synchronize-with* the explicit ``fence`` and establish the
10018 *happens-before* edge.
10020 A ``fence`` which has ``seq_cst`` ordering, in addition to having both
10021 ``acquire`` and ``release`` semantics specified above, participates in
10022 the global program order of other ``seq_cst`` operations and/or fences.
10024 A ``fence`` instruction can also take an optional
10025 ":ref:`syncscope <syncscope>`" argument.
10030 .. code-block:: text
10032 fence acquire ; yields void
10033 fence syncscope("singlethread") seq_cst ; yields void
10034 fence syncscope("agent") seq_cst ; yields void
10038 '``cmpxchg``' Instruction
10039 ^^^^^^^^^^^^^^^^^^^^^^^^^
10046 cmpxchg [weak] [volatile] <ty>* <pointer>, <ty> <cmp>, <ty> <new> [syncscope("<target-scope>")] <success ordering> <failure ordering>[, align <alignment>] ; yields { ty, i1 }
10051 The '``cmpxchg``' instruction is used to atomically modify memory. It
10052 loads a value in memory and compares it to a given value. If they are
10053 equal, it tries to store a new value into the memory.
10058 There are three arguments to the '``cmpxchg``' instruction: an address
10059 to operate on, a value to compare to the value currently be at that
10060 address, and a new value to place at that address if the compared values
10061 are equal. The type of '<cmp>' must be an integer or pointer type whose
10062 bit width is a power of two greater than or equal to eight and less
10063 than or equal to a target-specific size limit. '<cmp>' and '<new>' must
10064 have the same type, and the type of '<pointer>' must be a pointer to
10065 that type. If the ``cmpxchg`` is marked as ``volatile``, then the
10066 optimizer is not allowed to modify the number or order of execution of
10067 this ``cmpxchg`` with other :ref:`volatile operations <volatile>`.
10069 The success and failure :ref:`ordering <ordering>` arguments specify how this
10070 ``cmpxchg`` synchronizes with other atomic operations. Both ordering parameters
10071 must be at least ``monotonic``, the failure ordering cannot be either
10072 ``release`` or ``acq_rel``.
10074 A ``cmpxchg`` instruction can also take an optional
10075 ":ref:`syncscope <syncscope>`" argument.
10077 The instruction can take an optional ``align`` attribute.
10078 The alignment must be a power of two greater or equal to the size of the
10079 `<value>` type. If unspecified, the alignment is assumed to be equal to the
10080 size of the '<value>' type. Note that this default alignment assumption is
10081 different from the alignment used for the load/store instructions when align
10084 The pointer passed into cmpxchg must have alignment greater than or
10085 equal to the size in memory of the operand.
10090 The contents of memory at the location specified by the '``<pointer>``' operand
10091 is read and compared to '``<cmp>``'; if the values are equal, '``<new>``' is
10092 written to the location. The original value at the location is returned,
10093 together with a flag indicating success (true) or failure (false).
10095 If the cmpxchg operation is marked as ``weak`` then a spurious failure is
10096 permitted: the operation may not write ``<new>`` even if the comparison
10099 If the cmpxchg operation is strong (the default), the i1 value is 1 if and only
10100 if the value loaded equals ``cmp``.
10102 A successful ``cmpxchg`` is a read-modify-write instruction for the purpose of
10103 identifying release sequences. A failed ``cmpxchg`` is equivalent to an atomic
10104 load with an ordering parameter determined the second ordering parameter.
10109 .. code-block:: llvm
10112 %orig = load atomic i32, i32* %ptr unordered, align 4 ; yields i32
10116 %cmp = phi i32 [ %orig, %entry ], [%value_loaded, %loop]
10117 %squared = mul i32 %cmp, %cmp
10118 %val_success = cmpxchg i32* %ptr, i32 %cmp, i32 %squared acq_rel monotonic ; yields { i32, i1 }
10119 %value_loaded = extractvalue { i32, i1 } %val_success, 0
10120 %success = extractvalue { i32, i1 } %val_success, 1
10121 br i1 %success, label %done, label %loop
10128 '``atomicrmw``' Instruction
10129 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
10136 atomicrmw [volatile] <operation> <ty>* <pointer>, <ty> <value> [syncscope("<target-scope>")] <ordering>[, align <alignment>] ; yields ty
10141 The '``atomicrmw``' instruction is used to atomically modify memory.
10146 There are three arguments to the '``atomicrmw``' instruction: an
10147 operation to apply, an address whose value to modify, an argument to the
10148 operation. The operation must be one of the following keywords:
10164 For most of these operations, the type of '<value>' must be an integer
10165 type whose bit width is a power of two greater than or equal to eight
10166 and less than or equal to a target-specific size limit. For xchg, this
10167 may also be a floating point type with the same size constraints as
10168 integers. For fadd/fsub, this must be a floating point type. The
10169 type of the '``<pointer>``' operand must be a pointer to that type. If
10170 the ``atomicrmw`` is marked as ``volatile``, then the optimizer is not
10171 allowed to modify the number or order of execution of this
10172 ``atomicrmw`` with other :ref:`volatile operations <volatile>`.
10174 The instruction can take an optional ``align`` attribute.
10175 The alignment must be a power of two greater or equal to the size of the
10176 `<value>` type. If unspecified, the alignment is assumed to be equal to the
10177 size of the '<value>' type. Note that this default alignment assumption is
10178 different from the alignment used for the load/store instructions when align
10181 A ``atomicrmw`` instruction can also take an optional
10182 ":ref:`syncscope <syncscope>`" argument.
10187 The contents of memory at the location specified by the '``<pointer>``'
10188 operand are atomically read, modified, and written back. The original
10189 value at the location is returned. The modification is specified by the
10190 operation argument:
10192 - xchg: ``*ptr = val``
10193 - add: ``*ptr = *ptr + val``
10194 - sub: ``*ptr = *ptr - val``
10195 - and: ``*ptr = *ptr & val``
10196 - nand: ``*ptr = ~(*ptr & val)``
10197 - or: ``*ptr = *ptr | val``
10198 - xor: ``*ptr = *ptr ^ val``
10199 - max: ``*ptr = *ptr > val ? *ptr : val`` (using a signed comparison)
10200 - min: ``*ptr = *ptr < val ? *ptr : val`` (using a signed comparison)
10201 - umax: ``*ptr = *ptr > val ? *ptr : val`` (using an unsigned comparison)
10202 - umin: ``*ptr = *ptr < val ? *ptr : val`` (using an unsigned comparison)
10203 - fadd: ``*ptr = *ptr + val`` (using floating point arithmetic)
10204 - fsub: ``*ptr = *ptr - val`` (using floating point arithmetic)
10209 .. code-block:: llvm
10211 %old = atomicrmw add i32* %ptr, i32 1 acquire ; yields i32
10213 .. _i_getelementptr:
10215 '``getelementptr``' Instruction
10216 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
10223 <result> = getelementptr <ty>, <ty>* <ptrval>{, [inrange] <ty> <idx>}*
10224 <result> = getelementptr inbounds <ty>, <ty>* <ptrval>{, [inrange] <ty> <idx>}*
10225 <result> = getelementptr <ty>, <ptr vector> <ptrval>, [inrange] <vector index type> <idx>
10230 The '``getelementptr``' instruction is used to get the address of a
10231 subelement of an :ref:`aggregate <t_aggregate>` data structure. It performs
10232 address calculation only and does not access memory. The instruction can also
10233 be used to calculate a vector of such addresses.
10238 The first argument is always a type used as the basis for the calculations.
10239 The second argument is always a pointer or a vector of pointers, and is the
10240 base address to start from. The remaining arguments are indices
10241 that indicate which of the elements of the aggregate object are indexed.
10242 The interpretation of each index is dependent on the type being indexed
10243 into. The first index always indexes the pointer value given as the
10244 second argument, the second index indexes a value of the type pointed to
10245 (not necessarily the value directly pointed to, since the first index
10246 can be non-zero), etc. The first type indexed into must be a pointer
10247 value, subsequent types can be arrays, vectors, and structs. Note that
10248 subsequent types being indexed into can never be pointers, since that
10249 would require loading the pointer before continuing calculation.
10251 The type of each index argument depends on the type it is indexing into.
10252 When indexing into a (optionally packed) structure, only ``i32`` integer
10253 **constants** are allowed (when using a vector of indices they must all
10254 be the **same** ``i32`` integer constant). When indexing into an array,
10255 pointer or vector, integers of any width are allowed, and they are not
10256 required to be constant. These integers are treated as signed values
10259 For example, let's consider a C code fragment and how it gets compiled
10275 int *foo(struct ST *s) {
10276 return &s[1].Z.B[5][13];
10279 The LLVM code generated by Clang is:
10281 .. code-block:: llvm
10283 %struct.RT = type { i8, [10 x [20 x i32]], i8 }
10284 %struct.ST = type { i32, double, %struct.RT }
10286 define i32* @foo(%struct.ST* %s) nounwind uwtable readnone optsize ssp {
10288 %arrayidx = getelementptr inbounds %struct.ST, %struct.ST* %s, i64 1, i32 2, i32 1, i64 5, i64 13
10295 In the example above, the first index is indexing into the
10296 '``%struct.ST*``' type, which is a pointer, yielding a '``%struct.ST``'
10297 = '``{ i32, double, %struct.RT }``' type, a structure. The second index
10298 indexes into the third element of the structure, yielding a
10299 '``%struct.RT``' = '``{ i8 , [10 x [20 x i32]], i8 }``' type, another
10300 structure. The third index indexes into the second element of the
10301 structure, yielding a '``[10 x [20 x i32]]``' type, an array. The two
10302 dimensions of the array are subscripted into, yielding an '``i32``'
10303 type. The '``getelementptr``' instruction returns a pointer to this
10304 element, thus computing a value of '``i32*``' type.
10306 Note that it is perfectly legal to index partially through a structure,
10307 returning a pointer to an inner element. Because of this, the LLVM code
10308 for the given testcase is equivalent to:
10310 .. code-block:: llvm
10312 define i32* @foo(%struct.ST* %s) {
10313 %t1 = getelementptr %struct.ST, %struct.ST* %s, i32 1 ; yields %struct.ST*:%t1
10314 %t2 = getelementptr %struct.ST, %struct.ST* %t1, i32 0, i32 2 ; yields %struct.RT*:%t2
10315 %t3 = getelementptr %struct.RT, %struct.RT* %t2, i32 0, i32 1 ; yields [10 x [20 x i32]]*:%t3
10316 %t4 = getelementptr [10 x [20 x i32]], [10 x [20 x i32]]* %t3, i32 0, i32 5 ; yields [20 x i32]*:%t4
10317 %t5 = getelementptr [20 x i32], [20 x i32]* %t4, i32 0, i32 13 ; yields i32*:%t5
10321 If the ``inbounds`` keyword is present, the result value of the
10322 ``getelementptr`` is a :ref:`poison value <poisonvalues>` if one of the
10323 following rules is violated:
10325 * The base pointer has an *in bounds* address of an allocated object, which
10326 means that it points into an allocated object, or to its end. The only
10327 *in bounds* address for a null pointer in the default address-space is the
10328 null pointer itself.
10329 * If the type of an index is larger than the pointer index type, the
10330 truncation to the pointer index type preserves the signed value.
10331 * The multiplication of an index by the type size does not wrap the pointer
10332 index type in a signed sense (``nsw``).
10333 * The successive addition of offsets (without adding the base address) does
10334 not wrap the pointer index type in a signed sense (``nsw``).
10335 * The successive addition of the current address, interpreted as an unsigned
10336 number, and an offset, interpreted as a signed number, does not wrap the
10337 unsigned address space and remains *in bounds* of the allocated object.
10338 As a corollary, if the added offset is non-negative, the addition does not
10339 wrap in an unsigned sense (``nuw``).
10340 * In cases where the base is a vector of pointers, the ``inbounds`` keyword
10341 applies to each of the computations element-wise.
10343 These rules are based on the assumption that no allocated object may cross
10344 the unsigned address space boundary, and no allocated object may be larger
10345 than half the pointer index type space.
10347 If the ``inbounds`` keyword is not present, the offsets are added to the
10348 base address with silently-wrapping two's complement arithmetic. If the
10349 offsets have a different width from the pointer, they are sign-extended
10350 or truncated to the width of the pointer. The result value of the
10351 ``getelementptr`` may be outside the object pointed to by the base
10352 pointer. The result value may not necessarily be used to access memory
10353 though, even if it happens to point into allocated storage. See the
10354 :ref:`Pointer Aliasing Rules <pointeraliasing>` section for more
10357 If the ``inrange`` keyword is present before any index, loading from or
10358 storing to any pointer derived from the ``getelementptr`` has undefined
10359 behavior if the load or store would access memory outside of the bounds of
10360 the element selected by the index marked as ``inrange``. The result of a
10361 pointer comparison or ``ptrtoint`` (including ``ptrtoint``-like operations
10362 involving memory) involving a pointer derived from a ``getelementptr`` with
10363 the ``inrange`` keyword is undefined, with the exception of comparisons
10364 in the case where both operands are in the range of the element selected
10365 by the ``inrange`` keyword, inclusive of the address one past the end of
10366 that element. Note that the ``inrange`` keyword is currently only allowed
10367 in constant ``getelementptr`` expressions.
10369 The getelementptr instruction is often confusing. For some more insight
10370 into how it works, see :doc:`the getelementptr FAQ <GetElementPtr>`.
10375 .. code-block:: llvm
10377 ; yields [12 x i8]*:aptr
10378 %aptr = getelementptr {i32, [12 x i8]}, {i32, [12 x i8]}* %saptr, i64 0, i32 1
10380 %vptr = getelementptr {i32, <2 x i8>}, {i32, <2 x i8>}* %svptr, i64 0, i32 1, i32 1
10382 %eptr = getelementptr [12 x i8], [12 x i8]* %aptr, i64 0, i32 1
10384 %iptr = getelementptr [10 x i32], [10 x i32]* @arr, i16 0, i16 0
10386 Vector of pointers:
10387 """""""""""""""""""
10389 The ``getelementptr`` returns a vector of pointers, instead of a single address,
10390 when one or more of its arguments is a vector. In such cases, all vector
10391 arguments should have the same number of elements, and every scalar argument
10392 will be effectively broadcast into a vector during address calculation.
10394 .. code-block:: llvm
10396 ; All arguments are vectors:
10397 ; A[i] = ptrs[i] + offsets[i]*sizeof(i8)
10398 %A = getelementptr i8, <4 x i8*> %ptrs, <4 x i64> %offsets
10400 ; Add the same scalar offset to each pointer of a vector:
10401 ; A[i] = ptrs[i] + offset*sizeof(i8)
10402 %A = getelementptr i8, <4 x i8*> %ptrs, i64 %offset
10404 ; Add distinct offsets to the same pointer:
10405 ; A[i] = ptr + offsets[i]*sizeof(i8)
10406 %A = getelementptr i8, i8* %ptr, <4 x i64> %offsets
10408 ; In all cases described above the type of the result is <4 x i8*>
10410 The two following instructions are equivalent:
10412 .. code-block:: llvm
10414 getelementptr %struct.ST, <4 x %struct.ST*> %s, <4 x i64> %ind1,
10415 <4 x i32> <i32 2, i32 2, i32 2, i32 2>,
10416 <4 x i32> <i32 1, i32 1, i32 1, i32 1>,
10418 <4 x i64> <i64 13, i64 13, i64 13, i64 13>
10420 getelementptr %struct.ST, <4 x %struct.ST*> %s, <4 x i64> %ind1,
10421 i32 2, i32 1, <4 x i32> %ind4, i64 13
10423 Let's look at the C code, where the vector version of ``getelementptr``
10428 // Let's assume that we vectorize the following loop:
10429 double *A, *B; int *C;
10430 for (int i = 0; i < size; ++i) {
10434 .. code-block:: llvm
10436 ; get pointers for 8 elements from array B
10437 %ptrs = getelementptr double, double* %B, <8 x i32> %C
10438 ; load 8 elements from array B into A
10439 %A = call <8 x double> @llvm.masked.gather.v8f64.v8p0f64(<8 x double*> %ptrs,
10440 i32 8, <8 x i1> %mask, <8 x double> %passthru)
10442 Conversion Operations
10443 ---------------------
10445 The instructions in this category are the conversion instructions
10446 (casting) which all take a single operand and a type. They perform
10447 various bit conversions on the operand.
10451 '``trunc .. to``' Instruction
10452 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
10459 <result> = trunc <ty> <value> to <ty2> ; yields ty2
10464 The '``trunc``' instruction truncates its operand to the type ``ty2``.
10469 The '``trunc``' instruction takes a value to trunc, and a type to trunc
10470 it to. Both types must be of :ref:`integer <t_integer>` types, or vectors
10471 of the same number of integers. The bit size of the ``value`` must be
10472 larger than the bit size of the destination type, ``ty2``. Equal sized
10473 types are not allowed.
10478 The '``trunc``' instruction truncates the high order bits in ``value``
10479 and converts the remaining bits to ``ty2``. Since the source size must
10480 be larger than the destination size, ``trunc`` cannot be a *no-op cast*.
10481 It will always truncate bits.
10486 .. code-block:: llvm
10488 %X = trunc i32 257 to i8 ; yields i8:1
10489 %Y = trunc i32 123 to i1 ; yields i1:true
10490 %Z = trunc i32 122 to i1 ; yields i1:false
10491 %W = trunc <2 x i16> <i16 8, i16 7> to <2 x i8> ; yields <i8 8, i8 7>
10495 '``zext .. to``' Instruction
10496 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
10503 <result> = zext <ty> <value> to <ty2> ; yields ty2
10508 The '``zext``' instruction zero extends its operand to type ``ty2``.
10513 The '``zext``' instruction takes a value to cast, and a type to cast it
10514 to. Both types must be of :ref:`integer <t_integer>` types, or vectors of
10515 the same number of integers. The bit size of the ``value`` must be
10516 smaller than the bit size of the destination type, ``ty2``.
10521 The ``zext`` fills the high order bits of the ``value`` with zero bits
10522 until it reaches the size of the destination type, ``ty2``.
10524 When zero extending from i1, the result will always be either 0 or 1.
10529 .. code-block:: llvm
10531 %X = zext i32 257 to i64 ; yields i64:257
10532 %Y = zext i1 true to i32 ; yields i32:1
10533 %Z = zext <2 x i16> <i16 8, i16 7> to <2 x i32> ; yields <i32 8, i32 7>
10537 '``sext .. to``' Instruction
10538 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
10545 <result> = sext <ty> <value> to <ty2> ; yields ty2
10550 The '``sext``' sign extends ``value`` to the type ``ty2``.
10555 The '``sext``' instruction takes a value to cast, and a type to cast it
10556 to. Both types must be of :ref:`integer <t_integer>` types, or vectors of
10557 the same number of integers. The bit size of the ``value`` must be
10558 smaller than the bit size of the destination type, ``ty2``.
10563 The '``sext``' instruction performs a sign extension by copying the sign
10564 bit (highest order bit) of the ``value`` until it reaches the bit size
10565 of the type ``ty2``.
10567 When sign extending from i1, the extension always results in -1 or 0.
10572 .. code-block:: llvm
10574 %X = sext i8 -1 to i16 ; yields i16 :65535
10575 %Y = sext i1 true to i32 ; yields i32:-1
10576 %Z = sext <2 x i16> <i16 8, i16 7> to <2 x i32> ; yields <i32 8, i32 7>
10578 '``fptrunc .. to``' Instruction
10579 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
10586 <result> = fptrunc <ty> <value> to <ty2> ; yields ty2
10591 The '``fptrunc``' instruction truncates ``value`` to type ``ty2``.
10596 The '``fptrunc``' instruction takes a :ref:`floating-point <t_floating>`
10597 value to cast and a :ref:`floating-point <t_floating>` type to cast it to.
10598 The size of ``value`` must be larger than the size of ``ty2``. This
10599 implies that ``fptrunc`` cannot be used to make a *no-op cast*.
10604 The '``fptrunc``' instruction casts a ``value`` from a larger
10605 :ref:`floating-point <t_floating>` type to a smaller :ref:`floating-point
10606 <t_floating>` type.
10607 This instruction is assumed to execute in the default :ref:`floating-point
10608 environment <floatenv>`.
10613 .. code-block:: llvm
10615 %X = fptrunc double 16777217.0 to float ; yields float:16777216.0
10616 %Y = fptrunc double 1.0E+300 to half ; yields half:+infinity
10618 '``fpext .. to``' Instruction
10619 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
10626 <result> = fpext <ty> <value> to <ty2> ; yields ty2
10631 The '``fpext``' extends a floating-point ``value`` to a larger floating-point
10637 The '``fpext``' instruction takes a :ref:`floating-point <t_floating>`
10638 ``value`` to cast, and a :ref:`floating-point <t_floating>` type to cast it
10639 to. The source type must be smaller than the destination type.
10644 The '``fpext``' instruction extends the ``value`` from a smaller
10645 :ref:`floating-point <t_floating>` type to a larger :ref:`floating-point
10646 <t_floating>` type. The ``fpext`` cannot be used to make a
10647 *no-op cast* because it always changes bits. Use ``bitcast`` to make a
10648 *no-op cast* for a floating-point cast.
10653 .. code-block:: llvm
10655 %X = fpext float 3.125 to double ; yields double:3.125000e+00
10656 %Y = fpext double %X to fp128 ; yields fp128:0xL00000000000000004000900000000000
10658 '``fptoui .. to``' Instruction
10659 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
10666 <result> = fptoui <ty> <value> to <ty2> ; yields ty2
10671 The '``fptoui``' converts a floating-point ``value`` to its unsigned
10672 integer equivalent of type ``ty2``.
10677 The '``fptoui``' instruction takes a value to cast, which must be a
10678 scalar or vector :ref:`floating-point <t_floating>` value, and a type to
10679 cast it to ``ty2``, which must be an :ref:`integer <t_integer>` type. If
10680 ``ty`` is a vector floating-point type, ``ty2`` must be a vector integer
10681 type with the same number of elements as ``ty``
10686 The '``fptoui``' instruction converts its :ref:`floating-point
10687 <t_floating>` operand into the nearest (rounding towards zero)
10688 unsigned integer value. If the value cannot fit in ``ty2``, the result
10689 is a :ref:`poison value <poisonvalues>`.
10694 .. code-block:: llvm
10696 %X = fptoui double 123.0 to i32 ; yields i32:123
10697 %Y = fptoui float 1.0E+300 to i1 ; yields undefined:1
10698 %Z = fptoui float 1.04E+17 to i8 ; yields undefined:1
10700 '``fptosi .. to``' Instruction
10701 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
10708 <result> = fptosi <ty> <value> to <ty2> ; yields ty2
10713 The '``fptosi``' instruction converts :ref:`floating-point <t_floating>`
10714 ``value`` to type ``ty2``.
10719 The '``fptosi``' instruction takes a value to cast, which must be a
10720 scalar or vector :ref:`floating-point <t_floating>` value, and a type to
10721 cast it to ``ty2``, which must be an :ref:`integer <t_integer>` type. If
10722 ``ty`` is a vector floating-point type, ``ty2`` must be a vector integer
10723 type with the same number of elements as ``ty``
10728 The '``fptosi``' instruction converts its :ref:`floating-point
10729 <t_floating>` operand into the nearest (rounding towards zero)
10730 signed integer value. If the value cannot fit in ``ty2``, the result
10731 is a :ref:`poison value <poisonvalues>`.
10736 .. code-block:: llvm
10738 %X = fptosi double -123.0 to i32 ; yields i32:-123
10739 %Y = fptosi float 1.0E-247 to i1 ; yields undefined:1
10740 %Z = fptosi float 1.04E+17 to i8 ; yields undefined:1
10742 '``uitofp .. to``' Instruction
10743 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
10750 <result> = uitofp <ty> <value> to <ty2> ; yields ty2
10755 The '``uitofp``' instruction regards ``value`` as an unsigned integer
10756 and converts that value to the ``ty2`` type.
10761 The '``uitofp``' instruction takes a value to cast, which must be a
10762 scalar or vector :ref:`integer <t_integer>` value, and a type to cast it to
10763 ``ty2``, which must be an :ref:`floating-point <t_floating>` type. If
10764 ``ty`` is a vector integer type, ``ty2`` must be a vector floating-point
10765 type with the same number of elements as ``ty``
10770 The '``uitofp``' instruction interprets its operand as an unsigned
10771 integer quantity and converts it to the corresponding floating-point
10772 value. If the value cannot be exactly represented, it is rounded using
10773 the default rounding mode.
10779 .. code-block:: llvm
10781 %X = uitofp i32 257 to float ; yields float:257.0
10782 %Y = uitofp i8 -1 to double ; yields double:255.0
10784 '``sitofp .. to``' Instruction
10785 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
10792 <result> = sitofp <ty> <value> to <ty2> ; yields ty2
10797 The '``sitofp``' instruction regards ``value`` as a signed integer and
10798 converts that value to the ``ty2`` type.
10803 The '``sitofp``' instruction takes a value to cast, which must be a
10804 scalar or vector :ref:`integer <t_integer>` value, and a type to cast it to
10805 ``ty2``, which must be an :ref:`floating-point <t_floating>` type. If
10806 ``ty`` is a vector integer type, ``ty2`` must be a vector floating-point
10807 type with the same number of elements as ``ty``
10812 The '``sitofp``' instruction interprets its operand as a signed integer
10813 quantity and converts it to the corresponding floating-point value. If the
10814 value cannot be exactly represented, it is rounded using the default rounding
10820 .. code-block:: llvm
10822 %X = sitofp i32 257 to float ; yields float:257.0
10823 %Y = sitofp i8 -1 to double ; yields double:-1.0
10827 '``ptrtoint .. to``' Instruction
10828 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
10835 <result> = ptrtoint <ty> <value> to <ty2> ; yields ty2
10840 The '``ptrtoint``' instruction converts the pointer or a vector of
10841 pointers ``value`` to the integer (or vector of integers) type ``ty2``.
10846 The '``ptrtoint``' instruction takes a ``value`` to cast, which must be
10847 a value of type :ref:`pointer <t_pointer>` or a vector of pointers, and a
10848 type to cast it to ``ty2``, which must be an :ref:`integer <t_integer>` or
10849 a vector of integers type.
10854 The '``ptrtoint``' instruction converts ``value`` to integer type
10855 ``ty2`` by interpreting the pointer value as an integer and either
10856 truncating or zero extending that value to the size of the integer type.
10857 If ``value`` is smaller than ``ty2`` then a zero extension is done. If
10858 ``value`` is larger than ``ty2`` then a truncation is done. If they are
10859 the same size, then nothing is done (*no-op cast*) other than a type
10865 .. code-block:: llvm
10867 %X = ptrtoint i32* %P to i8 ; yields truncation on 32-bit architecture
10868 %Y = ptrtoint i32* %P to i64 ; yields zero extension on 32-bit architecture
10869 %Z = ptrtoint <4 x i32*> %P to <4 x i64>; yields vector zero extension for a vector of addresses on 32-bit architecture
10873 '``inttoptr .. to``' Instruction
10874 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
10881 <result> = inttoptr <ty> <value> to <ty2>[, !dereferenceable !<deref_bytes_node>][, !dereferenceable_or_null !<deref_bytes_node>] ; yields ty2
10886 The '``inttoptr``' instruction converts an integer ``value`` to a
10887 pointer type, ``ty2``.
10892 The '``inttoptr``' instruction takes an :ref:`integer <t_integer>` value to
10893 cast, and a type to cast it to, which must be a :ref:`pointer <t_pointer>`
10896 The optional ``!dereferenceable`` metadata must reference a single metadata
10897 name ``<deref_bytes_node>`` corresponding to a metadata node with one ``i64``
10899 See ``dereferenceable`` metadata.
10901 The optional ``!dereferenceable_or_null`` metadata must reference a single
10902 metadata name ``<deref_bytes_node>`` corresponding to a metadata node with one
10904 See ``dereferenceable_or_null`` metadata.
10909 The '``inttoptr``' instruction converts ``value`` to type ``ty2`` by
10910 applying either a zero extension or a truncation depending on the size
10911 of the integer ``value``. If ``value`` is larger than the size of a
10912 pointer then a truncation is done. If ``value`` is smaller than the size
10913 of a pointer then a zero extension is done. If they are the same size,
10914 nothing is done (*no-op cast*).
10919 .. code-block:: llvm
10921 %X = inttoptr i32 255 to i32* ; yields zero extension on 64-bit architecture
10922 %Y = inttoptr i32 255 to i32* ; yields no-op on 32-bit architecture
10923 %Z = inttoptr i64 0 to i32* ; yields truncation on 32-bit architecture
10924 %Z = inttoptr <4 x i32> %G to <4 x i8*>; yields truncation of vector G to four pointers
10928 '``bitcast .. to``' Instruction
10929 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
10936 <result> = bitcast <ty> <value> to <ty2> ; yields ty2
10941 The '``bitcast``' instruction converts ``value`` to type ``ty2`` without
10947 The '``bitcast``' instruction takes a value to cast, which must be a
10948 non-aggregate first class value, and a type to cast it to, which must
10949 also be a non-aggregate :ref:`first class <t_firstclass>` type. The
10950 bit sizes of ``value`` and the destination type, ``ty2``, must be
10951 identical. If the source type is a pointer, the destination type must
10952 also be a pointer of the same size. This instruction supports bitwise
10953 conversion of vectors to integers and to vectors of other types (as
10954 long as they have the same size).
10959 The '``bitcast``' instruction converts ``value`` to type ``ty2``. It
10960 is always a *no-op cast* because no bits change with this
10961 conversion. The conversion is done as if the ``value`` had been stored
10962 to memory and read back as type ``ty2``. Pointer (or vector of
10963 pointers) types may only be converted to other pointer (or vector of
10964 pointers) types with the same address space through this instruction.
10965 To convert pointers to other types, use the :ref:`inttoptr <i_inttoptr>`
10966 or :ref:`ptrtoint <i_ptrtoint>` instructions first.
10968 There is a caveat for bitcasts involving vector types in relation to
10969 endianess. For example ``bitcast <2 x i8> <value> to i16`` puts element zero
10970 of the vector in the least significant bits of the i16 for little-endian while
10971 element zero ends up in the most significant bits for big-endian.
10976 .. code-block:: text
10978 %X = bitcast i8 255 to i8 ; yields i8 :-1
10979 %Y = bitcast i32* %x to sint* ; yields sint*:%x
10980 %Z = bitcast <2 x int> %V to i64; ; yields i64: %V (depends on endianess)
10981 %Z = bitcast <2 x i32*> %V to <2 x i64*> ; yields <2 x i64*>
10983 .. _i_addrspacecast:
10985 '``addrspacecast .. to``' Instruction
10986 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
10993 <result> = addrspacecast <pty> <ptrval> to <pty2> ; yields pty2
10998 The '``addrspacecast``' instruction converts ``ptrval`` from ``pty`` in
10999 address space ``n`` to type ``pty2`` in address space ``m``.
11004 The '``addrspacecast``' instruction takes a pointer or vector of pointer value
11005 to cast and a pointer type to cast it to, which must have a different
11011 The '``addrspacecast``' instruction converts the pointer value
11012 ``ptrval`` to type ``pty2``. It can be a *no-op cast* or a complex
11013 value modification, depending on the target and the address space
11014 pair. Pointer conversions within the same address space must be
11015 performed with the ``bitcast`` instruction. Note that if the address space
11016 conversion is legal then both result and operand refer to the same memory
11022 .. code-block:: llvm
11024 %X = addrspacecast i32* %x to i32 addrspace(1)* ; yields i32 addrspace(1)*:%x
11025 %Y = addrspacecast i32 addrspace(1)* %y to i64 addrspace(2)* ; yields i64 addrspace(2)*:%y
11026 %Z = addrspacecast <4 x i32*> %z to <4 x float addrspace(3)*> ; yields <4 x float addrspace(3)*>:%z
11033 The instructions in this category are the "miscellaneous" instructions,
11034 which defy better classification.
11038 '``icmp``' Instruction
11039 ^^^^^^^^^^^^^^^^^^^^^^
11046 <result> = icmp <cond> <ty> <op1>, <op2> ; yields i1 or <N x i1>:result
11051 The '``icmp``' instruction returns a boolean value or a vector of
11052 boolean values based on comparison of its two integer, integer vector,
11053 pointer, or pointer vector operands.
11058 The '``icmp``' instruction takes three operands. The first operand is
11059 the condition code indicating the kind of comparison to perform. It is
11060 not a value, just a keyword. The possible condition codes are:
11063 #. ``ne``: not equal
11064 #. ``ugt``: unsigned greater than
11065 #. ``uge``: unsigned greater or equal
11066 #. ``ult``: unsigned less than
11067 #. ``ule``: unsigned less or equal
11068 #. ``sgt``: signed greater than
11069 #. ``sge``: signed greater or equal
11070 #. ``slt``: signed less than
11071 #. ``sle``: signed less or equal
11073 The remaining two arguments must be :ref:`integer <t_integer>` or
11074 :ref:`pointer <t_pointer>` or integer :ref:`vector <t_vector>` typed. They
11075 must also be identical types.
11080 The '``icmp``' compares ``op1`` and ``op2`` according to the condition
11081 code given as ``cond``. The comparison performed always yields either an
11082 :ref:`i1 <t_integer>` or vector of ``i1`` result, as follows:
11084 #. ``eq``: yields ``true`` if the operands are equal, ``false``
11085 otherwise. No sign interpretation is necessary or performed.
11086 #. ``ne``: yields ``true`` if the operands are unequal, ``false``
11087 otherwise. No sign interpretation is necessary or performed.
11088 #. ``ugt``: interprets the operands as unsigned values and yields
11089 ``true`` if ``op1`` is greater than ``op2``.
11090 #. ``uge``: interprets the operands as unsigned values and yields
11091 ``true`` if ``op1`` is greater than or equal to ``op2``.
11092 #. ``ult``: interprets the operands as unsigned values and yields
11093 ``true`` if ``op1`` is less than ``op2``.
11094 #. ``ule``: interprets the operands as unsigned values and yields
11095 ``true`` if ``op1`` is less than or equal to ``op2``.
11096 #. ``sgt``: interprets the operands as signed values and yields ``true``
11097 if ``op1`` is greater than ``op2``.
11098 #. ``sge``: interprets the operands as signed values and yields ``true``
11099 if ``op1`` is greater than or equal to ``op2``.
11100 #. ``slt``: interprets the operands as signed values and yields ``true``
11101 if ``op1`` is less than ``op2``.
11102 #. ``sle``: interprets the operands as signed values and yields ``true``
11103 if ``op1`` is less than or equal to ``op2``.
11105 If the operands are :ref:`pointer <t_pointer>` typed, the pointer values
11106 are compared as if they were integers.
11108 If the operands are integer vectors, then they are compared element by
11109 element. The result is an ``i1`` vector with the same number of elements
11110 as the values being compared. Otherwise, the result is an ``i1``.
11115 .. code-block:: text
11117 <result> = icmp eq i32 4, 5 ; yields: result=false
11118 <result> = icmp ne float* %X, %X ; yields: result=false
11119 <result> = icmp ult i16 4, 5 ; yields: result=true
11120 <result> = icmp sgt i16 4, 5 ; yields: result=false
11121 <result> = icmp ule i16 -4, 5 ; yields: result=false
11122 <result> = icmp sge i16 4, 5 ; yields: result=false
11126 '``fcmp``' Instruction
11127 ^^^^^^^^^^^^^^^^^^^^^^
11134 <result> = fcmp [fast-math flags]* <cond> <ty> <op1>, <op2> ; yields i1 or <N x i1>:result
11139 The '``fcmp``' instruction returns a boolean value or vector of boolean
11140 values based on comparison of its operands.
11142 If the operands are floating-point scalars, then the result type is a
11143 boolean (:ref:`i1 <t_integer>`).
11145 If the operands are floating-point vectors, then the result type is a
11146 vector of boolean with the same number of elements as the operands being
11152 The '``fcmp``' instruction takes three operands. The first operand is
11153 the condition code indicating the kind of comparison to perform. It is
11154 not a value, just a keyword. The possible condition codes are:
11156 #. ``false``: no comparison, always returns false
11157 #. ``oeq``: ordered and equal
11158 #. ``ogt``: ordered and greater than
11159 #. ``oge``: ordered and greater than or equal
11160 #. ``olt``: ordered and less than
11161 #. ``ole``: ordered and less than or equal
11162 #. ``one``: ordered and not equal
11163 #. ``ord``: ordered (no nans)
11164 #. ``ueq``: unordered or equal
11165 #. ``ugt``: unordered or greater than
11166 #. ``uge``: unordered or greater than or equal
11167 #. ``ult``: unordered or less than
11168 #. ``ule``: unordered or less than or equal
11169 #. ``une``: unordered or not equal
11170 #. ``uno``: unordered (either nans)
11171 #. ``true``: no comparison, always returns true
11173 *Ordered* means that neither operand is a QNAN while *unordered* means
11174 that either operand may be a QNAN.
11176 Each of ``val1`` and ``val2`` arguments must be either a :ref:`floating-point
11177 <t_floating>` type or a :ref:`vector <t_vector>` of floating-point type.
11178 They must have identical types.
11183 The '``fcmp``' instruction compares ``op1`` and ``op2`` according to the
11184 condition code given as ``cond``. If the operands are vectors, then the
11185 vectors are compared element by element. Each comparison performed
11186 always yields an :ref:`i1 <t_integer>` result, as follows:
11188 #. ``false``: always yields ``false``, regardless of operands.
11189 #. ``oeq``: yields ``true`` if both operands are not a QNAN and ``op1``
11190 is equal to ``op2``.
11191 #. ``ogt``: yields ``true`` if both operands are not a QNAN and ``op1``
11192 is greater than ``op2``.
11193 #. ``oge``: yields ``true`` if both operands are not a QNAN and ``op1``
11194 is greater than or equal to ``op2``.
11195 #. ``olt``: yields ``true`` if both operands are not a QNAN and ``op1``
11196 is less than ``op2``.
11197 #. ``ole``: yields ``true`` if both operands are not a QNAN and ``op1``
11198 is less than or equal to ``op2``.
11199 #. ``one``: yields ``true`` if both operands are not a QNAN and ``op1``
11200 is not equal to ``op2``.
11201 #. ``ord``: yields ``true`` if both operands are not a QNAN.
11202 #. ``ueq``: yields ``true`` if either operand is a QNAN or ``op1`` is
11204 #. ``ugt``: yields ``true`` if either operand is a QNAN or ``op1`` is
11205 greater than ``op2``.
11206 #. ``uge``: yields ``true`` if either operand is a QNAN or ``op1`` is
11207 greater than or equal to ``op2``.
11208 #. ``ult``: yields ``true`` if either operand is a QNAN or ``op1`` is
11210 #. ``ule``: yields ``true`` if either operand is a QNAN or ``op1`` is
11211 less than or equal to ``op2``.
11212 #. ``une``: yields ``true`` if either operand is a QNAN or ``op1`` is
11213 not equal to ``op2``.
11214 #. ``uno``: yields ``true`` if either operand is a QNAN.
11215 #. ``true``: always yields ``true``, regardless of operands.
11217 The ``fcmp`` instruction can also optionally take any number of
11218 :ref:`fast-math flags <fastmath>`, which are optimization hints to enable
11219 otherwise unsafe floating-point optimizations.
11221 Any set of fast-math flags are legal on an ``fcmp`` instruction, but the
11222 only flags that have any effect on its semantics are those that allow
11223 assumptions to be made about the values of input arguments; namely
11224 ``nnan``, ``ninf``, and ``reassoc``. See :ref:`fastmath` for more information.
11229 .. code-block:: text
11231 <result> = fcmp oeq float 4.0, 5.0 ; yields: result=false
11232 <result> = fcmp one float 4.0, 5.0 ; yields: result=true
11233 <result> = fcmp olt float 4.0, 5.0 ; yields: result=true
11234 <result> = fcmp ueq double 1.0, 2.0 ; yields: result=false
11238 '``phi``' Instruction
11239 ^^^^^^^^^^^^^^^^^^^^^
11246 <result> = phi [fast-math-flags] <ty> [ <val0>, <label0>], ...
11251 The '``phi``' instruction is used to implement the φ node in the SSA
11252 graph representing the function.
11257 The type of the incoming values is specified with the first type field.
11258 After this, the '``phi``' instruction takes a list of pairs as
11259 arguments, with one pair for each predecessor basic block of the current
11260 block. Only values of :ref:`first class <t_firstclass>` type may be used as
11261 the value arguments to the PHI node. Only labels may be used as the
11264 There must be no non-phi instructions between the start of a basic block
11265 and the PHI instructions: i.e. PHI instructions must be first in a basic
11268 For the purposes of the SSA form, the use of each incoming value is
11269 deemed to occur on the edge from the corresponding predecessor block to
11270 the current block (but after any definition of an '``invoke``'
11271 instruction's return value on the same edge).
11273 The optional ``fast-math-flags`` marker indicates that the phi has one
11274 or more :ref:`fast-math-flags <fastmath>`. These are optimization hints
11275 to enable otherwise unsafe floating-point optimizations. Fast-math-flags
11276 are only valid for phis that return a floating-point scalar or vector
11277 type, or an array (nested to any depth) of floating-point scalar or vector
11283 At runtime, the '``phi``' instruction logically takes on the value
11284 specified by the pair corresponding to the predecessor basic block that
11285 executed just prior to the current block.
11290 .. code-block:: llvm
11292 Loop: ; Infinite loop that counts from 0 on up...
11293 %indvar = phi i32 [ 0, %LoopHeader ], [ %nextindvar, %Loop ]
11294 %nextindvar = add i32 %indvar, 1
11299 '``select``' Instruction
11300 ^^^^^^^^^^^^^^^^^^^^^^^^
11307 <result> = select [fast-math flags] selty <cond>, <ty> <val1>, <ty> <val2> ; yields ty
11309 selty is either i1 or {<N x i1>}
11314 The '``select``' instruction is used to choose one value based on a
11315 condition, without IR-level branching.
11320 The '``select``' instruction requires an 'i1' value or a vector of 'i1'
11321 values indicating the condition, and two values of the same :ref:`first
11322 class <t_firstclass>` type.
11324 #. The optional ``fast-math flags`` marker indicates that the select has one or more
11325 :ref:`fast-math flags <fastmath>`. These are optimization hints to enable
11326 otherwise unsafe floating-point optimizations. Fast-math flags are only valid
11327 for selects that return a floating-point scalar or vector type, or an array
11328 (nested to any depth) of floating-point scalar or vector types.
11333 If the condition is an i1 and it evaluates to 1, the instruction returns
11334 the first value argument; otherwise, it returns the second value
11337 If the condition is a vector of i1, then the value arguments must be
11338 vectors of the same size, and the selection is done element by element.
11340 If the condition is an i1 and the value arguments are vectors of the
11341 same size, then an entire vector is selected.
11346 .. code-block:: llvm
11348 %X = select i1 true, i8 17, i8 42 ; yields i8:17
11353 '``freeze``' Instruction
11354 ^^^^^^^^^^^^^^^^^^^^^^^^
11361 <result> = freeze ty <val> ; yields ty:result
11366 The '``freeze``' instruction is used to stop propagation of
11367 :ref:`undef <undefvalues>` and :ref:`poison <poisonvalues>` values.
11372 The '``freeze``' instruction takes a single argument.
11377 If the argument is ``undef`` or ``poison``, '``freeze``' returns an
11378 arbitrary, but fixed, value of type '``ty``'.
11379 Otherwise, this instruction is a no-op and returns the input argument.
11380 All uses of a value returned by the same '``freeze``' instruction are
11381 guaranteed to always observe the same value, while different '``freeze``'
11382 instructions may yield different values.
11384 While ``undef`` and ``poison`` pointers can be frozen, the result is a
11385 non-dereferenceable pointer. See the
11386 :ref:`Pointer Aliasing Rules <pointeraliasing>` section for more information.
11387 If an aggregate value or vector is frozen, the operand is frozen element-wise.
11388 The padding of an aggregate isn't considered, since it isn't visible
11389 without storing it into memory and loading it with a different type.
11395 .. code-block:: text
11399 %y = add i32 %w, %w ; undef
11400 %z = add i32 %x, %x ; even number because all uses of %x observe
11402 %x2 = freeze i32 %w
11403 %cmp = icmp eq i32 %x, %x2 ; can be true or false
11405 ; example with vectors
11406 %v = <2 x i32> <i32 undef, i32 poison>
11407 %a = extractelement <2 x i32> %v, i32 0 ; undef
11408 %b = extractelement <2 x i32> %v, i32 1 ; poison
11409 %add = add i32 %a, %a ; undef
11411 %v.fr = freeze <2 x i32> %v ; element-wise freeze
11412 %d = extractelement <2 x i32> %v.fr, i32 0 ; not undef
11413 %add.f = add i32 %d, %d ; even number
11415 ; branching on frozen value
11416 %poison = add nsw i1 %k, undef ; poison
11417 %c = freeze i1 %poison
11418 br i1 %c, label %foo, label %bar ; non-deterministic branch to %foo or %bar
11423 '``call``' Instruction
11424 ^^^^^^^^^^^^^^^^^^^^^^
11431 <result> = [tail | musttail | notail ] call [fast-math flags] [cconv] [ret attrs] [addrspace(<num>)]
11432 <ty>|<fnty> <fnptrval>(<function args>) [fn attrs] [ operand bundles ]
11437 The '``call``' instruction represents a simple function call.
11442 This instruction requires several arguments:
11444 #. The optional ``tail`` and ``musttail`` markers indicate that the optimizers
11445 should perform tail call optimization. The ``tail`` marker is a hint that
11446 `can be ignored <CodeGenerator.html#sibcallopt>`_. The ``musttail`` marker
11447 means that the call must be tail call optimized in order for the program to
11448 be correct. The ``musttail`` marker provides these guarantees:
11450 #. The call will not cause unbounded stack growth if it is part of a
11451 recursive cycle in the call graph.
11452 #. Arguments with the :ref:`inalloca <attr_inalloca>` or
11453 :ref:`preallocated <attr_preallocated>` attribute are forwarded in place.
11454 #. If the musttail call appears in a function with the ``"thunk"`` attribute
11455 and the caller and callee both have varargs, than any unprototyped
11456 arguments in register or memory are forwarded to the callee. Similarly,
11457 the return value of the callee is returned to the caller's caller, even
11458 if a void return type is in use.
11460 Both markers imply that the callee does not access allocas from the caller.
11461 The ``tail`` marker additionally implies that the callee does not access
11462 varargs from the caller. Calls marked ``musttail`` must obey the following
11465 - The call must immediately precede a :ref:`ret <i_ret>` instruction,
11466 or a pointer bitcast followed by a ret instruction.
11467 - The ret instruction must return the (possibly bitcasted) value
11468 produced by the call, undef, or void.
11469 - The calling conventions of the caller and callee must match.
11470 - The callee must be varargs iff the caller is varargs. Bitcasting a
11471 non-varargs function to the appropriate varargs type is legal so
11472 long as the non-varargs prefixes obey the other rules.
11473 - The return type must not undergo automatic conversion to an `sret` pointer.
11475 In addition, if the calling convention is not `swifttailcc` or `tailcc`:
11477 - All ABI-impacting function attributes, such as sret, byval, inreg,
11478 returned, and inalloca, must match.
11479 - The caller and callee prototypes must match. Pointer types of parameters
11480 or return types may differ in pointee type, but not in address space.
11482 On the other hand, if the calling convention is `swifttailcc` or `swiftcc`:
11484 - Only these ABI-impacting attributes attributes are allowed: sret, byval,
11485 swiftself, and swiftasync.
11486 - Prototypes are not required to match.
11488 Tail call optimization for calls marked ``tail`` is guaranteed to occur if
11489 the following conditions are met:
11491 - Caller and callee both have the calling convention ``fastcc`` or ``tailcc``.
11492 - The call is in tail position (ret immediately follows call and ret
11493 uses value of call or is void).
11494 - Option ``-tailcallopt`` is enabled,
11495 ``llvm::GuaranteedTailCallOpt`` is ``true``, or the calling convention
11497 - `Platform-specific constraints are
11498 met. <CodeGenerator.html#tailcallopt>`_
11500 #. The optional ``notail`` marker indicates that the optimizers should not add
11501 ``tail`` or ``musttail`` markers to the call. It is used to prevent tail
11502 call optimization from being performed on the call.
11504 #. The optional ``fast-math flags`` marker indicates that the call has one or more
11505 :ref:`fast-math flags <fastmath>`, which are optimization hints to enable
11506 otherwise unsafe floating-point optimizations. Fast-math flags are only valid
11507 for calls that return a floating-point scalar or vector type, or an array
11508 (nested to any depth) of floating-point scalar or vector types.
11510 #. The optional "cconv" marker indicates which :ref:`calling
11511 convention <callingconv>` the call should use. If none is
11512 specified, the call defaults to using C calling conventions. The
11513 calling convention of the call must match the calling convention of
11514 the target function, or else the behavior is undefined.
11515 #. The optional :ref:`Parameter Attributes <paramattrs>` list for return
11516 values. Only '``zeroext``', '``signext``', and '``inreg``' attributes
11518 #. The optional addrspace attribute can be used to indicate the address space
11519 of the called function. If it is not specified, the program address space
11520 from the :ref:`datalayout string<langref_datalayout>` will be used.
11521 #. '``ty``': the type of the call instruction itself which is also the
11522 type of the return value. Functions that return no value are marked
11524 #. '``fnty``': shall be the signature of the function being called. The
11525 argument types must match the types implied by this signature. This
11526 type can be omitted if the function is not varargs.
11527 #. '``fnptrval``': An LLVM value containing a pointer to a function to
11528 be called. In most cases, this is a direct function call, but
11529 indirect ``call``'s are just as possible, calling an arbitrary pointer
11531 #. '``function args``': argument list whose types match the function
11532 signature argument types and parameter attributes. All arguments must
11533 be of :ref:`first class <t_firstclass>` type. If the function signature
11534 indicates the function accepts a variable number of arguments, the
11535 extra arguments can be specified.
11536 #. The optional :ref:`function attributes <fnattrs>` list.
11537 #. The optional :ref:`operand bundles <opbundles>` list.
11542 The '``call``' instruction is used to cause control flow to transfer to
11543 a specified function, with its incoming arguments bound to the specified
11544 values. Upon a '``ret``' instruction in the called function, control
11545 flow continues with the instruction after the function call, and the
11546 return value of the function is bound to the result argument.
11551 .. code-block:: llvm
11553 %retval = call i32 @test(i32 %argc)
11554 call i32 (i8*, ...)* @printf(i8* %msg, i32 12, i8 42) ; yields i32
11555 %X = tail call i32 @foo() ; yields i32
11556 %Y = tail call fastcc i32 @foo() ; yields i32
11557 call void %foo(i8 97 signext)
11559 %struct.A = type { i32, i8 }
11560 %r = call %struct.A @foo() ; yields { i32, i8 }
11561 %gr = extractvalue %struct.A %r, 0 ; yields i32
11562 %gr1 = extractvalue %struct.A %r, 1 ; yields i8
11563 %Z = call void @foo() noreturn ; indicates that %foo never returns normally
11564 %ZZ = call zeroext i32 @bar() ; Return value is %zero extended
11566 llvm treats calls to some functions with names and arguments that match
11567 the standard C99 library as being the C99 library functions, and may
11568 perform optimizations or generate code for them under that assumption.
11569 This is something we'd like to change in the future to provide better
11570 support for freestanding environments and non-C-based languages.
11574 '``va_arg``' Instruction
11575 ^^^^^^^^^^^^^^^^^^^^^^^^
11582 <resultval> = va_arg <va_list*> <arglist>, <argty>
11587 The '``va_arg``' instruction is used to access arguments passed through
11588 the "variable argument" area of a function call. It is used to implement
11589 the ``va_arg`` macro in C.
11594 This instruction takes a ``va_list*`` value and the type of the
11595 argument. It returns a value of the specified argument type and
11596 increments the ``va_list`` to point to the next argument. The actual
11597 type of ``va_list`` is target specific.
11602 The '``va_arg``' instruction loads an argument of the specified type
11603 from the specified ``va_list`` and causes the ``va_list`` to point to
11604 the next argument. For more information, see the variable argument
11605 handling :ref:`Intrinsic Functions <int_varargs>`.
11607 It is legal for this instruction to be called in a function which does
11608 not take a variable number of arguments, for example, the ``vfprintf``
11611 ``va_arg`` is an LLVM instruction instead of an :ref:`intrinsic
11612 function <intrinsics>` because it takes a type as an argument.
11617 See the :ref:`variable argument processing <int_varargs>` section.
11619 Note that the code generator does not yet fully support va\_arg on many
11620 targets. Also, it does not currently support va\_arg with aggregate
11621 types on any target.
11625 '``landingpad``' Instruction
11626 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
11633 <resultval> = landingpad <resultty> <clause>+
11634 <resultval> = landingpad <resultty> cleanup <clause>*
11636 <clause> := catch <type> <value>
11637 <clause> := filter <array constant type> <array constant>
11642 The '``landingpad``' instruction is used by `LLVM's exception handling
11643 system <ExceptionHandling.html#overview>`_ to specify that a basic block
11644 is a landing pad --- one where the exception lands, and corresponds to the
11645 code found in the ``catch`` portion of a ``try``/``catch`` sequence. It
11646 defines values supplied by the :ref:`personality function <personalityfn>` upon
11647 re-entry to the function. The ``resultval`` has the type ``resultty``.
11653 ``cleanup`` flag indicates that the landing pad block is a cleanup.
11655 A ``clause`` begins with the clause type --- ``catch`` or ``filter`` --- and
11656 contains the global variable representing the "type" that may be caught
11657 or filtered respectively. Unlike the ``catch`` clause, the ``filter``
11658 clause takes an array constant as its argument. Use
11659 "``[0 x i8**] undef``" for a filter which cannot throw. The
11660 '``landingpad``' instruction must contain *at least* one ``clause`` or
11661 the ``cleanup`` flag.
11666 The '``landingpad``' instruction defines the values which are set by the
11667 :ref:`personality function <personalityfn>` upon re-entry to the function, and
11668 therefore the "result type" of the ``landingpad`` instruction. As with
11669 calling conventions, how the personality function results are
11670 represented in LLVM IR is target specific.
11672 The clauses are applied in order from top to bottom. If two
11673 ``landingpad`` instructions are merged together through inlining, the
11674 clauses from the calling function are appended to the list of clauses.
11675 When the call stack is being unwound due to an exception being thrown,
11676 the exception is compared against each ``clause`` in turn. If it doesn't
11677 match any of the clauses, and the ``cleanup`` flag is not set, then
11678 unwinding continues further up the call stack.
11680 The ``landingpad`` instruction has several restrictions:
11682 - A landing pad block is a basic block which is the unwind destination
11683 of an '``invoke``' instruction.
11684 - A landing pad block must have a '``landingpad``' instruction as its
11685 first non-PHI instruction.
11686 - There can be only one '``landingpad``' instruction within the landing
11688 - A basic block that is not a landing pad block may not include a
11689 '``landingpad``' instruction.
11694 .. code-block:: llvm
11696 ;; A landing pad which can catch an integer.
11697 %res = landingpad { i8*, i32 }
11699 ;; A landing pad that is a cleanup.
11700 %res = landingpad { i8*, i32 }
11702 ;; A landing pad which can catch an integer and can only throw a double.
11703 %res = landingpad { i8*, i32 }
11705 filter [1 x i8**] [@_ZTId]
11709 '``catchpad``' Instruction
11710 ^^^^^^^^^^^^^^^^^^^^^^^^^^
11717 <resultval> = catchpad within <catchswitch> [<args>*]
11722 The '``catchpad``' instruction is used by `LLVM's exception handling
11723 system <ExceptionHandling.html#overview>`_ to specify that a basic block
11724 begins a catch handler --- one where a personality routine attempts to transfer
11725 control to catch an exception.
11730 The ``catchswitch`` operand must always be a token produced by a
11731 :ref:`catchswitch <i_catchswitch>` instruction in a predecessor block. This
11732 ensures that each ``catchpad`` has exactly one predecessor block, and it always
11733 terminates in a ``catchswitch``.
11735 The ``args`` correspond to whatever information the personality routine
11736 requires to know if this is an appropriate handler for the exception. Control
11737 will transfer to the ``catchpad`` if this is the first appropriate handler for
11740 The ``resultval`` has the type :ref:`token <t_token>` and is used to match the
11741 ``catchpad`` to corresponding :ref:`catchrets <i_catchret>` and other nested EH
11747 When the call stack is being unwound due to an exception being thrown, the
11748 exception is compared against the ``args``. If it doesn't match, control will
11749 not reach the ``catchpad`` instruction. The representation of ``args`` is
11750 entirely target and personality function-specific.
11752 Like the :ref:`landingpad <i_landingpad>` instruction, the ``catchpad``
11753 instruction must be the first non-phi of its parent basic block.
11755 The meaning of the tokens produced and consumed by ``catchpad`` and other "pad"
11756 instructions is described in the
11757 `Windows exception handling documentation\ <ExceptionHandling.html#wineh>`_.
11759 When a ``catchpad`` has been "entered" but not yet "exited" (as
11760 described in the `EH documentation\ <ExceptionHandling.html#wineh-constraints>`_),
11761 it is undefined behavior to execute a :ref:`call <i_call>` or :ref:`invoke <i_invoke>`
11762 that does not carry an appropriate :ref:`"funclet" bundle <ob_funclet>`.
11767 .. code-block:: text
11770 %cs = catchswitch within none [label %handler0] unwind to caller
11771 ;; A catch block which can catch an integer.
11773 %tok = catchpad within %cs [i8** @_ZTIi]
11777 '``cleanuppad``' Instruction
11778 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
11785 <resultval> = cleanuppad within <parent> [<args>*]
11790 The '``cleanuppad``' instruction is used by `LLVM's exception handling
11791 system <ExceptionHandling.html#overview>`_ to specify that a basic block
11792 is a cleanup block --- one where a personality routine attempts to
11793 transfer control to run cleanup actions.
11794 The ``args`` correspond to whatever additional
11795 information the :ref:`personality function <personalityfn>` requires to
11796 execute the cleanup.
11797 The ``resultval`` has the type :ref:`token <t_token>` and is used to
11798 match the ``cleanuppad`` to corresponding :ref:`cleanuprets <i_cleanupret>`.
11799 The ``parent`` argument is the token of the funclet that contains the
11800 ``cleanuppad`` instruction. If the ``cleanuppad`` is not inside a funclet,
11801 this operand may be the token ``none``.
11806 The instruction takes a list of arbitrary values which are interpreted
11807 by the :ref:`personality function <personalityfn>`.
11812 When the call stack is being unwound due to an exception being thrown,
11813 the :ref:`personality function <personalityfn>` transfers control to the
11814 ``cleanuppad`` with the aid of the personality-specific arguments.
11815 As with calling conventions, how the personality function results are
11816 represented in LLVM IR is target specific.
11818 The ``cleanuppad`` instruction has several restrictions:
11820 - A cleanup block is a basic block which is the unwind destination of
11821 an exceptional instruction.
11822 - A cleanup block must have a '``cleanuppad``' instruction as its
11823 first non-PHI instruction.
11824 - There can be only one '``cleanuppad``' instruction within the
11826 - A basic block that is not a cleanup block may not include a
11827 '``cleanuppad``' instruction.
11829 When a ``cleanuppad`` has been "entered" but not yet "exited" (as
11830 described in the `EH documentation\ <ExceptionHandling.html#wineh-constraints>`_),
11831 it is undefined behavior to execute a :ref:`call <i_call>` or :ref:`invoke <i_invoke>`
11832 that does not carry an appropriate :ref:`"funclet" bundle <ob_funclet>`.
11837 .. code-block:: text
11839 %tok = cleanuppad within %cs []
11843 Intrinsic Functions
11844 ===================
11846 LLVM supports the notion of an "intrinsic function". These functions
11847 have well known names and semantics and are required to follow certain
11848 restrictions. Overall, these intrinsics represent an extension mechanism
11849 for the LLVM language that does not require changing all of the
11850 transformations in LLVM when adding to the language (or the bitcode
11851 reader/writer, the parser, etc...).
11853 Intrinsic function names must all start with an "``llvm.``" prefix. This
11854 prefix is reserved in LLVM for intrinsic names; thus, function names may
11855 not begin with this prefix. Intrinsic functions must always be external
11856 functions: you cannot define the body of intrinsic functions. Intrinsic
11857 functions may only be used in call or invoke instructions: it is illegal
11858 to take the address of an intrinsic function. Additionally, because
11859 intrinsic functions are part of the LLVM language, it is required if any
11860 are added that they be documented here.
11862 Some intrinsic functions can be overloaded, i.e., the intrinsic
11863 represents a family of functions that perform the same operation but on
11864 different data types. Because LLVM can represent over 8 million
11865 different integer types, overloading is used commonly to allow an
11866 intrinsic function to operate on any integer type. One or more of the
11867 argument types or the result type can be overloaded to accept any
11868 integer type. Argument types may also be defined as exactly matching a
11869 previous argument's type or the result type. This allows an intrinsic
11870 function which accepts multiple arguments, but needs all of them to be
11871 of the same type, to only be overloaded with respect to a single
11872 argument or the result.
11874 Overloaded intrinsics will have the names of its overloaded argument
11875 types encoded into its function name, each preceded by a period. Only
11876 those types which are overloaded result in a name suffix. Arguments
11877 whose type is matched against another type do not. For example, the
11878 ``llvm.ctpop`` function can take an integer of any width and returns an
11879 integer of exactly the same integer width. This leads to a family of
11880 functions such as ``i8 @llvm.ctpop.i8(i8 %val)`` and
11881 ``i29 @llvm.ctpop.i29(i29 %val)``. Only one type, the return type, is
11882 overloaded, and only one type suffix is required. Because the argument's
11883 type is matched against the return type, it does not require its own
11886 :ref:`Unnamed types <t_opaque>` are encoded as ``s_s``. Overloaded intrinsics
11887 that depend on an unnamed type in one of its overloaded argument types get an
11888 additional ``.<number>`` suffix. This allows differentiating intrinsics with
11889 different unnamed types as arguments. (For example:
11890 ``llvm.ssa.copy.p0s_s.2(%42*)``) The number is tracked in the LLVM module and
11891 it ensures unique names in the module. While linking together two modules, it is
11892 still possible to get a name clash. In that case one of the names will be
11893 changed by getting a new number.
11895 For target developers who are defining intrinsics for back-end code
11896 generation, any intrinsic overloads based solely the distinction between
11897 integer or floating point types should not be relied upon for correct
11898 code generation. In such cases, the recommended approach for target
11899 maintainers when defining intrinsics is to create separate integer and
11900 FP intrinsics rather than rely on overloading. For example, if different
11901 codegen is required for ``llvm.target.foo(<4 x i32>)`` and
11902 ``llvm.target.foo(<4 x float>)`` then these should be split into
11903 different intrinsics.
11905 To learn how to add an intrinsic function, please see the `Extending
11906 LLVM Guide <ExtendingLLVM.html>`_.
11910 Variable Argument Handling Intrinsics
11911 -------------------------------------
11913 Variable argument support is defined in LLVM with the
11914 :ref:`va_arg <i_va_arg>` instruction and these three intrinsic
11915 functions. These functions are related to the similarly named macros
11916 defined in the ``<stdarg.h>`` header file.
11918 All of these functions operate on arguments that use a target-specific
11919 value type "``va_list``". The LLVM assembly language reference manual
11920 does not define what this type is, so all transformations should be
11921 prepared to handle these functions regardless of the type used.
11923 This example shows how the :ref:`va_arg <i_va_arg>` instruction and the
11924 variable argument handling intrinsic functions are used.
11926 .. code-block:: llvm
11928 ; This struct is different for every platform. For most platforms,
11929 ; it is merely an i8*.
11930 %struct.va_list = type { i8* }
11932 ; For Unix x86_64 platforms, va_list is the following struct:
11933 ; %struct.va_list = type { i32, i32, i8*, i8* }
11935 define i32 @test(i32 %X, ...) {
11936 ; Initialize variable argument processing
11937 %ap = alloca %struct.va_list
11938 %ap2 = bitcast %struct.va_list* %ap to i8*
11939 call void @llvm.va_start(i8* %ap2)
11941 ; Read a single integer argument
11942 %tmp = va_arg i8* %ap2, i32
11944 ; Demonstrate usage of llvm.va_copy and llvm.va_end
11946 %aq2 = bitcast i8** %aq to i8*
11947 call void @llvm.va_copy(i8* %aq2, i8* %ap2)
11948 call void @llvm.va_end(i8* %aq2)
11950 ; Stop processing of arguments.
11951 call void @llvm.va_end(i8* %ap2)
11955 declare void @llvm.va_start(i8*)
11956 declare void @llvm.va_copy(i8*, i8*)
11957 declare void @llvm.va_end(i8*)
11961 '``llvm.va_start``' Intrinsic
11962 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
11969 declare void @llvm.va_start(i8* <arglist>)
11974 The '``llvm.va_start``' intrinsic initializes ``*<arglist>`` for
11975 subsequent use by ``va_arg``.
11980 The argument is a pointer to a ``va_list`` element to initialize.
11985 The '``llvm.va_start``' intrinsic works just like the ``va_start`` macro
11986 available in C. In a target-dependent way, it initializes the
11987 ``va_list`` element to which the argument points, so that the next call
11988 to ``va_arg`` will produce the first variable argument passed to the
11989 function. Unlike the C ``va_start`` macro, this intrinsic does not need
11990 to know the last argument of the function as the compiler can figure
11993 '``llvm.va_end``' Intrinsic
11994 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
12001 declare void @llvm.va_end(i8* <arglist>)
12006 The '``llvm.va_end``' intrinsic destroys ``*<arglist>``, which has been
12007 initialized previously with ``llvm.va_start`` or ``llvm.va_copy``.
12012 The argument is a pointer to a ``va_list`` to destroy.
12017 The '``llvm.va_end``' intrinsic works just like the ``va_end`` macro
12018 available in C. In a target-dependent way, it destroys the ``va_list``
12019 element to which the argument points. Calls to
12020 :ref:`llvm.va_start <int_va_start>` and
12021 :ref:`llvm.va_copy <int_va_copy>` must be matched exactly with calls to
12026 '``llvm.va_copy``' Intrinsic
12027 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12034 declare void @llvm.va_copy(i8* <destarglist>, i8* <srcarglist>)
12039 The '``llvm.va_copy``' intrinsic copies the current argument position
12040 from the source argument list to the destination argument list.
12045 The first argument is a pointer to a ``va_list`` element to initialize.
12046 The second argument is a pointer to a ``va_list`` element to copy from.
12051 The '``llvm.va_copy``' intrinsic works just like the ``va_copy`` macro
12052 available in C. In a target-dependent way, it copies the source
12053 ``va_list`` element into the destination ``va_list`` element. This
12054 intrinsic is necessary because the `` llvm.va_start`` intrinsic may be
12055 arbitrarily complex and require, for example, memory allocation.
12057 Accurate Garbage Collection Intrinsics
12058 --------------------------------------
12060 LLVM's support for `Accurate Garbage Collection <GarbageCollection.html>`_
12061 (GC) requires the frontend to generate code containing appropriate intrinsic
12062 calls and select an appropriate GC strategy which knows how to lower these
12063 intrinsics in a manner which is appropriate for the target collector.
12065 These intrinsics allow identification of :ref:`GC roots on the
12066 stack <int_gcroot>`, as well as garbage collector implementations that
12067 require :ref:`read <int_gcread>` and :ref:`write <int_gcwrite>` barriers.
12068 Frontends for type-safe garbage collected languages should generate
12069 these intrinsics to make use of the LLVM garbage collectors. For more
12070 details, see `Garbage Collection with LLVM <GarbageCollection.html>`_.
12072 LLVM provides an second experimental set of intrinsics for describing garbage
12073 collection safepoints in compiled code. These intrinsics are an alternative
12074 to the ``llvm.gcroot`` intrinsics, but are compatible with the ones for
12075 :ref:`read <int_gcread>` and :ref:`write <int_gcwrite>` barriers. The
12076 differences in approach are covered in the `Garbage Collection with LLVM
12077 <GarbageCollection.html>`_ documentation. The intrinsics themselves are
12078 described in :doc:`Statepoints`.
12082 '``llvm.gcroot``' Intrinsic
12083 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
12090 declare void @llvm.gcroot(i8** %ptrloc, i8* %metadata)
12095 The '``llvm.gcroot``' intrinsic declares the existence of a GC root to
12096 the code generator, and allows some metadata to be associated with it.
12101 The first argument specifies the address of a stack object that contains
12102 the root pointer. The second pointer (which must be either a constant or
12103 a global value address) contains the meta-data to be associated with the
12109 At runtime, a call to this intrinsic stores a null pointer into the
12110 "ptrloc" location. At compile-time, the code generator generates
12111 information to allow the runtime to find the pointer at GC safe points.
12112 The '``llvm.gcroot``' intrinsic may only be used in a function which
12113 :ref:`specifies a GC algorithm <gc>`.
12117 '``llvm.gcread``' Intrinsic
12118 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
12125 declare i8* @llvm.gcread(i8* %ObjPtr, i8** %Ptr)
12130 The '``llvm.gcread``' intrinsic identifies reads of references from heap
12131 locations, allowing garbage collector implementations that require read
12137 The second argument is the address to read from, which should be an
12138 address allocated from the garbage collector. The first object is a
12139 pointer to the start of the referenced object, if needed by the language
12140 runtime (otherwise null).
12145 The '``llvm.gcread``' intrinsic has the same semantics as a load
12146 instruction, but may be replaced with substantially more complex code by
12147 the garbage collector runtime, as needed. The '``llvm.gcread``'
12148 intrinsic may only be used in a function which :ref:`specifies a GC
12153 '``llvm.gcwrite``' Intrinsic
12154 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12161 declare void @llvm.gcwrite(i8* %P1, i8* %Obj, i8** %P2)
12166 The '``llvm.gcwrite``' intrinsic identifies writes of references to heap
12167 locations, allowing garbage collector implementations that require write
12168 barriers (such as generational or reference counting collectors).
12173 The first argument is the reference to store, the second is the start of
12174 the object to store it to, and the third is the address of the field of
12175 Obj to store to. If the runtime does not require a pointer to the
12176 object, Obj may be null.
12181 The '``llvm.gcwrite``' intrinsic has the same semantics as a store
12182 instruction, but may be replaced with substantially more complex code by
12183 the garbage collector runtime, as needed. The '``llvm.gcwrite``'
12184 intrinsic may only be used in a function which :ref:`specifies a GC
12190 'llvm.experimental.gc.statepoint' Intrinsic
12191 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12199 @llvm.experimental.gc.statepoint(i64 <id>, i32 <num patch bytes>,
12200 func_type <target>,
12201 i64 <#call args>, i64 <flags>,
12202 ... (call parameters),
12208 The statepoint intrinsic represents a call which is parse-able by the
12214 The 'id' operand is a constant integer that is reported as the ID
12215 field in the generated stackmap. LLVM does not interpret this
12216 parameter in any way and its meaning is up to the statepoint user to
12217 decide. Note that LLVM is free to duplicate code containing
12218 statepoint calls, and this may transform IR that had a unique 'id' per
12219 lexical call to statepoint to IR that does not.
12221 If 'num patch bytes' is non-zero then the call instruction
12222 corresponding to the statepoint is not emitted and LLVM emits 'num
12223 patch bytes' bytes of nops in its place. LLVM will emit code to
12224 prepare the function arguments and retrieve the function return value
12225 in accordance to the calling convention; the former before the nop
12226 sequence and the latter after the nop sequence. It is expected that
12227 the user will patch over the 'num patch bytes' bytes of nops with a
12228 calling sequence specific to their runtime before executing the
12229 generated machine code. There are no guarantees with respect to the
12230 alignment of the nop sequence. Unlike :doc:`StackMaps` statepoints do
12231 not have a concept of shadow bytes. Note that semantically the
12232 statepoint still represents a call or invoke to 'target', and the nop
12233 sequence after patching is expected to represent an operation
12234 equivalent to a call or invoke to 'target'.
12236 The 'target' operand is the function actually being called. The
12237 target can be specified as either a symbolic LLVM function, or as an
12238 arbitrary Value of appropriate function type. Note that the function
12239 type must match the signature of the callee and the types of the 'call
12240 parameters' arguments.
12242 The '#call args' operand is the number of arguments to the actual
12243 call. It must exactly match the number of arguments passed in the
12244 'call parameters' variable length section.
12246 The 'flags' operand is used to specify extra information about the
12247 statepoint. This is currently only used to mark certain statepoints
12248 as GC transitions. This operand is a 64-bit integer with the following
12249 layout, where bit 0 is the least significant bit:
12251 +-------+---------------------------------------------------+
12253 +=======+===================================================+
12254 | 0 | Set if the statepoint is a GC transition, cleared |
12256 +-------+---------------------------------------------------+
12257 | 1-63 | Reserved for future use; must be cleared. |
12258 +-------+---------------------------------------------------+
12260 The 'call parameters' arguments are simply the arguments which need to
12261 be passed to the call target. They will be lowered according to the
12262 specified calling convention and otherwise handled like a normal call
12263 instruction. The number of arguments must exactly match what is
12264 specified in '# call args'. The types must match the signature of
12267 The 'call parameter' attributes must be followed by two 'i64 0' constants.
12268 These were originally the length prefixes for 'gc transition parameter' and
12269 'deopt parameter' arguments, but the role of these parameter sets have been
12270 entirely replaced with the corresponding operand bundles. In a future
12271 revision, these now redundant arguments will be removed.
12276 A statepoint is assumed to read and write all memory. As a result,
12277 memory operations can not be reordered past a statepoint. It is
12278 illegal to mark a statepoint as being either 'readonly' or 'readnone'.
12280 Note that legal IR can not perform any memory operation on a 'gc
12281 pointer' argument of the statepoint in a location statically reachable
12282 from the statepoint. Instead, the explicitly relocated value (from a
12283 ``gc.relocate``) must be used.
12285 'llvm.experimental.gc.result' Intrinsic
12286 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12294 @llvm.experimental.gc.result(token %statepoint_token)
12299 ``gc.result`` extracts the result of the original call instruction
12300 which was replaced by the ``gc.statepoint``. The ``gc.result``
12301 intrinsic is actually a family of three intrinsics due to an
12302 implementation limitation. Other than the type of the return value,
12303 the semantics are the same.
12308 The first and only argument is the ``gc.statepoint`` which starts
12309 the safepoint sequence of which this ``gc.result`` is a part.
12310 Despite the typing of this as a generic token, *only* the value defined
12311 by a ``gc.statepoint`` is legal here.
12316 The ``gc.result`` represents the return value of the call target of
12317 the ``statepoint``. The type of the ``gc.result`` must exactly match
12318 the type of the target. If the call target returns void, there will
12319 be no ``gc.result``.
12321 A ``gc.result`` is modeled as a 'readnone' pure function. It has no
12322 side effects since it is just a projection of the return value of the
12323 previous call represented by the ``gc.statepoint``.
12325 'llvm.experimental.gc.relocate' Intrinsic
12326 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12333 declare <pointer type>
12334 @llvm.experimental.gc.relocate(token %statepoint_token,
12336 i32 %pointer_offset)
12341 A ``gc.relocate`` returns the potentially relocated value of a pointer
12347 The first argument is the ``gc.statepoint`` which starts the
12348 safepoint sequence of which this ``gc.relocation`` is a part.
12349 Despite the typing of this as a generic token, *only* the value defined
12350 by a ``gc.statepoint`` is legal here.
12352 The second and third arguments are both indices into operands of the
12353 corresponding statepoint's :ref:`gc-live <ob_gc_live>` operand bundle.
12355 The second argument is an index which specifies the allocation for the pointer
12356 being relocated. The associated value must be within the object with which the
12357 pointer being relocated is associated. The optimizer is free to change *which*
12358 interior derived pointer is reported, provided that it does not replace an
12359 actual base pointer with another interior derived pointer. Collectors are
12360 allowed to rely on the base pointer operand remaining an actual base pointer if
12363 The third argument is an index which specify the (potentially) derived pointer
12364 being relocated. It is legal for this index to be the same as the second
12365 argument if-and-only-if a base pointer is being relocated.
12370 The return value of ``gc.relocate`` is the potentially relocated value
12371 of the pointer specified by its arguments. It is unspecified how the
12372 value of the returned pointer relates to the argument to the
12373 ``gc.statepoint`` other than that a) it points to the same source
12374 language object with the same offset, and b) the 'based-on'
12375 relationship of the newly relocated pointers is a projection of the
12376 unrelocated pointers. In particular, the integer value of the pointer
12377 returned is unspecified.
12379 A ``gc.relocate`` is modeled as a ``readnone`` pure function. It has no
12380 side effects since it is just a way to extract information about work
12381 done during the actual call modeled by the ``gc.statepoint``.
12383 .. _gc.get.pointer.base:
12385 'llvm.experimental.gc.get.pointer.base' Intrinsic
12386 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12393 declare <pointer type>
12394 @llvm.experimental.gc.get.pointer.base(
12395 <pointer type> readnone nocapture %derived_ptr)
12396 nounwind readnone willreturn
12401 ``gc.get.pointer.base`` for a derived pointer returns its base pointer.
12406 The only argument is a pointer which is based on some object with
12407 an unknown offset from the base of said object.
12412 This intrinsic is used in the abstract machine model for GC to represent
12413 the base pointer for an arbitrary derived pointer.
12415 This intrinsic is inlined by the :ref:`RewriteStatepointsForGC` pass by
12416 replacing all uses of this callsite with the offset of a derived pointer from
12417 its base pointer value. The replacement is done as part of the lowering to the
12418 explicit statepoint model.
12420 The return pointer type must be the same as the type of the parameter.
12423 'llvm.experimental.gc.get.pointer.offset' Intrinsic
12424 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12432 @llvm.experimental.gc.get.pointer.offset(
12433 <pointer type> readnone nocapture %derived_ptr)
12434 nounwind readnone willreturn
12439 ``gc.get.pointer.offset`` for a derived pointer returns the offset from its
12445 The only argument is a pointer which is based on some object with
12446 an unknown offset from the base of said object.
12451 This intrinsic is used in the abstract machine model for GC to represent
12452 the offset of an arbitrary derived pointer from its base pointer.
12454 This intrinsic is inlined by the :ref:`RewriteStatepointsForGC` pass by
12455 replacing all uses of this callsite with the offset of a derived pointer from
12456 its base pointer value. The replacement is done as part of the lowering to the
12457 explicit statepoint model.
12459 Basically this call calculates difference between the derived pointer and its
12460 base pointer (see :ref:`gc.get.pointer.base`) both ptrtoint casted. But
12461 this cast done outside the :ref:`RewriteStatepointsForGC` pass could result
12462 in the pointers lost for further lowering from the abstract model to the
12463 explicit physical one.
12465 Code Generator Intrinsics
12466 -------------------------
12468 These intrinsics are provided by LLVM to expose special features that
12469 may only be implemented with code generator support.
12471 '``llvm.returnaddress``' Intrinsic
12472 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12479 declare i8* @llvm.returnaddress(i32 <level>)
12484 The '``llvm.returnaddress``' intrinsic attempts to compute a
12485 target-specific value indicating the return address of the current
12486 function or one of its callers.
12491 The argument to this intrinsic indicates which function to return the
12492 address for. Zero indicates the calling function, one indicates its
12493 caller, etc. The argument is **required** to be a constant integer
12499 The '``llvm.returnaddress``' intrinsic either returns a pointer
12500 indicating the return address of the specified call frame, or zero if it
12501 cannot be identified. The value returned by this intrinsic is likely to
12502 be incorrect or 0 for arguments other than zero, so it should only be
12503 used for debugging purposes.
12505 Note that calling this intrinsic does not prevent function inlining or
12506 other aggressive transformations, so the value returned may not be that
12507 of the obvious source-language caller.
12509 '``llvm.addressofreturnaddress``' Intrinsic
12510 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12517 declare i8* @llvm.addressofreturnaddress()
12522 The '``llvm.addressofreturnaddress``' intrinsic returns a target-specific
12523 pointer to the place in the stack frame where the return address of the
12524 current function is stored.
12529 Note that calling this intrinsic does not prevent function inlining or
12530 other aggressive transformations, so the value returned may not be that
12531 of the obvious source-language caller.
12533 This intrinsic is only implemented for x86 and aarch64.
12535 '``llvm.sponentry``' Intrinsic
12536 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12543 declare i8* @llvm.sponentry()
12548 The '``llvm.sponentry``' intrinsic returns the stack pointer value at
12549 the entry of the current function calling this intrinsic.
12554 Note this intrinsic is only verified on AArch64.
12556 '``llvm.frameaddress``' Intrinsic
12557 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12564 declare i8* @llvm.frameaddress(i32 <level>)
12569 The '``llvm.frameaddress``' intrinsic attempts to return the
12570 target-specific frame pointer value for the specified stack frame.
12575 The argument to this intrinsic indicates which function to return the
12576 frame pointer for. Zero indicates the calling function, one indicates
12577 its caller, etc. The argument is **required** to be a constant integer
12583 The '``llvm.frameaddress``' intrinsic either returns a pointer
12584 indicating the frame address of the specified call frame, or zero if it
12585 cannot be identified. The value returned by this intrinsic is likely to
12586 be incorrect or 0 for arguments other than zero, so it should only be
12587 used for debugging purposes.
12589 Note that calling this intrinsic does not prevent function inlining or
12590 other aggressive transformations, so the value returned may not be that
12591 of the obvious source-language caller.
12593 '``llvm.swift.async.context.addr``' Intrinsic
12594 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12601 declare i8** @llvm.swift.async.context.addr()
12606 The '``llvm.swift.async.context.addr``' intrinsic returns a pointer to
12607 the part of the extended frame record containing the asynchronous
12608 context of a Swift execution.
12613 If the caller has a ``swiftasync`` parameter, that argument will initially
12614 be stored at the returned address. If not, it will be initialized to null.
12616 '``llvm.localescape``' and '``llvm.localrecover``' Intrinsics
12617 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12624 declare void @llvm.localescape(...)
12625 declare i8* @llvm.localrecover(i8* %func, i8* %fp, i32 %idx)
12630 The '``llvm.localescape``' intrinsic escapes offsets of a collection of static
12631 allocas, and the '``llvm.localrecover``' intrinsic applies those offsets to a
12632 live frame pointer to recover the address of the allocation. The offset is
12633 computed during frame layout of the caller of ``llvm.localescape``.
12638 All arguments to '``llvm.localescape``' must be pointers to static allocas or
12639 casts of static allocas. Each function can only call '``llvm.localescape``'
12640 once, and it can only do so from the entry block.
12642 The ``func`` argument to '``llvm.localrecover``' must be a constant
12643 bitcasted pointer to a function defined in the current module. The code
12644 generator cannot determine the frame allocation offset of functions defined in
12647 The ``fp`` argument to '``llvm.localrecover``' must be a frame pointer of a
12648 call frame that is currently live. The return value of '``llvm.localaddress``'
12649 is one way to produce such a value, but various runtimes also expose a suitable
12650 pointer in platform-specific ways.
12652 The ``idx`` argument to '``llvm.localrecover``' indicates which alloca passed to
12653 '``llvm.localescape``' to recover. It is zero-indexed.
12658 These intrinsics allow a group of functions to share access to a set of local
12659 stack allocations of a one parent function. The parent function may call the
12660 '``llvm.localescape``' intrinsic once from the function entry block, and the
12661 child functions can use '``llvm.localrecover``' to access the escaped allocas.
12662 The '``llvm.localescape``' intrinsic blocks inlining, as inlining changes where
12663 the escaped allocas are allocated, which would break attempts to use
12664 '``llvm.localrecover``'.
12666 '``llvm.seh.try.begin``' and '``llvm.seh.try.end``' Intrinsics
12667 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12674 declare void @llvm.seh.try.begin()
12675 declare void @llvm.seh.try.end()
12680 The '``llvm.seh.try.begin``' and '``llvm.seh.try.end``' intrinsics mark
12681 the boundary of a _try region for Windows SEH Asynchrous Exception Handling.
12686 When a C-function is compiled with Windows SEH Asynchrous Exception option,
12687 -feh_asynch (aka MSVC -EHa), these two intrinsics are injected to mark _try
12688 boundary and to prevent potential exceptions from being moved across boundary.
12689 Any set of operations can then be confined to the region by reading their leaf
12690 inputs via volatile loads and writing their root outputs via volatile stores.
12692 '``llvm.seh.scope.begin``' and '``llvm.seh.scope.end``' Intrinsics
12693 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12700 declare void @llvm.seh.scope.begin()
12701 declare void @llvm.seh.scope.end()
12706 The '``llvm.seh.scope.begin``' and '``llvm.seh.scope.end``' intrinsics mark
12707 the boundary of a CPP object lifetime for Windows SEH Asynchrous Exception
12708 Handling (MSVC option -EHa).
12713 LLVM's ordinary exception-handling representation associates EH cleanups and
12714 handlers only with ``invoke``s, which normally correspond only to call sites. To
12715 support arbitrary faulting instructions, it must be possible to recover the current
12716 EH scope for any instruction. Turning every operation in LLVM that could fault
12717 into an ``invoke`` of a new, potentially-throwing intrinsic would require adding a
12718 large number of intrinsics, impede optimization of those operations, and make
12719 compilation slower by introducing many extra basic blocks. These intrinsics can
12720 be used instead to mark the region protected by a cleanup, such as for a local
12721 C++ object with a non-trivial destructor. ``llvm.seh.scope.begin`` is used to mark
12722 the start of the region; it is always called with ``invoke``, with the unwind block
12723 being the desired unwind destination for any potentially-throwing instructions
12724 within the region. `llvm.seh.scope.end` is used to mark when the scope ends
12725 and the EH cleanup is no longer required (e.g. because the destructor is being
12728 .. _int_read_register:
12729 .. _int_read_volatile_register:
12730 .. _int_write_register:
12732 '``llvm.read_register``', '``llvm.read_volatile_register``', and '``llvm.write_register``' Intrinsics
12733 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12740 declare i32 @llvm.read_register.i32(metadata)
12741 declare i64 @llvm.read_register.i64(metadata)
12742 declare i32 @llvm.read_volatile_register.i32(metadata)
12743 declare i64 @llvm.read_volatile_register.i64(metadata)
12744 declare void @llvm.write_register.i32(metadata, i32 @value)
12745 declare void @llvm.write_register.i64(metadata, i64 @value)
12751 The '``llvm.read_register``', '``llvm.read_volatile_register``', and
12752 '``llvm.write_register``' intrinsics provide access to the named register.
12753 The register must be valid on the architecture being compiled to. The type
12754 needs to be compatible with the register being read.
12759 The '``llvm.read_register``' and '``llvm.read_volatile_register``' intrinsics
12760 return the current value of the register, where possible. The
12761 '``llvm.write_register``' intrinsic sets the current value of the register,
12764 A call to '``llvm.read_volatile_register``' is assumed to have side-effects
12765 and possibly return a different value each time (e.g. for a timer register).
12767 This is useful to implement named register global variables that need
12768 to always be mapped to a specific register, as is common practice on
12769 bare-metal programs including OS kernels.
12771 The compiler doesn't check for register availability or use of the used
12772 register in surrounding code, including inline assembly. Because of that,
12773 allocatable registers are not supported.
12775 Warning: So far it only works with the stack pointer on selected
12776 architectures (ARM, AArch64, PowerPC and x86_64). Significant amount of
12777 work is needed to support other registers and even more so, allocatable
12782 '``llvm.stacksave``' Intrinsic
12783 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12790 declare i8* @llvm.stacksave()
12795 The '``llvm.stacksave``' intrinsic is used to remember the current state
12796 of the function stack, for use with
12797 :ref:`llvm.stackrestore <int_stackrestore>`. This is useful for
12798 implementing language features like scoped automatic variable sized
12804 This intrinsic returns an opaque pointer value that can be passed to
12805 :ref:`llvm.stackrestore <int_stackrestore>`. When an
12806 ``llvm.stackrestore`` intrinsic is executed with a value saved from
12807 ``llvm.stacksave``, it effectively restores the state of the stack to
12808 the state it was in when the ``llvm.stacksave`` intrinsic executed. In
12809 practice, this pops any :ref:`alloca <i_alloca>` blocks from the stack that
12810 were allocated after the ``llvm.stacksave`` was executed.
12812 .. _int_stackrestore:
12814 '``llvm.stackrestore``' Intrinsic
12815 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12822 declare void @llvm.stackrestore(i8* %ptr)
12827 The '``llvm.stackrestore``' intrinsic is used to restore the state of
12828 the function stack to the state it was in when the corresponding
12829 :ref:`llvm.stacksave <int_stacksave>` intrinsic executed. This is
12830 useful for implementing language features like scoped automatic variable
12831 sized arrays in C99.
12836 See the description for :ref:`llvm.stacksave <int_stacksave>`.
12838 .. _int_get_dynamic_area_offset:
12840 '``llvm.get.dynamic.area.offset``' Intrinsic
12841 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12848 declare i32 @llvm.get.dynamic.area.offset.i32()
12849 declare i64 @llvm.get.dynamic.area.offset.i64()
12854 The '``llvm.get.dynamic.area.offset.*``' intrinsic family is used to
12855 get the offset from native stack pointer to the address of the most
12856 recent dynamic alloca on the caller's stack. These intrinsics are
12857 intended for use in combination with
12858 :ref:`llvm.stacksave <int_stacksave>` to get a
12859 pointer to the most recent dynamic alloca. This is useful, for example,
12860 for AddressSanitizer's stack unpoisoning routines.
12865 These intrinsics return a non-negative integer value that can be used to
12866 get the address of the most recent dynamic alloca, allocated by :ref:`alloca <i_alloca>`
12867 on the caller's stack. In particular, for targets where stack grows downwards,
12868 adding this offset to the native stack pointer would get the address of the most
12869 recent dynamic alloca. For targets where stack grows upwards, the situation is a bit more
12870 complicated, because subtracting this value from stack pointer would get the address
12871 one past the end of the most recent dynamic alloca.
12873 Although for most targets `llvm.get.dynamic.area.offset <int_get_dynamic_area_offset>`
12874 returns just a zero, for others, such as PowerPC and PowerPC64, it returns a
12875 compile-time-known constant value.
12877 The return value type of :ref:`llvm.get.dynamic.area.offset <int_get_dynamic_area_offset>`
12878 must match the target's default address space's (address space 0) pointer type.
12880 '``llvm.prefetch``' Intrinsic
12881 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12888 declare void @llvm.prefetch(i8* <address>, i32 <rw>, i32 <locality>, i32 <cache type>)
12893 The '``llvm.prefetch``' intrinsic is a hint to the code generator to
12894 insert a prefetch instruction if supported; otherwise, it is a noop.
12895 Prefetches have no effect on the behavior of the program but can change
12896 its performance characteristics.
12901 ``address`` is the address to be prefetched, ``rw`` is the specifier
12902 determining if the fetch should be for a read (0) or write (1), and
12903 ``locality`` is a temporal locality specifier ranging from (0) - no
12904 locality, to (3) - extremely local keep in cache. The ``cache type``
12905 specifies whether the prefetch is performed on the data (1) or
12906 instruction (0) cache. The ``rw``, ``locality`` and ``cache type``
12907 arguments must be constant integers.
12912 This intrinsic does not modify the behavior of the program. In
12913 particular, prefetches cannot trap and do not produce a value. On
12914 targets that support this intrinsic, the prefetch can provide hints to
12915 the processor cache for better performance.
12917 '``llvm.pcmarker``' Intrinsic
12918 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12925 declare void @llvm.pcmarker(i32 <id>)
12930 The '``llvm.pcmarker``' intrinsic is a method to export a Program
12931 Counter (PC) in a region of code to simulators and other tools. The
12932 method is target specific, but it is expected that the marker will use
12933 exported symbols to transmit the PC of the marker. The marker makes no
12934 guarantees that it will remain with any specific instruction after
12935 optimizations. It is possible that the presence of a marker will inhibit
12936 optimizations. The intended use is to be inserted after optimizations to
12937 allow correlations of simulation runs.
12942 ``id`` is a numerical id identifying the marker.
12947 This intrinsic does not modify the behavior of the program. Backends
12948 that do not support this intrinsic may ignore it.
12950 '``llvm.readcyclecounter``' Intrinsic
12951 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12958 declare i64 @llvm.readcyclecounter()
12963 The '``llvm.readcyclecounter``' intrinsic provides access to the cycle
12964 counter register (or similar low latency, high accuracy clocks) on those
12965 targets that support it. On X86, it should map to RDTSC. On Alpha, it
12966 should map to RPCC. As the backing counters overflow quickly (on the
12967 order of 9 seconds on alpha), this should only be used for small
12973 When directly supported, reading the cycle counter should not modify any
12974 memory. Implementations are allowed to either return an application
12975 specific value or a system wide value. On backends without support, this
12976 is lowered to a constant 0.
12978 Note that runtime support may be conditional on the privilege-level code is
12979 running at and the host platform.
12981 '``llvm.clear_cache``' Intrinsic
12982 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12989 declare void @llvm.clear_cache(i8*, i8*)
12994 The '``llvm.clear_cache``' intrinsic ensures visibility of modifications
12995 in the specified range to the execution unit of the processor. On
12996 targets with non-unified instruction and data cache, the implementation
12997 flushes the instruction cache.
13002 On platforms with coherent instruction and data caches (e.g. x86), this
13003 intrinsic is a nop. On platforms with non-coherent instruction and data
13004 cache (e.g. ARM, MIPS), the intrinsic is lowered either to appropriate
13005 instructions or a system call, if cache flushing requires special
13008 The default behavior is to emit a call to ``__clear_cache`` from the run
13011 This intrinsic does *not* empty the instruction pipeline. Modifications
13012 of the current function are outside the scope of the intrinsic.
13014 '``llvm.instrprof.increment``' Intrinsic
13015 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
13022 declare void @llvm.instrprof.increment(i8* <name>, i64 <hash>,
13023 i32 <num-counters>, i32 <index>)
13028 The '``llvm.instrprof.increment``' intrinsic can be emitted by a
13029 frontend for use with instrumentation based profiling. These will be
13030 lowered by the ``-instrprof`` pass to generate execution counts of a
13031 program at runtime.
13036 The first argument is a pointer to a global variable containing the
13037 name of the entity being instrumented. This should generally be the
13038 (mangled) function name for a set of counters.
13040 The second argument is a hash value that can be used by the consumer
13041 of the profile data to detect changes to the instrumented source, and
13042 the third is the number of counters associated with ``name``. It is an
13043 error if ``hash`` or ``num-counters`` differ between two instances of
13044 ``instrprof.increment`` that refer to the same name.
13046 The last argument refers to which of the counters for ``name`` should
13047 be incremented. It should be a value between 0 and ``num-counters``.
13052 This intrinsic represents an increment of a profiling counter. It will
13053 cause the ``-instrprof`` pass to generate the appropriate data
13054 structures and the code to increment the appropriate value, in a
13055 format that can be written out by a compiler runtime and consumed via
13056 the ``llvm-profdata`` tool.
13058 '``llvm.instrprof.increment.step``' Intrinsic
13059 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
13066 declare void @llvm.instrprof.increment.step(i8* <name>, i64 <hash>,
13067 i32 <num-counters>,
13068 i32 <index>, i64 <step>)
13073 The '``llvm.instrprof.increment.step``' intrinsic is an extension to
13074 the '``llvm.instrprof.increment``' intrinsic with an additional fifth
13075 argument to specify the step of the increment.
13079 The first four arguments are the same as '``llvm.instrprof.increment``'
13082 The last argument specifies the value of the increment of the counter variable.
13086 See description of '``llvm.instrprof.increment``' intrinsic.
13089 '``llvm.instrprof.value.profile``' Intrinsic
13090 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
13097 declare void @llvm.instrprof.value.profile(i8* <name>, i64 <hash>,
13098 i64 <value>, i32 <value_kind>,
13104 The '``llvm.instrprof.value.profile``' intrinsic can be emitted by a
13105 frontend for use with instrumentation based profiling. This will be
13106 lowered by the ``-instrprof`` pass to find out the target values,
13107 instrumented expressions take in a program at runtime.
13112 The first argument is a pointer to a global variable containing the
13113 name of the entity being instrumented. ``name`` should generally be the
13114 (mangled) function name for a set of counters.
13116 The second argument is a hash value that can be used by the consumer
13117 of the profile data to detect changes to the instrumented source. It
13118 is an error if ``hash`` differs between two instances of
13119 ``llvm.instrprof.*`` that refer to the same name.
13121 The third argument is the value of the expression being profiled. The profiled
13122 expression's value should be representable as an unsigned 64-bit value. The
13123 fourth argument represents the kind of value profiling that is being done. The
13124 supported value profiling kinds are enumerated through the
13125 ``InstrProfValueKind`` type declared in the
13126 ``<include/llvm/ProfileData/InstrProf.h>`` header file. The last argument is the
13127 index of the instrumented expression within ``name``. It should be >= 0.
13132 This intrinsic represents the point where a call to a runtime routine
13133 should be inserted for value profiling of target expressions. ``-instrprof``
13134 pass will generate the appropriate data structures and replace the
13135 ``llvm.instrprof.value.profile`` intrinsic with the call to the profile
13136 runtime library with proper arguments.
13138 '``llvm.thread.pointer``' Intrinsic
13139 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
13146 declare i8* @llvm.thread.pointer()
13151 The '``llvm.thread.pointer``' intrinsic returns the value of the thread
13157 The '``llvm.thread.pointer``' intrinsic returns a pointer to the TLS area
13158 for the current thread. The exact semantics of this value are target
13159 specific: it may point to the start of TLS area, to the end, or somewhere
13160 in the middle. Depending on the target, this intrinsic may read a register,
13161 call a helper function, read from an alternate memory space, or perform
13162 other operations necessary to locate the TLS area. Not all targets support
13165 '``llvm.call.preallocated.setup``' Intrinsic
13166 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
13173 declare token @llvm.call.preallocated.setup(i32 %num_args)
13178 The '``llvm.call.preallocated.setup``' intrinsic returns a token which can
13179 be used with a call's ``"preallocated"`` operand bundle to indicate that
13180 certain arguments are allocated and initialized before the call.
13185 The '``llvm.call.preallocated.setup``' intrinsic returns a token which is
13186 associated with at most one call. The token can be passed to
13187 '``@llvm.call.preallocated.arg``' to get a pointer to get that
13188 corresponding argument. The token must be the parameter to a
13189 ``"preallocated"`` operand bundle for the corresponding call.
13191 Nested calls to '``llvm.call.preallocated.setup``' are allowed, but must
13192 be properly nested. e.g.
13194 :: code-block:: llvm
13196 %t1 = call token @llvm.call.preallocated.setup(i32 0)
13197 %t2 = call token @llvm.call.preallocated.setup(i32 0)
13198 call void foo() ["preallocated"(token %t2)]
13199 call void foo() ["preallocated"(token %t1)]
13201 is allowed, but not
13203 :: code-block:: llvm
13205 %t1 = call token @llvm.call.preallocated.setup(i32 0)
13206 %t2 = call token @llvm.call.preallocated.setup(i32 0)
13207 call void foo() ["preallocated"(token %t1)]
13208 call void foo() ["preallocated"(token %t2)]
13210 .. _int_call_preallocated_arg:
13212 '``llvm.call.preallocated.arg``' Intrinsic
13213 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
13220 declare i8* @llvm.call.preallocated.arg(token %setup_token, i32 %arg_index)
13225 The '``llvm.call.preallocated.arg``' intrinsic returns a pointer to the
13226 corresponding preallocated argument for the preallocated call.
13231 The '``llvm.call.preallocated.arg``' intrinsic returns a pointer to the
13232 ``%arg_index``th argument with the ``preallocated`` attribute for
13233 the call associated with the ``%setup_token``, which must be from
13234 '``llvm.call.preallocated.setup``'.
13236 A call to '``llvm.call.preallocated.arg``' must have a call site
13237 ``preallocated`` attribute. The type of the ``preallocated`` attribute must
13238 match the type used by the ``preallocated`` attribute of the corresponding
13239 argument at the preallocated call. The type is used in the case that an
13240 ``llvm.call.preallocated.setup`` does not have a corresponding call (e.g. due
13241 to DCE), where otherwise we cannot know how large the arguments are.
13243 It is undefined behavior if this is called with a token from an
13244 '``llvm.call.preallocated.setup``' if another
13245 '``llvm.call.preallocated.setup``' has already been called or if the
13246 preallocated call corresponding to the '``llvm.call.preallocated.setup``'
13247 has already been called.
13249 .. _int_call_preallocated_teardown:
13251 '``llvm.call.preallocated.teardown``' Intrinsic
13252 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
13259 declare i8* @llvm.call.preallocated.teardown(token %setup_token)
13264 The '``llvm.call.preallocated.teardown``' intrinsic cleans up the stack
13265 created by a '``llvm.call.preallocated.setup``'.
13270 The token argument must be a '``llvm.call.preallocated.setup``'.
13272 The '``llvm.call.preallocated.teardown``' intrinsic cleans up the stack
13273 allocated by the corresponding '``llvm.call.preallocated.setup``'. Exactly
13274 one of this or the preallocated call must be called to prevent stack leaks.
13275 It is undefined behavior to call both a '``llvm.call.preallocated.teardown``'
13276 and the preallocated call for a given '``llvm.call.preallocated.setup``'.
13278 For example, if the stack is allocated for a preallocated call by a
13279 '``llvm.call.preallocated.setup``', then an initializer function called on an
13280 allocated argument throws an exception, there should be a
13281 '``llvm.call.preallocated.teardown``' in the exception handler to prevent
13284 Following the nesting rules in '``llvm.call.preallocated.setup``', nested
13285 calls to '``llvm.call.preallocated.setup``' and
13286 '``llvm.call.preallocated.teardown``' are allowed but must be properly
13292 .. code-block:: llvm
13294 %cs = call token @llvm.call.preallocated.setup(i32 1)
13295 %x = call i8* @llvm.call.preallocated.arg(token %cs, i32 0) preallocated(i32)
13296 %y = bitcast i8* %x to i32*
13297 invoke void @constructor(i32* %y) to label %conta unwind label %contb
13299 call void @foo1(i32* preallocated(i32) %y) ["preallocated"(token %cs)]
13302 %s = catchswitch within none [label %catch] unwind to caller
13304 %p = catchpad within %s []
13305 call void @llvm.call.preallocated.teardown(token %cs)
13308 Standard C/C++ Library Intrinsics
13309 ---------------------------------
13311 LLVM provides intrinsics for a few important standard C/C++ library
13312 functions. These intrinsics allow source-language front-ends to pass
13313 information about the alignment of the pointer arguments to the code
13314 generator, providing opportunity for more efficient code generation.
13317 '``llvm.abs.*``' Intrinsic
13318 ^^^^^^^^^^^^^^^^^^^^^^^^^^
13323 This is an overloaded intrinsic. You can use ``llvm.abs`` on any
13324 integer bit width or any vector of integer elements.
13328 declare i32 @llvm.abs.i32(i32 <src>, i1 <is_int_min_poison>)
13329 declare <4 x i32> @llvm.abs.v4i32(<4 x i32> <src>, i1 <is_int_min_poison>)
13334 The '``llvm.abs``' family of intrinsic functions returns the absolute value
13340 The first argument is the value for which the absolute value is to be returned.
13341 This argument may be of any integer type or a vector with integer element type.
13342 The return type must match the first argument type.
13344 The second argument must be a constant and is a flag to indicate whether the
13345 result value of the '``llvm.abs``' intrinsic is a
13346 :ref:`poison value <poisonvalues>` if the argument is statically or dynamically
13347 an ``INT_MIN`` value.
13352 The '``llvm.abs``' intrinsic returns the magnitude (always positive) of the
13353 argument or each element of a vector argument.". If the argument is ``INT_MIN``,
13354 then the result is also ``INT_MIN`` if ``is_int_min_poison == 0`` and
13355 ``poison`` otherwise.
13358 '``llvm.smax.*``' Intrinsic
13359 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
13364 This is an overloaded intrinsic. You can use ``@llvm.smax`` on any
13365 integer bit width or any vector of integer elements.
13369 declare i32 @llvm.smax.i32(i32 %a, i32 %b)
13370 declare <4 x i32> @llvm.smax.v4i32(<4 x i32> %a, <4 x i32> %b)
13375 Return the larger of ``%a`` and ``%b`` comparing the values as signed integers.
13376 Vector intrinsics operate on a per-element basis. The larger element of ``%a``
13377 and ``%b`` at a given index is returned for that index.
13382 The arguments (``%a`` and ``%b``) may be of any integer type or a vector with
13383 integer element type. The argument types must match each other, and the return
13384 type must match the argument type.
13387 '``llvm.smin.*``' Intrinsic
13388 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
13393 This is an overloaded intrinsic. You can use ``@llvm.smin`` on any
13394 integer bit width or any vector of integer elements.
13398 declare i32 @llvm.smin.i32(i32 %a, i32 %b)
13399 declare <4 x i32> @llvm.smin.v4i32(<4 x i32> %a, <4 x i32> %b)
13404 Return the smaller of ``%a`` and ``%b`` comparing the values as signed integers.
13405 Vector intrinsics operate on a per-element basis. The smaller element of ``%a``
13406 and ``%b`` at a given index is returned for that index.
13411 The arguments (``%a`` and ``%b``) may be of any integer type or a vector with
13412 integer element type. The argument types must match each other, and the return
13413 type must match the argument type.
13416 '``llvm.umax.*``' Intrinsic
13417 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
13422 This is an overloaded intrinsic. You can use ``@llvm.umax`` on any
13423 integer bit width or any vector of integer elements.
13427 declare i32 @llvm.umax.i32(i32 %a, i32 %b)
13428 declare <4 x i32> @llvm.umax.v4i32(<4 x i32> %a, <4 x i32> %b)
13433 Return the larger of ``%a`` and ``%b`` comparing the values as unsigned
13434 integers. Vector intrinsics operate on a per-element basis. The larger element
13435 of ``%a`` and ``%b`` at a given index is returned for that index.
13440 The arguments (``%a`` and ``%b``) may be of any integer type or a vector with
13441 integer element type. The argument types must match each other, and the return
13442 type must match the argument type.
13445 '``llvm.umin.*``' Intrinsic
13446 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
13451 This is an overloaded intrinsic. You can use ``@llvm.umin`` on any
13452 integer bit width or any vector of integer elements.
13456 declare i32 @llvm.umin.i32(i32 %a, i32 %b)
13457 declare <4 x i32> @llvm.umin.v4i32(<4 x i32> %a, <4 x i32> %b)
13462 Return the smaller of ``%a`` and ``%b`` comparing the values as unsigned
13463 integers. Vector intrinsics operate on a per-element basis. The smaller element
13464 of ``%a`` and ``%b`` at a given index is returned for that index.
13469 The arguments (``%a`` and ``%b``) may be of any integer type or a vector with
13470 integer element type. The argument types must match each other, and the return
13471 type must match the argument type.
13476 '``llvm.memcpy``' Intrinsic
13477 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
13482 This is an overloaded intrinsic. You can use ``llvm.memcpy`` on any
13483 integer bit width and for different address spaces. Not all targets
13484 support all bit widths however.
13488 declare void @llvm.memcpy.p0i8.p0i8.i32(i8* <dest>, i8* <src>,
13489 i32 <len>, i1 <isvolatile>)
13490 declare void @llvm.memcpy.p0i8.p0i8.i64(i8* <dest>, i8* <src>,
13491 i64 <len>, i1 <isvolatile>)
13496 The '``llvm.memcpy.*``' intrinsics copy a block of memory from the
13497 source location to the destination location.
13499 Note that, unlike the standard libc function, the ``llvm.memcpy.*``
13500 intrinsics do not return a value, takes extra isvolatile
13501 arguments and the pointers can be in specified address spaces.
13506 The first argument is a pointer to the destination, the second is a
13507 pointer to the source. The third argument is an integer argument
13508 specifying the number of bytes to copy, and the fourth is a
13509 boolean indicating a volatile access.
13511 The :ref:`align <attr_align>` parameter attribute can be provided
13512 for the first and second arguments.
13514 If the ``isvolatile`` parameter is ``true``, the ``llvm.memcpy`` call is
13515 a :ref:`volatile operation <volatile>`. The detailed access behavior is not
13516 very cleanly specified and it is unwise to depend on it.
13521 The '``llvm.memcpy.*``' intrinsics copy a block of memory from the source
13522 location to the destination location, which must either be equal or
13523 non-overlapping. It copies "len" bytes of memory over. If the argument is known
13524 to be aligned to some boundary, this can be specified as an attribute on the
13527 If ``<len>`` is 0, it is no-op modulo the behavior of attributes attached to
13529 If ``<len>`` is not a well-defined value, the behavior is undefined.
13530 If ``<len>`` is not zero, both ``<dest>`` and ``<src>`` should be well-defined,
13531 otherwise the behavior is undefined.
13533 .. _int_memcpy_inline:
13535 '``llvm.memcpy.inline``' Intrinsic
13536 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
13541 This is an overloaded intrinsic. You can use ``llvm.memcpy.inline`` on any
13542 integer bit width and for different address spaces. Not all targets
13543 support all bit widths however.
13547 declare void @llvm.memcpy.inline.p0i8.p0i8.i32(i8* <dest>, i8* <src>,
13548 i32 <len>, i1 <isvolatile>)
13549 declare void @llvm.memcpy.inline.p0i8.p0i8.i64(i8* <dest>, i8* <src>,
13550 i64 <len>, i1 <isvolatile>)
13555 The '``llvm.memcpy.inline.*``' intrinsics copy a block of memory from the
13556 source location to the destination location and guarantees that no external
13557 functions are called.
13559 Note that, unlike the standard libc function, the ``llvm.memcpy.inline.*``
13560 intrinsics do not return a value, takes extra isvolatile
13561 arguments and the pointers can be in specified address spaces.
13566 The first argument is a pointer to the destination, the second is a
13567 pointer to the source. The third argument is a constant integer argument
13568 specifying the number of bytes to copy, and the fourth is a
13569 boolean indicating a volatile access.
13571 The :ref:`align <attr_align>` parameter attribute can be provided
13572 for the first and second arguments.
13574 If the ``isvolatile`` parameter is ``true``, the ``llvm.memcpy.inline`` call is
13575 a :ref:`volatile operation <volatile>`. The detailed access behavior is not
13576 very cleanly specified and it is unwise to depend on it.
13581 The '``llvm.memcpy.inline.*``' intrinsics copy a block of memory from the
13582 source location to the destination location, which are not allowed to
13583 overlap. It copies "len" bytes of memory over. If the argument is known
13584 to be aligned to some boundary, this can be specified as an attribute on
13586 The behavior of '``llvm.memcpy.inline.*``' is equivalent to the behavior of
13587 '``llvm.memcpy.*``', but the generated code is guaranteed not to call any
13588 external functions.
13592 '``llvm.memmove``' Intrinsic
13593 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
13598 This is an overloaded intrinsic. You can use llvm.memmove on any integer
13599 bit width and for different address space. Not all targets support all
13600 bit widths however.
13604 declare void @llvm.memmove.p0i8.p0i8.i32(i8* <dest>, i8* <src>,
13605 i32 <len>, i1 <isvolatile>)
13606 declare void @llvm.memmove.p0i8.p0i8.i64(i8* <dest>, i8* <src>,
13607 i64 <len>, i1 <isvolatile>)
13612 The '``llvm.memmove.*``' intrinsics move a block of memory from the
13613 source location to the destination location. It is similar to the
13614 '``llvm.memcpy``' intrinsic but allows the two memory locations to
13617 Note that, unlike the standard libc function, the ``llvm.memmove.*``
13618 intrinsics do not return a value, takes an extra isvolatile
13619 argument and the pointers can be in specified address spaces.
13624 The first argument is a pointer to the destination, the second is a
13625 pointer to the source. The third argument is an integer argument
13626 specifying the number of bytes to copy, and the fourth is a
13627 boolean indicating a volatile access.
13629 The :ref:`align <attr_align>` parameter attribute can be provided
13630 for the first and second arguments.
13632 If the ``isvolatile`` parameter is ``true``, the ``llvm.memmove`` call
13633 is a :ref:`volatile operation <volatile>`. The detailed access behavior is
13634 not very cleanly specified and it is unwise to depend on it.
13639 The '``llvm.memmove.*``' intrinsics copy a block of memory from the
13640 source location to the destination location, which may overlap. It
13641 copies "len" bytes of memory over. If the argument is known to be
13642 aligned to some boundary, this can be specified as an attribute on
13645 If ``<len>`` is 0, it is no-op modulo the behavior of attributes attached to
13647 If ``<len>`` is not a well-defined value, the behavior is undefined.
13648 If ``<len>`` is not zero, both ``<dest>`` and ``<src>`` should be well-defined,
13649 otherwise the behavior is undefined.
13653 '``llvm.memset.*``' Intrinsics
13654 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
13659 This is an overloaded intrinsic. You can use llvm.memset on any integer
13660 bit width and for different address spaces. However, not all targets
13661 support all bit widths.
13665 declare void @llvm.memset.p0i8.i32(i8* <dest>, i8 <val>,
13666 i32 <len>, i1 <isvolatile>)
13667 declare void @llvm.memset.p0i8.i64(i8* <dest>, i8 <val>,
13668 i64 <len>, i1 <isvolatile>)
13673 The '``llvm.memset.*``' intrinsics fill a block of memory with a
13674 particular byte value.
13676 Note that, unlike the standard libc function, the ``llvm.memset``
13677 intrinsic does not return a value and takes an extra volatile
13678 argument. Also, the destination can be in an arbitrary address space.
13683 The first argument is a pointer to the destination to fill, the second
13684 is the byte value with which to fill it, the third argument is an
13685 integer argument specifying the number of bytes to fill, and the fourth
13686 is a boolean indicating a volatile access.
13688 The :ref:`align <attr_align>` parameter attribute can be provided
13689 for the first arguments.
13691 If the ``isvolatile`` parameter is ``true``, the ``llvm.memset`` call is
13692 a :ref:`volatile operation <volatile>`. The detailed access behavior is not
13693 very cleanly specified and it is unwise to depend on it.
13698 The '``llvm.memset.*``' intrinsics fill "len" bytes of memory starting
13699 at the destination location. If the argument is known to be
13700 aligned to some boundary, this can be specified as an attribute on
13703 If ``<len>`` is 0, it is no-op modulo the behavior of attributes attached to
13705 If ``<len>`` is not a well-defined value, the behavior is undefined.
13706 If ``<len>`` is not zero, both ``<dest>`` and ``<src>`` should be well-defined,
13707 otherwise the behavior is undefined.
13709 '``llvm.sqrt.*``' Intrinsic
13710 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
13715 This is an overloaded intrinsic. You can use ``llvm.sqrt`` on any
13716 floating-point or vector of floating-point type. Not all targets support
13721 declare float @llvm.sqrt.f32(float %Val)
13722 declare double @llvm.sqrt.f64(double %Val)
13723 declare x86_fp80 @llvm.sqrt.f80(x86_fp80 %Val)
13724 declare fp128 @llvm.sqrt.f128(fp128 %Val)
13725 declare ppc_fp128 @llvm.sqrt.ppcf128(ppc_fp128 %Val)
13730 The '``llvm.sqrt``' intrinsics return the square root of the specified value.
13735 The argument and return value are floating-point numbers of the same type.
13740 Return the same value as a corresponding libm '``sqrt``' function but without
13741 trapping or setting ``errno``. For types specified by IEEE-754, the result
13742 matches a conforming libm implementation.
13744 When specified with the fast-math-flag 'afn', the result may be approximated
13745 using a less accurate calculation.
13747 '``llvm.powi.*``' Intrinsic
13748 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
13753 This is an overloaded intrinsic. You can use ``llvm.powi`` on any
13754 floating-point or vector of floating-point type. Not all targets support
13757 Generally, the only supported type for the exponent is the one matching
13758 with the C type ``int``.
13762 declare float @llvm.powi.f32.i32(float %Val, i32 %power)
13763 declare double @llvm.powi.f64.i16(double %Val, i16 %power)
13764 declare x86_fp80 @llvm.powi.f80.i32(x86_fp80 %Val, i32 %power)
13765 declare fp128 @llvm.powi.f128.i32(fp128 %Val, i32 %power)
13766 declare ppc_fp128 @llvm.powi.ppcf128.i32(ppc_fp128 %Val, i32 %power)
13771 The '``llvm.powi.*``' intrinsics return the first operand raised to the
13772 specified (positive or negative) power. The order of evaluation of
13773 multiplications is not defined. When a vector of floating-point type is
13774 used, the second argument remains a scalar integer value.
13779 The second argument is an integer power, and the first is a value to
13780 raise to that power.
13785 This function returns the first value raised to the second power with an
13786 unspecified sequence of rounding operations.
13788 '``llvm.sin.*``' Intrinsic
13789 ^^^^^^^^^^^^^^^^^^^^^^^^^^
13794 This is an overloaded intrinsic. You can use ``llvm.sin`` on any
13795 floating-point or vector of floating-point type. Not all targets support
13800 declare float @llvm.sin.f32(float %Val)
13801 declare double @llvm.sin.f64(double %Val)
13802 declare x86_fp80 @llvm.sin.f80(x86_fp80 %Val)
13803 declare fp128 @llvm.sin.f128(fp128 %Val)
13804 declare ppc_fp128 @llvm.sin.ppcf128(ppc_fp128 %Val)
13809 The '``llvm.sin.*``' intrinsics return the sine of the operand.
13814 The argument and return value are floating-point numbers of the same type.
13819 Return the same value as a corresponding libm '``sin``' function but without
13820 trapping or setting ``errno``.
13822 When specified with the fast-math-flag 'afn', the result may be approximated
13823 using a less accurate calculation.
13825 '``llvm.cos.*``' Intrinsic
13826 ^^^^^^^^^^^^^^^^^^^^^^^^^^
13831 This is an overloaded intrinsic. You can use ``llvm.cos`` on any
13832 floating-point or vector of floating-point type. Not all targets support
13837 declare float @llvm.cos.f32(float %Val)
13838 declare double @llvm.cos.f64(double %Val)
13839 declare x86_fp80 @llvm.cos.f80(x86_fp80 %Val)
13840 declare fp128 @llvm.cos.f128(fp128 %Val)
13841 declare ppc_fp128 @llvm.cos.ppcf128(ppc_fp128 %Val)
13846 The '``llvm.cos.*``' intrinsics return the cosine of the operand.
13851 The argument and return value are floating-point numbers of the same type.
13856 Return the same value as a corresponding libm '``cos``' function but without
13857 trapping or setting ``errno``.
13859 When specified with the fast-math-flag 'afn', the result may be approximated
13860 using a less accurate calculation.
13862 '``llvm.pow.*``' Intrinsic
13863 ^^^^^^^^^^^^^^^^^^^^^^^^^^
13868 This is an overloaded intrinsic. You can use ``llvm.pow`` on any
13869 floating-point or vector of floating-point type. Not all targets support
13874 declare float @llvm.pow.f32(float %Val, float %Power)
13875 declare double @llvm.pow.f64(double %Val, double %Power)
13876 declare x86_fp80 @llvm.pow.f80(x86_fp80 %Val, x86_fp80 %Power)
13877 declare fp128 @llvm.pow.f128(fp128 %Val, fp128 %Power)
13878 declare ppc_fp128 @llvm.pow.ppcf128(ppc_fp128 %Val, ppc_fp128 Power)
13883 The '``llvm.pow.*``' intrinsics return the first operand raised to the
13884 specified (positive or negative) power.
13889 The arguments and return value are floating-point numbers of the same type.
13894 Return the same value as a corresponding libm '``pow``' function but without
13895 trapping or setting ``errno``.
13897 When specified with the fast-math-flag 'afn', the result may be approximated
13898 using a less accurate calculation.
13900 '``llvm.exp.*``' Intrinsic
13901 ^^^^^^^^^^^^^^^^^^^^^^^^^^
13906 This is an overloaded intrinsic. You can use ``llvm.exp`` on any
13907 floating-point or vector of floating-point type. Not all targets support
13912 declare float @llvm.exp.f32(float %Val)
13913 declare double @llvm.exp.f64(double %Val)
13914 declare x86_fp80 @llvm.exp.f80(x86_fp80 %Val)
13915 declare fp128 @llvm.exp.f128(fp128 %Val)
13916 declare ppc_fp128 @llvm.exp.ppcf128(ppc_fp128 %Val)
13921 The '``llvm.exp.*``' intrinsics compute the base-e exponential of the specified
13927 The argument and return value are floating-point numbers of the same type.
13932 Return the same value as a corresponding libm '``exp``' function but without
13933 trapping or setting ``errno``.
13935 When specified with the fast-math-flag 'afn', the result may be approximated
13936 using a less accurate calculation.
13938 '``llvm.exp2.*``' Intrinsic
13939 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
13944 This is an overloaded intrinsic. You can use ``llvm.exp2`` on any
13945 floating-point or vector of floating-point type. Not all targets support
13950 declare float @llvm.exp2.f32(float %Val)
13951 declare double @llvm.exp2.f64(double %Val)
13952 declare x86_fp80 @llvm.exp2.f80(x86_fp80 %Val)
13953 declare fp128 @llvm.exp2.f128(fp128 %Val)
13954 declare ppc_fp128 @llvm.exp2.ppcf128(ppc_fp128 %Val)
13959 The '``llvm.exp2.*``' intrinsics compute the base-2 exponential of the
13965 The argument and return value are floating-point numbers of the same type.
13970 Return the same value as a corresponding libm '``exp2``' function but without
13971 trapping or setting ``errno``.
13973 When specified with the fast-math-flag 'afn', the result may be approximated
13974 using a less accurate calculation.
13976 '``llvm.log.*``' Intrinsic
13977 ^^^^^^^^^^^^^^^^^^^^^^^^^^
13982 This is an overloaded intrinsic. You can use ``llvm.log`` on any
13983 floating-point or vector of floating-point type. Not all targets support
13988 declare float @llvm.log.f32(float %Val)
13989 declare double @llvm.log.f64(double %Val)
13990 declare x86_fp80 @llvm.log.f80(x86_fp80 %Val)
13991 declare fp128 @llvm.log.f128(fp128 %Val)
13992 declare ppc_fp128 @llvm.log.ppcf128(ppc_fp128 %Val)
13997 The '``llvm.log.*``' intrinsics compute the base-e logarithm of the specified
14003 The argument and return value are floating-point numbers of the same type.
14008 Return the same value as a corresponding libm '``log``' function but without
14009 trapping or setting ``errno``.
14011 When specified with the fast-math-flag 'afn', the result may be approximated
14012 using a less accurate calculation.
14014 '``llvm.log10.*``' Intrinsic
14015 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14020 This is an overloaded intrinsic. You can use ``llvm.log10`` on any
14021 floating-point or vector of floating-point type. Not all targets support
14026 declare float @llvm.log10.f32(float %Val)
14027 declare double @llvm.log10.f64(double %Val)
14028 declare x86_fp80 @llvm.log10.f80(x86_fp80 %Val)
14029 declare fp128 @llvm.log10.f128(fp128 %Val)
14030 declare ppc_fp128 @llvm.log10.ppcf128(ppc_fp128 %Val)
14035 The '``llvm.log10.*``' intrinsics compute the base-10 logarithm of the
14041 The argument and return value are floating-point numbers of the same type.
14046 Return the same value as a corresponding libm '``log10``' function but without
14047 trapping or setting ``errno``.
14049 When specified with the fast-math-flag 'afn', the result may be approximated
14050 using a less accurate calculation.
14052 '``llvm.log2.*``' Intrinsic
14053 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
14058 This is an overloaded intrinsic. You can use ``llvm.log2`` on any
14059 floating-point or vector of floating-point type. Not all targets support
14064 declare float @llvm.log2.f32(float %Val)
14065 declare double @llvm.log2.f64(double %Val)
14066 declare x86_fp80 @llvm.log2.f80(x86_fp80 %Val)
14067 declare fp128 @llvm.log2.f128(fp128 %Val)
14068 declare ppc_fp128 @llvm.log2.ppcf128(ppc_fp128 %Val)
14073 The '``llvm.log2.*``' intrinsics compute the base-2 logarithm of the specified
14079 The argument and return value are floating-point numbers of the same type.
14084 Return the same value as a corresponding libm '``log2``' function but without
14085 trapping or setting ``errno``.
14087 When specified with the fast-math-flag 'afn', the result may be approximated
14088 using a less accurate calculation.
14092 '``llvm.fma.*``' Intrinsic
14093 ^^^^^^^^^^^^^^^^^^^^^^^^^^
14098 This is an overloaded intrinsic. You can use ``llvm.fma`` on any
14099 floating-point or vector of floating-point type. Not all targets support
14104 declare float @llvm.fma.f32(float %a, float %b, float %c)
14105 declare double @llvm.fma.f64(double %a, double %b, double %c)
14106 declare x86_fp80 @llvm.fma.f80(x86_fp80 %a, x86_fp80 %b, x86_fp80 %c)
14107 declare fp128 @llvm.fma.f128(fp128 %a, fp128 %b, fp128 %c)
14108 declare ppc_fp128 @llvm.fma.ppcf128(ppc_fp128 %a, ppc_fp128 %b, ppc_fp128 %c)
14113 The '``llvm.fma.*``' intrinsics perform the fused multiply-add operation.
14118 The arguments and return value are floating-point numbers of the same type.
14123 Return the same value as a corresponding libm '``fma``' function but without
14124 trapping or setting ``errno``.
14126 When specified with the fast-math-flag 'afn', the result may be approximated
14127 using a less accurate calculation.
14129 '``llvm.fabs.*``' Intrinsic
14130 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
14135 This is an overloaded intrinsic. You can use ``llvm.fabs`` on any
14136 floating-point or vector of floating-point type. Not all targets support
14141 declare float @llvm.fabs.f32(float %Val)
14142 declare double @llvm.fabs.f64(double %Val)
14143 declare x86_fp80 @llvm.fabs.f80(x86_fp80 %Val)
14144 declare fp128 @llvm.fabs.f128(fp128 %Val)
14145 declare ppc_fp128 @llvm.fabs.ppcf128(ppc_fp128 %Val)
14150 The '``llvm.fabs.*``' intrinsics return the absolute value of the
14156 The argument and return value are floating-point numbers of the same
14162 This function returns the same values as the libm ``fabs`` functions
14163 would, and handles error conditions in the same way.
14165 '``llvm.minnum.*``' Intrinsic
14166 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14171 This is an overloaded intrinsic. You can use ``llvm.minnum`` on any
14172 floating-point or vector of floating-point type. Not all targets support
14177 declare float @llvm.minnum.f32(float %Val0, float %Val1)
14178 declare double @llvm.minnum.f64(double %Val0, double %Val1)
14179 declare x86_fp80 @llvm.minnum.f80(x86_fp80 %Val0, x86_fp80 %Val1)
14180 declare fp128 @llvm.minnum.f128(fp128 %Val0, fp128 %Val1)
14181 declare ppc_fp128 @llvm.minnum.ppcf128(ppc_fp128 %Val0, ppc_fp128 %Val1)
14186 The '``llvm.minnum.*``' intrinsics return the minimum of the two
14193 The arguments and return value are floating-point numbers of the same
14199 Follows the IEEE-754 semantics for minNum, except for handling of
14200 signaling NaNs. This match's the behavior of libm's fmin.
14202 If either operand is a NaN, returns the other non-NaN operand. Returns
14203 NaN only if both operands are NaN. The returned NaN is always
14204 quiet. If the operands compare equal, returns a value that compares
14205 equal to both operands. This means that fmin(+/-0.0, +/-0.0) could
14206 return either -0.0 or 0.0.
14208 Unlike the IEEE-754 2008 behavior, this does not distinguish between
14209 signaling and quiet NaN inputs. If a target's implementation follows
14210 the standard and returns a quiet NaN if either input is a signaling
14211 NaN, the intrinsic lowering is responsible for quieting the inputs to
14212 correctly return the non-NaN input (e.g. by using the equivalent of
14213 ``llvm.canonicalize``).
14216 '``llvm.maxnum.*``' Intrinsic
14217 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14222 This is an overloaded intrinsic. You can use ``llvm.maxnum`` on any
14223 floating-point or vector of floating-point type. Not all targets support
14228 declare float @llvm.maxnum.f32(float %Val0, float %Val1)
14229 declare double @llvm.maxnum.f64(double %Val0, double %Val1)
14230 declare x86_fp80 @llvm.maxnum.f80(x86_fp80 %Val0, x86_fp80 %Val1)
14231 declare fp128 @llvm.maxnum.f128(fp128 %Val0, fp128 %Val1)
14232 declare ppc_fp128 @llvm.maxnum.ppcf128(ppc_fp128 %Val0, ppc_fp128 %Val1)
14237 The '``llvm.maxnum.*``' intrinsics return the maximum of the two
14244 The arguments and return value are floating-point numbers of the same
14249 Follows the IEEE-754 semantics for maxNum except for the handling of
14250 signaling NaNs. This matches the behavior of libm's fmax.
14252 If either operand is a NaN, returns the other non-NaN operand. Returns
14253 NaN only if both operands are NaN. The returned NaN is always
14254 quiet. If the operands compare equal, returns a value that compares
14255 equal to both operands. This means that fmax(+/-0.0, +/-0.0) could
14256 return either -0.0 or 0.0.
14258 Unlike the IEEE-754 2008 behavior, this does not distinguish between
14259 signaling and quiet NaN inputs. If a target's implementation follows
14260 the standard and returns a quiet NaN if either input is a signaling
14261 NaN, the intrinsic lowering is responsible for quieting the inputs to
14262 correctly return the non-NaN input (e.g. by using the equivalent of
14263 ``llvm.canonicalize``).
14265 '``llvm.minimum.*``' Intrinsic
14266 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14271 This is an overloaded intrinsic. You can use ``llvm.minimum`` on any
14272 floating-point or vector of floating-point type. Not all targets support
14277 declare float @llvm.minimum.f32(float %Val0, float %Val1)
14278 declare double @llvm.minimum.f64(double %Val0, double %Val1)
14279 declare x86_fp80 @llvm.minimum.f80(x86_fp80 %Val0, x86_fp80 %Val1)
14280 declare fp128 @llvm.minimum.f128(fp128 %Val0, fp128 %Val1)
14281 declare ppc_fp128 @llvm.minimum.ppcf128(ppc_fp128 %Val0, ppc_fp128 %Val1)
14286 The '``llvm.minimum.*``' intrinsics return the minimum of the two
14287 arguments, propagating NaNs and treating -0.0 as less than +0.0.
14293 The arguments and return value are floating-point numbers of the same
14298 If either operand is a NaN, returns NaN. Otherwise returns the lesser
14299 of the two arguments. -0.0 is considered to be less than +0.0 for this
14300 intrinsic. Note that these are the semantics specified in the draft of
14303 '``llvm.maximum.*``' Intrinsic
14304 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14309 This is an overloaded intrinsic. You can use ``llvm.maximum`` on any
14310 floating-point or vector of floating-point type. Not all targets support
14315 declare float @llvm.maximum.f32(float %Val0, float %Val1)
14316 declare double @llvm.maximum.f64(double %Val0, double %Val1)
14317 declare x86_fp80 @llvm.maximum.f80(x86_fp80 %Val0, x86_fp80 %Val1)
14318 declare fp128 @llvm.maximum.f128(fp128 %Val0, fp128 %Val1)
14319 declare ppc_fp128 @llvm.maximum.ppcf128(ppc_fp128 %Val0, ppc_fp128 %Val1)
14324 The '``llvm.maximum.*``' intrinsics return the maximum of the two
14325 arguments, propagating NaNs and treating -0.0 as less than +0.0.
14331 The arguments and return value are floating-point numbers of the same
14336 If either operand is a NaN, returns NaN. Otherwise returns the greater
14337 of the two arguments. -0.0 is considered to be less than +0.0 for this
14338 intrinsic. Note that these are the semantics specified in the draft of
14341 '``llvm.copysign.*``' Intrinsic
14342 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14347 This is an overloaded intrinsic. You can use ``llvm.copysign`` on any
14348 floating-point or vector of floating-point type. Not all targets support
14353 declare float @llvm.copysign.f32(float %Mag, float %Sgn)
14354 declare double @llvm.copysign.f64(double %Mag, double %Sgn)
14355 declare x86_fp80 @llvm.copysign.f80(x86_fp80 %Mag, x86_fp80 %Sgn)
14356 declare fp128 @llvm.copysign.f128(fp128 %Mag, fp128 %Sgn)
14357 declare ppc_fp128 @llvm.copysign.ppcf128(ppc_fp128 %Mag, ppc_fp128 %Sgn)
14362 The '``llvm.copysign.*``' intrinsics return a value with the magnitude of the
14363 first operand and the sign of the second operand.
14368 The arguments and return value are floating-point numbers of the same
14374 This function returns the same values as the libm ``copysign``
14375 functions would, and handles error conditions in the same way.
14377 '``llvm.floor.*``' Intrinsic
14378 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14383 This is an overloaded intrinsic. You can use ``llvm.floor`` on any
14384 floating-point or vector of floating-point type. Not all targets support
14389 declare float @llvm.floor.f32(float %Val)
14390 declare double @llvm.floor.f64(double %Val)
14391 declare x86_fp80 @llvm.floor.f80(x86_fp80 %Val)
14392 declare fp128 @llvm.floor.f128(fp128 %Val)
14393 declare ppc_fp128 @llvm.floor.ppcf128(ppc_fp128 %Val)
14398 The '``llvm.floor.*``' intrinsics return the floor of the operand.
14403 The argument and return value are floating-point numbers of the same
14409 This function returns the same values as the libm ``floor`` functions
14410 would, and handles error conditions in the same way.
14412 '``llvm.ceil.*``' Intrinsic
14413 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
14418 This is an overloaded intrinsic. You can use ``llvm.ceil`` on any
14419 floating-point or vector of floating-point type. Not all targets support
14424 declare float @llvm.ceil.f32(float %Val)
14425 declare double @llvm.ceil.f64(double %Val)
14426 declare x86_fp80 @llvm.ceil.f80(x86_fp80 %Val)
14427 declare fp128 @llvm.ceil.f128(fp128 %Val)
14428 declare ppc_fp128 @llvm.ceil.ppcf128(ppc_fp128 %Val)
14433 The '``llvm.ceil.*``' intrinsics return the ceiling of the operand.
14438 The argument and return value are floating-point numbers of the same
14444 This function returns the same values as the libm ``ceil`` functions
14445 would, and handles error conditions in the same way.
14447 '``llvm.trunc.*``' Intrinsic
14448 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14453 This is an overloaded intrinsic. You can use ``llvm.trunc`` on any
14454 floating-point or vector of floating-point type. Not all targets support
14459 declare float @llvm.trunc.f32(float %Val)
14460 declare double @llvm.trunc.f64(double %Val)
14461 declare x86_fp80 @llvm.trunc.f80(x86_fp80 %Val)
14462 declare fp128 @llvm.trunc.f128(fp128 %Val)
14463 declare ppc_fp128 @llvm.trunc.ppcf128(ppc_fp128 %Val)
14468 The '``llvm.trunc.*``' intrinsics returns the operand rounded to the
14469 nearest integer not larger in magnitude than the operand.
14474 The argument and return value are floating-point numbers of the same
14480 This function returns the same values as the libm ``trunc`` functions
14481 would, and handles error conditions in the same way.
14483 '``llvm.rint.*``' Intrinsic
14484 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
14489 This is an overloaded intrinsic. You can use ``llvm.rint`` on any
14490 floating-point or vector of floating-point type. Not all targets support
14495 declare float @llvm.rint.f32(float %Val)
14496 declare double @llvm.rint.f64(double %Val)
14497 declare x86_fp80 @llvm.rint.f80(x86_fp80 %Val)
14498 declare fp128 @llvm.rint.f128(fp128 %Val)
14499 declare ppc_fp128 @llvm.rint.ppcf128(ppc_fp128 %Val)
14504 The '``llvm.rint.*``' intrinsics returns the operand rounded to the
14505 nearest integer. It may raise an inexact floating-point exception if the
14506 operand isn't an integer.
14511 The argument and return value are floating-point numbers of the same
14517 This function returns the same values as the libm ``rint`` functions
14518 would, and handles error conditions in the same way.
14520 '``llvm.nearbyint.*``' Intrinsic
14521 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14526 This is an overloaded intrinsic. You can use ``llvm.nearbyint`` on any
14527 floating-point or vector of floating-point type. Not all targets support
14532 declare float @llvm.nearbyint.f32(float %Val)
14533 declare double @llvm.nearbyint.f64(double %Val)
14534 declare x86_fp80 @llvm.nearbyint.f80(x86_fp80 %Val)
14535 declare fp128 @llvm.nearbyint.f128(fp128 %Val)
14536 declare ppc_fp128 @llvm.nearbyint.ppcf128(ppc_fp128 %Val)
14541 The '``llvm.nearbyint.*``' intrinsics returns the operand rounded to the
14547 The argument and return value are floating-point numbers of the same
14553 This function returns the same values as the libm ``nearbyint``
14554 functions would, and handles error conditions in the same way.
14556 '``llvm.round.*``' Intrinsic
14557 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14562 This is an overloaded intrinsic. You can use ``llvm.round`` on any
14563 floating-point or vector of floating-point type. Not all targets support
14568 declare float @llvm.round.f32(float %Val)
14569 declare double @llvm.round.f64(double %Val)
14570 declare x86_fp80 @llvm.round.f80(x86_fp80 %Val)
14571 declare fp128 @llvm.round.f128(fp128 %Val)
14572 declare ppc_fp128 @llvm.round.ppcf128(ppc_fp128 %Val)
14577 The '``llvm.round.*``' intrinsics returns the operand rounded to the
14583 The argument and return value are floating-point numbers of the same
14589 This function returns the same values as the libm ``round``
14590 functions would, and handles error conditions in the same way.
14592 '``llvm.roundeven.*``' Intrinsic
14593 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14598 This is an overloaded intrinsic. You can use ``llvm.roundeven`` on any
14599 floating-point or vector of floating-point type. Not all targets support
14604 declare float @llvm.roundeven.f32(float %Val)
14605 declare double @llvm.roundeven.f64(double %Val)
14606 declare x86_fp80 @llvm.roundeven.f80(x86_fp80 %Val)
14607 declare fp128 @llvm.roundeven.f128(fp128 %Val)
14608 declare ppc_fp128 @llvm.roundeven.ppcf128(ppc_fp128 %Val)
14613 The '``llvm.roundeven.*``' intrinsics returns the operand rounded to the nearest
14614 integer in floating-point format rounding halfway cases to even (that is, to the
14615 nearest value that is an even integer).
14620 The argument and return value are floating-point numbers of the same type.
14625 This function implements IEEE-754 operation ``roundToIntegralTiesToEven``. It
14626 also behaves in the same way as C standard function ``roundeven``, except that
14627 it does not raise floating point exceptions.
14630 '``llvm.lround.*``' Intrinsic
14631 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14636 This is an overloaded intrinsic. You can use ``llvm.lround`` on any
14637 floating-point type. Not all targets support all types however.
14641 declare i32 @llvm.lround.i32.f32(float %Val)
14642 declare i32 @llvm.lround.i32.f64(double %Val)
14643 declare i32 @llvm.lround.i32.f80(float %Val)
14644 declare i32 @llvm.lround.i32.f128(double %Val)
14645 declare i32 @llvm.lround.i32.ppcf128(double %Val)
14647 declare i64 @llvm.lround.i64.f32(float %Val)
14648 declare i64 @llvm.lround.i64.f64(double %Val)
14649 declare i64 @llvm.lround.i64.f80(float %Val)
14650 declare i64 @llvm.lround.i64.f128(double %Val)
14651 declare i64 @llvm.lround.i64.ppcf128(double %Val)
14656 The '``llvm.lround.*``' intrinsics return the operand rounded to the nearest
14657 integer with ties away from zero.
14663 The argument is a floating-point number and the return value is an integer
14669 This function returns the same values as the libm ``lround``
14670 functions would, but without setting errno.
14672 '``llvm.llround.*``' Intrinsic
14673 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14678 This is an overloaded intrinsic. You can use ``llvm.llround`` on any
14679 floating-point type. Not all targets support all types however.
14683 declare i64 @llvm.lround.i64.f32(float %Val)
14684 declare i64 @llvm.lround.i64.f64(double %Val)
14685 declare i64 @llvm.lround.i64.f80(float %Val)
14686 declare i64 @llvm.lround.i64.f128(double %Val)
14687 declare i64 @llvm.lround.i64.ppcf128(double %Val)
14692 The '``llvm.llround.*``' intrinsics return the operand rounded to the nearest
14693 integer with ties away from zero.
14698 The argument is a floating-point number and the return value is an integer
14704 This function returns the same values as the libm ``llround``
14705 functions would, but without setting errno.
14707 '``llvm.lrint.*``' Intrinsic
14708 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14713 This is an overloaded intrinsic. You can use ``llvm.lrint`` on any
14714 floating-point type. Not all targets support all types however.
14718 declare i32 @llvm.lrint.i32.f32(float %Val)
14719 declare i32 @llvm.lrint.i32.f64(double %Val)
14720 declare i32 @llvm.lrint.i32.f80(float %Val)
14721 declare i32 @llvm.lrint.i32.f128(double %Val)
14722 declare i32 @llvm.lrint.i32.ppcf128(double %Val)
14724 declare i64 @llvm.lrint.i64.f32(float %Val)
14725 declare i64 @llvm.lrint.i64.f64(double %Val)
14726 declare i64 @llvm.lrint.i64.f80(float %Val)
14727 declare i64 @llvm.lrint.i64.f128(double %Val)
14728 declare i64 @llvm.lrint.i64.ppcf128(double %Val)
14733 The '``llvm.lrint.*``' intrinsics return the operand rounded to the nearest
14740 The argument is a floating-point number and the return value is an integer
14746 This function returns the same values as the libm ``lrint``
14747 functions would, but without setting errno.
14749 '``llvm.llrint.*``' Intrinsic
14750 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14755 This is an overloaded intrinsic. You can use ``llvm.llrint`` on any
14756 floating-point type. Not all targets support all types however.
14760 declare i64 @llvm.llrint.i64.f32(float %Val)
14761 declare i64 @llvm.llrint.i64.f64(double %Val)
14762 declare i64 @llvm.llrint.i64.f80(float %Val)
14763 declare i64 @llvm.llrint.i64.f128(double %Val)
14764 declare i64 @llvm.llrint.i64.ppcf128(double %Val)
14769 The '``llvm.llrint.*``' intrinsics return the operand rounded to the nearest
14775 The argument is a floating-point number and the return value is an integer
14781 This function returns the same values as the libm ``llrint``
14782 functions would, but without setting errno.
14784 Bit Manipulation Intrinsics
14785 ---------------------------
14787 LLVM provides intrinsics for a few important bit manipulation
14788 operations. These allow efficient code generation for some algorithms.
14790 '``llvm.bitreverse.*``' Intrinsics
14791 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14796 This is an overloaded intrinsic function. You can use bitreverse on any
14801 declare i16 @llvm.bitreverse.i16(i16 <id>)
14802 declare i32 @llvm.bitreverse.i32(i32 <id>)
14803 declare i64 @llvm.bitreverse.i64(i64 <id>)
14804 declare <4 x i32> @llvm.bitreverse.v4i32(<4 x i32> <id>)
14809 The '``llvm.bitreverse``' family of intrinsics is used to reverse the
14810 bitpattern of an integer value or vector of integer values; for example
14811 ``0b10110110`` becomes ``0b01101101``.
14816 The ``llvm.bitreverse.iN`` intrinsic returns an iN value that has bit
14817 ``M`` in the input moved to bit ``N-M`` in the output. The vector
14818 intrinsics, such as ``llvm.bitreverse.v4i32``, operate on a per-element
14819 basis and the element order is not affected.
14821 '``llvm.bswap.*``' Intrinsics
14822 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14827 This is an overloaded intrinsic function. You can use bswap on any
14828 integer type that is an even number of bytes (i.e. BitWidth % 16 == 0).
14832 declare i16 @llvm.bswap.i16(i16 <id>)
14833 declare i32 @llvm.bswap.i32(i32 <id>)
14834 declare i64 @llvm.bswap.i64(i64 <id>)
14835 declare <4 x i32> @llvm.bswap.v4i32(<4 x i32> <id>)
14840 The '``llvm.bswap``' family of intrinsics is used to byte swap an integer
14841 value or vector of integer values with an even number of bytes (positive
14842 multiple of 16 bits).
14847 The ``llvm.bswap.i16`` intrinsic returns an i16 value that has the high
14848 and low byte of the input i16 swapped. Similarly, the ``llvm.bswap.i32``
14849 intrinsic returns an i32 value that has the four bytes of the input i32
14850 swapped, so that if the input bytes are numbered 0, 1, 2, 3 then the
14851 returned i32 will have its bytes in 3, 2, 1, 0 order. The
14852 ``llvm.bswap.i48``, ``llvm.bswap.i64`` and other intrinsics extend this
14853 concept to additional even-byte lengths (6 bytes, 8 bytes and more,
14854 respectively). The vector intrinsics, such as ``llvm.bswap.v4i32``,
14855 operate on a per-element basis and the element order is not affected.
14857 '``llvm.ctpop.*``' Intrinsic
14858 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14863 This is an overloaded intrinsic. You can use llvm.ctpop on any integer
14864 bit width, or on any vector with integer elements. Not all targets
14865 support all bit widths or vector types, however.
14869 declare i8 @llvm.ctpop.i8(i8 <src>)
14870 declare i16 @llvm.ctpop.i16(i16 <src>)
14871 declare i32 @llvm.ctpop.i32(i32 <src>)
14872 declare i64 @llvm.ctpop.i64(i64 <src>)
14873 declare i256 @llvm.ctpop.i256(i256 <src>)
14874 declare <2 x i32> @llvm.ctpop.v2i32(<2 x i32> <src>)
14879 The '``llvm.ctpop``' family of intrinsics counts the number of bits set
14885 The only argument is the value to be counted. The argument may be of any
14886 integer type, or a vector with integer elements. The return type must
14887 match the argument type.
14892 The '``llvm.ctpop``' intrinsic counts the 1's in a variable, or within
14893 each element of a vector.
14895 '``llvm.ctlz.*``' Intrinsic
14896 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
14901 This is an overloaded intrinsic. You can use ``llvm.ctlz`` on any
14902 integer bit width, or any vector whose elements are integers. Not all
14903 targets support all bit widths or vector types, however.
14907 declare i8 @llvm.ctlz.i8 (i8 <src>, i1 <is_zero_undef>)
14908 declare i16 @llvm.ctlz.i16 (i16 <src>, i1 <is_zero_undef>)
14909 declare i32 @llvm.ctlz.i32 (i32 <src>, i1 <is_zero_undef>)
14910 declare i64 @llvm.ctlz.i64 (i64 <src>, i1 <is_zero_undef>)
14911 declare i256 @llvm.ctlz.i256(i256 <src>, i1 <is_zero_undef>)
14912 declare <2 x i32> @llvm.ctlz.v2i32(<2 x i32> <src>, i1 <is_zero_undef>)
14917 The '``llvm.ctlz``' family of intrinsic functions counts the number of
14918 leading zeros in a variable.
14923 The first argument is the value to be counted. This argument may be of
14924 any integer type, or a vector with integer element type. The return
14925 type must match the first argument type.
14927 The second argument must be a constant and is a flag to indicate whether
14928 the intrinsic should ensure that a zero as the first argument produces a
14929 defined result. Historically some architectures did not provide a
14930 defined result for zero values as efficiently, and many algorithms are
14931 now predicated on avoiding zero-value inputs.
14936 The '``llvm.ctlz``' intrinsic counts the leading (most significant)
14937 zeros in a variable, or within each element of the vector. If
14938 ``src == 0`` then the result is the size in bits of the type of ``src``
14939 if ``is_zero_undef == 0`` and ``undef`` otherwise. For example,
14940 ``llvm.ctlz(i32 2) = 30``.
14942 '``llvm.cttz.*``' Intrinsic
14943 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
14948 This is an overloaded intrinsic. You can use ``llvm.cttz`` on any
14949 integer bit width, or any vector of integer elements. Not all targets
14950 support all bit widths or vector types, however.
14954 declare i8 @llvm.cttz.i8 (i8 <src>, i1 <is_zero_undef>)
14955 declare i16 @llvm.cttz.i16 (i16 <src>, i1 <is_zero_undef>)
14956 declare i32 @llvm.cttz.i32 (i32 <src>, i1 <is_zero_undef>)
14957 declare i64 @llvm.cttz.i64 (i64 <src>, i1 <is_zero_undef>)
14958 declare i256 @llvm.cttz.i256(i256 <src>, i1 <is_zero_undef>)
14959 declare <2 x i32> @llvm.cttz.v2i32(<2 x i32> <src>, i1 <is_zero_undef>)
14964 The '``llvm.cttz``' family of intrinsic functions counts the number of
14970 The first argument is the value to be counted. This argument may be of
14971 any integer type, or a vector with integer element type. The return
14972 type must match the first argument type.
14974 The second argument must be a constant and is a flag to indicate whether
14975 the intrinsic should ensure that a zero as the first argument produces a
14976 defined result. Historically some architectures did not provide a
14977 defined result for zero values as efficiently, and many algorithms are
14978 now predicated on avoiding zero-value inputs.
14983 The '``llvm.cttz``' intrinsic counts the trailing (least significant)
14984 zeros in a variable, or within each element of a vector. If ``src == 0``
14985 then the result is the size in bits of the type of ``src`` if
14986 ``is_zero_undef == 0`` and ``undef`` otherwise. For example,
14987 ``llvm.cttz(2) = 1``.
14991 '``llvm.fshl.*``' Intrinsic
14992 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
14997 This is an overloaded intrinsic. You can use ``llvm.fshl`` on any
14998 integer bit width or any vector of integer elements. Not all targets
14999 support all bit widths or vector types, however.
15003 declare i8 @llvm.fshl.i8 (i8 %a, i8 %b, i8 %c)
15004 declare i67 @llvm.fshl.i67(i67 %a, i67 %b, i67 %c)
15005 declare <2 x i32> @llvm.fshl.v2i32(<2 x i32> %a, <2 x i32> %b, <2 x i32> %c)
15010 The '``llvm.fshl``' family of intrinsic functions performs a funnel shift left:
15011 the first two values are concatenated as { %a : %b } (%a is the most significant
15012 bits of the wide value), the combined value is shifted left, and the most
15013 significant bits are extracted to produce a result that is the same size as the
15014 original arguments. If the first 2 arguments are identical, this is equivalent
15015 to a rotate left operation. For vector types, the operation occurs for each
15016 element of the vector. The shift argument is treated as an unsigned amount
15017 modulo the element size of the arguments.
15022 The first two arguments are the values to be concatenated. The third
15023 argument is the shift amount. The arguments may be any integer type or a
15024 vector with integer element type. All arguments and the return value must
15025 have the same type.
15030 .. code-block:: text
15032 %r = call i8 @llvm.fshl.i8(i8 %x, i8 %y, i8 %z) ; %r = i8: msb_extract((concat(x, y) << (z % 8)), 8)
15033 %r = call i8 @llvm.fshl.i8(i8 255, i8 0, i8 15) ; %r = i8: 128 (0b10000000)
15034 %r = call i8 @llvm.fshl.i8(i8 15, i8 15, i8 11) ; %r = i8: 120 (0b01111000)
15035 %r = call i8 @llvm.fshl.i8(i8 0, i8 255, i8 8) ; %r = i8: 0 (0b00000000)
15037 '``llvm.fshr.*``' Intrinsic
15038 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
15043 This is an overloaded intrinsic. You can use ``llvm.fshr`` on any
15044 integer bit width or any vector of integer elements. Not all targets
15045 support all bit widths or vector types, however.
15049 declare i8 @llvm.fshr.i8 (i8 %a, i8 %b, i8 %c)
15050 declare i67 @llvm.fshr.i67(i67 %a, i67 %b, i67 %c)
15051 declare <2 x i32> @llvm.fshr.v2i32(<2 x i32> %a, <2 x i32> %b, <2 x i32> %c)
15056 The '``llvm.fshr``' family of intrinsic functions performs a funnel shift right:
15057 the first two values are concatenated as { %a : %b } (%a is the most significant
15058 bits of the wide value), the combined value is shifted right, and the least
15059 significant bits are extracted to produce a result that is the same size as the
15060 original arguments. If the first 2 arguments are identical, this is equivalent
15061 to a rotate right operation. For vector types, the operation occurs for each
15062 element of the vector. The shift argument is treated as an unsigned amount
15063 modulo the element size of the arguments.
15068 The first two arguments are the values to be concatenated. The third
15069 argument is the shift amount. The arguments may be any integer type or a
15070 vector with integer element type. All arguments and the return value must
15071 have the same type.
15076 .. code-block:: text
15078 %r = call i8 @llvm.fshr.i8(i8 %x, i8 %y, i8 %z) ; %r = i8: lsb_extract((concat(x, y) >> (z % 8)), 8)
15079 %r = call i8 @llvm.fshr.i8(i8 255, i8 0, i8 15) ; %r = i8: 254 (0b11111110)
15080 %r = call i8 @llvm.fshr.i8(i8 15, i8 15, i8 11) ; %r = i8: 225 (0b11100001)
15081 %r = call i8 @llvm.fshr.i8(i8 0, i8 255, i8 8) ; %r = i8: 255 (0b11111111)
15083 Arithmetic with Overflow Intrinsics
15084 -----------------------------------
15086 LLVM provides intrinsics for fast arithmetic overflow checking.
15088 Each of these intrinsics returns a two-element struct. The first
15089 element of this struct contains the result of the corresponding
15090 arithmetic operation modulo 2\ :sup:`n`\ , where n is the bit width of
15091 the result. Therefore, for example, the first element of the struct
15092 returned by ``llvm.sadd.with.overflow.i32`` is always the same as the
15093 result of a 32-bit ``add`` instruction with the same operands, where
15094 the ``add`` is *not* modified by an ``nsw`` or ``nuw`` flag.
15096 The second element of the result is an ``i1`` that is 1 if the
15097 arithmetic operation overflowed and 0 otherwise. An operation
15098 overflows if, for any values of its operands ``A`` and ``B`` and for
15099 any ``N`` larger than the operands' width, ``ext(A op B) to iN`` is
15100 not equal to ``(ext(A) to iN) op (ext(B) to iN)`` where ``ext`` is
15101 ``sext`` for signed overflow and ``zext`` for unsigned overflow, and
15102 ``op`` is the underlying arithmetic operation.
15104 The behavior of these intrinsics is well-defined for all argument
15107 '``llvm.sadd.with.overflow.*``' Intrinsics
15108 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
15113 This is an overloaded intrinsic. You can use ``llvm.sadd.with.overflow``
15114 on any integer bit width or vectors of integers.
15118 declare {i16, i1} @llvm.sadd.with.overflow.i16(i16 %a, i16 %b)
15119 declare {i32, i1} @llvm.sadd.with.overflow.i32(i32 %a, i32 %b)
15120 declare {i64, i1} @llvm.sadd.with.overflow.i64(i64 %a, i64 %b)
15121 declare {<4 x i32>, <4 x i1>} @llvm.sadd.with.overflow.v4i32(<4 x i32> %a, <4 x i32> %b)
15126 The '``llvm.sadd.with.overflow``' family of intrinsic functions perform
15127 a signed addition of the two arguments, and indicate whether an overflow
15128 occurred during the signed summation.
15133 The arguments (%a and %b) and the first element of the result structure
15134 may be of integer types of any bit width, but they must have the same
15135 bit width. The second element of the result structure must be of type
15136 ``i1``. ``%a`` and ``%b`` are the two values that will undergo signed
15142 The '``llvm.sadd.with.overflow``' family of intrinsic functions perform
15143 a signed addition of the two variables. They return a structure --- the
15144 first element of which is the signed summation, and the second element
15145 of which is a bit specifying if the signed summation resulted in an
15151 .. code-block:: llvm
15153 %res = call {i32, i1} @llvm.sadd.with.overflow.i32(i32 %a, i32 %b)
15154 %sum = extractvalue {i32, i1} %res, 0
15155 %obit = extractvalue {i32, i1} %res, 1
15156 br i1 %obit, label %overflow, label %normal
15158 '``llvm.uadd.with.overflow.*``' Intrinsics
15159 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
15164 This is an overloaded intrinsic. You can use ``llvm.uadd.with.overflow``
15165 on any integer bit width or vectors of integers.
15169 declare {i16, i1} @llvm.uadd.with.overflow.i16(i16 %a, i16 %b)
15170 declare {i32, i1} @llvm.uadd.with.overflow.i32(i32 %a, i32 %b)
15171 declare {i64, i1} @llvm.uadd.with.overflow.i64(i64 %a, i64 %b)
15172 declare {<4 x i32>, <4 x i1>} @llvm.uadd.with.overflow.v4i32(<4 x i32> %a, <4 x i32> %b)
15177 The '``llvm.uadd.with.overflow``' family of intrinsic functions perform
15178 an unsigned addition of the two arguments, and indicate whether a carry
15179 occurred during the unsigned summation.
15184 The arguments (%a and %b) and the first element of the result structure
15185 may be of integer types of any bit width, but they must have the same
15186 bit width. The second element of the result structure must be of type
15187 ``i1``. ``%a`` and ``%b`` are the two values that will undergo unsigned
15193 The '``llvm.uadd.with.overflow``' family of intrinsic functions perform
15194 an unsigned addition of the two arguments. They return a structure --- the
15195 first element of which is the sum, and the second element of which is a
15196 bit specifying if the unsigned summation resulted in a carry.
15201 .. code-block:: llvm
15203 %res = call {i32, i1} @llvm.uadd.with.overflow.i32(i32 %a, i32 %b)
15204 %sum = extractvalue {i32, i1} %res, 0
15205 %obit = extractvalue {i32, i1} %res, 1
15206 br i1 %obit, label %carry, label %normal
15208 '``llvm.ssub.with.overflow.*``' Intrinsics
15209 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
15214 This is an overloaded intrinsic. You can use ``llvm.ssub.with.overflow``
15215 on any integer bit width or vectors of integers.
15219 declare {i16, i1} @llvm.ssub.with.overflow.i16(i16 %a, i16 %b)
15220 declare {i32, i1} @llvm.ssub.with.overflow.i32(i32 %a, i32 %b)
15221 declare {i64, i1} @llvm.ssub.with.overflow.i64(i64 %a, i64 %b)
15222 declare {<4 x i32>, <4 x i1>} @llvm.ssub.with.overflow.v4i32(<4 x i32> %a, <4 x i32> %b)
15227 The '``llvm.ssub.with.overflow``' family of intrinsic functions perform
15228 a signed subtraction of the two arguments, and indicate whether an
15229 overflow occurred during the signed subtraction.
15234 The arguments (%a and %b) and the first element of the result structure
15235 may be of integer types of any bit width, but they must have the same
15236 bit width. The second element of the result structure must be of type
15237 ``i1``. ``%a`` and ``%b`` are the two values that will undergo signed
15243 The '``llvm.ssub.with.overflow``' family of intrinsic functions perform
15244 a signed subtraction of the two arguments. They return a structure --- the
15245 first element of which is the subtraction, and the second element of
15246 which is a bit specifying if the signed subtraction resulted in an
15252 .. code-block:: llvm
15254 %res = call {i32, i1} @llvm.ssub.with.overflow.i32(i32 %a, i32 %b)
15255 %sum = extractvalue {i32, i1} %res, 0
15256 %obit = extractvalue {i32, i1} %res, 1
15257 br i1 %obit, label %overflow, label %normal
15259 '``llvm.usub.with.overflow.*``' Intrinsics
15260 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
15265 This is an overloaded intrinsic. You can use ``llvm.usub.with.overflow``
15266 on any integer bit width or vectors of integers.
15270 declare {i16, i1} @llvm.usub.with.overflow.i16(i16 %a, i16 %b)
15271 declare {i32, i1} @llvm.usub.with.overflow.i32(i32 %a, i32 %b)
15272 declare {i64, i1} @llvm.usub.with.overflow.i64(i64 %a, i64 %b)
15273 declare {<4 x i32>, <4 x i1>} @llvm.usub.with.overflow.v4i32(<4 x i32> %a, <4 x i32> %b)
15278 The '``llvm.usub.with.overflow``' family of intrinsic functions perform
15279 an unsigned subtraction of the two arguments, and indicate whether an
15280 overflow occurred during the unsigned subtraction.
15285 The arguments (%a and %b) and the first element of the result structure
15286 may be of integer types of any bit width, but they must have the same
15287 bit width. The second element of the result structure must be of type
15288 ``i1``. ``%a`` and ``%b`` are the two values that will undergo unsigned
15294 The '``llvm.usub.with.overflow``' family of intrinsic functions perform
15295 an unsigned subtraction of the two arguments. They return a structure ---
15296 the first element of which is the subtraction, and the second element of
15297 which is a bit specifying if the unsigned subtraction resulted in an
15303 .. code-block:: llvm
15305 %res = call {i32, i1} @llvm.usub.with.overflow.i32(i32 %a, i32 %b)
15306 %sum = extractvalue {i32, i1} %res, 0
15307 %obit = extractvalue {i32, i1} %res, 1
15308 br i1 %obit, label %overflow, label %normal
15310 '``llvm.smul.with.overflow.*``' Intrinsics
15311 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
15316 This is an overloaded intrinsic. You can use ``llvm.smul.with.overflow``
15317 on any integer bit width or vectors of integers.
15321 declare {i16, i1} @llvm.smul.with.overflow.i16(i16 %a, i16 %b)
15322 declare {i32, i1} @llvm.smul.with.overflow.i32(i32 %a, i32 %b)
15323 declare {i64, i1} @llvm.smul.with.overflow.i64(i64 %a, i64 %b)
15324 declare {<4 x i32>, <4 x i1>} @llvm.smul.with.overflow.v4i32(<4 x i32> %a, <4 x i32> %b)
15329 The '``llvm.smul.with.overflow``' family of intrinsic functions perform
15330 a signed multiplication of the two arguments, and indicate whether an
15331 overflow occurred during the signed multiplication.
15336 The arguments (%a and %b) and the first element of the result structure
15337 may be of integer types of any bit width, but they must have the same
15338 bit width. The second element of the result structure must be of type
15339 ``i1``. ``%a`` and ``%b`` are the two values that will undergo signed
15345 The '``llvm.smul.with.overflow``' family of intrinsic functions perform
15346 a signed multiplication of the two arguments. They return a structure ---
15347 the first element of which is the multiplication, and the second element
15348 of which is a bit specifying if the signed multiplication resulted in an
15354 .. code-block:: llvm
15356 %res = call {i32, i1} @llvm.smul.with.overflow.i32(i32 %a, i32 %b)
15357 %sum = extractvalue {i32, i1} %res, 0
15358 %obit = extractvalue {i32, i1} %res, 1
15359 br i1 %obit, label %overflow, label %normal
15361 '``llvm.umul.with.overflow.*``' Intrinsics
15362 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
15367 This is an overloaded intrinsic. You can use ``llvm.umul.with.overflow``
15368 on any integer bit width or vectors of integers.
15372 declare {i16, i1} @llvm.umul.with.overflow.i16(i16 %a, i16 %b)
15373 declare {i32, i1} @llvm.umul.with.overflow.i32(i32 %a, i32 %b)
15374 declare {i64, i1} @llvm.umul.with.overflow.i64(i64 %a, i64 %b)
15375 declare {<4 x i32>, <4 x i1>} @llvm.umul.with.overflow.v4i32(<4 x i32> %a, <4 x i32> %b)
15380 The '``llvm.umul.with.overflow``' family of intrinsic functions perform
15381 a unsigned multiplication of the two arguments, and indicate whether an
15382 overflow occurred during the unsigned multiplication.
15387 The arguments (%a and %b) and the first element of the result structure
15388 may be of integer types of any bit width, but they must have the same
15389 bit width. The second element of the result structure must be of type
15390 ``i1``. ``%a`` and ``%b`` are the two values that will undergo unsigned
15396 The '``llvm.umul.with.overflow``' family of intrinsic functions perform
15397 an unsigned multiplication of the two arguments. They return a structure ---
15398 the first element of which is the multiplication, and the second
15399 element of which is a bit specifying if the unsigned multiplication
15400 resulted in an overflow.
15405 .. code-block:: llvm
15407 %res = call {i32, i1} @llvm.umul.with.overflow.i32(i32 %a, i32 %b)
15408 %sum = extractvalue {i32, i1} %res, 0
15409 %obit = extractvalue {i32, i1} %res, 1
15410 br i1 %obit, label %overflow, label %normal
15412 Saturation Arithmetic Intrinsics
15413 ---------------------------------
15415 Saturation arithmetic is a version of arithmetic in which operations are
15416 limited to a fixed range between a minimum and maximum value. If the result of
15417 an operation is greater than the maximum value, the result is set (or
15418 "clamped") to this maximum. If it is below the minimum, it is clamped to this
15422 '``llvm.sadd.sat.*``' Intrinsics
15423 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
15428 This is an overloaded intrinsic. You can use ``llvm.sadd.sat``
15429 on any integer bit width or vectors of integers.
15433 declare i16 @llvm.sadd.sat.i16(i16 %a, i16 %b)
15434 declare i32 @llvm.sadd.sat.i32(i32 %a, i32 %b)
15435 declare i64 @llvm.sadd.sat.i64(i64 %a, i64 %b)
15436 declare <4 x i32> @llvm.sadd.sat.v4i32(<4 x i32> %a, <4 x i32> %b)
15441 The '``llvm.sadd.sat``' family of intrinsic functions perform signed
15442 saturating addition on the 2 arguments.
15447 The arguments (%a and %b) and the result may be of integer types of any bit
15448 width, but they must have the same bit width. ``%a`` and ``%b`` are the two
15449 values that will undergo signed addition.
15454 The maximum value this operation can clamp to is the largest signed value
15455 representable by the bit width of the arguments. The minimum value is the
15456 smallest signed value representable by this bit width.
15462 .. code-block:: llvm
15464 %res = call i4 @llvm.sadd.sat.i4(i4 1, i4 2) ; %res = 3
15465 %res = call i4 @llvm.sadd.sat.i4(i4 5, i4 6) ; %res = 7
15466 %res = call i4 @llvm.sadd.sat.i4(i4 -4, i4 2) ; %res = -2
15467 %res = call i4 @llvm.sadd.sat.i4(i4 -4, i4 -5) ; %res = -8
15470 '``llvm.uadd.sat.*``' Intrinsics
15471 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
15476 This is an overloaded intrinsic. You can use ``llvm.uadd.sat``
15477 on any integer bit width or vectors of integers.
15481 declare i16 @llvm.uadd.sat.i16(i16 %a, i16 %b)
15482 declare i32 @llvm.uadd.sat.i32(i32 %a, i32 %b)
15483 declare i64 @llvm.uadd.sat.i64(i64 %a, i64 %b)
15484 declare <4 x i32> @llvm.uadd.sat.v4i32(<4 x i32> %a, <4 x i32> %b)
15489 The '``llvm.uadd.sat``' family of intrinsic functions perform unsigned
15490 saturating addition on the 2 arguments.
15495 The arguments (%a and %b) and the result may be of integer types of any bit
15496 width, but they must have the same bit width. ``%a`` and ``%b`` are the two
15497 values that will undergo unsigned addition.
15502 The maximum value this operation can clamp to is the largest unsigned value
15503 representable by the bit width of the arguments. Because this is an unsigned
15504 operation, the result will never saturate towards zero.
15510 .. code-block:: llvm
15512 %res = call i4 @llvm.uadd.sat.i4(i4 1, i4 2) ; %res = 3
15513 %res = call i4 @llvm.uadd.sat.i4(i4 5, i4 6) ; %res = 11
15514 %res = call i4 @llvm.uadd.sat.i4(i4 8, i4 8) ; %res = 15
15517 '``llvm.ssub.sat.*``' Intrinsics
15518 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
15523 This is an overloaded intrinsic. You can use ``llvm.ssub.sat``
15524 on any integer bit width or vectors of integers.
15528 declare i16 @llvm.ssub.sat.i16(i16 %a, i16 %b)
15529 declare i32 @llvm.ssub.sat.i32(i32 %a, i32 %b)
15530 declare i64 @llvm.ssub.sat.i64(i64 %a, i64 %b)
15531 declare <4 x i32> @llvm.ssub.sat.v4i32(<4 x i32> %a, <4 x i32> %b)
15536 The '``llvm.ssub.sat``' family of intrinsic functions perform signed
15537 saturating subtraction on the 2 arguments.
15542 The arguments (%a and %b) and the result may be of integer types of any bit
15543 width, but they must have the same bit width. ``%a`` and ``%b`` are the two
15544 values that will undergo signed subtraction.
15549 The maximum value this operation can clamp to is the largest signed value
15550 representable by the bit width of the arguments. The minimum value is the
15551 smallest signed value representable by this bit width.
15557 .. code-block:: llvm
15559 %res = call i4 @llvm.ssub.sat.i4(i4 2, i4 1) ; %res = 1
15560 %res = call i4 @llvm.ssub.sat.i4(i4 2, i4 6) ; %res = -4
15561 %res = call i4 @llvm.ssub.sat.i4(i4 -4, i4 5) ; %res = -8
15562 %res = call i4 @llvm.ssub.sat.i4(i4 4, i4 -5) ; %res = 7
15565 '``llvm.usub.sat.*``' Intrinsics
15566 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
15571 This is an overloaded intrinsic. You can use ``llvm.usub.sat``
15572 on any integer bit width or vectors of integers.
15576 declare i16 @llvm.usub.sat.i16(i16 %a, i16 %b)
15577 declare i32 @llvm.usub.sat.i32(i32 %a, i32 %b)
15578 declare i64 @llvm.usub.sat.i64(i64 %a, i64 %b)
15579 declare <4 x i32> @llvm.usub.sat.v4i32(<4 x i32> %a, <4 x i32> %b)
15584 The '``llvm.usub.sat``' family of intrinsic functions perform unsigned
15585 saturating subtraction on the 2 arguments.
15590 The arguments (%a and %b) and the result may be of integer types of any bit
15591 width, but they must have the same bit width. ``%a`` and ``%b`` are the two
15592 values that will undergo unsigned subtraction.
15597 The minimum value this operation can clamp to is 0, which is the smallest
15598 unsigned value representable by the bit width of the unsigned arguments.
15599 Because this is an unsigned operation, the result will never saturate towards
15600 the largest possible value representable by this bit width.
15606 .. code-block:: llvm
15608 %res = call i4 @llvm.usub.sat.i4(i4 2, i4 1) ; %res = 1
15609 %res = call i4 @llvm.usub.sat.i4(i4 2, i4 6) ; %res = 0
15612 '``llvm.sshl.sat.*``' Intrinsics
15613 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
15618 This is an overloaded intrinsic. You can use ``llvm.sshl.sat``
15619 on integers or vectors of integers of any bit width.
15623 declare i16 @llvm.sshl.sat.i16(i16 %a, i16 %b)
15624 declare i32 @llvm.sshl.sat.i32(i32 %a, i32 %b)
15625 declare i64 @llvm.sshl.sat.i64(i64 %a, i64 %b)
15626 declare <4 x i32> @llvm.sshl.sat.v4i32(<4 x i32> %a, <4 x i32> %b)
15631 The '``llvm.sshl.sat``' family of intrinsic functions perform signed
15632 saturating left shift on the first argument.
15637 The arguments (``%a`` and ``%b``) and the result may be of integer types of any
15638 bit width, but they must have the same bit width. ``%a`` is the value to be
15639 shifted, and ``%b`` is the amount to shift by. If ``b`` is (statically or
15640 dynamically) equal to or larger than the integer bit width of the arguments,
15641 the result is a :ref:`poison value <poisonvalues>`. If the arguments are
15642 vectors, each vector element of ``a`` is shifted by the corresponding shift
15649 The maximum value this operation can clamp to is the largest signed value
15650 representable by the bit width of the arguments. The minimum value is the
15651 smallest signed value representable by this bit width.
15657 .. code-block:: llvm
15659 %res = call i4 @llvm.sshl.sat.i4(i4 2, i4 1) ; %res = 4
15660 %res = call i4 @llvm.sshl.sat.i4(i4 2, i4 2) ; %res = 7
15661 %res = call i4 @llvm.sshl.sat.i4(i4 -5, i4 1) ; %res = -8
15662 %res = call i4 @llvm.sshl.sat.i4(i4 -1, i4 1) ; %res = -2
15665 '``llvm.ushl.sat.*``' Intrinsics
15666 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
15671 This is an overloaded intrinsic. You can use ``llvm.ushl.sat``
15672 on integers or vectors of integers of any bit width.
15676 declare i16 @llvm.ushl.sat.i16(i16 %a, i16 %b)
15677 declare i32 @llvm.ushl.sat.i32(i32 %a, i32 %b)
15678 declare i64 @llvm.ushl.sat.i64(i64 %a, i64 %b)
15679 declare <4 x i32> @llvm.ushl.sat.v4i32(<4 x i32> %a, <4 x i32> %b)
15684 The '``llvm.ushl.sat``' family of intrinsic functions perform unsigned
15685 saturating left shift on the first argument.
15690 The arguments (``%a`` and ``%b``) and the result may be of integer types of any
15691 bit width, but they must have the same bit width. ``%a`` is the value to be
15692 shifted, and ``%b`` is the amount to shift by. If ``b`` is (statically or
15693 dynamically) equal to or larger than the integer bit width of the arguments,
15694 the result is a :ref:`poison value <poisonvalues>`. If the arguments are
15695 vectors, each vector element of ``a`` is shifted by the corresponding shift
15701 The maximum value this operation can clamp to is the largest unsigned value
15702 representable by the bit width of the arguments.
15708 .. code-block:: llvm
15710 %res = call i4 @llvm.ushl.sat.i4(i4 2, i4 1) ; %res = 4
15711 %res = call i4 @llvm.ushl.sat.i4(i4 3, i4 3) ; %res = 15
15714 Fixed Point Arithmetic Intrinsics
15715 ---------------------------------
15717 A fixed point number represents a real data type for a number that has a fixed
15718 number of digits after a radix point (equivalent to the decimal point '.').
15719 The number of digits after the radix point is referred as the `scale`. These
15720 are useful for representing fractional values to a specific precision. The
15721 following intrinsics perform fixed point arithmetic operations on 2 operands
15722 of the same scale, specified as the third argument.
15724 The ``llvm.*mul.fix`` family of intrinsic functions represents a multiplication
15725 of fixed point numbers through scaled integers. Therefore, fixed point
15726 multiplication can be represented as
15728 .. code-block:: llvm
15730 %result = call i4 @llvm.smul.fix.i4(i4 %a, i4 %b, i32 %scale)
15733 %a2 = sext i4 %a to i8
15734 %b2 = sext i4 %b to i8
15735 %mul = mul nsw nuw i8 %a, %b
15736 %scale2 = trunc i32 %scale to i8
15737 %r = ashr i8 %mul, i8 %scale2 ; this is for a target rounding down towards negative infinity
15738 %result = trunc i8 %r to i4
15740 The ``llvm.*div.fix`` family of intrinsic functions represents a division of
15741 fixed point numbers through scaled integers. Fixed point division can be
15744 .. code-block:: llvm
15746 %result call i4 @llvm.sdiv.fix.i4(i4 %a, i4 %b, i32 %scale)
15749 %a2 = sext i4 %a to i8
15750 %b2 = sext i4 %b to i8
15751 %scale2 = trunc i32 %scale to i8
15752 %a3 = shl i8 %a2, %scale2
15753 %r = sdiv i8 %a3, %b2 ; this is for a target rounding towards zero
15754 %result = trunc i8 %r to i4
15756 For each of these functions, if the result cannot be represented exactly with
15757 the provided scale, the result is rounded. Rounding is unspecified since
15758 preferred rounding may vary for different targets. Rounding is specified
15759 through a target hook. Different pipelines should legalize or optimize this
15760 using the rounding specified by this hook if it is provided. Operations like
15761 constant folding, instruction combining, KnownBits, and ValueTracking should
15762 also use this hook, if provided, and not assume the direction of rounding. A
15763 rounded result must always be within one unit of precision from the true
15764 result. That is, the error between the returned result and the true result must
15765 be less than 1/2^(scale).
15768 '``llvm.smul.fix.*``' Intrinsics
15769 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
15774 This is an overloaded intrinsic. You can use ``llvm.smul.fix``
15775 on any integer bit width or vectors of integers.
15779 declare i16 @llvm.smul.fix.i16(i16 %a, i16 %b, i32 %scale)
15780 declare i32 @llvm.smul.fix.i32(i32 %a, i32 %b, i32 %scale)
15781 declare i64 @llvm.smul.fix.i64(i64 %a, i64 %b, i32 %scale)
15782 declare <4 x i32> @llvm.smul.fix.v4i32(<4 x i32> %a, <4 x i32> %b, i32 %scale)
15787 The '``llvm.smul.fix``' family of intrinsic functions perform signed
15788 fixed point multiplication on 2 arguments of the same scale.
15793 The arguments (%a and %b) and the result may be of integer types of any bit
15794 width, but they must have the same bit width. The arguments may also work with
15795 int vectors of the same length and int size. ``%a`` and ``%b`` are the two
15796 values that will undergo signed fixed point multiplication. The argument
15797 ``%scale`` represents the scale of both operands, and must be a constant
15803 This operation performs fixed point multiplication on the 2 arguments of a
15804 specified scale. The result will also be returned in the same scale specified
15805 in the third argument.
15807 If the result value cannot be precisely represented in the given scale, the
15808 value is rounded up or down to the closest representable value. The rounding
15809 direction is unspecified.
15811 It is undefined behavior if the result value does not fit within the range of
15812 the fixed point type.
15818 .. code-block:: llvm
15820 %res = call i4 @llvm.smul.fix.i4(i4 3, i4 2, i32 0) ; %res = 6 (2 x 3 = 6)
15821 %res = call i4 @llvm.smul.fix.i4(i4 3, i4 2, i32 1) ; %res = 3 (1.5 x 1 = 1.5)
15822 %res = call i4 @llvm.smul.fix.i4(i4 3, i4 -2, i32 1) ; %res = -3 (1.5 x -1 = -1.5)
15824 ; The result in the following could be rounded up to -2 or down to -2.5
15825 %res = call i4 @llvm.smul.fix.i4(i4 3, i4 -3, i32 1) ; %res = -5 (or -4) (1.5 x -1.5 = -2.25)
15828 '``llvm.umul.fix.*``' Intrinsics
15829 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
15834 This is an overloaded intrinsic. You can use ``llvm.umul.fix``
15835 on any integer bit width or vectors of integers.
15839 declare i16 @llvm.umul.fix.i16(i16 %a, i16 %b, i32 %scale)
15840 declare i32 @llvm.umul.fix.i32(i32 %a, i32 %b, i32 %scale)
15841 declare i64 @llvm.umul.fix.i64(i64 %a, i64 %b, i32 %scale)
15842 declare <4 x i32> @llvm.umul.fix.v4i32(<4 x i32> %a, <4 x i32> %b, i32 %scale)
15847 The '``llvm.umul.fix``' family of intrinsic functions perform unsigned
15848 fixed point multiplication on 2 arguments of the same scale.
15853 The arguments (%a and %b) and the result may be of integer types of any bit
15854 width, but they must have the same bit width. The arguments may also work with
15855 int vectors of the same length and int size. ``%a`` and ``%b`` are the two
15856 values that will undergo unsigned fixed point multiplication. The argument
15857 ``%scale`` represents the scale of both operands, and must be a constant
15863 This operation performs unsigned fixed point multiplication on the 2 arguments of a
15864 specified scale. The result will also be returned in the same scale specified
15865 in the third argument.
15867 If the result value cannot be precisely represented in the given scale, the
15868 value is rounded up or down to the closest representable value. The rounding
15869 direction is unspecified.
15871 It is undefined behavior if the result value does not fit within the range of
15872 the fixed point type.
15878 .. code-block:: llvm
15880 %res = call i4 @llvm.umul.fix.i4(i4 3, i4 2, i32 0) ; %res = 6 (2 x 3 = 6)
15881 %res = call i4 @llvm.umul.fix.i4(i4 3, i4 2, i32 1) ; %res = 3 (1.5 x 1 = 1.5)
15883 ; The result in the following could be rounded down to 3.5 or up to 4
15884 %res = call i4 @llvm.umul.fix.i4(i4 15, i4 1, i32 1) ; %res = 7 (or 8) (7.5 x 0.5 = 3.75)
15887 '``llvm.smul.fix.sat.*``' Intrinsics
15888 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
15893 This is an overloaded intrinsic. You can use ``llvm.smul.fix.sat``
15894 on any integer bit width or vectors of integers.
15898 declare i16 @llvm.smul.fix.sat.i16(i16 %a, i16 %b, i32 %scale)
15899 declare i32 @llvm.smul.fix.sat.i32(i32 %a, i32 %b, i32 %scale)
15900 declare i64 @llvm.smul.fix.sat.i64(i64 %a, i64 %b, i32 %scale)
15901 declare <4 x i32> @llvm.smul.fix.sat.v4i32(<4 x i32> %a, <4 x i32> %b, i32 %scale)
15906 The '``llvm.smul.fix.sat``' family of intrinsic functions perform signed
15907 fixed point saturating multiplication on 2 arguments of the same scale.
15912 The arguments (%a and %b) and the result may be of integer types of any bit
15913 width, but they must have the same bit width. ``%a`` and ``%b`` are the two
15914 values that will undergo signed fixed point multiplication. The argument
15915 ``%scale`` represents the scale of both operands, and must be a constant
15921 This operation performs fixed point multiplication on the 2 arguments of a
15922 specified scale. The result will also be returned in the same scale specified
15923 in the third argument.
15925 If the result value cannot be precisely represented in the given scale, the
15926 value is rounded up or down to the closest representable value. The rounding
15927 direction is unspecified.
15929 The maximum value this operation can clamp to is the largest signed value
15930 representable by the bit width of the first 2 arguments. The minimum value is the
15931 smallest signed value representable by this bit width.
15937 .. code-block:: llvm
15939 %res = call i4 @llvm.smul.fix.sat.i4(i4 3, i4 2, i32 0) ; %res = 6 (2 x 3 = 6)
15940 %res = call i4 @llvm.smul.fix.sat.i4(i4 3, i4 2, i32 1) ; %res = 3 (1.5 x 1 = 1.5)
15941 %res = call i4 @llvm.smul.fix.sat.i4(i4 3, i4 -2, i32 1) ; %res = -3 (1.5 x -1 = -1.5)
15943 ; The result in the following could be rounded up to -2 or down to -2.5
15944 %res = call i4 @llvm.smul.fix.sat.i4(i4 3, i4 -3, i32 1) ; %res = -5 (or -4) (1.5 x -1.5 = -2.25)
15947 %res = call i4 @llvm.smul.fix.sat.i4(i4 7, i4 2, i32 0) ; %res = 7
15948 %res = call i4 @llvm.smul.fix.sat.i4(i4 7, i4 4, i32 2) ; %res = 7
15949 %res = call i4 @llvm.smul.fix.sat.i4(i4 -8, i4 5, i32 2) ; %res = -8
15950 %res = call i4 @llvm.smul.fix.sat.i4(i4 -8, i4 -2, i32 1) ; %res = 7
15952 ; Scale can affect the saturation result
15953 %res = call i4 @llvm.smul.fix.sat.i4(i4 2, i4 4, i32 0) ; %res = 7 (2 x 4 -> clamped to 7)
15954 %res = call i4 @llvm.smul.fix.sat.i4(i4 2, i4 4, i32 1) ; %res = 4 (1 x 2 = 2)
15957 '``llvm.umul.fix.sat.*``' Intrinsics
15958 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
15963 This is an overloaded intrinsic. You can use ``llvm.umul.fix.sat``
15964 on any integer bit width or vectors of integers.
15968 declare i16 @llvm.umul.fix.sat.i16(i16 %a, i16 %b, i32 %scale)
15969 declare i32 @llvm.umul.fix.sat.i32(i32 %a, i32 %b, i32 %scale)
15970 declare i64 @llvm.umul.fix.sat.i64(i64 %a, i64 %b, i32 %scale)
15971 declare <4 x i32> @llvm.umul.fix.sat.v4i32(<4 x i32> %a, <4 x i32> %b, i32 %scale)
15976 The '``llvm.umul.fix.sat``' family of intrinsic functions perform unsigned
15977 fixed point saturating multiplication on 2 arguments of the same scale.
15982 The arguments (%a and %b) and the result may be of integer types of any bit
15983 width, but they must have the same bit width. ``%a`` and ``%b`` are the two
15984 values that will undergo unsigned fixed point multiplication. The argument
15985 ``%scale`` represents the scale of both operands, and must be a constant
15991 This operation performs fixed point multiplication on the 2 arguments of a
15992 specified scale. The result will also be returned in the same scale specified
15993 in the third argument.
15995 If the result value cannot be precisely represented in the given scale, the
15996 value is rounded up or down to the closest representable value. The rounding
15997 direction is unspecified.
15999 The maximum value this operation can clamp to is the largest unsigned value
16000 representable by the bit width of the first 2 arguments. The minimum value is the
16001 smallest unsigned value representable by this bit width (zero).
16007 .. code-block:: llvm
16009 %res = call i4 @llvm.umul.fix.sat.i4(i4 3, i4 2, i32 0) ; %res = 6 (2 x 3 = 6)
16010 %res = call i4 @llvm.umul.fix.sat.i4(i4 3, i4 2, i32 1) ; %res = 3 (1.5 x 1 = 1.5)
16012 ; The result in the following could be rounded down to 2 or up to 2.5
16013 %res = call i4 @llvm.umul.fix.sat.i4(i4 3, i4 3, i32 1) ; %res = 4 (or 5) (1.5 x 1.5 = 2.25)
16016 %res = call i4 @llvm.umul.fix.sat.i4(i4 8, i4 2, i32 0) ; %res = 15 (8 x 2 -> clamped to 15)
16017 %res = call i4 @llvm.umul.fix.sat.i4(i4 8, i4 8, i32 2) ; %res = 15 (2 x 2 -> clamped to 3.75)
16019 ; Scale can affect the saturation result
16020 %res = call i4 @llvm.umul.fix.sat.i4(i4 2, i4 4, i32 0) ; %res = 7 (2 x 4 -> clamped to 7)
16021 %res = call i4 @llvm.umul.fix.sat.i4(i4 2, i4 4, i32 1) ; %res = 4 (1 x 2 = 2)
16024 '``llvm.sdiv.fix.*``' Intrinsics
16025 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16030 This is an overloaded intrinsic. You can use ``llvm.sdiv.fix``
16031 on any integer bit width or vectors of integers.
16035 declare i16 @llvm.sdiv.fix.i16(i16 %a, i16 %b, i32 %scale)
16036 declare i32 @llvm.sdiv.fix.i32(i32 %a, i32 %b, i32 %scale)
16037 declare i64 @llvm.sdiv.fix.i64(i64 %a, i64 %b, i32 %scale)
16038 declare <4 x i32> @llvm.sdiv.fix.v4i32(<4 x i32> %a, <4 x i32> %b, i32 %scale)
16043 The '``llvm.sdiv.fix``' family of intrinsic functions perform signed
16044 fixed point division on 2 arguments of the same scale.
16049 The arguments (%a and %b) and the result may be of integer types of any bit
16050 width, but they must have the same bit width. The arguments may also work with
16051 int vectors of the same length and int size. ``%a`` and ``%b`` are the two
16052 values that will undergo signed fixed point division. The argument
16053 ``%scale`` represents the scale of both operands, and must be a constant
16059 This operation performs fixed point division on the 2 arguments of a
16060 specified scale. The result will also be returned in the same scale specified
16061 in the third argument.
16063 If the result value cannot be precisely represented in the given scale, the
16064 value is rounded up or down to the closest representable value. The rounding
16065 direction is unspecified.
16067 It is undefined behavior if the result value does not fit within the range of
16068 the fixed point type, or if the second argument is zero.
16074 .. code-block:: llvm
16076 %res = call i4 @llvm.sdiv.fix.i4(i4 6, i4 2, i32 0) ; %res = 3 (6 / 2 = 3)
16077 %res = call i4 @llvm.sdiv.fix.i4(i4 6, i4 4, i32 1) ; %res = 3 (3 / 2 = 1.5)
16078 %res = call i4 @llvm.sdiv.fix.i4(i4 3, i4 -2, i32 1) ; %res = -3 (1.5 / -1 = -1.5)
16080 ; The result in the following could be rounded up to 1 or down to 0.5
16081 %res = call i4 @llvm.sdiv.fix.i4(i4 3, i4 4, i32 1) ; %res = 2 (or 1) (1.5 / 2 = 0.75)
16084 '``llvm.udiv.fix.*``' Intrinsics
16085 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16090 This is an overloaded intrinsic. You can use ``llvm.udiv.fix``
16091 on any integer bit width or vectors of integers.
16095 declare i16 @llvm.udiv.fix.i16(i16 %a, i16 %b, i32 %scale)
16096 declare i32 @llvm.udiv.fix.i32(i32 %a, i32 %b, i32 %scale)
16097 declare i64 @llvm.udiv.fix.i64(i64 %a, i64 %b, i32 %scale)
16098 declare <4 x i32> @llvm.udiv.fix.v4i32(<4 x i32> %a, <4 x i32> %b, i32 %scale)
16103 The '``llvm.udiv.fix``' family of intrinsic functions perform unsigned
16104 fixed point division on 2 arguments of the same scale.
16109 The arguments (%a and %b) and the result may be of integer types of any bit
16110 width, but they must have the same bit width. The arguments may also work with
16111 int vectors of the same length and int size. ``%a`` and ``%b`` are the two
16112 values that will undergo unsigned fixed point division. The argument
16113 ``%scale`` represents the scale of both operands, and must be a constant
16119 This operation performs fixed point division on the 2 arguments of a
16120 specified scale. The result will also be returned in the same scale specified
16121 in the third argument.
16123 If the result value cannot be precisely represented in the given scale, the
16124 value is rounded up or down to the closest representable value. The rounding
16125 direction is unspecified.
16127 It is undefined behavior if the result value does not fit within the range of
16128 the fixed point type, or if the second argument is zero.
16134 .. code-block:: llvm
16136 %res = call i4 @llvm.udiv.fix.i4(i4 6, i4 2, i32 0) ; %res = 3 (6 / 2 = 3)
16137 %res = call i4 @llvm.udiv.fix.i4(i4 6, i4 4, i32 1) ; %res = 3 (3 / 2 = 1.5)
16138 %res = call i4 @llvm.udiv.fix.i4(i4 1, i4 -8, i32 4) ; %res = 2 (0.0625 / 0.5 = 0.125)
16140 ; The result in the following could be rounded up to 1 or down to 0.5
16141 %res = call i4 @llvm.udiv.fix.i4(i4 3, i4 4, i32 1) ; %res = 2 (or 1) (1.5 / 2 = 0.75)
16144 '``llvm.sdiv.fix.sat.*``' Intrinsics
16145 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16150 This is an overloaded intrinsic. You can use ``llvm.sdiv.fix.sat``
16151 on any integer bit width or vectors of integers.
16155 declare i16 @llvm.sdiv.fix.sat.i16(i16 %a, i16 %b, i32 %scale)
16156 declare i32 @llvm.sdiv.fix.sat.i32(i32 %a, i32 %b, i32 %scale)
16157 declare i64 @llvm.sdiv.fix.sat.i64(i64 %a, i64 %b, i32 %scale)
16158 declare <4 x i32> @llvm.sdiv.fix.sat.v4i32(<4 x i32> %a, <4 x i32> %b, i32 %scale)
16163 The '``llvm.sdiv.fix.sat``' family of intrinsic functions perform signed
16164 fixed point saturating division on 2 arguments of the same scale.
16169 The arguments (%a and %b) and the result may be of integer types of any bit
16170 width, but they must have the same bit width. ``%a`` and ``%b`` are the two
16171 values that will undergo signed fixed point division. The argument
16172 ``%scale`` represents the scale of both operands, and must be a constant
16178 This operation performs fixed point division on the 2 arguments of a
16179 specified scale. The result will also be returned in the same scale specified
16180 in the third argument.
16182 If the result value cannot be precisely represented in the given scale, the
16183 value is rounded up or down to the closest representable value. The rounding
16184 direction is unspecified.
16186 The maximum value this operation can clamp to is the largest signed value
16187 representable by the bit width of the first 2 arguments. The minimum value is the
16188 smallest signed value representable by this bit width.
16190 It is undefined behavior if the second argument is zero.
16196 .. code-block:: llvm
16198 %res = call i4 @llvm.sdiv.fix.sat.i4(i4 6, i4 2, i32 0) ; %res = 3 (6 / 2 = 3)
16199 %res = call i4 @llvm.sdiv.fix.sat.i4(i4 6, i4 4, i32 1) ; %res = 3 (3 / 2 = 1.5)
16200 %res = call i4 @llvm.sdiv.fix.sat.i4(i4 3, i4 -2, i32 1) ; %res = -3 (1.5 / -1 = -1.5)
16202 ; The result in the following could be rounded up to 1 or down to 0.5
16203 %res = call i4 @llvm.sdiv.fix.sat.i4(i4 3, i4 4, i32 1) ; %res = 2 (or 1) (1.5 / 2 = 0.75)
16206 %res = call i4 @llvm.sdiv.fix.sat.i4(i4 -8, i4 -1, i32 0) ; %res = 7 (-8 / -1 = 8 => 7)
16207 %res = call i4 @llvm.sdiv.fix.sat.i4(i4 4, i4 2, i32 2) ; %res = 7 (1 / 0.5 = 2 => 1.75)
16208 %res = call i4 @llvm.sdiv.fix.sat.i4(i4 -4, i4 1, i32 2) ; %res = -8 (-1 / 0.25 = -4 => -2)
16211 '``llvm.udiv.fix.sat.*``' Intrinsics
16212 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16217 This is an overloaded intrinsic. You can use ``llvm.udiv.fix.sat``
16218 on any integer bit width or vectors of integers.
16222 declare i16 @llvm.udiv.fix.sat.i16(i16 %a, i16 %b, i32 %scale)
16223 declare i32 @llvm.udiv.fix.sat.i32(i32 %a, i32 %b, i32 %scale)
16224 declare i64 @llvm.udiv.fix.sat.i64(i64 %a, i64 %b, i32 %scale)
16225 declare <4 x i32> @llvm.udiv.fix.sat.v4i32(<4 x i32> %a, <4 x i32> %b, i32 %scale)
16230 The '``llvm.udiv.fix.sat``' family of intrinsic functions perform unsigned
16231 fixed point saturating division on 2 arguments of the same scale.
16236 The arguments (%a and %b) and the result may be of integer types of any bit
16237 width, but they must have the same bit width. ``%a`` and ``%b`` are the two
16238 values that will undergo unsigned fixed point division. The argument
16239 ``%scale`` represents the scale of both operands, and must be a constant
16245 This operation performs fixed point division on the 2 arguments of a
16246 specified scale. The result will also be returned in the same scale specified
16247 in the third argument.
16249 If the result value cannot be precisely represented in the given scale, the
16250 value is rounded up or down to the closest representable value. The rounding
16251 direction is unspecified.
16253 The maximum value this operation can clamp to is the largest unsigned value
16254 representable by the bit width of the first 2 arguments. The minimum value is the
16255 smallest unsigned value representable by this bit width (zero).
16257 It is undefined behavior if the second argument is zero.
16262 .. code-block:: llvm
16264 %res = call i4 @llvm.udiv.fix.sat.i4(i4 6, i4 2, i32 0) ; %res = 3 (6 / 2 = 3)
16265 %res = call i4 @llvm.udiv.fix.sat.i4(i4 6, i4 4, i32 1) ; %res = 3 (3 / 2 = 1.5)
16267 ; The result in the following could be rounded down to 0.5 or up to 1
16268 %res = call i4 @llvm.udiv.fix.sat.i4(i4 3, i4 4, i32 1) ; %res = 1 (or 2) (1.5 / 2 = 0.75)
16271 %res = call i4 @llvm.udiv.fix.sat.i4(i4 8, i4 2, i32 2) ; %res = 15 (2 / 0.5 = 4 => 3.75)
16274 Specialised Arithmetic Intrinsics
16275 ---------------------------------
16277 .. _i_intr_llvm_canonicalize:
16279 '``llvm.canonicalize.*``' Intrinsic
16280 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16287 declare float @llvm.canonicalize.f32(float %a)
16288 declare double @llvm.canonicalize.f64(double %b)
16293 The '``llvm.canonicalize.*``' intrinsic returns the platform specific canonical
16294 encoding of a floating-point number. This canonicalization is useful for
16295 implementing certain numeric primitives such as frexp. The canonical encoding is
16296 defined by IEEE-754-2008 to be:
16300 2.1.8 canonical encoding: The preferred encoding of a floating-point
16301 representation in a format. Applied to declets, significands of finite
16302 numbers, infinities, and NaNs, especially in decimal formats.
16304 This operation can also be considered equivalent to the IEEE-754-2008
16305 conversion of a floating-point value to the same format. NaNs are handled
16306 according to section 6.2.
16308 Examples of non-canonical encodings:
16310 - x87 pseudo denormals, pseudo NaNs, pseudo Infinity, Unnormals. These are
16311 converted to a canonical representation per hardware-specific protocol.
16312 - Many normal decimal floating-point numbers have non-canonical alternative
16314 - Some machines, like GPUs or ARMv7 NEON, do not support subnormal values.
16315 These are treated as non-canonical encodings of zero and will be flushed to
16316 a zero of the same sign by this operation.
16318 Note that per IEEE-754-2008 6.2, systems that support signaling NaNs with
16319 default exception handling must signal an invalid exception, and produce a
16322 This function should always be implementable as multiplication by 1.0, provided
16323 that the compiler does not constant fold the operation. Likewise, division by
16324 1.0 and ``llvm.minnum(x, x)`` are possible implementations. Addition with
16325 -0.0 is also sufficient provided that the rounding mode is not -Infinity.
16327 ``@llvm.canonicalize`` must preserve the equality relation. That is:
16329 - ``(@llvm.canonicalize(x) == x)`` is equivalent to ``(x == x)``
16330 - ``(@llvm.canonicalize(x) == @llvm.canonicalize(y))`` is equivalent to
16333 Additionally, the sign of zero must be conserved:
16334 ``@llvm.canonicalize(-0.0) = -0.0`` and ``@llvm.canonicalize(+0.0) = +0.0``
16336 The payload bits of a NaN must be conserved, with two exceptions.
16337 First, environments which use only a single canonical representation of NaN
16338 must perform said canonicalization. Second, SNaNs must be quieted per the
16341 The canonicalization operation may be optimized away if:
16343 - The input is known to be canonical. For example, it was produced by a
16344 floating-point operation that is required by the standard to be canonical.
16345 - The result is consumed only by (or fused with) other floating-point
16346 operations. That is, the bits of the floating-point value are not examined.
16348 '``llvm.fmuladd.*``' Intrinsic
16349 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16356 declare float @llvm.fmuladd.f32(float %a, float %b, float %c)
16357 declare double @llvm.fmuladd.f64(double %a, double %b, double %c)
16362 The '``llvm.fmuladd.*``' intrinsic functions represent multiply-add
16363 expressions that can be fused if the code generator determines that (a) the
16364 target instruction set has support for a fused operation, and (b) that the
16365 fused operation is more efficient than the equivalent, separate pair of mul
16366 and add instructions.
16371 The '``llvm.fmuladd.*``' intrinsics each take three arguments: two
16372 multiplicands, a and b, and an addend c.
16381 %0 = call float @llvm.fmuladd.f32(%a, %b, %c)
16383 is equivalent to the expression a \* b + c, except that it is unspecified
16384 whether rounding will be performed between the multiplication and addition
16385 steps. Fusion is not guaranteed, even if the target platform supports it.
16386 If a fused multiply-add is required, the corresponding
16387 :ref:`llvm.fma <int_fma>` intrinsic function should be used instead.
16388 This never sets errno, just as '``llvm.fma.*``'.
16393 .. code-block:: llvm
16395 %r2 = call float @llvm.fmuladd.f32(float %a, float %b, float %c) ; yields float:r2 = (a * b) + c
16398 Hardware-Loop Intrinsics
16399 ------------------------
16401 LLVM support several intrinsics to mark a loop as a hardware-loop. They are
16402 hints to the backend which are required to lower these intrinsics further to target
16403 specific instructions, or revert the hardware-loop to a normal loop if target
16404 specific restriction are not met and a hardware-loop can't be generated.
16406 These intrinsics may be modified in the future and are not intended to be used
16407 outside the backend. Thus, front-end and mid-level optimizations should not be
16408 generating these intrinsics.
16411 '``llvm.set.loop.iterations.*``' Intrinsic
16412 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16417 This is an overloaded intrinsic.
16421 declare void @llvm.set.loop.iterations.i32(i32)
16422 declare void @llvm.set.loop.iterations.i64(i64)
16427 The '``llvm.set.loop.iterations.*``' intrinsics are used to specify the
16428 hardware-loop trip count. They are placed in the loop preheader basic block and
16429 are marked as ``IntrNoDuplicate`` to avoid optimizers duplicating these
16435 The integer operand is the loop trip count of the hardware-loop, and thus
16436 not e.g. the loop back-edge taken count.
16441 The '``llvm.set.loop.iterations.*``' intrinsics do not perform any arithmetic
16442 on their operand. It's a hint to the backend that can use this to set up the
16443 hardware-loop count with a target specific instruction, usually a move of this
16444 value to a special register or a hardware-loop instruction.
16447 '``llvm.start.loop.iterations.*``' Intrinsic
16448 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16453 This is an overloaded intrinsic.
16457 declare i32 @llvm.start.loop.iterations.i32(i32)
16458 declare i64 @llvm.start.loop.iterations.i64(i64)
16463 The '``llvm.start.loop.iterations.*``' intrinsics are similar to the
16464 '``llvm.set.loop.iterations.*``' intrinsics, used to specify the
16465 hardware-loop trip count but also produce a value identical to the input
16466 that can be used as the input to the loop. They are placed in the loop
16467 preheader basic block and the output is expected to be the input to the
16468 phi for the induction variable of the loop, decremented by the
16469 '``llvm.loop.decrement.reg.*``'.
16474 The integer operand is the loop trip count of the hardware-loop, and thus
16475 not e.g. the loop back-edge taken count.
16480 The '``llvm.start.loop.iterations.*``' intrinsics do not perform any arithmetic
16481 on their operand. It's a hint to the backend that can use this to set up the
16482 hardware-loop count with a target specific instruction, usually a move of this
16483 value to a special register or a hardware-loop instruction.
16485 '``llvm.test.set.loop.iterations.*``' Intrinsic
16486 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16491 This is an overloaded intrinsic.
16495 declare i1 @llvm.test.set.loop.iterations.i32(i32)
16496 declare i1 @llvm.test.set.loop.iterations.i64(i64)
16501 The '``llvm.test.set.loop.iterations.*``' intrinsics are used to specify the
16502 the loop trip count, and also test that the given count is not zero, allowing
16503 it to control entry to a while-loop. They are placed in the loop preheader's
16504 predecessor basic block, and are marked as ``IntrNoDuplicate`` to avoid
16505 optimizers duplicating these instructions.
16510 The integer operand is the loop trip count of the hardware-loop, and thus
16511 not e.g. the loop back-edge taken count.
16516 The '``llvm.test.set.loop.iterations.*``' intrinsics do not perform any
16517 arithmetic on their operand. It's a hint to the backend that can use this to
16518 set up the hardware-loop count with a target specific instruction, usually a
16519 move of this value to a special register or a hardware-loop instruction.
16520 The result is the conditional value of whether the given count is not zero.
16523 '``llvm.test.start.loop.iterations.*``' Intrinsic
16524 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16529 This is an overloaded intrinsic.
16533 declare {i32, i1} @llvm.test.start.loop.iterations.i32(i32)
16534 declare {i64, i1} @llvm.test.start.loop.iterations.i64(i64)
16539 The '``llvm.test.start.loop.iterations.*``' intrinsics are similar to the
16540 '``llvm.test.set.loop.iterations.*``' and '``llvm.start.loop.iterations.*``'
16541 intrinsics, used to specify the hardware-loop trip count, but also produce a
16542 value identical to the input that can be used as the input to the loop. The
16543 second i1 output controls entry to a while-loop.
16548 The integer operand is the loop trip count of the hardware-loop, and thus
16549 not e.g. the loop back-edge taken count.
16554 The '``llvm.test.start.loop.iterations.*``' intrinsics do not perform any
16555 arithmetic on their operand. It's a hint to the backend that can use this to
16556 set up the hardware-loop count with a target specific instruction, usually a
16557 move of this value to a special register or a hardware-loop instruction.
16558 The result is a pair of the input and a conditional value of whether the
16559 given count is not zero.
16562 '``llvm.loop.decrement.reg.*``' Intrinsic
16563 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16568 This is an overloaded intrinsic.
16572 declare i32 @llvm.loop.decrement.reg.i32(i32, i32)
16573 declare i64 @llvm.loop.decrement.reg.i64(i64, i64)
16578 The '``llvm.loop.decrement.reg.*``' intrinsics are used to lower the loop
16579 iteration counter and return an updated value that will be used in the next
16585 Both arguments must have identical integer types. The first operand is the
16586 loop iteration counter. The second operand is the maximum number of elements
16587 processed in an iteration.
16592 The '``llvm.loop.decrement.reg.*``' intrinsics do an integer ``SUB`` of its
16593 two operands, which is not allowed to wrap. They return the remaining number of
16594 iterations still to be executed, and can be used together with a ``PHI``,
16595 ``ICMP`` and ``BR`` to control the number of loop iterations executed. Any
16596 optimisations are allowed to treat it is a ``SUB``, and it is supported by
16597 SCEV, so it's the backends responsibility to handle cases where it may be
16598 optimised. These intrinsics are marked as ``IntrNoDuplicate`` to avoid
16599 optimizers duplicating these instructions.
16602 '``llvm.loop.decrement.*``' Intrinsic
16603 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16608 This is an overloaded intrinsic.
16612 declare i1 @llvm.loop.decrement.i32(i32)
16613 declare i1 @llvm.loop.decrement.i64(i64)
16618 The HardwareLoops pass allows the loop decrement value to be specified with an
16619 option. It defaults to a loop decrement value of 1, but it can be an unsigned
16620 integer value provided by this option. The '``llvm.loop.decrement.*``'
16621 intrinsics decrement the loop iteration counter with this value, and return a
16622 false predicate if the loop should exit, and true otherwise.
16623 This is emitted if the loop counter is not updated via a ``PHI`` node, which
16624 can also be controlled with an option.
16629 The integer argument is the loop decrement value used to decrement the loop
16635 The '``llvm.loop.decrement.*``' intrinsics do a ``SUB`` of the loop iteration
16636 counter with the given loop decrement value, and return false if the loop
16637 should exit, this ``SUB`` is not allowed to wrap. The result is a condition
16638 that is used by the conditional branch controlling the loop.
16641 Vector Reduction Intrinsics
16642 ---------------------------
16644 Horizontal reductions of vectors can be expressed using the following
16645 intrinsics. Each one takes a vector operand as an input and applies its
16646 respective operation across all elements of the vector, returning a single
16647 scalar result of the same element type.
16649 .. _int_vector_reduce_add:
16651 '``llvm.vector.reduce.add.*``' Intrinsic
16652 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16659 declare i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %a)
16660 declare i64 @llvm.vector.reduce.add.v2i64(<2 x i64> %a)
16665 The '``llvm.vector.reduce.add.*``' intrinsics do an integer ``ADD``
16666 reduction of a vector, returning the result as a scalar. The return type matches
16667 the element-type of the vector input.
16671 The argument to this intrinsic must be a vector of integer values.
16673 .. _int_vector_reduce_fadd:
16675 '``llvm.vector.reduce.fadd.*``' Intrinsic
16676 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16683 declare float @llvm.vector.reduce.fadd.v4f32(float %start_value, <4 x float> %a)
16684 declare double @llvm.vector.reduce.fadd.v2f64(double %start_value, <2 x double> %a)
16689 The '``llvm.vector.reduce.fadd.*``' intrinsics do a floating-point
16690 ``ADD`` reduction of a vector, returning the result as a scalar. The return type
16691 matches the element-type of the vector input.
16693 If the intrinsic call has the 'reassoc' flag set, then the reduction will not
16694 preserve the associativity of an equivalent scalarized counterpart. Otherwise
16695 the reduction will be *sequential*, thus implying that the operation respects
16696 the associativity of a scalarized reduction. That is, the reduction begins with
16697 the start value and performs an fadd operation with consecutively increasing
16698 vector element indices. See the following pseudocode:
16702 float sequential_fadd(start_value, input_vector)
16703 result = start_value
16704 for i = 0 to length(input_vector)
16705 result = result + input_vector[i]
16711 The first argument to this intrinsic is a scalar start value for the reduction.
16712 The type of the start value matches the element-type of the vector input.
16713 The second argument must be a vector of floating-point values.
16715 To ignore the start value, negative zero (``-0.0``) can be used, as it is
16716 the neutral value of floating point addition.
16723 %unord = call reassoc float @llvm.vector.reduce.fadd.v4f32(float -0.0, <4 x float> %input) ; relaxed reduction
16724 %ord = call float @llvm.vector.reduce.fadd.v4f32(float %start_value, <4 x float> %input) ; sequential reduction
16727 .. _int_vector_reduce_mul:
16729 '``llvm.vector.reduce.mul.*``' Intrinsic
16730 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16737 declare i32 @llvm.vector.reduce.mul.v4i32(<4 x i32> %a)
16738 declare i64 @llvm.vector.reduce.mul.v2i64(<2 x i64> %a)
16743 The '``llvm.vector.reduce.mul.*``' intrinsics do an integer ``MUL``
16744 reduction of a vector, returning the result as a scalar. The return type matches
16745 the element-type of the vector input.
16749 The argument to this intrinsic must be a vector of integer values.
16751 .. _int_vector_reduce_fmul:
16753 '``llvm.vector.reduce.fmul.*``' Intrinsic
16754 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16761 declare float @llvm.vector.reduce.fmul.v4f32(float %start_value, <4 x float> %a)
16762 declare double @llvm.vector.reduce.fmul.v2f64(double %start_value, <2 x double> %a)
16767 The '``llvm.vector.reduce.fmul.*``' intrinsics do a floating-point
16768 ``MUL`` reduction of a vector, returning the result as a scalar. The return type
16769 matches the element-type of the vector input.
16771 If the intrinsic call has the 'reassoc' flag set, then the reduction will not
16772 preserve the associativity of an equivalent scalarized counterpart. Otherwise
16773 the reduction will be *sequential*, thus implying that the operation respects
16774 the associativity of a scalarized reduction. That is, the reduction begins with
16775 the start value and performs an fmul operation with consecutively increasing
16776 vector element indices. See the following pseudocode:
16780 float sequential_fmul(start_value, input_vector)
16781 result = start_value
16782 for i = 0 to length(input_vector)
16783 result = result * input_vector[i]
16789 The first argument to this intrinsic is a scalar start value for the reduction.
16790 The type of the start value matches the element-type of the vector input.
16791 The second argument must be a vector of floating-point values.
16793 To ignore the start value, one (``1.0``) can be used, as it is the neutral
16794 value of floating point multiplication.
16801 %unord = call reassoc float @llvm.vector.reduce.fmul.v4f32(float 1.0, <4 x float> %input) ; relaxed reduction
16802 %ord = call float @llvm.vector.reduce.fmul.v4f32(float %start_value, <4 x float> %input) ; sequential reduction
16804 .. _int_vector_reduce_and:
16806 '``llvm.vector.reduce.and.*``' Intrinsic
16807 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16814 declare i32 @llvm.vector.reduce.and.v4i32(<4 x i32> %a)
16819 The '``llvm.vector.reduce.and.*``' intrinsics do a bitwise ``AND``
16820 reduction of a vector, returning the result as a scalar. The return type matches
16821 the element-type of the vector input.
16825 The argument to this intrinsic must be a vector of integer values.
16827 .. _int_vector_reduce_or:
16829 '``llvm.vector.reduce.or.*``' Intrinsic
16830 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16837 declare i32 @llvm.vector.reduce.or.v4i32(<4 x i32> %a)
16842 The '``llvm.vector.reduce.or.*``' intrinsics do a bitwise ``OR`` reduction
16843 of a vector, returning the result as a scalar. The return type matches the
16844 element-type of the vector input.
16848 The argument to this intrinsic must be a vector of integer values.
16850 .. _int_vector_reduce_xor:
16852 '``llvm.vector.reduce.xor.*``' Intrinsic
16853 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16860 declare i32 @llvm.vector.reduce.xor.v4i32(<4 x i32> %a)
16865 The '``llvm.vector.reduce.xor.*``' intrinsics do a bitwise ``XOR``
16866 reduction of a vector, returning the result as a scalar. The return type matches
16867 the element-type of the vector input.
16871 The argument to this intrinsic must be a vector of integer values.
16873 .. _int_vector_reduce_smax:
16875 '``llvm.vector.reduce.smax.*``' Intrinsic
16876 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16883 declare i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> %a)
16888 The '``llvm.vector.reduce.smax.*``' intrinsics do a signed integer
16889 ``MAX`` reduction of a vector, returning the result as a scalar. The return type
16890 matches the element-type of the vector input.
16894 The argument to this intrinsic must be a vector of integer values.
16896 .. _int_vector_reduce_smin:
16898 '``llvm.vector.reduce.smin.*``' Intrinsic
16899 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16906 declare i32 @llvm.vector.reduce.smin.v4i32(<4 x i32> %a)
16911 The '``llvm.vector.reduce.smin.*``' intrinsics do a signed integer
16912 ``MIN`` reduction of a vector, returning the result as a scalar. The return type
16913 matches the element-type of the vector input.
16917 The argument to this intrinsic must be a vector of integer values.
16919 .. _int_vector_reduce_umax:
16921 '``llvm.vector.reduce.umax.*``' Intrinsic
16922 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16929 declare i32 @llvm.vector.reduce.umax.v4i32(<4 x i32> %a)
16934 The '``llvm.vector.reduce.umax.*``' intrinsics do an unsigned
16935 integer ``MAX`` reduction of a vector, returning the result as a scalar. The
16936 return type matches the element-type of the vector input.
16940 The argument to this intrinsic must be a vector of integer values.
16942 .. _int_vector_reduce_umin:
16944 '``llvm.vector.reduce.umin.*``' Intrinsic
16945 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16952 declare i32 @llvm.vector.reduce.umin.v4i32(<4 x i32> %a)
16957 The '``llvm.vector.reduce.umin.*``' intrinsics do an unsigned
16958 integer ``MIN`` reduction of a vector, returning the result as a scalar. The
16959 return type matches the element-type of the vector input.
16963 The argument to this intrinsic must be a vector of integer values.
16965 .. _int_vector_reduce_fmax:
16967 '``llvm.vector.reduce.fmax.*``' Intrinsic
16968 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16975 declare float @llvm.vector.reduce.fmax.v4f32(<4 x float> %a)
16976 declare double @llvm.vector.reduce.fmax.v2f64(<2 x double> %a)
16981 The '``llvm.vector.reduce.fmax.*``' intrinsics do a floating-point
16982 ``MAX`` reduction of a vector, returning the result as a scalar. The return type
16983 matches the element-type of the vector input.
16985 This instruction has the same comparison semantics as the '``llvm.maxnum.*``'
16986 intrinsic. That is, the result will always be a number unless all elements of
16987 the vector are NaN. For a vector with maximum element magnitude 0.0 and
16988 containing both +0.0 and -0.0 elements, the sign of the result is unspecified.
16990 If the intrinsic call has the ``nnan`` fast-math flag, then the operation can
16991 assume that NaNs are not present in the input vector.
16995 The argument to this intrinsic must be a vector of floating-point values.
16997 .. _int_vector_reduce_fmin:
16999 '``llvm.vector.reduce.fmin.*``' Intrinsic
17000 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17004 This is an overloaded intrinsic.
17008 declare float @llvm.vector.reduce.fmin.v4f32(<4 x float> %a)
17009 declare double @llvm.vector.reduce.fmin.v2f64(<2 x double> %a)
17014 The '``llvm.vector.reduce.fmin.*``' intrinsics do a floating-point
17015 ``MIN`` reduction of a vector, returning the result as a scalar. The return type
17016 matches the element-type of the vector input.
17018 This instruction has the same comparison semantics as the '``llvm.minnum.*``'
17019 intrinsic. That is, the result will always be a number unless all elements of
17020 the vector are NaN. For a vector with minimum element magnitude 0.0 and
17021 containing both +0.0 and -0.0 elements, the sign of the result is unspecified.
17023 If the intrinsic call has the ``nnan`` fast-math flag, then the operation can
17024 assume that NaNs are not present in the input vector.
17028 The argument to this intrinsic must be a vector of floating-point values.
17030 '``llvm.experimental.vector.insert``' Intrinsic
17031 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17035 This is an overloaded intrinsic. You can use ``llvm.experimental.vector.insert``
17036 to insert a fixed-width vector into a scalable vector, but not the other way
17041 declare <vscale x 4 x float> @llvm.experimental.vector.insert.v4f32(<vscale x 4 x float> %vec, <4 x float> %subvec, i64 %idx)
17042 declare <vscale x 2 x double> @llvm.experimental.vector.insert.v2f64(<vscale x 2 x double> %vec, <2 x double> %subvec, i64 %idx)
17047 The '``llvm.experimental.vector.insert.*``' intrinsics insert a vector into another vector
17048 starting from a given index. The return type matches the type of the vector we
17049 insert into. Conceptually, this can be used to build a scalable vector out of
17050 non-scalable vectors.
17055 The ``vec`` is the vector which ``subvec`` will be inserted into.
17056 The ``subvec`` is the vector that will be inserted.
17058 ``idx`` represents the starting element number at which ``subvec`` will be
17059 inserted. ``idx`` must be a constant multiple of ``subvec``'s known minimum
17060 vector length. If ``subvec`` is a scalable vector, ``idx`` is first scaled by
17061 the runtime scaling factor of ``subvec``. The elements of ``vec`` starting at
17062 ``idx`` are overwritten with ``subvec``. Elements ``idx`` through (``idx`` +
17063 num_elements(``subvec``) - 1) must be valid ``vec`` indices. If this condition
17064 cannot be determined statically but is false at runtime, then the result vector
17068 '``llvm.experimental.vector.extract``' Intrinsic
17069 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17073 This is an overloaded intrinsic. You can use
17074 ``llvm.experimental.vector.extract`` to extract a fixed-width vector from a
17075 scalable vector, but not the other way around.
17079 declare <4 x float> @llvm.experimental.vector.extract.v4f32(<vscale x 4 x float> %vec, i64 %idx)
17080 declare <2 x double> @llvm.experimental.vector.extract.v2f64(<vscale x 2 x double> %vec, i64 %idx)
17085 The '``llvm.experimental.vector.extract.*``' intrinsics extract a vector from
17086 within another vector starting from a given index. The return type must be
17087 explicitly specified. Conceptually, this can be used to decompose a scalable
17088 vector into non-scalable parts.
17093 The ``vec`` is the vector from which we will extract a subvector.
17095 The ``idx`` specifies the starting element number within ``vec`` from which a
17096 subvector is extracted. ``idx`` must be a constant multiple of the known-minimum
17097 vector length of the result type. If the result type is a scalable vector,
17098 ``idx`` is first scaled by the result type's runtime scaling factor. Elements
17099 ``idx`` through (``idx`` + num_elements(result_type) - 1) must be valid vector
17100 indices. If this condition cannot be determined statically but is false at
17101 runtime, then the result vector is undefined. The ``idx`` parameter must be a
17102 vector index constant type (for most targets this will be an integer pointer
17105 '``llvm.experimental.vector.reverse``' Intrinsic
17106 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17110 This is an overloaded intrinsic.
17114 declare <2 x i8> @llvm.experimental.vector.reverse.v2i8(<2 x i8> %a)
17115 declare <vscale x 4 x i32> @llvm.experimental.vector.reverse.nxv4i32(<vscale x 4 x i32> %a)
17120 The '``llvm.experimental.vector.reverse.*``' intrinsics reverse a vector.
17121 The intrinsic takes a single vector and returns a vector of matching type but
17122 with the original lane order reversed. These intrinsics work for both fixed
17123 and scalable vectors. While this intrinsic is marked as experimental the
17124 recommended way to express reverse operations for fixed-width vectors is still
17125 to use a shufflevector, as that may allow for more optimization opportunities.
17130 The argument to this intrinsic must be a vector.
17132 '``llvm.experimental.vector.splice``' Intrinsic
17133 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17137 This is an overloaded intrinsic.
17141 declare <2 x double> @llvm.experimental.vector.splice.v2f64(<2 x double> %vec1, <2 x double> %vec2, i32 %imm)
17142 declare <vscale x 4 x i32> @llvm.experimental.vector.splice.nxv4i32(<vscale x 4 x i32> %vec1, <vscale x 4 x i32> %vec2, i32 %imm)
17147 The '``llvm.experimental.vector.splice.*``' intrinsics construct a vector by
17148 concatenating elements from the first input vector with elements of the second
17149 input vector, returning a vector of the same type as the input vectors. The
17150 signed immediate, modulo the number of elements in the vector, is the index
17151 into the first vector from which to extract the result value. This means
17152 conceptually that for a positive immediate, a vector is extracted from
17153 ``concat(%vec1, %vec2)`` starting at index ``imm``, whereas for a negative
17154 immediate, it extracts ``-imm`` trailing elements from the first vector, and
17155 the remaining elements from ``%vec2``.
17157 These intrinsics work for both fixed and scalable vectors. While this intrinsic
17158 is marked as experimental, the recommended way to express this operation for
17159 fixed-width vectors is still to use a shufflevector, as that may allow for more
17160 optimization opportunities.
17164 .. code-block:: text
17166 llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, 1) ==> <B, C, D, E> ; index
17167 llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, -3) ==> <B, C, D, E> ; trailing elements
17173 The first two operands are vectors with the same type. The third argument
17174 ``imm`` is the start index, modulo VL, where VL is the runtime vector length of
17175 the source/result vector. The ``imm`` is a signed integer constant in the range
17176 ``-VL <= imm < VL``. For values outside of this range the result is poison.
17178 '``llvm.experimental.stepvector``' Intrinsic
17179 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17181 This is an overloaded intrinsic. You can use ``llvm.experimental.stepvector``
17182 to generate a vector whose lane values comprise the linear sequence
17183 <0, 1, 2, ...>. It is primarily intended for scalable vectors.
17187 declare <vscale x 4 x i32> @llvm.experimental.stepvector.nxv4i32()
17188 declare <vscale x 8 x i16> @llvm.experimental.stepvector.nxv8i16()
17190 The '``llvm.experimental.stepvector``' intrinsics are used to create vectors
17191 of integers whose elements contain a linear sequence of values starting from 0
17192 with a step of 1. This experimental intrinsic can only be used for vectors
17193 with integer elements that are at least 8 bits in size. If the sequence value
17194 exceeds the allowed limit for the element type then the result for that lane is
17197 These intrinsics work for both fixed and scalable vectors. While this intrinsic
17198 is marked as experimental, the recommended way to express this operation for
17199 fixed-width vectors is still to generate a constant vector instead.
17211 Operations on matrixes requiring shape information (like number of rows/columns
17212 or the memory layout) can be expressed using the matrix intrinsics. These
17213 intrinsics require matrix dimensions to be passed as immediate arguments, and
17214 matrixes are passed and returned as vectors. This means that for a ``R`` x
17215 ``C`` matrix, element ``i`` of column ``j`` is at index ``j * R + i`` in the
17216 corresponding vector, with indices starting at 0. Currently column-major layout
17217 is assumed. The intrinsics support both integer and floating point matrixes.
17220 '``llvm.matrix.transpose.*``' Intrinsic
17221 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17225 This is an overloaded intrinsic.
17229 declare vectorty @llvm.matrix.transpose.*(vectorty %In, i32 <Rows>, i32 <Cols>)
17234 The '``llvm.matrix.transpose.*``' intrinsics treat ``%In`` as a ``<Rows> x
17235 <Cols>`` matrix and return the transposed matrix in the result vector.
17240 The first argument ``%In`` is a vector that corresponds to a ``<Rows> x
17241 <Cols>`` matrix. Thus, arguments ``<Rows>`` and ``<Cols>`` correspond to the
17242 number of rows and columns, respectively, and must be positive, constant
17243 integers. The returned vector must have ``<Rows> * <Cols>`` elements, and have
17244 the same float or integer element type as ``%In``.
17246 '``llvm.matrix.multiply.*``' Intrinsic
17247 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17251 This is an overloaded intrinsic.
17255 declare vectorty @llvm.matrix.multiply.*(vectorty %A, vectorty %B, i32 <OuterRows>, i32 <Inner>, i32 <OuterColumns>)
17260 The '``llvm.matrix.multiply.*``' intrinsics treat ``%A`` as a ``<OuterRows> x
17261 <Inner>`` matrix, ``%B`` as a ``<Inner> x <OuterColumns>`` matrix, and
17262 multiplies them. The result matrix is returned in the result vector.
17267 The first vector argument ``%A`` corresponds to a matrix with ``<OuterRows> *
17268 <Inner>`` elements, and the second argument ``%B`` to a matrix with
17269 ``<Inner> * <OuterColumns>`` elements. Arguments ``<OuterRows>``,
17270 ``<Inner>`` and ``<OuterColumns>`` must be positive, constant integers. The
17271 returned vector must have ``<OuterRows> * <OuterColumns>`` elements.
17272 Vectors ``%A``, ``%B``, and the returned vector all have the same float or
17273 integer element type.
17276 '``llvm.matrix.column.major.load.*``' Intrinsic
17277 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17281 This is an overloaded intrinsic.
17285 declare vectorty @llvm.matrix.column.major.load.*(
17286 ptrty %Ptr, i64 %Stride, i1 <IsVolatile>, i32 <Rows>, i32 <Cols>)
17291 The '``llvm.matrix.column.major.load.*``' intrinsics load a ``<Rows> x <Cols>``
17292 matrix using a stride of ``%Stride`` to compute the start address of the
17293 different columns. The offset is computed using ``%Stride``'s bitwidth. This
17294 allows for convenient loading of sub matrixes. If ``<IsVolatile>`` is true, the
17295 intrinsic is considered a :ref:`volatile memory access <volatile>`. The result
17296 matrix is returned in the result vector. If the ``%Ptr`` argument is known to
17297 be aligned to some boundary, this can be specified as an attribute on the
17303 The first argument ``%Ptr`` is a pointer type to the returned vector type, and
17304 corresponds to the start address to load from. The second argument ``%Stride``
17305 is a positive, constant integer with ``%Stride >= <Rows>``. ``%Stride`` is used
17306 to compute the column memory addresses. I.e., for a column ``C``, its start
17307 memory addresses is calculated with ``%Ptr + C * %Stride``. The third Argument
17308 ``<IsVolatile>`` is a boolean value. The fourth and fifth arguments,
17309 ``<Rows>`` and ``<Cols>``, correspond to the number of rows and columns,
17310 respectively, and must be positive, constant integers. The returned vector must
17311 have ``<Rows> * <Cols>`` elements.
17313 The :ref:`align <attr_align>` parameter attribute can be provided for the
17314 ``%Ptr`` arguments.
17317 '``llvm.matrix.column.major.store.*``' Intrinsic
17318 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17325 declare void @llvm.matrix.column.major.store.*(
17326 vectorty %In, ptrty %Ptr, i64 %Stride, i1 <IsVolatile>, i32 <Rows>, i32 <Cols>)
17331 The '``llvm.matrix.column.major.store.*``' intrinsics store the ``<Rows> x
17332 <Cols>`` matrix in ``%In`` to memory using a stride of ``%Stride`` between
17333 columns. The offset is computed using ``%Stride``'s bitwidth. If
17334 ``<IsVolatile>`` is true, the intrinsic is considered a
17335 :ref:`volatile memory access <volatile>`.
17337 If the ``%Ptr`` argument is known to be aligned to some boundary, this can be
17338 specified as an attribute on the argument.
17343 The first argument ``%In`` is a vector that corresponds to a ``<Rows> x
17344 <Cols>`` matrix to be stored to memory. The second argument ``%Ptr`` is a
17345 pointer to the vector type of ``%In``, and is the start address of the matrix
17346 in memory. The third argument ``%Stride`` is a positive, constant integer with
17347 ``%Stride >= <Rows>``. ``%Stride`` is used to compute the column memory
17348 addresses. I.e., for a column ``C``, its start memory addresses is calculated
17349 with ``%Ptr + C * %Stride``. The fourth argument ``<IsVolatile>`` is a boolean
17350 value. The arguments ``<Rows>`` and ``<Cols>`` correspond to the number of rows
17351 and columns, respectively, and must be positive, constant integers.
17353 The :ref:`align <attr_align>` parameter attribute can be provided
17354 for the ``%Ptr`` arguments.
17357 Half Precision Floating-Point Intrinsics
17358 ----------------------------------------
17360 For most target platforms, half precision floating-point is a
17361 storage-only format. This means that it is a dense encoding (in memory)
17362 but does not support computation in the format.
17364 This means that code must first load the half-precision floating-point
17365 value as an i16, then convert it to float with
17366 :ref:`llvm.convert.from.fp16 <int_convert_from_fp16>`. Computation can
17367 then be performed on the float value (including extending to double
17368 etc). To store the value back to memory, it is first converted to float
17369 if needed, then converted to i16 with
17370 :ref:`llvm.convert.to.fp16 <int_convert_to_fp16>`, then storing as an
17373 .. _int_convert_to_fp16:
17375 '``llvm.convert.to.fp16``' Intrinsic
17376 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17383 declare i16 @llvm.convert.to.fp16.f32(float %a)
17384 declare i16 @llvm.convert.to.fp16.f64(double %a)
17389 The '``llvm.convert.to.fp16``' intrinsic function performs a conversion from a
17390 conventional floating-point type to half precision floating-point format.
17395 The intrinsic function contains single argument - the value to be
17401 The '``llvm.convert.to.fp16``' intrinsic function performs a conversion from a
17402 conventional floating-point format to half precision floating-point format. The
17403 return value is an ``i16`` which contains the converted number.
17408 .. code-block:: llvm
17410 %res = call i16 @llvm.convert.to.fp16.f32(float %a)
17411 store i16 %res, i16* @x, align 2
17413 .. _int_convert_from_fp16:
17415 '``llvm.convert.from.fp16``' Intrinsic
17416 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17423 declare float @llvm.convert.from.fp16.f32(i16 %a)
17424 declare double @llvm.convert.from.fp16.f64(i16 %a)
17429 The '``llvm.convert.from.fp16``' intrinsic function performs a
17430 conversion from half precision floating-point format to single precision
17431 floating-point format.
17436 The intrinsic function contains single argument - the value to be
17442 The '``llvm.convert.from.fp16``' intrinsic function performs a
17443 conversion from half single precision floating-point format to single
17444 precision floating-point format. The input half-float value is
17445 represented by an ``i16`` value.
17450 .. code-block:: llvm
17452 %a = load i16, i16* @x, align 2
17453 %res = call float @llvm.convert.from.fp16(i16 %a)
17455 Saturating floating-point to integer conversions
17456 ------------------------------------------------
17458 The ``fptoui`` and ``fptosi`` instructions return a
17459 :ref:`poison value <poisonvalues>` if the rounded-towards-zero value is not
17460 representable by the result type. These intrinsics provide an alternative
17461 conversion, which will saturate towards the smallest and largest representable
17462 integer values instead.
17464 '``llvm.fptoui.sat.*``' Intrinsic
17465 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17470 This is an overloaded intrinsic. You can use ``llvm.fptoui.sat`` on any
17471 floating-point argument type and any integer result type, or vectors thereof.
17472 Not all targets may support all types, however.
17476 declare i32 @llvm.fptoui.sat.i32.f32(float %f)
17477 declare i19 @llvm.fptoui.sat.i19.f64(double %f)
17478 declare <4 x i100> @llvm.fptoui.sat.v4i100.v4f128(<4 x fp128> %f)
17483 This intrinsic converts the argument into an unsigned integer using saturating
17489 The argument may be any floating-point or vector of floating-point type. The
17490 return value may be any integer or vector of integer type. The number of vector
17491 elements in argument and return must be the same.
17496 The conversion to integer is performed subject to the following rules:
17498 - If the argument is any NaN, zero is returned.
17499 - If the argument is smaller than zero (this includes negative infinity),
17501 - If the argument is larger than the largest representable unsigned integer of
17502 the result type (this includes positive infinity), the largest representable
17503 unsigned integer is returned.
17504 - Otherwise, the result of rounding the argument towards zero is returned.
17509 .. code-block:: text
17511 %a = call i8 @llvm.fptoui.sat.i8.f32(float 123.9) ; yields i8: 123
17512 %b = call i8 @llvm.fptoui.sat.i8.f32(float -5.7) ; yields i8: 0
17513 %c = call i8 @llvm.fptoui.sat.i8.f32(float 377.0) ; yields i8: 255
17514 %d = call i8 @llvm.fptoui.sat.i8.f32(float 0xFFF8000000000000) ; yields i8: 0
17516 '``llvm.fptosi.sat.*``' Intrinsic
17517 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17522 This is an overloaded intrinsic. You can use ``llvm.fptosi.sat`` on any
17523 floating-point argument type and any integer result type, or vectors thereof.
17524 Not all targets may support all types, however.
17528 declare i32 @llvm.fptosi.sat.i32.f32(float %f)
17529 declare i19 @llvm.fptosi.sat.i19.f64(double %f)
17530 declare <4 x i100> @llvm.fptosi.sat.v4i100.v4f128(<4 x fp128> %f)
17535 This intrinsic converts the argument into a signed integer using saturating
17541 The argument may be any floating-point or vector of floating-point type. The
17542 return value may be any integer or vector of integer type. The number of vector
17543 elements in argument and return must be the same.
17548 The conversion to integer is performed subject to the following rules:
17550 - If the argument is any NaN, zero is returned.
17551 - If the argument is smaller than the smallest representable signed integer of
17552 the result type (this includes negative infinity), the smallest
17553 representable signed integer is returned.
17554 - If the argument is larger than the largest representable signed integer of
17555 the result type (this includes positive infinity), the largest representable
17556 signed integer is returned.
17557 - Otherwise, the result of rounding the argument towards zero is returned.
17562 .. code-block:: text
17564 %a = call i8 @llvm.fptosi.sat.i8.f32(float 23.9) ; yields i8: 23
17565 %b = call i8 @llvm.fptosi.sat.i8.f32(float -130.8) ; yields i8: -128
17566 %c = call i8 @llvm.fptosi.sat.i8.f32(float 999.0) ; yields i8: 127
17567 %d = call i8 @llvm.fptosi.sat.i8.f32(float 0xFFF8000000000000) ; yields i8: 0
17569 .. _dbg_intrinsics:
17571 Debugger Intrinsics
17572 -------------------
17574 The LLVM debugger intrinsics (which all start with ``llvm.dbg.``
17575 prefix), are described in the `LLVM Source Level
17576 Debugging <SourceLevelDebugging.html#format-common-intrinsics>`_
17579 Exception Handling Intrinsics
17580 -----------------------------
17582 The LLVM exception handling intrinsics (which all start with
17583 ``llvm.eh.`` prefix), are described in the `LLVM Exception
17584 Handling <ExceptionHandling.html#format-common-intrinsics>`_ document.
17586 .. _int_trampoline:
17588 Trampoline Intrinsics
17589 ---------------------
17591 These intrinsics make it possible to excise one parameter, marked with
17592 the :ref:`nest <nest>` attribute, from a function. The result is a
17593 callable function pointer lacking the nest parameter - the caller does
17594 not need to provide a value for it. Instead, the value to use is stored
17595 in advance in a "trampoline", a block of memory usually allocated on the
17596 stack, which also contains code to splice the nest value into the
17597 argument list. This is used to implement the GCC nested function address
17600 For example, if the function is ``i32 f(i8* nest %c, i32 %x, i32 %y)``
17601 then the resulting function pointer has signature ``i32 (i32, i32)*``.
17602 It can be created as follows:
17604 .. code-block:: llvm
17606 %tramp = alloca [10 x i8], align 4 ; size and alignment only correct for X86
17607 %tramp1 = getelementptr [10 x i8], [10 x i8]* %tramp, i32 0, i32 0
17608 call i8* @llvm.init.trampoline(i8* %tramp1, i8* bitcast (i32 (i8*, i32, i32)* @f to i8*), i8* %nval)
17609 %p = call i8* @llvm.adjust.trampoline(i8* %tramp1)
17610 %fp = bitcast i8* %p to i32 (i32, i32)*
17612 The call ``%val = call i32 %fp(i32 %x, i32 %y)`` is then equivalent to
17613 ``%val = call i32 %f(i8* %nval, i32 %x, i32 %y)``.
17617 '``llvm.init.trampoline``' Intrinsic
17618 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17625 declare void @llvm.init.trampoline(i8* <tramp>, i8* <func>, i8* <nval>)
17630 This fills the memory pointed to by ``tramp`` with executable code,
17631 turning it into a trampoline.
17636 The ``llvm.init.trampoline`` intrinsic takes three arguments, all
17637 pointers. The ``tramp`` argument must point to a sufficiently large and
17638 sufficiently aligned block of memory; this memory is written to by the
17639 intrinsic. Note that the size and the alignment are target-specific -
17640 LLVM currently provides no portable way of determining them, so a
17641 front-end that generates this intrinsic needs to have some
17642 target-specific knowledge. The ``func`` argument must hold a function
17643 bitcast to an ``i8*``.
17648 The block of memory pointed to by ``tramp`` is filled with target
17649 dependent code, turning it into a function. Then ``tramp`` needs to be
17650 passed to :ref:`llvm.adjust.trampoline <int_at>` to get a pointer which can
17651 be :ref:`bitcast (to a new function) and called <int_trampoline>`. The new
17652 function's signature is the same as that of ``func`` with any arguments
17653 marked with the ``nest`` attribute removed. At most one such ``nest``
17654 argument is allowed, and it must be of pointer type. Calling the new
17655 function is equivalent to calling ``func`` with the same argument list,
17656 but with ``nval`` used for the missing ``nest`` argument. If, after
17657 calling ``llvm.init.trampoline``, the memory pointed to by ``tramp`` is
17658 modified, then the effect of any later call to the returned function
17659 pointer is undefined.
17663 '``llvm.adjust.trampoline``' Intrinsic
17664 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17671 declare i8* @llvm.adjust.trampoline(i8* <tramp>)
17676 This performs any required machine-specific adjustment to the address of
17677 a trampoline (passed as ``tramp``).
17682 ``tramp`` must point to a block of memory which already has trampoline
17683 code filled in by a previous call to
17684 :ref:`llvm.init.trampoline <int_it>`.
17689 On some architectures the address of the code to be executed needs to be
17690 different than the address where the trampoline is actually stored. This
17691 intrinsic returns the executable address corresponding to ``tramp``
17692 after performing the required machine specific adjustments. The pointer
17693 returned can then be :ref:`bitcast and executed <int_trampoline>`.
17698 Vector Predication Intrinsics
17699 -----------------------------
17700 VP intrinsics are intended for predicated SIMD/vector code. A typical VP
17701 operation takes a vector mask and an explicit vector length parameter as in:
17705 <W x T> llvm.vp.<opcode>.*(<W x T> %x, <W x T> %y, <W x i1> %mask, i32 %evl)
17707 The vector mask parameter (%mask) always has a vector of `i1` type, for example
17708 `<32 x i1>`. The explicit vector length parameter always has the type `i32` and
17709 is an unsigned integer value. The explicit vector length parameter (%evl) is in
17714 0 <= %evl <= W, where W is the number of vector elements
17716 Note that for :ref:`scalable vector types <t_vector>` ``W`` is the runtime
17717 length of the vector.
17719 The VP intrinsic has undefined behavior if ``%evl > W``. The explicit vector
17720 length (%evl) creates a mask, %EVLmask, with all elements ``0 <= i < %evl`` set
17721 to True, and all other lanes ``%evl <= i < W`` to False. A new mask %M is
17722 calculated with an element-wise AND from %mask and %EVLmask:
17726 M = %mask AND %EVLmask
17728 A vector operation ``<opcode>`` on vectors ``A`` and ``B`` calculates:
17732 A <opcode> B = { A[i] <opcode> B[i] M[i] = True, and
17738 Some targets, such as AVX512, do not support the %evl parameter in hardware.
17739 The use of an effective %evl is discouraged for those targets. The function
17740 ``TargetTransformInfo::hasActiveVectorLength()`` returns true when the target
17741 has native support for %evl.
17746 '``llvm.vp.add.*``' Intrinsics
17747 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17751 This is an overloaded intrinsic.
17755 declare <16 x i32> @llvm.vp.add.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
17756 declare <vscale x 4 x i32> @llvm.vp.add.nxv4i32 (<vscale x 4 x i32> <left_op>, <vscale x 4 x i32> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
17757 declare <256 x i64> @llvm.vp.add.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)
17762 Predicated integer addition of two vectors of integers.
17768 The first two operands and the result have the same vector of integer type. The
17769 third operand is the vector mask and has the same number of elements as the
17770 result vector type. The fourth operand is the explicit vector length of the
17776 The '``llvm.vp.add``' intrinsic performs integer addition (:ref:`add <i_add>`)
17777 of the first and second vector operand on each enabled lane. The result on
17778 disabled lanes is undefined.
17783 .. code-block:: llvm
17785 %r = call <4 x i32> @llvm.vp.add.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl)
17786 ;; For all lanes below %evl, %r is lane-wise equivalent to %also.r
17788 %t = add <4 x i32> %a, %b
17789 %also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef
17793 '``llvm.vp.sub.*``' Intrinsics
17794 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17798 This is an overloaded intrinsic.
17802 declare <16 x i32> @llvm.vp.sub.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
17803 declare <vscale x 4 x i32> @llvm.vp.sub.nxv4i32 (<vscale x 4 x i32> <left_op>, <vscale x 4 x i32> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
17804 declare <256 x i64> @llvm.vp.sub.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)
17809 Predicated integer subtraction of two vectors of integers.
17815 The first two operands and the result have the same vector of integer type. The
17816 third operand is the vector mask and has the same number of elements as the
17817 result vector type. The fourth operand is the explicit vector length of the
17823 The '``llvm.vp.sub``' intrinsic performs integer subtraction
17824 (:ref:`sub <i_sub>`) of the first and second vector operand on each enabled
17825 lane. The result on disabled lanes is undefined.
17830 .. code-block:: llvm
17832 %r = call <4 x i32> @llvm.vp.sub.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl)
17833 ;; For all lanes below %evl, %r is lane-wise equivalent to %also.r
17835 %t = sub <4 x i32> %a, %b
17836 %also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef
17842 '``llvm.vp.mul.*``' Intrinsics
17843 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17847 This is an overloaded intrinsic.
17851 declare <16 x i32> @llvm.vp.mul.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
17852 declare <vscale x 4 x i32> @llvm.vp.mul.nxv46i32 (<vscale x 4 x i32> <left_op>, <vscale x 4 x i32> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
17853 declare <256 x i64> @llvm.vp.mul.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)
17858 Predicated integer multiplication of two vectors of integers.
17864 The first two operands and the result have the same vector of integer type. The
17865 third operand is the vector mask and has the same number of elements as the
17866 result vector type. The fourth operand is the explicit vector length of the
17871 The '``llvm.vp.mul``' intrinsic performs integer multiplication
17872 (:ref:`mul <i_mul>`) of the first and second vector operand on each enabled
17873 lane. The result on disabled lanes is undefined.
17878 .. code-block:: llvm
17880 %r = call <4 x i32> @llvm.vp.mul.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl)
17881 ;; For all lanes below %evl, %r is lane-wise equivalent to %also.r
17883 %t = mul <4 x i32> %a, %b
17884 %also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef
17889 '``llvm.vp.sdiv.*``' Intrinsics
17890 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17894 This is an overloaded intrinsic.
17898 declare <16 x i32> @llvm.vp.sdiv.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
17899 declare <vscale x 4 x i32> @llvm.vp.sdiv.nxv4i32 (<vscale x 4 x i32> <left_op>, <vscale x 4 x i32> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
17900 declare <256 x i64> @llvm.vp.sdiv.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)
17905 Predicated, signed division of two vectors of integers.
17911 The first two operands and the result have the same vector of integer type. The
17912 third operand is the vector mask and has the same number of elements as the
17913 result vector type. The fourth operand is the explicit vector length of the
17919 The '``llvm.vp.sdiv``' intrinsic performs signed division (:ref:`sdiv <i_sdiv>`)
17920 of the first and second vector operand on each enabled lane. The result on
17921 disabled lanes is undefined.
17926 .. code-block:: llvm
17928 %r = call <4 x i32> @llvm.vp.sdiv.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl)
17929 ;; For all lanes below %evl, %r is lane-wise equivalent to %also.r
17931 %t = sdiv <4 x i32> %a, %b
17932 %also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef
17937 '``llvm.vp.udiv.*``' Intrinsics
17938 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17942 This is an overloaded intrinsic.
17946 declare <16 x i32> @llvm.vp.udiv.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
17947 declare <vscale x 4 x i32> @llvm.vp.udiv.nxv4i32 (<vscale x 4 x i32> <left_op>, <vscale x 4 x i32> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
17948 declare <256 x i64> @llvm.vp.udiv.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)
17953 Predicated, unsigned division of two vectors of integers.
17959 The first two operands and the result have the same vector of integer type. The third operand is the vector mask and has the same number of elements as the result vector type. The fourth operand is the explicit vector length of the operation.
17964 The '``llvm.vp.udiv``' intrinsic performs unsigned division
17965 (:ref:`udiv <i_udiv>`) of the first and second vector operand on each enabled
17966 lane. The result on disabled lanes is undefined.
17971 .. code-block:: llvm
17973 %r = call <4 x i32> @llvm.vp.udiv.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl)
17974 ;; For all lanes below %evl, %r is lane-wise equivalent to %also.r
17976 %t = udiv <4 x i32> %a, %b
17977 %also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef
17983 '``llvm.vp.srem.*``' Intrinsics
17984 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
17988 This is an overloaded intrinsic.
17992 declare <16 x i32> @llvm.vp.srem.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
17993 declare <vscale x 4 x i32> @llvm.vp.srem.nxv4i32 (<vscale x 4 x i32> <left_op>, <vscale x 4 x i32> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
17994 declare <256 x i64> @llvm.vp.srem.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)
17999 Predicated computations of the signed remainder of two integer vectors.
18005 The first two operands and the result have the same vector of integer type. The
18006 third operand is the vector mask and has the same number of elements as the
18007 result vector type. The fourth operand is the explicit vector length of the
18013 The '``llvm.vp.srem``' intrinsic computes the remainder of the signed division
18014 (:ref:`srem <i_srem>`) of the first and second vector operand on each enabled
18015 lane. The result on disabled lanes is undefined.
18020 .. code-block:: llvm
18022 %r = call <4 x i32> @llvm.vp.srem.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl)
18023 ;; For all lanes below %evl, %r is lane-wise equivalent to %also.r
18025 %t = srem <4 x i32> %a, %b
18026 %also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef
18032 '``llvm.vp.urem.*``' Intrinsics
18033 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
18037 This is an overloaded intrinsic.
18041 declare <16 x i32> @llvm.vp.urem.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
18042 declare <vscale x 4 x i32> @llvm.vp.urem.nxv4i32 (<vscale x 4 x i32> <left_op>, <vscale x 4 x i32> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
18043 declare <256 x i64> @llvm.vp.urem.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)
18048 Predicated computation of the unsigned remainder of two integer vectors.
18054 The first two operands and the result have the same vector of integer type. The
18055 third operand is the vector mask and has the same number of elements as the
18056 result vector type. The fourth operand is the explicit vector length of the
18062 The '``llvm.vp.urem``' intrinsic computes the remainder of the unsigned division
18063 (:ref:`urem <i_urem>`) of the first and second vector operand on each enabled
18064 lane. The result on disabled lanes is undefined.
18069 .. code-block:: llvm
18071 %r = call <4 x i32> @llvm.vp.urem.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl)
18072 ;; For all lanes below %evl, %r is lane-wise equivalent to %also.r
18074 %t = urem <4 x i32> %a, %b
18075 %also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef
18080 '``llvm.vp.ashr.*``' Intrinsics
18081 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
18085 This is an overloaded intrinsic.
18089 declare <16 x i32> @llvm.vp.ashr.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
18090 declare <vscale x 4 x i32> @llvm.vp.ashr.nxv4i32 (<vscale x 4 x i32> <left_op>, <vscale x 4 x i32> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
18091 declare <256 x i64> @llvm.vp.ashr.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)
18096 Vector-predicated arithmetic right-shift.
18102 The first two operands and the result have the same vector of integer type. The
18103 third operand is the vector mask and has the same number of elements as the
18104 result vector type. The fourth operand is the explicit vector length of the
18110 The '``llvm.vp.ashr``' intrinsic computes the arithmetic right shift
18111 (:ref:`ashr <i_ashr>`) of the first operand by the second operand on each
18112 enabled lane. The result on disabled lanes is undefined.
18117 .. code-block:: llvm
18119 %r = call <4 x i32> @llvm.vp.ashr.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl)
18120 ;; For all lanes below %evl, %r is lane-wise equivalent to %also.r
18122 %t = ashr <4 x i32> %a, %b
18123 %also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef
18129 '``llvm.vp.lshr.*``' Intrinsics
18130 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
18134 This is an overloaded intrinsic.
18138 declare <16 x i32> @llvm.vp.lshr.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
18139 declare <vscale x 4 x i32> @llvm.vp.lshr.nxv4i32 (<vscale x 4 x i32> <left_op>, <vscale x 4 x i32> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
18140 declare <256 x i64> @llvm.vp.lshr.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)
18145 Vector-predicated logical right-shift.
18151 The first two operands and the result have the same vector of integer type. The
18152 third operand is the vector mask and has the same number of elements as the
18153 result vector type. The fourth operand is the explicit vector length of the
18159 The '``llvm.vp.lshr``' intrinsic computes the logical right shift
18160 (:ref:`lshr <i_lshr>`) of the first operand by the second operand on each
18161 enabled lane. The result on disabled lanes is undefined.
18166 .. code-block:: llvm
18168 %r = call <4 x i32> @llvm.vp.lshr.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl)
18169 ;; For all lanes below %evl, %r is lane-wise equivalent to %also.r
18171 %t = lshr <4 x i32> %a, %b
18172 %also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef
18177 '``llvm.vp.shl.*``' Intrinsics
18178 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
18182 This is an overloaded intrinsic.
18186 declare <16 x i32> @llvm.vp.shl.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
18187 declare <vscale x 4 x i32> @llvm.vp.shl.nxv4i32 (<vscale x 4 x i32> <left_op>, <vscale x 4 x i32> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
18188 declare <256 x i64> @llvm.vp.shl.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)
18193 Vector-predicated left shift.
18199 The first two operands and the result have the same vector of integer type. The
18200 third operand is the vector mask and has the same number of elements as the
18201 result vector type. The fourth operand is the explicit vector length of the
18207 The '``llvm.vp.shl``' intrinsic computes the left shift (:ref:`shl <i_shl>`) of
18208 the first operand by the second operand on each enabled lane. The result on
18209 disabled lanes is undefined.
18214 .. code-block:: llvm
18216 %r = call <4 x i32> @llvm.vp.shl.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl)
18217 ;; For all lanes below %evl, %r is lane-wise equivalent to %also.r
18219 %t = shl <4 x i32> %a, %b
18220 %also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef
18225 '``llvm.vp.or.*``' Intrinsics
18226 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
18230 This is an overloaded intrinsic.
18234 declare <16 x i32> @llvm.vp.or.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
18235 declare <vscale x 4 x i32> @llvm.vp.or.nxv4i32 (<vscale x 4 x i32> <left_op>, <vscale x 4 x i32> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
18236 declare <256 x i64> @llvm.vp.or.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)
18241 Vector-predicated or.
18247 The first two operands and the result have the same vector of integer type. The
18248 third operand is the vector mask and has the same number of elements as the
18249 result vector type. The fourth operand is the explicit vector length of the
18255 The '``llvm.vp.or``' intrinsic performs a bitwise or (:ref:`or <i_or>`) of the
18256 first two operands on each enabled lane. The result on disabled lanes is
18262 .. code-block:: llvm
18264 %r = call <4 x i32> @llvm.vp.or.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl)
18265 ;; For all lanes below %evl, %r is lane-wise equivalent to %also.r
18267 %t = or <4 x i32> %a, %b
18268 %also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef
18273 '``llvm.vp.and.*``' Intrinsics
18274 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
18278 This is an overloaded intrinsic.
18282 declare <16 x i32> @llvm.vp.and.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
18283 declare <vscale x 4 x i32> @llvm.vp.and.nxv4i32 (<vscale x 4 x i32> <left_op>, <vscale x 4 x i32> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
18284 declare <256 x i64> @llvm.vp.and.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)
18289 Vector-predicated and.
18295 The first two operands and the result have the same vector of integer type. The
18296 third operand is the vector mask and has the same number of elements as the
18297 result vector type. The fourth operand is the explicit vector length of the
18303 The '``llvm.vp.and``' intrinsic performs a bitwise and (:ref:`and <i_or>`) of
18304 the first two operands on each enabled lane. The result on disabled lanes is
18310 .. code-block:: llvm
18312 %r = call <4 x i32> @llvm.vp.and.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl)
18313 ;; For all lanes below %evl, %r is lane-wise equivalent to %also.r
18315 %t = and <4 x i32> %a, %b
18316 %also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef
18321 '``llvm.vp.xor.*``' Intrinsics
18322 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
18326 This is an overloaded intrinsic.
18330 declare <16 x i32> @llvm.vp.xor.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
18331 declare <vscale x 4 x i32> @llvm.vp.xor.nxv4i32 (<vscale x 4 x i32> <left_op>, <vscale x 4 x i32> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
18332 declare <256 x i64> @llvm.vp.xor.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)
18337 Vector-predicated, bitwise xor.
18343 The first two operands and the result have the same vector of integer type. The
18344 third operand is the vector mask and has the same number of elements as the
18345 result vector type. The fourth operand is the explicit vector length of the
18351 The '``llvm.vp.xor``' intrinsic performs a bitwise xor (:ref:`xor <i_xor>`) of
18352 the first two operands on each enabled lane.
18353 The result on disabled lanes is undefined.
18358 .. code-block:: llvm
18360 %r = call <4 x i32> @llvm.vp.xor.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl)
18361 ;; For all lanes below %evl, %r is lane-wise equivalent to %also.r
18363 %t = xor <4 x i32> %a, %b
18364 %also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef
18369 '``llvm.vp.fadd.*``' Intrinsics
18370 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
18374 This is an overloaded intrinsic.
18378 declare <16 x float> @llvm.vp.fadd.v16f32 (<16 x float> <left_op>, <16 x float> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
18379 declare <vscale x 4 x float> @llvm.vp.fadd.nxv4f32 (<vscale x 4 x float> <left_op>, <vscale x 4 x float> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
18380 declare <256 x double> @llvm.vp.fadd.v256f64 (<256 x double> <left_op>, <256 x double> <right_op>, <256 x i1> <mask>, i32 <vector_length>)
18385 Predicated floating-point addition of two vectors of floating-point values.
18391 The first two operands and the result have the same vector of floating-point type. The
18392 third operand is the vector mask and has the same number of elements as the
18393 result vector type. The fourth operand is the explicit vector length of the
18399 The '``llvm.vp.fadd``' intrinsic performs floating-point addition (:ref:`add <i_fadd>`)
18400 of the first and second vector operand on each enabled lane. The result on
18401 disabled lanes is undefined. The operation is performed in the default
18402 floating-point environment.
18407 .. code-block:: llvm
18409 %r = call <4 x float> @llvm.vp.fadd.v4f32(<4 x float> %a, <4 x float> %b, <4 x i1> %mask, i32 %evl)
18410 ;; For all lanes below %evl, %r is lane-wise equivalent to %also.r
18412 %t = fadd <4 x float> %a, %b
18413 %also.r = select <4 x i1> %mask, <4 x float> %t, <4 x float> undef
18418 '``llvm.vp.fsub.*``' Intrinsics
18419 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
18423 This is an overloaded intrinsic.
18427 declare <16 x float> @llvm.vp.fsub.v16f32 (<16 x float> <left_op>, <16 x float> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
18428 declare <vscale x 4 x float> @llvm.vp.fsub.nxv4f32 (<vscale x 4 x float> <left_op>, <vscale x 4 x float> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
18429 declare <256 x double> @llvm.vp.fsub.v256f64 (<256 x double> <left_op>, <256 x double> <right_op>, <256 x i1> <mask>, i32 <vector_length>)
18434 Predicated floating-point subtraction of two vectors of floating-point values.
18440 The first two operands and the result have the same vector of floating-point type. The
18441 third operand is the vector mask and has the same number of elements as the
18442 result vector type. The fourth operand is the explicit vector length of the
18448 The '``llvm.vp.fsub``' intrinsic performs floating-point subtraction (:ref:`add <i_fsub>`)
18449 of the first and second vector operand on each enabled lane. The result on
18450 disabled lanes is undefined. The operation is performed in the default
18451 floating-point environment.
18456 .. code-block:: llvm
18458 %r = call <4 x float> @llvm.vp.fsub.v4f32(<4 x float> %a, <4 x float> %b, <4 x i1> %mask, i32 %evl)
18459 ;; For all lanes below %evl, %r is lane-wise equivalent to %also.r
18461 %t = fsub <4 x float> %a, %b
18462 %also.r = select <4 x i1> %mask, <4 x float> %t, <4 x float> undef
18467 '``llvm.vp.fmul.*``' Intrinsics
18468 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
18472 This is an overloaded intrinsic.
18476 declare <16 x float> @llvm.vp.fmul.v16f32 (<16 x float> <left_op>, <16 x float> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
18477 declare <vscale x 4 x float> @llvm.vp.fmul.nxv4f32 (<vscale x 4 x float> <left_op>, <vscale x 4 x float> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
18478 declare <256 x double> @llvm.vp.fmul.v256f64 (<256 x double> <left_op>, <256 x double> <right_op>, <256 x i1> <mask>, i32 <vector_length>)
18483 Predicated floating-point multiplication of two vectors of floating-point values.
18489 The first two operands and the result have the same vector of floating-point type. The
18490 third operand is the vector mask and has the same number of elements as the
18491 result vector type. The fourth operand is the explicit vector length of the
18497 The '``llvm.vp.fmul``' intrinsic performs floating-point multiplication (:ref:`add <i_fmul>`)
18498 of the first and second vector operand on each enabled lane. The result on
18499 disabled lanes is undefined. The operation is performed in the default
18500 floating-point environment.
18505 .. code-block:: llvm
18507 %r = call <4 x float> @llvm.vp.fmul.v4f32(<4 x float> %a, <4 x float> %b, <4 x i1> %mask, i32 %evl)
18508 ;; For all lanes below %evl, %r is lane-wise equivalent to %also.r
18510 %t = fmul <4 x float> %a, %b
18511 %also.r = select <4 x i1> %mask, <4 x float> %t, <4 x float> undef
18516 '``llvm.vp.fdiv.*``' Intrinsics
18517 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
18521 This is an overloaded intrinsic.
18525 declare <16 x float> @llvm.vp.fdiv.v16f32 (<16 x float> <left_op>, <16 x float> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
18526 declare <vscale x 4 x float> @llvm.vp.fdiv.nxv4f32 (<vscale x 4 x float> <left_op>, <vscale x 4 x float> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
18527 declare <256 x double> @llvm.vp.fdiv.v256f64 (<256 x double> <left_op>, <256 x double> <right_op>, <256 x i1> <mask>, i32 <vector_length>)
18532 Predicated floating-point division of two vectors of floating-point values.
18538 The first two operands and the result have the same vector of floating-point type. The
18539 third operand is the vector mask and has the same number of elements as the
18540 result vector type. The fourth operand is the explicit vector length of the
18546 The '``llvm.vp.fdiv``' intrinsic performs floating-point division (:ref:`add <i_fdiv>`)
18547 of the first and second vector operand on each enabled lane. The result on
18548 disabled lanes is undefined. The operation is performed in the default
18549 floating-point environment.
18554 .. code-block:: llvm
18556 %r = call <4 x float> @llvm.vp.fdiv.v4f32(<4 x float> %a, <4 x float> %b, <4 x i1> %mask, i32 %evl)
18557 ;; For all lanes below %evl, %r is lane-wise equivalent to %also.r
18559 %t = fdiv <4 x float> %a, %b
18560 %also.r = select <4 x i1> %mask, <4 x float> %t, <4 x float> undef
18565 '``llvm.vp.frem.*``' Intrinsics
18566 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
18570 This is an overloaded intrinsic.
18574 declare <16 x float> @llvm.vp.frem.v16f32 (<16 x float> <left_op>, <16 x float> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
18575 declare <vscale x 4 x float> @llvm.vp.frem.nxv4f32 (<vscale x 4 x float> <left_op>, <vscale x 4 x float> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
18576 declare <256 x double> @llvm.vp.frem.v256f64 (<256 x double> <left_op>, <256 x double> <right_op>, <256 x i1> <mask>, i32 <vector_length>)
18581 Predicated floating-point remainder of two vectors of floating-point values.
18587 The first two operands and the result have the same vector of floating-point type. The
18588 third operand is the vector mask and has the same number of elements as the
18589 result vector type. The fourth operand is the explicit vector length of the
18595 The '``llvm.vp.frem``' intrinsic performs floating-point remainder (:ref:`add <i_frem>`)
18596 of the first and second vector operand on each enabled lane. The result on
18597 disabled lanes is undefined. The operation is performed in the default
18598 floating-point environment.
18603 .. code-block:: llvm
18605 %r = call <4 x float> @llvm.vp.frem.v4f32(<4 x float> %a, <4 x float> %b, <4 x i1> %mask, i32 %evl)
18606 ;; For all lanes below %evl, %r is lane-wise equivalent to %also.r
18608 %t = frem <4 x float> %a, %b
18609 %also.r = select <4 x i1> %mask, <4 x float> %t, <4 x float> undef
18613 .. _int_vp_reduce_add:
18615 '``llvm.vp.reduce.add.*``' Intrinsics
18616 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
18620 This is an overloaded intrinsic.
18624 declare i32 @llvm.vp.reduce.add.v4i32(i32 <start_value>, <4 x i32> <val>, <4 x i1> <mask>, i32 <vector_length>)
18625 declare i16 @llvm.vp.reduce.add.nxv8i16(i16 <start_value>, <vscale x 8 x i16> <val>, <vscale x 8 x i1> <mask>, i32 <vector_length>)
18630 Predicated integer ``ADD`` reduction of a vector and a scalar starting value,
18631 returning the result as a scalar.
18636 The first operand is the start value of the reduction, which must be a scalar
18637 integer type equal to the result type. The second operand is the vector on
18638 which the reduction is performed and must be a vector of integer values whose
18639 element type is the result/start type. The third operand is the vector mask and
18640 is a vector of boolean values with the same number of elements as the vector
18641 operand. The fourth operand is the explicit vector length of the operation.
18646 The '``llvm.vp.reduce.add``' intrinsic performs the integer ``ADD`` reduction
18647 (:ref:`llvm.vector.reduce.add <int_vector_reduce_add>`) of the vector operand
18648 ``val`` on each enabled lane, adding it to the scalar ``start_value``. Disabled
18649 lanes are treated as containing the neutral value ``0`` (i.e. having no effect
18650 on the reduction operation). If the vector length is zero, the result is equal
18651 to ``start_value``.
18653 To ignore the start value, the neutral value can be used.
18658 .. code-block:: llvm
18660 %r = call i32 @llvm.vp.reduce.add.v4i32(i32 %start, <4 x i32> %a, <4 x i1> %mask, i32 %evl)
18661 ; %r is equivalent to %also.r, where lanes greater than or equal to %evl
18662 ; are treated as though %mask were false for those lanes.
18664 %masked.a = select <4 x i1> %mask, <4 x i32> %a, <4 x i32> zeroinitializer
18665 %reduction = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %masked.a)
18666 %also.r = add i32 %reduction, %start
18669 .. _int_vp_reduce_fadd:
18671 '``llvm.vp.reduce.fadd.*``' Intrinsics
18672 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
18676 This is an overloaded intrinsic.
18680 declare float @llvm.vp.reduce.fadd.v4f32(float <start_value>, <4 x float> <val>, <4 x i1> <mask>, i32 <vector_length>)
18681 declare double @llvm.vp.reduce.fadd.nxv8f64(double <start_value>, <vscale x 8 x double> <val>, <vscale x 8 x i1> <mask>, i32 <vector_length>)
18686 Predicated floating-point ``ADD`` reduction of a vector and a scalar starting
18687 value, returning the result as a scalar.
18692 The first operand is the start value of the reduction, which must be a scalar
18693 floating-point type equal to the result type. The second operand is the vector
18694 on which the reduction is performed and must be a vector of floating-point
18695 values whose element type is the result/start type. The third operand is the
18696 vector mask and is a vector of boolean values with the same number of elements
18697 as the vector operand. The fourth operand is the explicit vector length of the
18703 The '``llvm.vp.reduce.fadd``' intrinsic performs the floating-point ``ADD``
18704 reduction (:ref:`llvm.vector.reduce.fadd <int_vector_reduce_fadd>`) of the
18705 vector operand ``val`` on each enabled lane, adding it to the scalar
18706 ``start_value``. Disabled lanes are treated as containing the neutral value
18707 ``-0.0`` (i.e. having no effect on the reduction operation). If no lanes are
18708 enabled, the resulting value will be equal to ``start_value``.
18710 To ignore the start value, the neutral value can be used.
18712 See the unpredicated version (:ref:`llvm.vector.reduce.fadd
18713 <int_vector_reduce_fadd>`) for more detail on the semantics of the reduction.
18718 .. code-block:: llvm
18720 %r = call float @llvm.vp.reduce.fadd.v4f32(float %start, <4 x float> %a, <4 x i1> %mask, i32 %evl)
18721 ; %r is equivalent to %also.r, where lanes greater than or equal to %evl
18722 ; are treated as though %mask were false for those lanes.
18724 %masked.a = select <4 x i1> %mask, <4 x float> %a, <4 x float> <float -0.0, float -0.0, float -0.0, float -0.0>
18725 %also.r = call float @llvm.vector.reduce.fadd.v4f32(float %start, <4 x float> %masked.a)
18728 .. _int_vp_reduce_mul:
18730 '``llvm.vp.reduce.mul.*``' Intrinsics
18731 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
18735 This is an overloaded intrinsic.
18739 declare i32 @llvm.vp.reduce.mul.v4i32(i32 <start_value>, <4 x i32> <val>, <4 x i1> <mask>, i32 <vector_length>)
18740 declare i16 @llvm.vp.reduce.mul.nxv8i16(i16 <start_value>, <vscale x 8 x i16> <val>, <vscale x 8 x i1> <mask>, i32 <vector_length>)
18745 Predicated integer ``MUL`` reduction of a vector and a scalar starting value,
18746 returning the result as a scalar.
18752 The first operand is the start value of the reduction, which must be a scalar
18753 integer type equal to the result type. The second operand is the vector on
18754 which the reduction is performed and must be a vector of integer values whose
18755 element type is the result/start type. The third operand is the vector mask and
18756 is a vector of boolean values with the same number of elements as the vector
18757 operand. The fourth operand is the explicit vector length of the operation.
18762 The '``llvm.vp.reduce.mul``' intrinsic performs the integer ``MUL`` reduction
18763 (:ref:`llvm.vector.reduce.mul <int_vector_reduce_mul>`) of the vector operand ``val``
18764 on each enabled lane, multiplying it by the scalar ``start_value``. Disabled
18765 lanes are treated as containing the neutral value ``1`` (i.e. having no effect
18766 on the reduction operation). If the vector length is zero, the result is the
18769 To ignore the start value, the neutral value can be used.
18774 .. code-block:: llvm
18776 %r = call i32 @llvm.vp.reduce.mul.v4i32(i32 %start, <4 x i32> %a, <4 x i1> %mask, i32 %evl)
18777 ; %r is equivalent to %also.r, where lanes greater than or equal to %evl
18778 ; are treated as though %mask were false for those lanes.
18780 %masked.a = select <4 x i1> %mask, <4 x i32> %a, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
18781 %reduction = call i32 @llvm.vector.reduce.mul.v4i32(<4 x i32> %masked.a)
18782 %also.r = mul i32 %reduction, %start
18784 .. _int_vp_reduce_fmul:
18786 '``llvm.vp.reduce.fmul.*``' Intrinsics
18787 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
18791 This is an overloaded intrinsic.
18795 declare float @llvm.vp.reduce.fmul.v4f32(float <start_value>, <4 x float> <val>, <4 x i1> <mask>, i32 <vector_length>)
18796 declare double @llvm.vp.reduce.fmul.nxv8f64(double <start_value>, <vscale x 8 x double> <val>, <vscale x 8 x i1> <mask>, i32 <vector_length>)
18801 Predicated floating-point ``MUL`` reduction of a vector and a scalar starting
18802 value, returning the result as a scalar.
18808 The first operand is the start value of the reduction, which must be a scalar
18809 floating-point type equal to the result type. The second operand is the vector
18810 on which the reduction is performed and must be a vector of floating-point
18811 values whose element type is the result/start type. The third operand is the
18812 vector mask and is a vector of boolean values with the same number of elements
18813 as the vector operand. The fourth operand is the explicit vector length of the
18819 The '``llvm.vp.reduce.fmul``' intrinsic performs the floating-point ``MUL``
18820 reduction (:ref:`llvm.vector.reduce.fmul <int_vector_reduce_fmul>`) of the
18821 vector operand ``val`` on each enabled lane, multiplying it by the scalar
18822 `start_value``. Disabled lanes are treated as containing the neutral value
18823 ``1.0`` (i.e. having no effect on the reduction operation). If no lanes are
18824 enabled, the resulting value will be equal to the starting value.
18826 To ignore the start value, the neutral value can be used.
18828 See the unpredicated version (:ref:`llvm.vector.reduce.fmul
18829 <int_vector_reduce_fmul>`) for more detail on the semantics.
18834 .. code-block:: llvm
18836 %r = call float @llvm.vp.reduce.fmul.v4f32(float %start, <4 x float> %a, <4 x i1> %mask, i32 %evl)
18837 ; %r is equivalent to %also.r, where lanes greater than or equal to %evl
18838 ; are treated as though %mask were false for those lanes.
18840 %masked.a = select <4 x i1> %mask, <4 x float> %a, <4 x float> <float 1.0, float 1.0, float 1.0, float 1.0>
18841 %also.r = call float @llvm.vector.reduce.fmul.v4f32(float %start, <4 x float> %masked.a)
18844 .. _int_vp_reduce_and:
18846 '``llvm.vp.reduce.and.*``' Intrinsics
18847 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
18851 This is an overloaded intrinsic.
18855 declare i32 @llvm.vp.reduce.and.v4i32(i32 <start_value>, <4 x i32> <val>, <4 x i1> <mask>, i32 <vector_length>)
18856 declare i16 @llvm.vp.reduce.and.nxv8i16(i16 <start_value>, <vscale x 8 x i16> <val>, <vscale x 8 x i1> <mask>, i32 <vector_length>)
18861 Predicated integer ``AND`` reduction of a vector and a scalar starting value,
18862 returning the result as a scalar.
18868 The first operand is the start value of the reduction, which must be a scalar
18869 integer type equal to the result type. The second operand is the vector on
18870 which the reduction is performed and must be a vector of integer values whose
18871 element type is the result/start type. The third operand is the vector mask and
18872 is a vector of boolean values with the same number of elements as the vector
18873 operand. The fourth operand is the explicit vector length of the operation.
18878 The '``llvm.vp.reduce.and``' intrinsic performs the integer ``AND`` reduction
18879 (:ref:`llvm.vector.reduce.and <int_vector_reduce_and>`) of the vector operand
18880 ``val`` on each enabled lane, performing an '``and``' of that with with the
18881 scalar ``start_value``. Disabled lanes are treated as containing the neutral
18882 value ``UINT_MAX``, or ``-1`` (i.e. having no effect on the reduction
18883 operation). If the vector length is zero, the result is the start value.
18885 To ignore the start value, the neutral value can be used.
18890 .. code-block:: llvm
18892 %r = call i32 @llvm.vp.reduce.and.v4i32(i32 %start, <4 x i32> %a, <4 x i1> %mask, i32 %evl)
18893 ; %r is equivalent to %also.r, where lanes greater than or equal to %evl
18894 ; are treated as though %mask were false for those lanes.
18896 %masked.a = select <4 x i1> %mask, <4 x i32> %a, <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>
18897 %reduction = call i32 @llvm.vector.reduce.and.v4i32(<4 x i32> %masked.a)
18898 %also.r = and i32 %reduction, %start
18901 .. _int_vp_reduce_or:
18903 '``llvm.vp.reduce.or.*``' Intrinsics
18904 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
18908 This is an overloaded intrinsic.
18912 declare i32 @llvm.vp.reduce.or.v4i32(i32 <start_value>, <4 x i32> <val>, <4 x i1> <mask>, i32 <vector_length>)
18913 declare i16 @llvm.vp.reduce.or.nxv8i16(i16 <start_value>, <vscale x 8 x i16> <val>, <vscale x 8 x i1> <mask>, i32 <vector_length>)
18918 Predicated integer ``OR`` reduction of a vector and a scalar starting value,
18919 returning the result as a scalar.
18925 The first operand is the start value of the reduction, which must be a scalar
18926 integer type equal to the result type. The second operand is the vector on
18927 which the reduction is performed and must be a vector of integer values whose
18928 element type is the result/start type. The third operand is the vector mask and
18929 is a vector of boolean values with the same number of elements as the vector
18930 operand. The fourth operand is the explicit vector length of the operation.
18935 The '``llvm.vp.reduce.or``' intrinsic performs the integer ``OR`` reduction
18936 (:ref:`llvm.vector.reduce.or <int_vector_reduce_or>`) of the vector operand
18937 ``val`` on each enabled lane, performing an '``or``' of that with the scalar
18938 ``start_value``. Disabled lanes are treated as containing the neutral value
18939 ``0`` (i.e. having no effect on the reduction operation). If the vector length
18940 is zero, the result is the start value.
18942 To ignore the start value, the neutral value can be used.
18947 .. code-block:: llvm
18949 %r = call i32 @llvm.vp.reduce.or.v4i32(i32 %start, <4 x i32> %a, <4 x i1> %mask, i32 %evl)
18950 ; %r is equivalent to %also.r, where lanes greater than or equal to %evl
18951 ; are treated as though %mask were false for those lanes.
18953 %masked.a = select <4 x i1> %mask, <4 x i32> %a, <4 x i32> <i32 0, i32 0, i32 0, i32 0>
18954 %reduction = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> %masked.a)
18955 %also.r = or i32 %reduction, %start
18957 .. _int_vp_reduce_xor:
18959 '``llvm.vp.reduce.xor.*``' Intrinsics
18960 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
18964 This is an overloaded intrinsic.
18968 declare i32 @llvm.vp.reduce.xor.v4i32(i32 <start_value>, <4 x i32> <val>, <4 x i1> <mask>, i32 <vector_length>)
18969 declare i16 @llvm.vp.reduce.xor.nxv8i16(i16 <start_value>, <vscale x 8 x i16> <val>, <vscale x 8 x i1> <mask>, i32 <vector_length>)
18974 Predicated integer ``XOR`` reduction of a vector and a scalar starting value,
18975 returning the result as a scalar.
18981 The first operand is the start value of the reduction, which must be a scalar
18982 integer type equal to the result type. The second operand is the vector on
18983 which the reduction is performed and must be a vector of integer values whose
18984 element type is the result/start type. The third operand is the vector mask and
18985 is a vector of boolean values with the same number of elements as the vector
18986 operand. The fourth operand is the explicit vector length of the operation.
18991 The '``llvm.vp.reduce.xor``' intrinsic performs the integer ``XOR`` reduction
18992 (:ref:`llvm.vector.reduce.xor <int_vector_reduce_xor>`) of the vector operand
18993 ``val`` on each enabled lane, performing an '``xor``' of that with the scalar
18994 ``start_value``. Disabled lanes are treated as containing the neutral value
18995 ``0`` (i.e. having no effect on the reduction operation). If the vector length
18996 is zero, the result is the start value.
18998 To ignore the start value, the neutral value can be used.
19003 .. code-block:: llvm
19005 %r = call i32 @llvm.vp.reduce.xor.v4i32(i32 %start, <4 x i32> %a, <4 x i1> %mask, i32 %evl)
19006 ; %r is equivalent to %also.r, where lanes greater than or equal to %evl
19007 ; are treated as though %mask were false for those lanes.
19009 %masked.a = select <4 x i1> %mask, <4 x i32> %a, <4 x i32> <i32 0, i32 0, i32 0, i32 0>
19010 %reduction = call i32 @llvm.vector.reduce.xor.v4i32(<4 x i32> %masked.a)
19011 %also.r = xor i32 %reduction, %start
19014 .. _int_vp_reduce_smax:
19016 '``llvm.vp.reduce.smax.*``' Intrinsics
19017 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19021 This is an overloaded intrinsic.
19025 declare i32 @llvm.vp.reduce.smax.v4i32(i32 <start_value>, <4 x i32> <val>, <4 x i1> <mask>, i32 <vector_length>)
19026 declare i16 @llvm.vp.reduce.smax.nxv8i16(i16 <start_value>, <vscale x 8 x i16> <val>, <vscale x 8 x i1> <mask>, i32 <vector_length>)
19031 Predicated signed-integer ``MAX`` reduction of a vector and a scalar starting
19032 value, returning the result as a scalar.
19038 The first operand is the start value of the reduction, which must be a scalar
19039 integer type equal to the result type. The second operand is the vector on
19040 which the reduction is performed and must be a vector of integer values whose
19041 element type is the result/start type. The third operand is the vector mask and
19042 is a vector of boolean values with the same number of elements as the vector
19043 operand. The fourth operand is the explicit vector length of the operation.
19048 The '``llvm.vp.reduce.smax``' intrinsic performs the signed-integer ``MAX``
19049 reduction (:ref:`llvm.vector.reduce.smax <int_vector_reduce_smax>`) of the
19050 vector operand ``val`` on each enabled lane, and taking the maximum of that and
19051 the scalar ``start_value``. Disabled lanes are treated as containing the
19052 neutral value ``INT_MIN`` (i.e. having no effect on the reduction operation).
19053 If the vector length is zero, the result is the start value.
19055 To ignore the start value, the neutral value can be used.
19060 .. code-block:: llvm
19062 %r = call i8 @llvm.vp.reduce.smax.v4i8(i8 %start, <4 x i8> %a, <4 x i1> %mask, i32 %evl)
19063 ; %r is equivalent to %also.r, where lanes greater than or equal to %evl
19064 ; are treated as though %mask were false for those lanes.
19066 %masked.a = select <4 x i1> %mask, <4 x i8> %a, <4 x i8> <i8 -128, i8 -128, i8 -128, i8 -128>
19067 %reduction = call i8 @llvm.vector.reduce.smax.v4i8(<4 x i8> %masked.a)
19068 %also.r = call i8 @llvm.smax.i8(i8 %reduction, i8 %start)
19071 .. _int_vp_reduce_smin:
19073 '``llvm.vp.reduce.smin.*``' Intrinsics
19074 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19078 This is an overloaded intrinsic.
19082 declare i32 @llvm.vp.reduce.smin.v4i32(i32 <start_value>, <4 x i32> <val>, <4 x i1> <mask>, i32 <vector_length>)
19083 declare i16 @llvm.vp.reduce.smin.nxv8i16(i16 <start_value>, <vscale x 8 x i16> <val>, <vscale x 8 x i1> <mask>, i32 <vector_length>)
19088 Predicated signed-integer ``MIN`` reduction of a vector and a scalar starting
19089 value, returning the result as a scalar.
19095 The first operand is the start value of the reduction, which must be a scalar
19096 integer type equal to the result type. The second operand is the vector on
19097 which the reduction is performed and must be a vector of integer values whose
19098 element type is the result/start type. The third operand is the vector mask and
19099 is a vector of boolean values with the same number of elements as the vector
19100 operand. The fourth operand is the explicit vector length of the operation.
19105 The '``llvm.vp.reduce.smin``' intrinsic performs the signed-integer ``MIN``
19106 reduction (:ref:`llvm.vector.reduce.smin <int_vector_reduce_smin>`) of the
19107 vector operand ``val`` on each enabled lane, and taking the minimum of that and
19108 the scalar ``start_value``. Disabled lanes are treated as containing the
19109 neutral value ``INT_MAX`` (i.e. having no effect on the reduction operation).
19110 If the vector length is zero, the result is the start value.
19112 To ignore the start value, the neutral value can be used.
19117 .. code-block:: llvm
19119 %r = call i8 @llvm.vp.reduce.smin.v4i8(i8 %start, <4 x i8> %a, <4 x i1> %mask, i32 %evl)
19120 ; %r is equivalent to %also.r, where lanes greater than or equal to %evl
19121 ; are treated as though %mask were false for those lanes.
19123 %masked.a = select <4 x i1> %mask, <4 x i8> %a, <4 x i8> <i8 127, i8 127, i8 127, i8 127>
19124 %reduction = call i8 @llvm.vector.reduce.smin.v4i8(<4 x i8> %masked.a)
19125 %also.r = call i8 @llvm.smin.i8(i8 %reduction, i8 %start)
19128 .. _int_vp_reduce_umax:
19130 '``llvm.vp.reduce.umax.*``' Intrinsics
19131 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19135 This is an overloaded intrinsic.
19139 declare i32 @llvm.vp.reduce.umax.v4i32(i32 <start_value>, <4 x i32> <val>, <4 x i1> <mask>, i32 <vector_length>)
19140 declare i16 @llvm.vp.reduce.umax.nxv8i16(i16 <start_value>, <vscale x 8 x i16> <val>, <vscale x 8 x i1> <mask>, i32 <vector_length>)
19145 Predicated unsigned-integer ``MAX`` reduction of a vector and a scalar starting
19146 value, returning the result as a scalar.
19152 The first operand is the start value of the reduction, which must be a scalar
19153 integer type equal to the result type. The second operand is the vector on
19154 which the reduction is performed and must be a vector of integer values whose
19155 element type is the result/start type. The third operand is the vector mask and
19156 is a vector of boolean values with the same number of elements as the vector
19157 operand. The fourth operand is the explicit vector length of the operation.
19162 The '``llvm.vp.reduce.umax``' intrinsic performs the unsigned-integer ``MAX``
19163 reduction (:ref:`llvm.vector.reduce.umax <int_vector_reduce_umax>`) of the
19164 vector operand ``val`` on each enabled lane, and taking the maximum of that and
19165 the scalar ``start_value``. Disabled lanes are treated as containing the
19166 neutral value ``0`` (i.e. having no effect on the reduction operation). If the
19167 vector length is zero, the result is the start value.
19169 To ignore the start value, the neutral value can be used.
19174 .. code-block:: llvm
19176 %r = call i32 @llvm.vp.reduce.umax.v4i32(i32 %start, <4 x i32> %a, <4 x i1> %mask, i32 %evl)
19177 ; %r is equivalent to %also.r, where lanes greater than or equal to %evl
19178 ; are treated as though %mask were false for those lanes.
19180 %masked.a = select <4 x i1> %mask, <4 x i32> %a, <4 x i32> <i32 0, i32 0, i32 0, i32 0>
19181 %reduction = call i32 @llvm.vector.reduce.umax.v4i32(<4 x i32> %masked.a)
19182 %also.r = call i32 @llvm.umax.i32(i32 %reduction, i32 %start)
19185 .. _int_vp_reduce_umin:
19187 '``llvm.vp.reduce.umin.*``' Intrinsics
19188 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19192 This is an overloaded intrinsic.
19196 declare i32 @llvm.vp.reduce.umin.v4i32(i32 <start_value>, <4 x i32> <val>, <4 x i1> <mask>, i32 <vector_length>)
19197 declare i16 @llvm.vp.reduce.umin.nxv8i16(i16 <start_value>, <vscale x 8 x i16> <val>, <vscale x 8 x i1> <mask>, i32 <vector_length>)
19202 Predicated unsigned-integer ``MIN`` reduction of a vector and a scalar starting
19203 value, returning the result as a scalar.
19209 The first operand is the start value of the reduction, which must be a scalar
19210 integer type equal to the result type. The second operand is the vector on
19211 which the reduction is performed and must be a vector of integer values whose
19212 element type is the result/start type. The third operand is the vector mask and
19213 is a vector of boolean values with the same number of elements as the vector
19214 operand. The fourth operand is the explicit vector length of the operation.
19219 The '``llvm.vp.reduce.umin``' intrinsic performs the unsigned-integer ``MIN``
19220 reduction (:ref:`llvm.vector.reduce.umin <int_vector_reduce_umin>`) of the
19221 vector operand ``val`` on each enabled lane, taking the minimum of that and the
19222 scalar ``start_value``. Disabled lanes are treated as containing the neutral
19223 value ``UINT_MAX``, or ``-1`` (i.e. having no effect on the reduction
19224 operation). If the vector length is zero, the result is the start value.
19226 To ignore the start value, the neutral value can be used.
19231 .. code-block:: llvm
19233 %r = call i32 @llvm.vp.reduce.umin.v4i32(i32 %start, <4 x i32> %a, <4 x i1> %mask, i32 %evl)
19234 ; %r is equivalent to %also.r, where lanes greater than or equal to %evl
19235 ; are treated as though %mask were false for those lanes.
19237 %masked.a = select <4 x i1> %mask, <4 x i32> %a, <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>
19238 %reduction = call i32 @llvm.vector.reduce.umin.v4i32(<4 x i32> %masked.a)
19239 %also.r = call i32 @llvm.umin.i32(i32 %reduction, i32 %start)
19242 .. _int_vp_reduce_fmax:
19244 '``llvm.vp.reduce.fmax.*``' Intrinsics
19245 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19249 This is an overloaded intrinsic.
19253 declare float @llvm.vp.reduce.fmax.v4f32(float <start_value>, <4 x float> <val>, <4 x i1> <mask>, float <vector_length>)
19254 declare double @llvm.vp.reduce.fmax.nxv8f64(double <start_value>, <vscale x 8 x double> <val>, <vscale x 8 x i1> <mask>, i32 <vector_length>)
19259 Predicated floating-point ``MAX`` reduction of a vector and a scalar starting
19260 value, returning the result as a scalar.
19266 The first operand is the start value of the reduction, which must be a scalar
19267 floating-point type equal to the result type. The second operand is the vector
19268 on which the reduction is performed and must be a vector of floating-point
19269 values whose element type is the result/start type. The third operand is the
19270 vector mask and is a vector of boolean values with the same number of elements
19271 as the vector operand. The fourth operand is the explicit vector length of the
19277 The '``llvm.vp.reduce.fmax``' intrinsic performs the floating-point ``MAX``
19278 reduction (:ref:`llvm.vector.reduce.fmax <int_vector_reduce_fmax>`) of the
19279 vector operand ``val`` on each enabled lane, taking the maximum of that and the
19280 scalar ``start_value``. Disabled lanes are treated as containing the neutral
19281 value (i.e. having no effect on the reduction operation). If the vector length
19282 is zero, the result is the start value.
19284 The neutral value is dependent on the :ref:`fast-math flags <fastmath>`. If no
19285 flags are set, the neutral value is ``-QNAN``. If ``nnan`` and ``ninf`` are
19286 both set, then the neutral value is the smallest floating-point value for the
19287 result type. If only ``nnan`` is set then the neutral value is ``-Infinity``.
19289 This instruction has the same comparison semantics as the
19290 :ref:`llvm.vector.reduce.fmax <int_vector_reduce_fmax>` intrinsic (and thus the
19291 '``llvm.maxnum.*``' intrinsic). That is, the result will always be a number
19292 unless all elements of the vector and the starting value are ``NaN``. For a
19293 vector with maximum element magnitude ``0.0`` and containing both ``+0.0`` and
19294 ``-0.0`` elements, the sign of the result is unspecified.
19296 To ignore the start value, the neutral value can be used.
19301 .. code-block:: llvm
19303 %r = call float @llvm.vp.reduce.fmax.v4f32(float %float, <4 x float> %a, <4 x i1> %mask, i32 %evl)
19304 ; %r is equivalent to %also.r, where lanes greater than or equal to %evl
19305 ; are treated as though %mask were false for those lanes.
19307 %masked.a = select <4 x i1> %mask, <4 x float> %a, <4 x float> <float QNAN, float QNAN, float QNAN, float QNAN>
19308 %reduction = call float @llvm.vector.reduce.fmax.v4f32(<4 x float> %masked.a)
19309 %also.r = call float @llvm.maxnum.f32(float %reduction, float %start)
19312 .. _int_vp_reduce_fmin:
19314 '``llvm.vp.reduce.fmin.*``' Intrinsics
19315 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19319 This is an overloaded intrinsic.
19323 declare float @llvm.vp.reduce.fmin.v4f32(float <start_value>, <4 x float> <val>, <4 x i1> <mask>, float <vector_length>)
19324 declare double @llvm.vp.reduce.fmin.nxv8f64(double <start_value>, <vscale x 8 x double> <val>, <vscale x 8 x i1> <mask>, i32 <vector_length>)
19329 Predicated floating-point ``MIN`` reduction of a vector and a scalar starting
19330 value, returning the result as a scalar.
19336 The first operand is the start value of the reduction, which must be a scalar
19337 floating-point type equal to the result type. The second operand is the vector
19338 on which the reduction is performed and must be a vector of floating-point
19339 values whose element type is the result/start type. The third operand is the
19340 vector mask and is a vector of boolean values with the same number of elements
19341 as the vector operand. The fourth operand is the explicit vector length of the
19347 The '``llvm.vp.reduce.fmin``' intrinsic performs the floating-point ``MIN``
19348 reduction (:ref:`llvm.vector.reduce.fmin <int_vector_reduce_fmin>`) of the
19349 vector operand ``val`` on each enabled lane, taking the minimum of that and the
19350 scalar ``start_value``. Disabled lanes are treated as containing the neutral
19351 value (i.e. having no effect on the reduction operation). If the vector length
19352 is zero, the result is the start value.
19354 The neutral value is dependent on the :ref:`fast-math flags <fastmath>`. If no
19355 flags are set, the neutral value is ``+QNAN``. If ``nnan`` and ``ninf`` are
19356 both set, then the neutral value is the largest floating-point value for the
19357 result type. If only ``nnan`` is set then the neutral value is ``+Infinity``.
19359 This instruction has the same comparison semantics as the
19360 :ref:`llvm.vector.reduce.fmin <int_vector_reduce_fmin>` intrinsic (and thus the
19361 '``llvm.minnum.*``' intrinsic). That is, the result will always be a number
19362 unless all elements of the vector and the starting value are ``NaN``. For a
19363 vector with maximum element magnitude ``0.0`` and containing both ``+0.0`` and
19364 ``-0.0`` elements, the sign of the result is unspecified.
19366 To ignore the start value, the neutral value can be used.
19371 .. code-block:: llvm
19373 %r = call float @llvm.vp.reduce.fmin.v4f32(float %start, <4 x float> %a, <4 x i1> %mask, i32 %evl)
19374 ; %r is equivalent to %also.r, where lanes greater than or equal to %evl
19375 ; are treated as though %mask were false for those lanes.
19377 %masked.a = select <4 x i1> %mask, <4 x float> %a, <4 x float> <float QNAN, float QNAN, float QNAN, float QNAN>
19378 %reduction = call float @llvm.vector.reduce.fmin.v4f32(<4 x float> %masked.a)
19379 %also.r = call float @llvm.minnum.f32(float %reduction, float %start)
19382 .. _int_get_active_lane_mask:
19384 '``llvm.get.active.lane.mask.*``' Intrinsics
19385 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19389 This is an overloaded intrinsic.
19393 declare <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 %base, i32 %n)
19394 declare <8 x i1> @llvm.get.active.lane.mask.v8i1.i64(i64 %base, i64 %n)
19395 declare <16 x i1> @llvm.get.active.lane.mask.v16i1.i64(i64 %base, i64 %n)
19396 declare <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 %base, i64 %n)
19402 Create a mask representing active and inactive vector lanes.
19408 Both operands have the same scalar integer type. The result is a vector with
19409 the i1 element type.
19414 The '``llvm.get.active.lane.mask.*``' intrinsics are semantically equivalent
19419 %m[i] = icmp ult (%base + i), %n
19421 where ``%m`` is a vector (mask) of active/inactive lanes with its elements
19422 indexed by ``i``, and ``%base``, ``%n`` are the two arguments to
19423 ``llvm.get.active.lane.mask.*``, ``%icmp`` is an integer compare and ``ult``
19424 the unsigned less-than comparison operator. Overflow cannot occur in
19425 ``(%base + i)`` and its comparison against ``%n`` as it is performed in integer
19426 numbers and not in machine numbers. If ``%n`` is ``0``, then the result is a
19427 poison value. The above is equivalent to:
19431 %m = @llvm.get.active.lane.mask(%base, %n)
19433 This can, for example, be emitted by the loop vectorizer in which case
19434 ``%base`` is the first element of the vector induction variable (VIV) and
19435 ``%n`` is the loop tripcount. Thus, these intrinsics perform an element-wise
19436 less than comparison of VIV with the loop tripcount, producing a mask of
19437 true/false values representing active/inactive vector lanes, except if the VIV
19438 overflows in which case they return false in the lanes where the VIV overflows.
19439 The arguments are scalar types to accommodate scalable vector types, for which
19440 it is unknown what the type of the step vector needs to be that enumerate its
19441 lanes without overflow.
19443 This mask ``%m`` can e.g. be used in masked load/store instructions. These
19444 intrinsics provide a hint to the backend. I.e., for a vector loop, the
19445 back-edge taken count of the original scalar loop is explicit as the second
19452 .. code-block:: llvm
19454 %active.lane.mask = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 %elem0, i64 429)
19455 %wide.masked.load = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %3, i32 4, <4 x i1> %active.lane.mask, <4 x i32> undef)
19458 .. _int_mload_mstore:
19460 Masked Vector Load and Store Intrinsics
19461 ---------------------------------------
19463 LLVM provides intrinsics for predicated vector load and store operations. The predicate is specified by a mask operand, which holds one bit per vector element, switching the associated vector lane on or off. The memory addresses corresponding to the "off" lanes are not accessed. When all bits of the mask are on, the intrinsic is identical to a regular vector load or store. When all bits are off, no memory is accessed.
19467 '``llvm.masked.load.*``' Intrinsics
19468 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19472 This is an overloaded intrinsic. The loaded data is a vector of any integer, floating-point or pointer data type.
19476 declare <16 x float> @llvm.masked.load.v16f32.p0v16f32 (<16 x float>* <ptr>, i32 <alignment>, <16 x i1> <mask>, <16 x float> <passthru>)
19477 declare <2 x double> @llvm.masked.load.v2f64.p0v2f64 (<2 x double>* <ptr>, i32 <alignment>, <2 x i1> <mask>, <2 x double> <passthru>)
19478 ;; The data is a vector of pointers to double
19479 declare <8 x double*> @llvm.masked.load.v8p0f64.p0v8p0f64 (<8 x double*>* <ptr>, i32 <alignment>, <8 x i1> <mask>, <8 x double*> <passthru>)
19480 ;; The data is a vector of function pointers
19481 declare <8 x i32 ()*> @llvm.masked.load.v8p0f_i32f.p0v8p0f_i32f (<8 x i32 ()*>* <ptr>, i32 <alignment>, <8 x i1> <mask>, <8 x i32 ()*> <passthru>)
19486 Reads a vector from memory according to the provided mask. The mask holds a bit for each vector lane, and is used to prevent memory accesses to the masked-off lanes. The masked-off lanes in the result vector are taken from the corresponding lanes of the '``passthru``' operand.
19492 The first operand is the base pointer for the load. The second operand is the alignment of the source location. It must be a power of two constant integer value. The third operand, mask, is a vector of boolean values with the same number of elements as the return type. The fourth is a pass-through value that is used to fill the masked-off lanes of the result. The return type, underlying type of the base pointer and the type of the '``passthru``' operand are the same vector types.
19497 The '``llvm.masked.load``' intrinsic is designed for conditional reading of selected vector elements in a single IR operation. It is useful for targets that support vector masked loads and allows vectorizing predicated basic blocks on these targets. Other targets may support this intrinsic differently, for example by lowering it into a sequence of branches that guard scalar load operations.
19498 The result of this operation is equivalent to a regular vector load instruction followed by a 'select' between the loaded and the passthru values, predicated on the same mask. However, using this intrinsic prevents exceptions on memory access to masked-off lanes.
19503 %res = call <16 x float> @llvm.masked.load.v16f32.p0v16f32 (<16 x float>* %ptr, i32 4, <16 x i1>%mask, <16 x float> %passthru)
19505 ;; The result of the two following instructions is identical aside from potential memory access exception
19506 %loadlal = load <16 x float>, <16 x float>* %ptr, align 4
19507 %res = select <16 x i1> %mask, <16 x float> %loadlal, <16 x float> %passthru
19511 '``llvm.masked.store.*``' Intrinsics
19512 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19516 This is an overloaded intrinsic. The data stored in memory is a vector of any integer, floating-point or pointer data type.
19520 declare void @llvm.masked.store.v8i32.p0v8i32 (<8 x i32> <value>, <8 x i32>* <ptr>, i32 <alignment>, <8 x i1> <mask>)
19521 declare void @llvm.masked.store.v16f32.p0v16f32 (<16 x float> <value>, <16 x float>* <ptr>, i32 <alignment>, <16 x i1> <mask>)
19522 ;; The data is a vector of pointers to double
19523 declare void @llvm.masked.store.v8p0f64.p0v8p0f64 (<8 x double*> <value>, <8 x double*>* <ptr>, i32 <alignment>, <8 x i1> <mask>)
19524 ;; The data is a vector of function pointers
19525 declare void @llvm.masked.store.v4p0f_i32f.p0v4p0f_i32f (<4 x i32 ()*> <value>, <4 x i32 ()*>* <ptr>, i32 <alignment>, <4 x i1> <mask>)
19530 Writes a vector to memory according to the provided mask. The mask holds a bit for each vector lane, and is used to prevent memory accesses to the masked-off lanes.
19535 The first operand is the vector value to be written to memory. The second operand is the base pointer for the store, it has the same underlying type as the value operand. The third operand is the alignment of the destination location. It must be a power of two constant integer value. The fourth operand, mask, is a vector of boolean values. The types of the mask and the value operand must have the same number of vector elements.
19541 The '``llvm.masked.store``' intrinsics is designed for conditional writing of selected vector elements in a single IR operation. It is useful for targets that support vector masked store and allows vectorizing predicated basic blocks on these targets. Other targets may support this intrinsic differently, for example by lowering it into a sequence of branches that guard scalar store operations.
19542 The result of this operation is equivalent to a load-modify-store sequence. However, using this intrinsic prevents exceptions and data races on memory access to masked-off lanes.
19546 call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> %value, <16 x float>* %ptr, i32 4, <16 x i1> %mask)
19548 ;; The result of the following instructions is identical aside from potential data races and memory access exceptions
19549 %oldval = load <16 x float>, <16 x float>* %ptr, align 4
19550 %res = select <16 x i1> %mask, <16 x float> %value, <16 x float> %oldval
19551 store <16 x float> %res, <16 x float>* %ptr, align 4
19554 Masked Vector Gather and Scatter Intrinsics
19555 -------------------------------------------
19557 LLVM provides intrinsics for vector gather and scatter operations. They are similar to :ref:`Masked Vector Load and Store <int_mload_mstore>`, except they are designed for arbitrary memory accesses, rather than sequential memory accesses. Gather and scatter also employ a mask operand, which holds one bit per vector element, switching the associated vector lane on or off. The memory addresses corresponding to the "off" lanes are not accessed. When all bits are off, no memory is accessed.
19561 '``llvm.masked.gather.*``' Intrinsics
19562 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19566 This is an overloaded intrinsic. The loaded data are multiple scalar values of any integer, floating-point or pointer data type gathered together into one vector.
19570 declare <16 x float> @llvm.masked.gather.v16f32.v16p0f32 (<16 x float*> <ptrs>, i32 <alignment>, <16 x i1> <mask>, <16 x float> <passthru>)
19571 declare <2 x double> @llvm.masked.gather.v2f64.v2p1f64 (<2 x double addrspace(1)*> <ptrs>, i32 <alignment>, <2 x i1> <mask>, <2 x double> <passthru>)
19572 declare <8 x float*> @llvm.masked.gather.v8p0f32.v8p0p0f32 (<8 x float**> <ptrs>, i32 <alignment>, <8 x i1> <mask>, <8 x float*> <passthru>)
19577 Reads scalar values from arbitrary memory locations and gathers them into one vector. The memory locations are provided in the vector of pointers '``ptrs``'. The memory is accessed according to the provided mask. The mask holds a bit for each vector lane, and is used to prevent memory accesses to the masked-off lanes. The masked-off lanes in the result vector are taken from the corresponding lanes of the '``passthru``' operand.
19583 The first operand is a vector of pointers which holds all memory addresses to read. The second operand is an alignment of the source addresses. It must be 0 or a power of two constant integer value. The third operand, mask, is a vector of boolean values with the same number of elements as the return type. The fourth is a pass-through value that is used to fill the masked-off lanes of the result. The return type, underlying type of the vector of pointers and the type of the '``passthru``' operand are the same vector types.
19588 The '``llvm.masked.gather``' intrinsic is designed for conditional reading of multiple scalar values from arbitrary memory locations in a single IR operation. It is useful for targets that support vector masked gathers and allows vectorizing basic blocks with data and control divergence. Other targets may support this intrinsic differently, for example by lowering it into a sequence of scalar load operations.
19589 The semantics of this operation are equivalent to a sequence of conditional scalar loads with subsequent gathering all loaded values into a single vector. The mask restricts memory access to certain lanes and facilitates vectorization of predicated basic blocks.
19594 %res = call <4 x double> @llvm.masked.gather.v4f64.v4p0f64 (<4 x double*> %ptrs, i32 8, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x double> undef)
19596 ;; The gather with all-true mask is equivalent to the following instruction sequence
19597 %ptr0 = extractelement <4 x double*> %ptrs, i32 0
19598 %ptr1 = extractelement <4 x double*> %ptrs, i32 1
19599 %ptr2 = extractelement <4 x double*> %ptrs, i32 2
19600 %ptr3 = extractelement <4 x double*> %ptrs, i32 3
19602 %val0 = load double, double* %ptr0, align 8
19603 %val1 = load double, double* %ptr1, align 8
19604 %val2 = load double, double* %ptr2, align 8
19605 %val3 = load double, double* %ptr3, align 8
19607 %vec0 = insertelement <4 x double>undef, %val0, 0
19608 %vec01 = insertelement <4 x double>%vec0, %val1, 1
19609 %vec012 = insertelement <4 x double>%vec01, %val2, 2
19610 %vec0123 = insertelement <4 x double>%vec012, %val3, 3
19614 '``llvm.masked.scatter.*``' Intrinsics
19615 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19619 This is an overloaded intrinsic. The data stored in memory is a vector of any integer, floating-point or pointer data type. Each vector element is stored in an arbitrary memory address. Scatter with overlapping addresses is guaranteed to be ordered from least-significant to most-significant element.
19623 declare void @llvm.masked.scatter.v8i32.v8p0i32 (<8 x i32> <value>, <8 x i32*> <ptrs>, i32 <alignment>, <8 x i1> <mask>)
19624 declare void @llvm.masked.scatter.v16f32.v16p1f32 (<16 x float> <value>, <16 x float addrspace(1)*> <ptrs>, i32 <alignment>, <16 x i1> <mask>)
19625 declare void @llvm.masked.scatter.v4p0f64.v4p0p0f64 (<4 x double*> <value>, <4 x double**> <ptrs>, i32 <alignment>, <4 x i1> <mask>)
19630 Writes each element from the value vector to the corresponding memory address. The memory addresses are represented as a vector of pointers. Writing is done according to the provided mask. The mask holds a bit for each vector lane, and is used to prevent memory accesses to the masked-off lanes.
19635 The first operand is a vector value to be written to memory. The second operand is a vector of pointers, pointing to where the value elements should be stored. It has the same underlying type as the value operand. The third operand is an alignment of the destination addresses. It must be 0 or a power of two constant integer value. The fourth operand, mask, is a vector of boolean values. The types of the mask and the value operand must have the same number of vector elements.
19640 The '``llvm.masked.scatter``' intrinsics is designed for writing selected vector elements to arbitrary memory addresses in a single IR operation. The operation may be conditional, when not all bits in the mask are switched on. It is useful for targets that support vector masked scatter and allows vectorizing basic blocks with data and control divergence. Other targets may support this intrinsic differently, for example by lowering it into a sequence of branches that guard scalar store operations.
19644 ;; This instruction unconditionally stores data vector in multiple addresses
19645 call @llvm.masked.scatter.v8i32.v8p0i32 (<8 x i32> %value, <8 x i32*> %ptrs, i32 4, <8 x i1> <true, true, .. true>)
19647 ;; It is equivalent to a list of scalar stores
19648 %val0 = extractelement <8 x i32> %value, i32 0
19649 %val1 = extractelement <8 x i32> %value, i32 1
19651 %val7 = extractelement <8 x i32> %value, i32 7
19652 %ptr0 = extractelement <8 x i32*> %ptrs, i32 0
19653 %ptr1 = extractelement <8 x i32*> %ptrs, i32 1
19655 %ptr7 = extractelement <8 x i32*> %ptrs, i32 7
19656 ;; Note: the order of the following stores is important when they overlap:
19657 store i32 %val0, i32* %ptr0, align 4
19658 store i32 %val1, i32* %ptr1, align 4
19660 store i32 %val7, i32* %ptr7, align 4
19663 Masked Vector Expanding Load and Compressing Store Intrinsics
19664 -------------------------------------------------------------
19666 LLVM provides intrinsics for expanding load and compressing store operations. Data selected from a vector according to a mask is stored in consecutive memory addresses (compressed store), and vice-versa (expanding load). These operations effective map to "if (cond.i) a[j++] = v.i" and "if (cond.i) v.i = a[j++]" patterns, respectively. Note that when the mask starts with '1' bits followed by '0' bits, these operations are identical to :ref:`llvm.masked.store <int_mstore>` and :ref:`llvm.masked.load <int_mload>`.
19668 .. _int_expandload:
19670 '``llvm.masked.expandload.*``' Intrinsics
19671 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19675 This is an overloaded intrinsic. Several values of integer, floating point or pointer data type are loaded from consecutive memory addresses and stored into the elements of a vector according to the mask.
19679 declare <16 x float> @llvm.masked.expandload.v16f32 (float* <ptr>, <16 x i1> <mask>, <16 x float> <passthru>)
19680 declare <2 x i64> @llvm.masked.expandload.v2i64 (i64* <ptr>, <2 x i1> <mask>, <2 x i64> <passthru>)
19685 Reads a number of scalar values sequentially from memory location provided in '``ptr``' and spreads them in a vector. The '``mask``' holds a bit for each vector lane. The number of elements read from memory is equal to the number of '1' bits in the mask. The loaded elements are positioned in the destination vector according to the sequence of '1' and '0' bits in the mask. E.g., if the mask vector is '10010001', "expandload" reads 3 values from memory addresses ptr, ptr+1, ptr+2 and places them in lanes 0, 3 and 7 accordingly. The masked-off lanes are filled by elements from the corresponding lanes of the '``passthru``' operand.
19691 The first operand is the base pointer for the load. It has the same underlying type as the element of the returned vector. The second operand, mask, is a vector of boolean values with the same number of elements as the return type. The third is a pass-through value that is used to fill the masked-off lanes of the result. The return type and the type of the '``passthru``' operand have the same vector type.
19696 The '``llvm.masked.expandload``' intrinsic is designed for reading multiple scalar values from adjacent memory addresses into possibly non-adjacent vector lanes. It is useful for targets that support vector expanding loads and allows vectorizing loop with cross-iteration dependency like in the following example:
19700 // In this loop we load from B and spread the elements into array A.
19701 double *A, B; int *C;
19702 for (int i = 0; i < size; ++i) {
19708 .. code-block:: llvm
19710 ; Load several elements from array B and expand them in a vector.
19711 ; The number of loaded elements is equal to the number of '1' elements in the Mask.
19712 %Tmp = call <8 x double> @llvm.masked.expandload.v8f64(double* %Bptr, <8 x i1> %Mask, <8 x double> undef)
19713 ; Store the result in A
19714 call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> %Tmp, <8 x double>* %Aptr, i32 8, <8 x i1> %Mask)
19716 ; %Bptr should be increased on each iteration according to the number of '1' elements in the Mask.
19717 %MaskI = bitcast <8 x i1> %Mask to i8
19718 %MaskIPopcnt = call i8 @llvm.ctpop.i8(i8 %MaskI)
19719 %MaskI64 = zext i8 %MaskIPopcnt to i64
19720 %BNextInd = add i64 %BInd, %MaskI64
19723 Other targets may support this intrinsic differently, for example, by lowering it into a sequence of conditional scalar load operations and shuffles.
19724 If all mask elements are '1', the intrinsic behavior is equivalent to the regular unmasked vector load.
19726 .. _int_compressstore:
19728 '``llvm.masked.compressstore.*``' Intrinsics
19729 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19733 This is an overloaded intrinsic. A number of scalar values of integer, floating point or pointer data type are collected from an input vector and stored into adjacent memory addresses. A mask defines which elements to collect from the vector.
19737 declare void @llvm.masked.compressstore.v8i32 (<8 x i32> <value>, i32* <ptr>, <8 x i1> <mask>)
19738 declare void @llvm.masked.compressstore.v16f32 (<16 x float> <value>, float* <ptr>, <16 x i1> <mask>)
19743 Selects elements from input vector '``value``' according to the '``mask``'. All selected elements are written into adjacent memory addresses starting at address '`ptr`', from lower to higher. The mask holds a bit for each vector lane, and is used to select elements to be stored. The number of elements to be stored is equal to the number of active bits in the mask.
19748 The first operand is the input vector, from which elements are collected and written to memory. The second operand is the base pointer for the store, it has the same underlying type as the element of the input vector operand. The third operand is the mask, a vector of boolean values. The mask and the input vector must have the same number of vector elements.
19754 The '``llvm.masked.compressstore``' intrinsic is designed for compressing data in memory. It allows to collect elements from possibly non-adjacent lanes of a vector and store them contiguously in memory in one IR operation. It is useful for targets that support compressing store operations and allows vectorizing loops with cross-iteration dependences like in the following example:
19758 // In this loop we load elements from A and store them consecutively in B
19759 double *A, B; int *C;
19760 for (int i = 0; i < size; ++i) {
19766 .. code-block:: llvm
19768 ; Load elements from A.
19769 %Tmp = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double>* %Aptr, i32 8, <8 x i1> %Mask, <8 x double> undef)
19770 ; Store all selected elements consecutively in array B
19771 call <void> @llvm.masked.compressstore.v8f64(<8 x double> %Tmp, double* %Bptr, <8 x i1> %Mask)
19773 ; %Bptr should be increased on each iteration according to the number of '1' elements in the Mask.
19774 %MaskI = bitcast <8 x i1> %Mask to i8
19775 %MaskIPopcnt = call i8 @llvm.ctpop.i8(i8 %MaskI)
19776 %MaskI64 = zext i8 %MaskIPopcnt to i64
19777 %BNextInd = add i64 %BInd, %MaskI64
19780 Other targets may support this intrinsic differently, for example, by lowering it into a sequence of branches that guard scalar store operations.
19786 This class of intrinsics provides information about the
19787 :ref:`lifetime of memory objects <objectlifetime>` and ranges where variables
19792 '``llvm.lifetime.start``' Intrinsic
19793 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19800 declare void @llvm.lifetime.start(i64 <size>, i8* nocapture <ptr>)
19805 The '``llvm.lifetime.start``' intrinsic specifies the start of a memory
19811 The first argument is a constant integer representing the size of the
19812 object, or -1 if it is variable sized. The second argument is a pointer
19818 If ``ptr`` is a stack-allocated object and it points to the first byte of
19819 the object, the object is initially marked as dead.
19820 ``ptr`` is conservatively considered as a non-stack-allocated object if
19821 the stack coloring algorithm that is used in the optimization pipeline cannot
19822 conclude that ``ptr`` is a stack-allocated object.
19824 After '``llvm.lifetime.start``', the stack object that ``ptr`` points is marked
19825 as alive and has an uninitialized value.
19826 The stack object is marked as dead when either
19827 :ref:`llvm.lifetime.end <int_lifeend>` to the alloca is executed or the
19830 After :ref:`llvm.lifetime.end <int_lifeend>` is called,
19831 '``llvm.lifetime.start``' on the stack object can be called again.
19832 The second '``llvm.lifetime.start``' call marks the object as alive, but it
19833 does not change the address of the object.
19835 If ``ptr`` is a non-stack-allocated object, it does not point to the first
19836 byte of the object or it is a stack object that is already alive, it simply
19837 fills all bytes of the object with ``poison``.
19842 '``llvm.lifetime.end``' Intrinsic
19843 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19850 declare void @llvm.lifetime.end(i64 <size>, i8* nocapture <ptr>)
19855 The '``llvm.lifetime.end``' intrinsic specifies the end of a memory object's
19861 The first argument is a constant integer representing the size of the
19862 object, or -1 if it is variable sized. The second argument is a pointer
19868 If ``ptr`` is a stack-allocated object and it points to the first byte of the
19869 object, the object is dead.
19870 ``ptr`` is conservatively considered as a non-stack-allocated object if
19871 the stack coloring algorithm that is used in the optimization pipeline cannot
19872 conclude that ``ptr`` is a stack-allocated object.
19874 Calling ``llvm.lifetime.end`` on an already dead alloca is no-op.
19876 If ``ptr`` is a non-stack-allocated object or it does not point to the first
19877 byte of the object, it is equivalent to simply filling all bytes of the object
19881 '``llvm.invariant.start``' Intrinsic
19882 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19886 This is an overloaded intrinsic. The memory object can belong to any address space.
19890 declare {}* @llvm.invariant.start.p0i8(i64 <size>, i8* nocapture <ptr>)
19895 The '``llvm.invariant.start``' intrinsic specifies that the contents of
19896 a memory object will not change.
19901 The first argument is a constant integer representing the size of the
19902 object, or -1 if it is variable sized. The second argument is a pointer
19908 This intrinsic indicates that until an ``llvm.invariant.end`` that uses
19909 the return value, the referenced memory location is constant and
19912 '``llvm.invariant.end``' Intrinsic
19913 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19917 This is an overloaded intrinsic. The memory object can belong to any address space.
19921 declare void @llvm.invariant.end.p0i8({}* <start>, i64 <size>, i8* nocapture <ptr>)
19926 The '``llvm.invariant.end``' intrinsic specifies that the contents of a
19927 memory object are mutable.
19932 The first argument is the matching ``llvm.invariant.start`` intrinsic.
19933 The second argument is a constant integer representing the size of the
19934 object, or -1 if it is variable sized and the third argument is a
19935 pointer to the object.
19940 This intrinsic indicates that the memory is mutable again.
19942 '``llvm.launder.invariant.group``' Intrinsic
19943 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19947 This is an overloaded intrinsic. The memory object can belong to any address
19948 space. The returned pointer must belong to the same address space as the
19953 declare i8* @llvm.launder.invariant.group.p0i8(i8* <ptr>)
19958 The '``llvm.launder.invariant.group``' intrinsic can be used when an invariant
19959 established by ``invariant.group`` metadata no longer holds, to obtain a new
19960 pointer value that carries fresh invariant group information. It is an
19961 experimental intrinsic, which means that its semantics might change in the
19968 The ``llvm.launder.invariant.group`` takes only one argument, which is a pointer
19974 Returns another pointer that aliases its argument but which is considered different
19975 for the purposes of ``load``/``store`` ``invariant.group`` metadata.
19976 It does not read any accessible memory and the execution can be speculated.
19978 '``llvm.strip.invariant.group``' Intrinsic
19979 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19983 This is an overloaded intrinsic. The memory object can belong to any address
19984 space. The returned pointer must belong to the same address space as the
19989 declare i8* @llvm.strip.invariant.group.p0i8(i8* <ptr>)
19994 The '``llvm.strip.invariant.group``' intrinsic can be used when an invariant
19995 established by ``invariant.group`` metadata no longer holds, to obtain a new pointer
19996 value that does not carry the invariant information. It is an experimental
19997 intrinsic, which means that its semantics might change in the future.
20003 The ``llvm.strip.invariant.group`` takes only one argument, which is a pointer
20009 Returns another pointer that aliases its argument but which has no associated
20010 ``invariant.group`` metadata.
20011 It does not read any memory and can be speculated.
20017 Constrained Floating-Point Intrinsics
20018 -------------------------------------
20020 These intrinsics are used to provide special handling of floating-point
20021 operations when specific rounding mode or floating-point exception behavior is
20022 required. By default, LLVM optimization passes assume that the rounding mode is
20023 round-to-nearest and that floating-point exceptions will not be monitored.
20024 Constrained FP intrinsics are used to support non-default rounding modes and
20025 accurately preserve exception behavior without compromising LLVM's ability to
20026 optimize FP code when the default behavior is used.
20028 If any FP operation in a function is constrained then they all must be
20029 constrained. This is required for correct LLVM IR. Optimizations that
20030 move code around can create miscompiles if mixing of constrained and normal
20031 operations is done. The correct way to mix constrained and less constrained
20032 operations is to use the rounding mode and exception handling metadata to
20033 mark constrained intrinsics as having LLVM's default behavior.
20035 Each of these intrinsics corresponds to a normal floating-point operation. The
20036 data arguments and the return value are the same as the corresponding FP
20039 The rounding mode argument is a metadata string specifying what
20040 assumptions, if any, the optimizer can make when transforming constant
20041 values. Some constrained FP intrinsics omit this argument. If required
20042 by the intrinsic, this argument must be one of the following strings:
20051 "round.tonearestaway"
20053 If this argument is "round.dynamic" optimization passes must assume that the
20054 rounding mode is unknown and may change at runtime. No transformations that
20055 depend on rounding mode may be performed in this case.
20057 The other possible values for the rounding mode argument correspond to the
20058 similarly named IEEE rounding modes. If the argument is any of these values
20059 optimization passes may perform transformations as long as they are consistent
20060 with the specified rounding mode.
20062 For example, 'x-0'->'x' is not a valid transformation if the rounding mode is
20063 "round.downward" or "round.dynamic" because if the value of 'x' is +0 then
20064 'x-0' should evaluate to '-0' when rounding downward. However, this
20065 transformation is legal for all other rounding modes.
20067 For values other than "round.dynamic" optimization passes may assume that the
20068 actual runtime rounding mode (as defined in a target-specific manner) matches
20069 the specified rounding mode, but this is not guaranteed. Using a specific
20070 non-dynamic rounding mode which does not match the actual rounding mode at
20071 runtime results in undefined behavior.
20073 The exception behavior argument is a metadata string describing the floating
20074 point exception semantics that required for the intrinsic. This argument
20075 must be one of the following strings:
20083 If this argument is "fpexcept.ignore" optimization passes may assume that the
20084 exception status flags will not be read and that floating-point exceptions will
20085 be masked. This allows transformations to be performed that may change the
20086 exception semantics of the original code. For example, FP operations may be
20087 speculatively executed in this case whereas they must not be for either of the
20088 other possible values of this argument.
20090 If the exception behavior argument is "fpexcept.maytrap" optimization passes
20091 must avoid transformations that may raise exceptions that would not have been
20092 raised by the original code (such as speculatively executing FP operations), but
20093 passes are not required to preserve all exceptions that are implied by the
20094 original code. For example, exceptions may be potentially hidden by constant
20097 If the exception behavior argument is "fpexcept.strict" all transformations must
20098 strictly preserve the floating-point exception semantics of the original code.
20099 Any FP exception that would have been raised by the original code must be raised
20100 by the transformed code, and the transformed code must not raise any FP
20101 exceptions that would not have been raised by the original code. This is the
20102 exception behavior argument that will be used if the code being compiled reads
20103 the FP exception status flags, but this mode can also be used with code that
20104 unmasks FP exceptions.
20106 The number and order of floating-point exceptions is NOT guaranteed. For
20107 example, a series of FP operations that each may raise exceptions may be
20108 vectorized into a single instruction that raises each unique exception a single
20111 Proper :ref:`function attributes <fnattrs>` usage is required for the
20112 constrained intrinsics to function correctly.
20114 All function *calls* done in a function that uses constrained floating
20115 point intrinsics must have the ``strictfp`` attribute.
20117 All function *definitions* that use constrained floating point intrinsics
20118 must have the ``strictfp`` attribute.
20120 '``llvm.experimental.constrained.fadd``' Intrinsic
20121 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20129 @llvm.experimental.constrained.fadd(<type> <op1>, <type> <op2>,
20130 metadata <rounding mode>,
20131 metadata <exception behavior>)
20136 The '``llvm.experimental.constrained.fadd``' intrinsic returns the sum of its
20143 The first two arguments to the '``llvm.experimental.constrained.fadd``'
20144 intrinsic must be :ref:`floating-point <t_floating>` or :ref:`vector <t_vector>`
20145 of floating-point values. Both arguments must have identical types.
20147 The third and fourth arguments specify the rounding mode and exception
20148 behavior as described above.
20153 The value produced is the floating-point sum of the two value operands and has
20154 the same type as the operands.
20157 '``llvm.experimental.constrained.fsub``' Intrinsic
20158 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20166 @llvm.experimental.constrained.fsub(<type> <op1>, <type> <op2>,
20167 metadata <rounding mode>,
20168 metadata <exception behavior>)
20173 The '``llvm.experimental.constrained.fsub``' intrinsic returns the difference
20174 of its two operands.
20180 The first two arguments to the '``llvm.experimental.constrained.fsub``'
20181 intrinsic must be :ref:`floating-point <t_floating>` or :ref:`vector <t_vector>`
20182 of floating-point values. Both arguments must have identical types.
20184 The third and fourth arguments specify the rounding mode and exception
20185 behavior as described above.
20190 The value produced is the floating-point difference of the two value operands
20191 and has the same type as the operands.
20194 '``llvm.experimental.constrained.fmul``' Intrinsic
20195 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20203 @llvm.experimental.constrained.fmul(<type> <op1>, <type> <op2>,
20204 metadata <rounding mode>,
20205 metadata <exception behavior>)
20210 The '``llvm.experimental.constrained.fmul``' intrinsic returns the product of
20217 The first two arguments to the '``llvm.experimental.constrained.fmul``'
20218 intrinsic must be :ref:`floating-point <t_floating>` or :ref:`vector <t_vector>`
20219 of floating-point values. Both arguments must have identical types.
20221 The third and fourth arguments specify the rounding mode and exception
20222 behavior as described above.
20227 The value produced is the floating-point product of the two value operands and
20228 has the same type as the operands.
20231 '``llvm.experimental.constrained.fdiv``' Intrinsic
20232 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20240 @llvm.experimental.constrained.fdiv(<type> <op1>, <type> <op2>,
20241 metadata <rounding mode>,
20242 metadata <exception behavior>)
20247 The '``llvm.experimental.constrained.fdiv``' intrinsic returns the quotient of
20254 The first two arguments to the '``llvm.experimental.constrained.fdiv``'
20255 intrinsic must be :ref:`floating-point <t_floating>` or :ref:`vector <t_vector>`
20256 of floating-point values. Both arguments must have identical types.
20258 The third and fourth arguments specify the rounding mode and exception
20259 behavior as described above.
20264 The value produced is the floating-point quotient of the two value operands and
20265 has the same type as the operands.
20268 '``llvm.experimental.constrained.frem``' Intrinsic
20269 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20277 @llvm.experimental.constrained.frem(<type> <op1>, <type> <op2>,
20278 metadata <rounding mode>,
20279 metadata <exception behavior>)
20284 The '``llvm.experimental.constrained.frem``' intrinsic returns the remainder
20285 from the division of its two operands.
20291 The first two arguments to the '``llvm.experimental.constrained.frem``'
20292 intrinsic must be :ref:`floating-point <t_floating>` or :ref:`vector <t_vector>`
20293 of floating-point values. Both arguments must have identical types.
20295 The third and fourth arguments specify the rounding mode and exception
20296 behavior as described above. The rounding mode argument has no effect, since
20297 the result of frem is never rounded, but the argument is included for
20298 consistency with the other constrained floating-point intrinsics.
20303 The value produced is the floating-point remainder from the division of the two
20304 value operands and has the same type as the operands. The remainder has the
20305 same sign as the dividend.
20307 '``llvm.experimental.constrained.fma``' Intrinsic
20308 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20316 @llvm.experimental.constrained.fma(<type> <op1>, <type> <op2>, <type> <op3>,
20317 metadata <rounding mode>,
20318 metadata <exception behavior>)
20323 The '``llvm.experimental.constrained.fma``' intrinsic returns the result of a
20324 fused-multiply-add operation on its operands.
20329 The first three arguments to the '``llvm.experimental.constrained.fma``'
20330 intrinsic must be :ref:`floating-point <t_floating>` or :ref:`vector
20331 <t_vector>` of floating-point values. All arguments must have identical types.
20333 The fourth and fifth arguments specify the rounding mode and exception behavior
20334 as described above.
20339 The result produced is the product of the first two operands added to the third
20340 operand computed with infinite precision, and then rounded to the target
20343 '``llvm.experimental.constrained.fptoui``' Intrinsic
20344 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20352 @llvm.experimental.constrained.fptoui(<type> <value>,
20353 metadata <exception behavior>)
20358 The '``llvm.experimental.constrained.fptoui``' intrinsic converts a
20359 floating-point ``value`` to its unsigned integer equivalent of type ``ty2``.
20364 The first argument to the '``llvm.experimental.constrained.fptoui``'
20365 intrinsic must be :ref:`floating point <t_floating>` or :ref:`vector
20366 <t_vector>` of floating point values.
20368 The second argument specifies the exception behavior as described above.
20373 The result produced is an unsigned integer converted from the floating
20374 point operand. The value is truncated, so it is rounded towards zero.
20376 '``llvm.experimental.constrained.fptosi``' Intrinsic
20377 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20385 @llvm.experimental.constrained.fptosi(<type> <value>,
20386 metadata <exception behavior>)
20391 The '``llvm.experimental.constrained.fptosi``' intrinsic converts
20392 :ref:`floating-point <t_floating>` ``value`` to type ``ty2``.
20397 The first argument to the '``llvm.experimental.constrained.fptosi``'
20398 intrinsic must be :ref:`floating point <t_floating>` or :ref:`vector
20399 <t_vector>` of floating point values.
20401 The second argument specifies the exception behavior as described above.
20406 The result produced is a signed integer converted from the floating
20407 point operand. The value is truncated, so it is rounded towards zero.
20409 '``llvm.experimental.constrained.uitofp``' Intrinsic
20410 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20418 @llvm.experimental.constrained.uitofp(<type> <value>,
20419 metadata <rounding mode>,
20420 metadata <exception behavior>)
20425 The '``llvm.experimental.constrained.uitofp``' intrinsic converts an
20426 unsigned integer ``value`` to a floating-point of type ``ty2``.
20431 The first argument to the '``llvm.experimental.constrained.uitofp``'
20432 intrinsic must be an :ref:`integer <t_integer>` or :ref:`vector
20433 <t_vector>` of integer values.
20435 The second and third arguments specify the rounding mode and exception
20436 behavior as described above.
20441 An inexact floating-point exception will be raised if rounding is required.
20442 Any result produced is a floating point value converted from the input
20445 '``llvm.experimental.constrained.sitofp``' Intrinsic
20446 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20454 @llvm.experimental.constrained.sitofp(<type> <value>,
20455 metadata <rounding mode>,
20456 metadata <exception behavior>)
20461 The '``llvm.experimental.constrained.sitofp``' intrinsic converts a
20462 signed integer ``value`` to a floating-point of type ``ty2``.
20467 The first argument to the '``llvm.experimental.constrained.sitofp``'
20468 intrinsic must be an :ref:`integer <t_integer>` or :ref:`vector
20469 <t_vector>` of integer values.
20471 The second and third arguments specify the rounding mode and exception
20472 behavior as described above.
20477 An inexact floating-point exception will be raised if rounding is required.
20478 Any result produced is a floating point value converted from the input
20481 '``llvm.experimental.constrained.fptrunc``' Intrinsic
20482 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20490 @llvm.experimental.constrained.fptrunc(<type> <value>,
20491 metadata <rounding mode>,
20492 metadata <exception behavior>)
20497 The '``llvm.experimental.constrained.fptrunc``' intrinsic truncates ``value``
20503 The first argument to the '``llvm.experimental.constrained.fptrunc``'
20504 intrinsic must be :ref:`floating point <t_floating>` or :ref:`vector
20505 <t_vector>` of floating point values. This argument must be larger in size
20508 The second and third arguments specify the rounding mode and exception
20509 behavior as described above.
20514 The result produced is a floating point value truncated to be smaller in size
20517 '``llvm.experimental.constrained.fpext``' Intrinsic
20518 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20526 @llvm.experimental.constrained.fpext(<type> <value>,
20527 metadata <exception behavior>)
20532 The '``llvm.experimental.constrained.fpext``' intrinsic extends a
20533 floating-point ``value`` to a larger floating-point value.
20538 The first argument to the '``llvm.experimental.constrained.fpext``'
20539 intrinsic must be :ref:`floating point <t_floating>` or :ref:`vector
20540 <t_vector>` of floating point values. This argument must be smaller in size
20543 The second argument specifies the exception behavior as described above.
20548 The result produced is a floating point value extended to be larger in size
20549 than the operand. All restrictions that apply to the fpext instruction also
20550 apply to this intrinsic.
20552 '``llvm.experimental.constrained.fcmp``' and '``llvm.experimental.constrained.fcmps``' Intrinsics
20553 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20561 @llvm.experimental.constrained.fcmp(<type> <op1>, <type> <op2>,
20562 metadata <condition code>,
20563 metadata <exception behavior>)
20565 @llvm.experimental.constrained.fcmps(<type> <op1>, <type> <op2>,
20566 metadata <condition code>,
20567 metadata <exception behavior>)
20572 The '``llvm.experimental.constrained.fcmp``' and
20573 '``llvm.experimental.constrained.fcmps``' intrinsics return a boolean
20574 value or vector of boolean values based on comparison of its operands.
20576 If the operands are floating-point scalars, then the result type is a
20577 boolean (:ref:`i1 <t_integer>`).
20579 If the operands are floating-point vectors, then the result type is a
20580 vector of boolean with the same number of elements as the operands being
20583 The '``llvm.experimental.constrained.fcmp``' intrinsic performs a quiet
20584 comparison operation while the '``llvm.experimental.constrained.fcmps``'
20585 intrinsic performs a signaling comparison operation.
20590 The first two arguments to the '``llvm.experimental.constrained.fcmp``'
20591 and '``llvm.experimental.constrained.fcmps``' intrinsics must be
20592 :ref:`floating-point <t_floating>` or :ref:`vector <t_vector>`
20593 of floating-point values. Both arguments must have identical types.
20595 The third argument is the condition code indicating the kind of comparison
20596 to perform. It must be a metadata string with one of the following values:
20598 - "``oeq``": ordered and equal
20599 - "``ogt``": ordered and greater than
20600 - "``oge``": ordered and greater than or equal
20601 - "``olt``": ordered and less than
20602 - "``ole``": ordered and less than or equal
20603 - "``one``": ordered and not equal
20604 - "``ord``": ordered (no nans)
20605 - "``ueq``": unordered or equal
20606 - "``ugt``": unordered or greater than
20607 - "``uge``": unordered or greater than or equal
20608 - "``ult``": unordered or less than
20609 - "``ule``": unordered or less than or equal
20610 - "``une``": unordered or not equal
20611 - "``uno``": unordered (either nans)
20613 *Ordered* means that neither operand is a NAN while *unordered* means
20614 that either operand may be a NAN.
20616 The fourth argument specifies the exception behavior as described above.
20621 ``op1`` and ``op2`` are compared according to the condition code given
20622 as the third argument. If the operands are vectors, then the
20623 vectors are compared element by element. Each comparison performed
20624 always yields an :ref:`i1 <t_integer>` result, as follows:
20626 - "``oeq``": yields ``true`` if both operands are not a NAN and ``op1``
20627 is equal to ``op2``.
20628 - "``ogt``": yields ``true`` if both operands are not a NAN and ``op1``
20629 is greater than ``op2``.
20630 - "``oge``": yields ``true`` if both operands are not a NAN and ``op1``
20631 is greater than or equal to ``op2``.
20632 - "``olt``": yields ``true`` if both operands are not a NAN and ``op1``
20633 is less than ``op2``.
20634 - "``ole``": yields ``true`` if both operands are not a NAN and ``op1``
20635 is less than or equal to ``op2``.
20636 - "``one``": yields ``true`` if both operands are not a NAN and ``op1``
20637 is not equal to ``op2``.
20638 - "``ord``": yields ``true`` if both operands are not a NAN.
20639 - "``ueq``": yields ``true`` if either operand is a NAN or ``op1`` is
20641 - "``ugt``": yields ``true`` if either operand is a NAN or ``op1`` is
20642 greater than ``op2``.
20643 - "``uge``": yields ``true`` if either operand is a NAN or ``op1`` is
20644 greater than or equal to ``op2``.
20645 - "``ult``": yields ``true`` if either operand is a NAN or ``op1`` is
20647 - "``ule``": yields ``true`` if either operand is a NAN or ``op1`` is
20648 less than or equal to ``op2``.
20649 - "``une``": yields ``true`` if either operand is a NAN or ``op1`` is
20650 not equal to ``op2``.
20651 - "``uno``": yields ``true`` if either operand is a NAN.
20653 The quiet comparison operation performed by
20654 '``llvm.experimental.constrained.fcmp``' will only raise an exception
20655 if either operand is a SNAN. The signaling comparison operation
20656 performed by '``llvm.experimental.constrained.fcmps``' will raise an
20657 exception if either operand is a NAN (QNAN or SNAN). Such an exception
20658 does not preclude a result being produced (e.g. exception might only
20659 set a flag), therefore the distinction between ordered and unordered
20660 comparisons is also relevant for the
20661 '``llvm.experimental.constrained.fcmps``' intrinsic.
20663 '``llvm.experimental.constrained.fmuladd``' Intrinsic
20664 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20672 @llvm.experimental.constrained.fmuladd(<type> <op1>, <type> <op2>,
20674 metadata <rounding mode>,
20675 metadata <exception behavior>)
20680 The '``llvm.experimental.constrained.fmuladd``' intrinsic represents
20681 multiply-add expressions that can be fused if the code generator determines
20682 that (a) the target instruction set has support for a fused operation,
20683 and (b) that the fused operation is more efficient than the equivalent,
20684 separate pair of mul and add instructions.
20689 The first three arguments to the '``llvm.experimental.constrained.fmuladd``'
20690 intrinsic must be floating-point or vector of floating-point values.
20691 All three arguments must have identical types.
20693 The fourth and fifth arguments specify the rounding mode and exception behavior
20694 as described above.
20703 %0 = call float @llvm.experimental.constrained.fmuladd.f32(%a, %b, %c,
20704 metadata <rounding mode>,
20705 metadata <exception behavior>)
20707 is equivalent to the expression:
20711 %0 = call float @llvm.experimental.constrained.fmul.f32(%a, %b,
20712 metadata <rounding mode>,
20713 metadata <exception behavior>)
20714 %1 = call float @llvm.experimental.constrained.fadd.f32(%0, %c,
20715 metadata <rounding mode>,
20716 metadata <exception behavior>)
20718 except that it is unspecified whether rounding will be performed between the
20719 multiplication and addition steps. Fusion is not guaranteed, even if the target
20720 platform supports it.
20721 If a fused multiply-add is required, the corresponding
20722 :ref:`llvm.experimental.constrained.fma <int_fma>` intrinsic function should be
20724 This never sets errno, just as '``llvm.experimental.constrained.fma.*``'.
20726 Constrained libm-equivalent Intrinsics
20727 --------------------------------------
20729 In addition to the basic floating-point operations for which constrained
20730 intrinsics are described above, there are constrained versions of various
20731 operations which provide equivalent behavior to a corresponding libm function.
20732 These intrinsics allow the precise behavior of these operations with respect to
20733 rounding mode and exception behavior to be controlled.
20735 As with the basic constrained floating-point intrinsics, the rounding mode
20736 and exception behavior arguments only control the behavior of the optimizer.
20737 They do not change the runtime floating-point environment.
20740 '``llvm.experimental.constrained.sqrt``' Intrinsic
20741 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20749 @llvm.experimental.constrained.sqrt(<type> <op1>,
20750 metadata <rounding mode>,
20751 metadata <exception behavior>)
20756 The '``llvm.experimental.constrained.sqrt``' intrinsic returns the square root
20757 of the specified value, returning the same value as the libm '``sqrt``'
20758 functions would, but without setting ``errno``.
20763 The first argument and the return type are floating-point numbers of the same
20766 The second and third arguments specify the rounding mode and exception
20767 behavior as described above.
20772 This function returns the nonnegative square root of the specified value.
20773 If the value is less than negative zero, a floating-point exception occurs
20774 and the return value is architecture specific.
20777 '``llvm.experimental.constrained.pow``' Intrinsic
20778 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20786 @llvm.experimental.constrained.pow(<type> <op1>, <type> <op2>,
20787 metadata <rounding mode>,
20788 metadata <exception behavior>)
20793 The '``llvm.experimental.constrained.pow``' intrinsic returns the first operand
20794 raised to the (positive or negative) power specified by the second operand.
20799 The first two arguments and the return value are floating-point numbers of the
20800 same type. The second argument specifies the power to which the first argument
20803 The third and fourth arguments specify the rounding mode and exception
20804 behavior as described above.
20809 This function returns the first value raised to the second power,
20810 returning the same values as the libm ``pow`` functions would, and
20811 handles error conditions in the same way.
20814 '``llvm.experimental.constrained.powi``' Intrinsic
20815 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20823 @llvm.experimental.constrained.powi(<type> <op1>, i32 <op2>,
20824 metadata <rounding mode>,
20825 metadata <exception behavior>)
20830 The '``llvm.experimental.constrained.powi``' intrinsic returns the first operand
20831 raised to the (positive or negative) power specified by the second operand. The
20832 order of evaluation of multiplications is not defined. When a vector of
20833 floating-point type is used, the second argument remains a scalar integer value.
20839 The first argument and the return value are floating-point numbers of the same
20840 type. The second argument is a 32-bit signed integer specifying the power to
20841 which the first argument should be raised.
20843 The third and fourth arguments specify the rounding mode and exception
20844 behavior as described above.
20849 This function returns the first value raised to the second power with an
20850 unspecified sequence of rounding operations.
20853 '``llvm.experimental.constrained.sin``' Intrinsic
20854 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20862 @llvm.experimental.constrained.sin(<type> <op1>,
20863 metadata <rounding mode>,
20864 metadata <exception behavior>)
20869 The '``llvm.experimental.constrained.sin``' intrinsic returns the sine of the
20875 The first argument and the return type are floating-point numbers of the same
20878 The second and third arguments specify the rounding mode and exception
20879 behavior as described above.
20884 This function returns the sine of the specified operand, returning the
20885 same values as the libm ``sin`` functions would, and handles error
20886 conditions in the same way.
20889 '``llvm.experimental.constrained.cos``' Intrinsic
20890 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20898 @llvm.experimental.constrained.cos(<type> <op1>,
20899 metadata <rounding mode>,
20900 metadata <exception behavior>)
20905 The '``llvm.experimental.constrained.cos``' intrinsic returns the cosine of the
20911 The first argument and the return type are floating-point numbers of the same
20914 The second and third arguments specify the rounding mode and exception
20915 behavior as described above.
20920 This function returns the cosine of the specified operand, returning the
20921 same values as the libm ``cos`` functions would, and handles error
20922 conditions in the same way.
20925 '``llvm.experimental.constrained.exp``' Intrinsic
20926 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20934 @llvm.experimental.constrained.exp(<type> <op1>,
20935 metadata <rounding mode>,
20936 metadata <exception behavior>)
20941 The '``llvm.experimental.constrained.exp``' intrinsic computes the base-e
20942 exponential of the specified value.
20947 The first argument and the return value are floating-point numbers of the same
20950 The second and third arguments specify the rounding mode and exception
20951 behavior as described above.
20956 This function returns the same values as the libm ``exp`` functions
20957 would, and handles error conditions in the same way.
20960 '``llvm.experimental.constrained.exp2``' Intrinsic
20961 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20969 @llvm.experimental.constrained.exp2(<type> <op1>,
20970 metadata <rounding mode>,
20971 metadata <exception behavior>)
20976 The '``llvm.experimental.constrained.exp2``' intrinsic computes the base-2
20977 exponential of the specified value.
20983 The first argument and the return value are floating-point numbers of the same
20986 The second and third arguments specify the rounding mode and exception
20987 behavior as described above.
20992 This function returns the same values as the libm ``exp2`` functions
20993 would, and handles error conditions in the same way.
20996 '``llvm.experimental.constrained.log``' Intrinsic
20997 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21005 @llvm.experimental.constrained.log(<type> <op1>,
21006 metadata <rounding mode>,
21007 metadata <exception behavior>)
21012 The '``llvm.experimental.constrained.log``' intrinsic computes the base-e
21013 logarithm of the specified value.
21018 The first argument and the return value are floating-point numbers of the same
21021 The second and third arguments specify the rounding mode and exception
21022 behavior as described above.
21028 This function returns the same values as the libm ``log`` functions
21029 would, and handles error conditions in the same way.
21032 '``llvm.experimental.constrained.log10``' Intrinsic
21033 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21041 @llvm.experimental.constrained.log10(<type> <op1>,
21042 metadata <rounding mode>,
21043 metadata <exception behavior>)
21048 The '``llvm.experimental.constrained.log10``' intrinsic computes the base-10
21049 logarithm of the specified value.
21054 The first argument and the return value are floating-point numbers of the same
21057 The second and third arguments specify the rounding mode and exception
21058 behavior as described above.
21063 This function returns the same values as the libm ``log10`` functions
21064 would, and handles error conditions in the same way.
21067 '``llvm.experimental.constrained.log2``' Intrinsic
21068 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21076 @llvm.experimental.constrained.log2(<type> <op1>,
21077 metadata <rounding mode>,
21078 metadata <exception behavior>)
21083 The '``llvm.experimental.constrained.log2``' intrinsic computes the base-2
21084 logarithm of the specified value.
21089 The first argument and the return value are floating-point numbers of the same
21092 The second and third arguments specify the rounding mode and exception
21093 behavior as described above.
21098 This function returns the same values as the libm ``log2`` functions
21099 would, and handles error conditions in the same way.
21102 '``llvm.experimental.constrained.rint``' Intrinsic
21103 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21111 @llvm.experimental.constrained.rint(<type> <op1>,
21112 metadata <rounding mode>,
21113 metadata <exception behavior>)
21118 The '``llvm.experimental.constrained.rint``' intrinsic returns the first
21119 operand rounded to the nearest integer. It may raise an inexact floating-point
21120 exception if the operand is not an integer.
21125 The first argument and the return value are floating-point numbers of the same
21128 The second and third arguments specify the rounding mode and exception
21129 behavior as described above.
21134 This function returns the same values as the libm ``rint`` functions
21135 would, and handles error conditions in the same way. The rounding mode is
21136 described, not determined, by the rounding mode argument. The actual rounding
21137 mode is determined by the runtime floating-point environment. The rounding
21138 mode argument is only intended as information to the compiler.
21141 '``llvm.experimental.constrained.lrint``' Intrinsic
21142 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21150 @llvm.experimental.constrained.lrint(<fptype> <op1>,
21151 metadata <rounding mode>,
21152 metadata <exception behavior>)
21157 The '``llvm.experimental.constrained.lrint``' intrinsic returns the first
21158 operand rounded to the nearest integer. An inexact floating-point exception
21159 will be raised if the operand is not an integer. An invalid exception is
21160 raised if the result is too large to fit into a supported integer type,
21161 and in this case the result is undefined.
21166 The first argument is a floating-point number. The return value is an
21167 integer type. Not all types are supported on all targets. The supported
21168 types are the same as the ``llvm.lrint`` intrinsic and the ``lrint``
21171 The second and third arguments specify the rounding mode and exception
21172 behavior as described above.
21177 This function returns the same values as the libm ``lrint`` functions
21178 would, and handles error conditions in the same way.
21180 The rounding mode is described, not determined, by the rounding mode
21181 argument. The actual rounding mode is determined by the runtime floating-point
21182 environment. The rounding mode argument is only intended as information
21185 If the runtime floating-point environment is using the default rounding mode
21186 then the results will be the same as the llvm.lrint intrinsic.
21189 '``llvm.experimental.constrained.llrint``' Intrinsic
21190 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21198 @llvm.experimental.constrained.llrint(<fptype> <op1>,
21199 metadata <rounding mode>,
21200 metadata <exception behavior>)
21205 The '``llvm.experimental.constrained.llrint``' intrinsic returns the first
21206 operand rounded to the nearest integer. An inexact floating-point exception
21207 will be raised if the operand is not an integer. An invalid exception is
21208 raised if the result is too large to fit into a supported integer type,
21209 and in this case the result is undefined.
21214 The first argument is a floating-point number. The return value is an
21215 integer type. Not all types are supported on all targets. The supported
21216 types are the same as the ``llvm.llrint`` intrinsic and the ``llrint``
21219 The second and third arguments specify the rounding mode and exception
21220 behavior as described above.
21225 This function returns the same values as the libm ``llrint`` functions
21226 would, and handles error conditions in the same way.
21228 The rounding mode is described, not determined, by the rounding mode
21229 argument. The actual rounding mode is determined by the runtime floating-point
21230 environment. The rounding mode argument is only intended as information
21233 If the runtime floating-point environment is using the default rounding mode
21234 then the results will be the same as the llvm.llrint intrinsic.
21237 '``llvm.experimental.constrained.nearbyint``' Intrinsic
21238 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21246 @llvm.experimental.constrained.nearbyint(<type> <op1>,
21247 metadata <rounding mode>,
21248 metadata <exception behavior>)
21253 The '``llvm.experimental.constrained.nearbyint``' intrinsic returns the first
21254 operand rounded to the nearest integer. It will not raise an inexact
21255 floating-point exception if the operand is not an integer.
21261 The first argument and the return value are floating-point numbers of the same
21264 The second and third arguments specify the rounding mode and exception
21265 behavior as described above.
21270 This function returns the same values as the libm ``nearbyint`` functions
21271 would, and handles error conditions in the same way. The rounding mode is
21272 described, not determined, by the rounding mode argument. The actual rounding
21273 mode is determined by the runtime floating-point environment. The rounding
21274 mode argument is only intended as information to the compiler.
21277 '``llvm.experimental.constrained.maxnum``' Intrinsic
21278 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21286 @llvm.experimental.constrained.maxnum(<type> <op1>, <type> <op2>
21287 metadata <exception behavior>)
21292 The '``llvm.experimental.constrained.maxnum``' intrinsic returns the maximum
21293 of the two arguments.
21298 The first two arguments and the return value are floating-point numbers
21301 The third argument specifies the exception behavior as described above.
21306 This function follows the IEEE-754 semantics for maxNum.
21309 '``llvm.experimental.constrained.minnum``' Intrinsic
21310 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21318 @llvm.experimental.constrained.minnum(<type> <op1>, <type> <op2>
21319 metadata <exception behavior>)
21324 The '``llvm.experimental.constrained.minnum``' intrinsic returns the minimum
21325 of the two arguments.
21330 The first two arguments and the return value are floating-point numbers
21333 The third argument specifies the exception behavior as described above.
21338 This function follows the IEEE-754 semantics for minNum.
21341 '``llvm.experimental.constrained.maximum``' Intrinsic
21342 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21350 @llvm.experimental.constrained.maximum(<type> <op1>, <type> <op2>
21351 metadata <exception behavior>)
21356 The '``llvm.experimental.constrained.maximum``' intrinsic returns the maximum
21357 of the two arguments, propagating NaNs and treating -0.0 as less than +0.0.
21362 The first two arguments and the return value are floating-point numbers
21365 The third argument specifies the exception behavior as described above.
21370 This function follows semantics specified in the draft of IEEE 754-2018.
21373 '``llvm.experimental.constrained.minimum``' Intrinsic
21374 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21382 @llvm.experimental.constrained.minimum(<type> <op1>, <type> <op2>
21383 metadata <exception behavior>)
21388 The '``llvm.experimental.constrained.minimum``' intrinsic returns the minimum
21389 of the two arguments, propagating NaNs and treating -0.0 as less than +0.0.
21394 The first two arguments and the return value are floating-point numbers
21397 The third argument specifies the exception behavior as described above.
21402 This function follows semantics specified in the draft of IEEE 754-2018.
21405 '``llvm.experimental.constrained.ceil``' Intrinsic
21406 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21414 @llvm.experimental.constrained.ceil(<type> <op1>,
21415 metadata <exception behavior>)
21420 The '``llvm.experimental.constrained.ceil``' intrinsic returns the ceiling of the
21426 The first argument and the return value are floating-point numbers of the same
21429 The second argument specifies the exception behavior as described above.
21434 This function returns the same values as the libm ``ceil`` functions
21435 would and handles error conditions in the same way.
21438 '``llvm.experimental.constrained.floor``' Intrinsic
21439 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21447 @llvm.experimental.constrained.floor(<type> <op1>,
21448 metadata <exception behavior>)
21453 The '``llvm.experimental.constrained.floor``' intrinsic returns the floor of the
21459 The first argument and the return value are floating-point numbers of the same
21462 The second argument specifies the exception behavior as described above.
21467 This function returns the same values as the libm ``floor`` functions
21468 would and handles error conditions in the same way.
21471 '``llvm.experimental.constrained.round``' Intrinsic
21472 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21480 @llvm.experimental.constrained.round(<type> <op1>,
21481 metadata <exception behavior>)
21486 The '``llvm.experimental.constrained.round``' intrinsic returns the first
21487 operand rounded to the nearest integer.
21492 The first argument and the return value are floating-point numbers of the same
21495 The second argument specifies the exception behavior as described above.
21500 This function returns the same values as the libm ``round`` functions
21501 would and handles error conditions in the same way.
21504 '``llvm.experimental.constrained.roundeven``' Intrinsic
21505 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21513 @llvm.experimental.constrained.roundeven(<type> <op1>,
21514 metadata <exception behavior>)
21519 The '``llvm.experimental.constrained.roundeven``' intrinsic returns the first
21520 operand rounded to the nearest integer in floating-point format, rounding
21521 halfway cases to even (that is, to the nearest value that is an even integer),
21522 regardless of the current rounding direction.
21527 The first argument and the return value are floating-point numbers of the same
21530 The second argument specifies the exception behavior as described above.
21535 This function implements IEEE-754 operation ``roundToIntegralTiesToEven``. It
21536 also behaves in the same way as C standard function ``roundeven`` and can signal
21537 the invalid operation exception for a SNAN operand.
21540 '``llvm.experimental.constrained.lround``' Intrinsic
21541 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21549 @llvm.experimental.constrained.lround(<fptype> <op1>,
21550 metadata <exception behavior>)
21555 The '``llvm.experimental.constrained.lround``' intrinsic returns the first
21556 operand rounded to the nearest integer with ties away from zero. It will
21557 raise an inexact floating-point exception if the operand is not an integer.
21558 An invalid exception is raised if the result is too large to fit into a
21559 supported integer type, and in this case the result is undefined.
21564 The first argument is a floating-point number. The return value is an
21565 integer type. Not all types are supported on all targets. The supported
21566 types are the same as the ``llvm.lround`` intrinsic and the ``lround``
21569 The second argument specifies the exception behavior as described above.
21574 This function returns the same values as the libm ``lround`` functions
21575 would and handles error conditions in the same way.
21578 '``llvm.experimental.constrained.llround``' Intrinsic
21579 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21587 @llvm.experimental.constrained.llround(<fptype> <op1>,
21588 metadata <exception behavior>)
21593 The '``llvm.experimental.constrained.llround``' intrinsic returns the first
21594 operand rounded to the nearest integer with ties away from zero. It will
21595 raise an inexact floating-point exception if the operand is not an integer.
21596 An invalid exception is raised if the result is too large to fit into a
21597 supported integer type, and in this case the result is undefined.
21602 The first argument is a floating-point number. The return value is an
21603 integer type. Not all types are supported on all targets. The supported
21604 types are the same as the ``llvm.llround`` intrinsic and the ``llround``
21607 The second argument specifies the exception behavior as described above.
21612 This function returns the same values as the libm ``llround`` functions
21613 would and handles error conditions in the same way.
21616 '``llvm.experimental.constrained.trunc``' Intrinsic
21617 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21625 @llvm.experimental.constrained.trunc(<type> <op1>,
21626 metadata <exception behavior>)
21631 The '``llvm.experimental.constrained.trunc``' intrinsic returns the first
21632 operand rounded to the nearest integer not larger in magnitude than the
21638 The first argument and the return value are floating-point numbers of the same
21641 The second argument specifies the exception behavior as described above.
21646 This function returns the same values as the libm ``trunc`` functions
21647 would and handles error conditions in the same way.
21649 .. _int_experimental_noalias_scope_decl:
21651 '``llvm.experimental.noalias.scope.decl``' Intrinsic
21652 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21660 declare void @llvm.experimental.noalias.scope.decl(metadata !id.scope.list)
21665 The ``llvm.experimental.noalias.scope.decl`` intrinsic identifies where a
21666 noalias scope is declared. When the intrinsic is duplicated, a decision must
21667 also be made about the scope: depending on the reason of the duplication,
21668 the scope might need to be duplicated as well.
21674 The ``!id.scope.list`` argument is metadata that is a list of ``noalias``
21675 metadata references. The format is identical to that required for ``noalias``
21676 metadata. This list must have exactly one element.
21681 The ``llvm.experimental.noalias.scope.decl`` intrinsic identifies where a
21682 noalias scope is declared. When the intrinsic is duplicated, a decision must
21683 also be made about the scope: depending on the reason of the duplication,
21684 the scope might need to be duplicated as well.
21686 For example, when the intrinsic is used inside a loop body, and that loop is
21687 unrolled, the associated noalias scope must also be duplicated. Otherwise, the
21688 noalias property it signifies would spill across loop iterations, whereas it
21689 was only valid within a single iteration.
21691 .. code-block:: llvm
21693 ; This examples shows two possible positions for noalias.decl and how they impact the semantics:
21694 ; If it is outside the loop (Version 1), then %a and %b are noalias across *all* iterations.
21695 ; If it is inside the loop (Version 2), then %a and %b are noalias only within *one* iteration.
21696 declare void @decl_in_loop(i8* %a.base, i8* %b.base) {
21698 ; call void @llvm.experimental.noalias.scope.decl(metadata !2) ; Version 1: noalias decl outside loop
21702 %a = phi i8* [ %a.base, %entry ], [ %a.inc, %loop ]
21703 %b = phi i8* [ %b.base, %entry ], [ %b.inc, %loop ]
21704 ; call void @llvm.experimental.noalias.scope.decl(metadata !2) ; Version 2: noalias decl inside loop
21705 %val = load i8, i8* %a, !alias.scope !2
21706 store i8 %val, i8* %b, !noalias !2
21707 %a.inc = getelementptr inbounds i8, i8* %a, i64 1
21708 %b.inc = getelementptr inbounds i8, i8* %b, i64 1
21709 %cond = call i1 @cond()
21710 br i1 %cond, label %loop, label %exit
21716 !0 = !{!0} ; domain
21717 !1 = !{!1, !0} ; scope
21718 !2 = !{!1} ; scope list
21720 Multiple calls to `@llvm.experimental.noalias.scope.decl` for the same scope
21721 are possible, but one should never dominate another. Violations are pointed out
21722 by the verifier as they indicate a problem in either a transformation pass or
21726 Floating Point Environment Manipulation intrinsics
21727 --------------------------------------------------
21729 These functions read or write floating point environment, such as rounding
21730 mode or state of floating point exceptions. Altering the floating point
21731 environment requires special care. See :ref:`Floating Point Environment <floatenv>`.
21733 '``llvm.flt.rounds``' Intrinsic
21734 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21741 declare i32 @llvm.flt.rounds()
21746 The '``llvm.flt.rounds``' intrinsic reads the current rounding mode.
21751 The '``llvm.flt.rounds``' intrinsic returns the current rounding mode.
21752 Encoding of the returned values is same as the result of ``FLT_ROUNDS``,
21753 specified by C standard:
21758 1 - to nearest, ties to even
21759 2 - toward positive infinity
21760 3 - toward negative infinity
21761 4 - to nearest, ties away from zero
21763 Other values may be used to represent additional rounding modes, supported by a
21764 target. These values are target-specific.
21767 '``llvm.set.rounding``' Intrinsic
21768 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21775 declare void @llvm.set.rounding(i32 <val>)
21780 The '``llvm.set.rounding``' intrinsic sets current rounding mode.
21785 The argument is the required rounding mode. Encoding of rounding mode is
21786 the same as used by '``llvm.flt.rounds``'.
21791 The '``llvm.set.rounding``' intrinsic sets the current rounding mode. It is
21792 similar to C library function 'fesetround', however this intrinsic does not
21793 return any value and uses platform-independent representation of IEEE rounding
21797 Floating Point Test Intrinsics
21798 ------------------------------
21800 These functions get properties of floating point values.
21803 '``llvm.isnan``' Intrinsic
21804 ^^^^^^^^^^^^^^^^^^^^^^^^^^
21811 declare i1 @llvm.isnan(<fptype> <op>)
21812 declare <N x i1> @llvm.isnan(<vector-fptype> <op>)
21817 The '``llvm.isnan``' intrinsic returns a boolean value or vector of boolean
21818 values depending on whether the value is NaN.
21820 If the operand is a floating-point scalar, then the result type is a
21821 boolean (:ref:`i1 <t_integer>`).
21823 If the operand is a floating-point vector, then the result type is a
21824 vector of boolean with the same number of elements as the operand.
21829 The argument to the '``llvm.isnan``' intrinsic must be
21830 :ref:`floating-point <t_floating>` or :ref:`vector <t_vector>`
21831 of floating-point values.
21837 The function tests if ``op`` is NaN. If ``op`` is a vector, then the
21838 check is made element by element. Each test yields an :ref:`i1 <t_integer>`
21839 result, which is ``true``, if the value is NaN. The function never raises
21840 floating point exceptions.
21846 This class of intrinsics is designed to be generic and has no specific
21849 '``llvm.var.annotation``' Intrinsic
21850 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21857 declare void @llvm.var.annotation(i8* <val>, i8* <str>, i8* <str>, i32 <int>)
21862 The '``llvm.var.annotation``' intrinsic.
21867 The first argument is a pointer to a value, the second is a pointer to a
21868 global string, the third is a pointer to a global string which is the
21869 source file name, and the last argument is the line number.
21874 This intrinsic allows annotation of local variables with arbitrary
21875 strings. This can be useful for special purpose optimizations that want
21876 to look for these annotations. These have no other defined use; they are
21877 ignored by code generation and optimization.
21879 '``llvm.ptr.annotation.*``' Intrinsic
21880 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21885 This is an overloaded intrinsic. You can use '``llvm.ptr.annotation``' on a
21886 pointer to an integer of any width. *NOTE* you must specify an address space for
21887 the pointer. The identifier for the default address space is the integer
21892 declare i8* @llvm.ptr.annotation.p<address space>i8(i8* <val>, i8* <str>, i8* <str>, i32 <int>)
21893 declare i16* @llvm.ptr.annotation.p<address space>i16(i16* <val>, i8* <str>, i8* <str>, i32 <int>)
21894 declare i32* @llvm.ptr.annotation.p<address space>i32(i32* <val>, i8* <str>, i8* <str>, i32 <int>)
21895 declare i64* @llvm.ptr.annotation.p<address space>i64(i64* <val>, i8* <str>, i8* <str>, i32 <int>)
21896 declare i256* @llvm.ptr.annotation.p<address space>i256(i256* <val>, i8* <str>, i8* <str>, i32 <int>)
21901 The '``llvm.ptr.annotation``' intrinsic.
21906 The first argument is a pointer to an integer value of arbitrary bitwidth
21907 (result of some expression), the second is a pointer to a global string, the
21908 third is a pointer to a global string which is the source file name, and the
21909 last argument is the line number. It returns the value of the first argument.
21914 This intrinsic allows annotation of a pointer to an integer with arbitrary
21915 strings. This can be useful for special purpose optimizations that want to look
21916 for these annotations. These have no other defined use; they are ignored by code
21917 generation and optimization.
21919 '``llvm.annotation.*``' Intrinsic
21920 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21925 This is an overloaded intrinsic. You can use '``llvm.annotation``' on
21926 any integer bit width.
21930 declare i8 @llvm.annotation.i8(i8 <val>, i8* <str>, i8* <str>, i32 <int>)
21931 declare i16 @llvm.annotation.i16(i16 <val>, i8* <str>, i8* <str>, i32 <int>)
21932 declare i32 @llvm.annotation.i32(i32 <val>, i8* <str>, i8* <str>, i32 <int>)
21933 declare i64 @llvm.annotation.i64(i64 <val>, i8* <str>, i8* <str>, i32 <int>)
21934 declare i256 @llvm.annotation.i256(i256 <val>, i8* <str>, i8* <str>, i32 <int>)
21939 The '``llvm.annotation``' intrinsic.
21944 The first argument is an integer value (result of some expression), the
21945 second is a pointer to a global string, the third is a pointer to a
21946 global string which is the source file name, and the last argument is
21947 the line number. It returns the value of the first argument.
21952 This intrinsic allows annotations to be put on arbitrary expressions
21953 with arbitrary strings. This can be useful for special purpose
21954 optimizations that want to look for these annotations. These have no
21955 other defined use; they are ignored by code generation and optimization.
21957 '``llvm.codeview.annotation``' Intrinsic
21958 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21963 This annotation emits a label at its program point and an associated
21964 ``S_ANNOTATION`` codeview record with some additional string metadata. This is
21965 used to implement MSVC's ``__annotation`` intrinsic. It is marked
21966 ``noduplicate``, so calls to this intrinsic prevent inlining and should be
21967 considered expensive.
21971 declare void @llvm.codeview.annotation(metadata)
21976 The argument should be an MDTuple containing any number of MDStrings.
21978 '``llvm.trap``' Intrinsic
21979 ^^^^^^^^^^^^^^^^^^^^^^^^^
21986 declare void @llvm.trap() cold noreturn nounwind
21991 The '``llvm.trap``' intrinsic.
22001 This intrinsic is lowered to the target dependent trap instruction. If
22002 the target does not have a trap instruction, this intrinsic will be
22003 lowered to a call of the ``abort()`` function.
22005 '``llvm.debugtrap``' Intrinsic
22006 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22013 declare void @llvm.debugtrap() nounwind
22018 The '``llvm.debugtrap``' intrinsic.
22028 This intrinsic is lowered to code which is intended to cause an
22029 execution trap with the intention of requesting the attention of a
22032 '``llvm.ubsantrap``' Intrinsic
22033 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22040 declare void @llvm.ubsantrap(i8 immarg) cold noreturn nounwind
22045 The '``llvm.ubsantrap``' intrinsic.
22050 An integer describing the kind of failure detected.
22055 This intrinsic is lowered to code which is intended to cause an execution trap,
22056 embedding the argument into encoding of that trap somehow to discriminate
22057 crashes if possible.
22059 Equivalent to ``@llvm.trap`` for targets that do not support this behaviour.
22061 '``llvm.stackprotector``' Intrinsic
22062 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22069 declare void @llvm.stackprotector(i8* <guard>, i8** <slot>)
22074 The ``llvm.stackprotector`` intrinsic takes the ``guard`` and stores it
22075 onto the stack at ``slot``. The stack slot is adjusted to ensure that it
22076 is placed on the stack before local variables.
22081 The ``llvm.stackprotector`` intrinsic requires two pointer arguments.
22082 The first argument is the value loaded from the stack guard
22083 ``@__stack_chk_guard``. The second variable is an ``alloca`` that has
22084 enough space to hold the value of the guard.
22089 This intrinsic causes the prologue/epilogue inserter to force the position of
22090 the ``AllocaInst`` stack slot to be before local variables on the stack. This is
22091 to ensure that if a local variable on the stack is overwritten, it will destroy
22092 the value of the guard. When the function exits, the guard on the stack is
22093 checked against the original guard by ``llvm.stackprotectorcheck``. If they are
22094 different, then ``llvm.stackprotectorcheck`` causes the program to abort by
22095 calling the ``__stack_chk_fail()`` function.
22097 '``llvm.stackguard``' Intrinsic
22098 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22105 declare i8* @llvm.stackguard()
22110 The ``llvm.stackguard`` intrinsic returns the system stack guard value.
22112 It should not be generated by frontends, since it is only for internal usage.
22113 The reason why we create this intrinsic is that we still support IR form Stack
22114 Protector in FastISel.
22124 On some platforms, the value returned by this intrinsic remains unchanged
22125 between loads in the same thread. On other platforms, it returns the same
22126 global variable value, if any, e.g. ``@__stack_chk_guard``.
22128 Currently some platforms have IR-level customized stack guard loading (e.g.
22129 X86 Linux) that is not handled by ``llvm.stackguard()``, while they should be
22132 '``llvm.objectsize``' Intrinsic
22133 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22140 declare i32 @llvm.objectsize.i32(i8* <object>, i1 <min>, i1 <nullunknown>, i1 <dynamic>)
22141 declare i64 @llvm.objectsize.i64(i8* <object>, i1 <min>, i1 <nullunknown>, i1 <dynamic>)
22146 The ``llvm.objectsize`` intrinsic is designed to provide information to the
22147 optimizer to determine whether a) an operation (like memcpy) will overflow a
22148 buffer that corresponds to an object, or b) that a runtime check for overflow
22149 isn't necessary. An object in this context means an allocation of a specific
22150 class, structure, array, or other object.
22155 The ``llvm.objectsize`` intrinsic takes four arguments. The first argument is a
22156 pointer to or into the ``object``. The second argument determines whether
22157 ``llvm.objectsize`` returns 0 (if true) or -1 (if false) when the object size is
22158 unknown. The third argument controls how ``llvm.objectsize`` acts when ``null``
22159 in address space 0 is used as its pointer argument. If it's ``false``,
22160 ``llvm.objectsize`` reports 0 bytes available when given ``null``. Otherwise, if
22161 the ``null`` is in a non-zero address space or if ``true`` is given for the
22162 third argument of ``llvm.objectsize``, we assume its size is unknown. The fourth
22163 argument to ``llvm.objectsize`` determines if the value should be evaluated at
22166 The second, third, and fourth arguments only accept constants.
22171 The ``llvm.objectsize`` intrinsic is lowered to a value representing the size of
22172 the object concerned. If the size cannot be determined, ``llvm.objectsize``
22173 returns ``i32/i64 -1 or 0`` (depending on the ``min`` argument).
22175 '``llvm.expect``' Intrinsic
22176 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
22181 This is an overloaded intrinsic. You can use ``llvm.expect`` on any
22186 declare i1 @llvm.expect.i1(i1 <val>, i1 <expected_val>)
22187 declare i32 @llvm.expect.i32(i32 <val>, i32 <expected_val>)
22188 declare i64 @llvm.expect.i64(i64 <val>, i64 <expected_val>)
22193 The ``llvm.expect`` intrinsic provides information about expected (the
22194 most probable) value of ``val``, which can be used by optimizers.
22199 The ``llvm.expect`` intrinsic takes two arguments. The first argument is
22200 a value. The second argument is an expected value.
22205 This intrinsic is lowered to the ``val``.
22207 '``llvm.expect.with.probability``' Intrinsic
22208 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22213 This intrinsic is similar to ``llvm.expect``. This is an overloaded intrinsic.
22214 You can use ``llvm.expect.with.probability`` on any integer bit width.
22218 declare i1 @llvm.expect.with.probability.i1(i1 <val>, i1 <expected_val>, double <prob>)
22219 declare i32 @llvm.expect.with.probability.i32(i32 <val>, i32 <expected_val>, double <prob>)
22220 declare i64 @llvm.expect.with.probability.i64(i64 <val>, i64 <expected_val>, double <prob>)
22225 The ``llvm.expect.with.probability`` intrinsic provides information about
22226 expected value of ``val`` with probability(or confidence) ``prob``, which can
22227 be used by optimizers.
22232 The ``llvm.expect.with.probability`` intrinsic takes three arguments. The first
22233 argument is a value. The second argument is an expected value. The third
22234 argument is a probability.
22239 This intrinsic is lowered to the ``val``.
22243 '``llvm.assume``' Intrinsic
22244 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22251 declare void @llvm.assume(i1 %cond)
22256 The ``llvm.assume`` allows the optimizer to assume that the provided
22257 condition is true. This information can then be used in simplifying other parts
22260 More complex assumptions can be encoded as
22261 :ref:`assume operand bundles <assume_opbundles>`.
22266 The argument of the call is the condition which the optimizer may assume is
22272 The intrinsic allows the optimizer to assume that the provided condition is
22273 always true whenever the control flow reaches the intrinsic call. No code is
22274 generated for this intrinsic, and instructions that contribute only to the
22275 provided condition are not used for code generation. If the condition is
22276 violated during execution, the behavior is undefined.
22278 Note that the optimizer might limit the transformations performed on values
22279 used by the ``llvm.assume`` intrinsic in order to preserve the instructions
22280 only used to form the intrinsic's input argument. This might prove undesirable
22281 if the extra information provided by the ``llvm.assume`` intrinsic does not cause
22282 sufficient overall improvement in code quality. For this reason,
22283 ``llvm.assume`` should not be used to document basic mathematical invariants
22284 that the optimizer can otherwise deduce or facts that are of little use to the
22289 '``llvm.ssa.copy``' Intrinsic
22290 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22297 declare type @llvm.ssa.copy(type %operand) returned(1) readnone
22302 The first argument is an operand which is used as the returned value.
22307 The ``llvm.ssa.copy`` intrinsic can be used to attach information to
22308 operations by copying them and giving them new names. For example,
22309 the PredicateInfo utility uses it to build Extended SSA form, and
22310 attach various forms of information to operands that dominate specific
22311 uses. It is not meant for general use, only for building temporary
22312 renaming forms that require value splits at certain points.
22316 '``llvm.type.test``' Intrinsic
22317 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22324 declare i1 @llvm.type.test(i8* %ptr, metadata %type) nounwind readnone
22330 The first argument is a pointer to be tested. The second argument is a
22331 metadata object representing a :doc:`type identifier <TypeMetadata>`.
22336 The ``llvm.type.test`` intrinsic tests whether the given pointer is associated
22337 with the given type identifier.
22339 .. _type.checked.load:
22341 '``llvm.type.checked.load``' Intrinsic
22342 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22349 declare {i8*, i1} @llvm.type.checked.load(i8* %ptr, i32 %offset, metadata %type) argmemonly nounwind readonly
22355 The first argument is a pointer from which to load a function pointer. The
22356 second argument is the byte offset from which to load the function pointer. The
22357 third argument is a metadata object representing a :doc:`type identifier
22363 The ``llvm.type.checked.load`` intrinsic safely loads a function pointer from a
22364 virtual table pointer using type metadata. This intrinsic is used to implement
22365 control flow integrity in conjunction with virtual call optimization. The
22366 virtual call optimization pass will optimize away ``llvm.type.checked.load``
22367 intrinsics associated with devirtualized calls, thereby removing the type
22368 check in cases where it is not needed to enforce the control flow integrity
22371 If the given pointer is associated with a type metadata identifier, this
22372 function returns true as the second element of its return value. (Note that
22373 the function may also return true if the given pointer is not associated
22374 with a type metadata identifier.) If the function's return value's second
22375 element is true, the following rules apply to the first element:
22377 - If the given pointer is associated with the given type metadata identifier,
22378 it is the function pointer loaded from the given byte offset from the given
22381 - If the given pointer is not associated with the given type metadata
22382 identifier, it is one of the following (the choice of which is unspecified):
22384 1. The function pointer that would have been loaded from an arbitrarily chosen
22385 (through an unspecified mechanism) pointer associated with the type
22388 2. If the function has a non-void return type, a pointer to a function that
22389 returns an unspecified value without causing side effects.
22391 If the function's return value's second element is false, the value of the
22392 first element is undefined.
22395 '``llvm.arithmetic.fence``' Intrinsic
22396 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22404 @llvm.arithmetic.fence(<type> <op>)
22409 The purpose of the ``llvm.arithmetic.fence`` intrinsic
22410 is to prevent the optimizer from performaing fast-math optimizations,
22411 particularly reassociation,
22412 between the argument and the expression that contains the argument.
22413 It can be used to preserve the parentheses in the source language.
22418 The ``llvm.arithmetic.fence`` intrinsic takes only one argument.
22419 The argument and the return value are floating-point numbers,
22420 or vector floating-point numbers, of the same type.
22425 This intrinsic returns the value of its operand. The optimizer can optimize
22426 the argument, but the optimizer cannot hoist any component of the operand
22427 to the containing context, and the optimizer cannot move the calculation of
22428 any expression in the containing context into the operand.
22431 '``llvm.donothing``' Intrinsic
22432 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22439 declare void @llvm.donothing() nounwind readnone
22444 The ``llvm.donothing`` intrinsic doesn't perform any operation. It's one of only
22445 three intrinsics (besides ``llvm.experimental.patchpoint`` and
22446 ``llvm.experimental.gc.statepoint``) that can be called with an invoke
22457 This intrinsic does nothing, and it's removed by optimizers and ignored
22460 '``llvm.experimental.deoptimize``' Intrinsic
22461 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22468 declare type @llvm.experimental.deoptimize(...) [ "deopt"(...) ]
22473 This intrinsic, together with :ref:`deoptimization operand bundles
22474 <deopt_opbundles>`, allow frontends to express transfer of control and
22475 frame-local state from the currently executing (typically more specialized,
22476 hence faster) version of a function into another (typically more generic, hence
22479 In languages with a fully integrated managed runtime like Java and JavaScript
22480 this intrinsic can be used to implement "uncommon trap" or "side exit" like
22481 functionality. In unmanaged languages like C and C++, this intrinsic can be
22482 used to represent the slow paths of specialized functions.
22488 The intrinsic takes an arbitrary number of arguments, whose meaning is
22489 decided by the :ref:`lowering strategy<deoptimize_lowering>`.
22494 The ``@llvm.experimental.deoptimize`` intrinsic executes an attached
22495 deoptimization continuation (denoted using a :ref:`deoptimization
22496 operand bundle <deopt_opbundles>`) and returns the value returned by
22497 the deoptimization continuation. Defining the semantic properties of
22498 the continuation itself is out of scope of the language reference --
22499 as far as LLVM is concerned, the deoptimization continuation can
22500 invoke arbitrary side effects, including reading from and writing to
22503 Deoptimization continuations expressed using ``"deopt"`` operand bundles always
22504 continue execution to the end of the physical frame containing them, so all
22505 calls to ``@llvm.experimental.deoptimize`` must be in "tail position":
22507 - ``@llvm.experimental.deoptimize`` cannot be invoked.
22508 - The call must immediately precede a :ref:`ret <i_ret>` instruction.
22509 - The ``ret`` instruction must return the value produced by the
22510 ``@llvm.experimental.deoptimize`` call if there is one, or void.
22512 Note that the above restrictions imply that the return type for a call to
22513 ``@llvm.experimental.deoptimize`` will match the return type of its immediate
22516 The inliner composes the ``"deopt"`` continuations of the caller into the
22517 ``"deopt"`` continuations present in the inlinee, and also updates calls to this
22518 intrinsic to return directly from the frame of the function it inlined into.
22520 All declarations of ``@llvm.experimental.deoptimize`` must share the
22521 same calling convention.
22523 .. _deoptimize_lowering:
22528 Calls to ``@llvm.experimental.deoptimize`` are lowered to calls to the
22529 symbol ``__llvm_deoptimize`` (it is the frontend's responsibility to
22530 ensure that this symbol is defined). The call arguments to
22531 ``@llvm.experimental.deoptimize`` are lowered as if they were formal
22532 arguments of the specified types, and not as varargs.
22535 '``llvm.experimental.guard``' Intrinsic
22536 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22543 declare void @llvm.experimental.guard(i1, ...) [ "deopt"(...) ]
22548 This intrinsic, together with :ref:`deoptimization operand bundles
22549 <deopt_opbundles>`, allows frontends to express guards or checks on
22550 optimistic assumptions made during compilation. The semantics of
22551 ``@llvm.experimental.guard`` is defined in terms of
22552 ``@llvm.experimental.deoptimize`` -- its body is defined to be
22555 .. code-block:: text
22557 define void @llvm.experimental.guard(i1 %pred, <args...>) {
22558 %realPred = and i1 %pred, undef
22559 br i1 %realPred, label %continue, label %leave [, !make.implicit !{}]
22562 call void @llvm.experimental.deoptimize(<args...>) [ "deopt"() ]
22570 with the optional ``[, !make.implicit !{}]`` present if and only if it
22571 is present on the call site. For more details on ``!make.implicit``,
22572 see :doc:`FaultMaps`.
22574 In words, ``@llvm.experimental.guard`` executes the attached
22575 ``"deopt"`` continuation if (but **not** only if) its first argument
22576 is ``false``. Since the optimizer is allowed to replace the ``undef``
22577 with an arbitrary value, it can optimize guard to fail "spuriously",
22578 i.e. without the original condition being false (hence the "not only
22579 if"); and this allows for "check widening" type optimizations.
22581 ``@llvm.experimental.guard`` cannot be invoked.
22583 After ``@llvm.experimental.guard`` was first added, a more general
22584 formulation was found in ``@llvm.experimental.widenable.condition``.
22585 Support for ``@llvm.experimental.guard`` is slowly being rephrased in
22586 terms of this alternate.
22588 '``llvm.experimental.widenable.condition``' Intrinsic
22589 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22596 declare i1 @llvm.experimental.widenable.condition()
22601 This intrinsic represents a "widenable condition" which is
22602 boolean expressions with the following property: whether this
22603 expression is `true` or `false`, the program is correct and
22606 Together with :ref:`deoptimization operand bundles <deopt_opbundles>`,
22607 ``@llvm.experimental.widenable.condition`` allows frontends to
22608 express guards or checks on optimistic assumptions made during
22609 compilation and represent them as branch instructions on special
22612 While this may appear similar in semantics to `undef`, it is very
22613 different in that an invocation produces a particular, singular
22614 value. It is also intended to be lowered late, and remain available
22615 for specific optimizations and transforms that can benefit from its
22616 special properties.
22626 The intrinsic ``@llvm.experimental.widenable.condition()``
22627 returns either `true` or `false`. For each evaluation of a call
22628 to this intrinsic, the program must be valid and correct both if
22629 it returns `true` and if it returns `false`. This allows
22630 transformation passes to replace evaluations of this intrinsic
22631 with either value whenever one is beneficial.
22633 When used in a branch condition, it allows us to choose between
22634 two alternative correct solutions for the same problem, like
22637 .. code-block:: text
22639 %cond = call i1 @llvm.experimental.widenable.condition()
22640 br i1 %cond, label %solution_1, label %solution_2
22643 ; Apply memory-consuming but fast solution for a task.
22646 ; Cheap in memory but slow solution.
22648 Whether the result of intrinsic's call is `true` or `false`,
22649 it should be correct to pick either solution. We can switch
22650 between them by replacing the result of
22651 ``@llvm.experimental.widenable.condition`` with different
22654 This is how it can be used to represent guards as widenable branches:
22656 .. code-block:: text
22659 ; Unguarded instructions
22660 call void @llvm.experimental.guard(i1 %cond, <args...>) ["deopt"(<deopt_args...>)]
22661 ; Guarded instructions
22663 Can be expressed in an alternative equivalent form of explicit branch using
22664 ``@llvm.experimental.widenable.condition``:
22666 .. code-block:: text
22669 ; Unguarded instructions
22670 %widenable_condition = call i1 @llvm.experimental.widenable.condition()
22671 %guard_condition = and i1 %cond, %widenable_condition
22672 br i1 %guard_condition, label %guarded, label %deopt
22675 ; Guarded instructions
22678 call type @llvm.experimental.deoptimize(<args...>) [ "deopt"(<deopt_args...>) ]
22680 So the block `guarded` is only reachable when `%cond` is `true`,
22681 and it should be valid to go to the block `deopt` whenever `%cond`
22682 is `true` or `false`.
22684 ``@llvm.experimental.widenable.condition`` will never throw, thus
22685 it cannot be invoked.
22690 When ``@llvm.experimental.widenable.condition()`` is used in
22691 condition of a guard represented as explicit branch, it is
22692 legal to widen the guard's condition with any additional
22695 Guard widening looks like replacement of
22697 .. code-block:: text
22699 %widenable_cond = call i1 @llvm.experimental.widenable.condition()
22700 %guard_cond = and i1 %cond, %widenable_cond
22701 br i1 %guard_cond, label %guarded, label %deopt
22705 .. code-block:: text
22707 %widenable_cond = call i1 @llvm.experimental.widenable.condition()
22708 %new_cond = and i1 %any_other_cond, %widenable_cond
22709 %new_guard_cond = and i1 %cond, %new_cond
22710 br i1 %new_guard_cond, label %guarded, label %deopt
22712 for this branch. Here `%any_other_cond` is an arbitrarily chosen
22713 well-defined `i1` value. By making guard widening, we may
22714 impose stricter conditions on `guarded` block and bail to the
22715 deopt when the new condition is not met.
22720 Default lowering strategy is replacing the result of
22721 call of ``@llvm.experimental.widenable.condition`` with
22722 constant `true`. However it is always correct to replace
22723 it with any other `i1` value. Any pass can
22724 freely do it if it can benefit from non-default lowering.
22727 '``llvm.load.relative``' Intrinsic
22728 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22735 declare i8* @llvm.load.relative.iN(i8* %ptr, iN %offset) argmemonly nounwind readonly
22740 This intrinsic loads a 32-bit value from the address ``%ptr + %offset``,
22741 adds ``%ptr`` to that value and returns it. The constant folder specifically
22742 recognizes the form of this intrinsic and the constant initializers it may
22743 load from; if a loaded constant initializer is known to have the form
22744 ``i32 trunc(x - %ptr)``, the intrinsic call is folded to ``x``.
22746 LLVM provides that the calculation of such a constant initializer will
22747 not overflow at link time under the medium code model if ``x`` is an
22748 ``unnamed_addr`` function. However, it does not provide this guarantee for
22749 a constant initializer folded into a function body. This intrinsic can be
22750 used to avoid the possibility of overflows when loading from such a constant.
22752 .. _llvm_sideeffect:
22754 '``llvm.sideeffect``' Intrinsic
22755 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22762 declare void @llvm.sideeffect() inaccessiblememonly nounwind
22767 The ``llvm.sideeffect`` intrinsic doesn't perform any operation. Optimizers
22768 treat it as having side effects, so it can be inserted into a loop to
22769 indicate that the loop shouldn't be assumed to terminate (which could
22770 potentially lead to the loop being optimized away entirely), even if it's
22771 an infinite loop with no other side effects.
22781 This intrinsic actually does nothing, but optimizers must assume that it
22782 has externally observable side effects.
22784 '``llvm.is.constant.*``' Intrinsic
22785 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22790 This is an overloaded intrinsic. You can use llvm.is.constant with any argument type.
22794 declare i1 @llvm.is.constant.i32(i32 %operand) nounwind readnone
22795 declare i1 @llvm.is.constant.f32(float %operand) nounwind readnone
22796 declare i1 @llvm.is.constant.TYPENAME(TYPE %operand) nounwind readnone
22801 The '``llvm.is.constant``' intrinsic will return true if the argument
22802 is known to be a manifest compile-time constant. It is guaranteed to
22803 fold to either true or false before generating machine code.
22808 This intrinsic generates no code. If its argument is known to be a
22809 manifest compile-time constant value, then the intrinsic will be
22810 converted to a constant true value. Otherwise, it will be converted to
22811 a constant false value.
22813 In particular, note that if the argument is a constant expression
22814 which refers to a global (the address of which _is_ a constant, but
22815 not manifest during the compile), then the intrinsic evaluates to
22818 The result also intentionally depends on the result of optimization
22819 passes -- e.g., the result can change depending on whether a
22820 function gets inlined or not. A function's parameters are
22821 obviously not constant. However, a call like
22822 ``llvm.is.constant.i32(i32 %param)`` *can* return true after the
22823 function is inlined, if the value passed to the function parameter was
22826 On the other hand, if constant folding is not run, it will never
22827 evaluate to true, even in simple cases.
22831 '``llvm.ptrmask``' Intrinsic
22832 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22839 declare ptrty llvm.ptrmask(ptrty %ptr, intty %mask) readnone speculatable
22844 The first argument is a pointer. The second argument is an integer.
22849 The ``llvm.ptrmask`` intrinsic masks out bits of the pointer according to a mask.
22850 This allows stripping data from tagged pointers without converting them to an
22851 integer (ptrtoint/inttoptr). As a consequence, we can preserve more information
22852 to facilitate alias analysis and underlying-object detection.
22857 The result of ``ptrmask(ptr, mask)`` is equivalent to
22858 ``getelementptr ptr, (ptrtoint(ptr) & mask) - ptrtoint(ptr)``. Both the returned
22859 pointer and the first argument are based on the same underlying object (for more
22860 information on the *based on* terminology see
22861 :ref:`the pointer aliasing rules <pointeraliasing>`). If the bitwidth of the
22862 mask argument does not match the pointer size of the target, the mask is
22863 zero-extended or truncated accordingly.
22867 '``llvm.vscale``' Intrinsic
22868 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
22875 declare i32 llvm.vscale.i32()
22876 declare i64 llvm.vscale.i64()
22881 The ``llvm.vscale`` intrinsic returns the value for ``vscale`` in scalable
22882 vectors such as ``<vscale x 16 x i8>``.
22887 ``vscale`` is a positive value that is constant throughout program
22888 execution, but is unknown at compile time.
22889 If the result value does not fit in the result type, then the result is
22890 a :ref:`poison value <poisonvalues>`.
22893 Stack Map Intrinsics
22894 --------------------
22896 LLVM provides experimental intrinsics to support runtime patching
22897 mechanisms commonly desired in dynamic language JITs. These intrinsics
22898 are described in :doc:`StackMaps`.
22900 Element Wise Atomic Memory Intrinsics
22901 -------------------------------------
22903 These intrinsics are similar to the standard library memory intrinsics except
22904 that they perform memory transfer as a sequence of atomic memory accesses.
22906 .. _int_memcpy_element_unordered_atomic:
22908 '``llvm.memcpy.element.unordered.atomic``' Intrinsic
22909 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22914 This is an overloaded intrinsic. You can use ``llvm.memcpy.element.unordered.atomic`` on
22915 any integer bit width and for different address spaces. Not all targets
22916 support all bit widths however.
22920 declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* <dest>,
22923 i32 <element_size>)
22924 declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* <dest>,
22927 i32 <element_size>)
22932 The '``llvm.memcpy.element.unordered.atomic.*``' intrinsic is a specialization of the
22933 '``llvm.memcpy.*``' intrinsic. It differs in that the ``dest`` and ``src`` are treated
22934 as arrays with elements that are exactly ``element_size`` bytes, and the copy between
22935 buffers uses a sequence of :ref:`unordered atomic <ordering>` load/store operations
22936 that are a positive integer multiple of the ``element_size`` in size.
22941 The first three arguments are the same as they are in the :ref:`@llvm.memcpy <int_memcpy>`
22942 intrinsic, with the added constraint that ``len`` is required to be a positive integer
22943 multiple of the ``element_size``. If ``len`` is not a positive integer multiple of
22944 ``element_size``, then the behaviour of the intrinsic is undefined.
22946 ``element_size`` must be a compile-time constant positive power of two no greater than
22947 target-specific atomic access size limit.
22949 For each of the input pointers ``align`` parameter attribute must be specified. It
22950 must be a power of two no less than the ``element_size``. Caller guarantees that
22951 both the source and destination pointers are aligned to that boundary.
22956 The '``llvm.memcpy.element.unordered.atomic.*``' intrinsic copies ``len`` bytes of
22957 memory from the source location to the destination location. These locations are not
22958 allowed to overlap. The memory copy is performed as a sequence of load/store operations
22959 where each access is guaranteed to be a multiple of ``element_size`` bytes wide and
22960 aligned at an ``element_size`` boundary.
22962 The order of the copy is unspecified. The same value may be read from the source
22963 buffer many times, but only one write is issued to the destination buffer per
22964 element. It is well defined to have concurrent reads and writes to both source and
22965 destination provided those reads and writes are unordered atomic when specified.
22967 This intrinsic does not provide any additional ordering guarantees over those
22968 provided by a set of unordered loads from the source location and stores to the
22974 In the most general case call to the '``llvm.memcpy.element.unordered.atomic.*``' is
22975 lowered to a call to the symbol ``__llvm_memcpy_element_unordered_atomic_*``. Where '*'
22976 is replaced with an actual element size. See :ref:`RewriteStatepointsForGC intrinsic
22977 lowering <RewriteStatepointsForGC_intrinsic_lowering>` for details on GC specific
22980 Optimizer is allowed to inline memory copy when it's profitable to do so.
22982 '``llvm.memmove.element.unordered.atomic``' Intrinsic
22983 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22988 This is an overloaded intrinsic. You can use
22989 ``llvm.memmove.element.unordered.atomic`` on any integer bit width and for
22990 different address spaces. Not all targets support all bit widths however.
22994 declare void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* <dest>,
22997 i32 <element_size>)
22998 declare void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i64(i8* <dest>,
23001 i32 <element_size>)
23006 The '``llvm.memmove.element.unordered.atomic.*``' intrinsic is a specialization
23007 of the '``llvm.memmove.*``' intrinsic. It differs in that the ``dest`` and
23008 ``src`` are treated as arrays with elements that are exactly ``element_size``
23009 bytes, and the copy between buffers uses a sequence of
23010 :ref:`unordered atomic <ordering>` load/store operations that are a positive
23011 integer multiple of the ``element_size`` in size.
23016 The first three arguments are the same as they are in the
23017 :ref:`@llvm.memmove <int_memmove>` intrinsic, with the added constraint that
23018 ``len`` is required to be a positive integer multiple of the ``element_size``.
23019 If ``len`` is not a positive integer multiple of ``element_size``, then the
23020 behaviour of the intrinsic is undefined.
23022 ``element_size`` must be a compile-time constant positive power of two no
23023 greater than a target-specific atomic access size limit.
23025 For each of the input pointers the ``align`` parameter attribute must be
23026 specified. It must be a power of two no less than the ``element_size``. Caller
23027 guarantees that both the source and destination pointers are aligned to that
23033 The '``llvm.memmove.element.unordered.atomic.*``' intrinsic copies ``len`` bytes
23034 of memory from the source location to the destination location. These locations
23035 are allowed to overlap. The memory copy is performed as a sequence of load/store
23036 operations where each access is guaranteed to be a multiple of ``element_size``
23037 bytes wide and aligned at an ``element_size`` boundary.
23039 The order of the copy is unspecified. The same value may be read from the source
23040 buffer many times, but only one write is issued to the destination buffer per
23041 element. It is well defined to have concurrent reads and writes to both source
23042 and destination provided those reads and writes are unordered atomic when
23045 This intrinsic does not provide any additional ordering guarantees over those
23046 provided by a set of unordered loads from the source location and stores to the
23052 In the most general case call to the
23053 '``llvm.memmove.element.unordered.atomic.*``' is lowered to a call to the symbol
23054 ``__llvm_memmove_element_unordered_atomic_*``. Where '*' is replaced with an
23055 actual element size. See :ref:`RewriteStatepointsForGC intrinsic lowering
23056 <RewriteStatepointsForGC_intrinsic_lowering>` for details on GC specific
23059 The optimizer is allowed to inline the memory copy when it's profitable to do so.
23061 .. _int_memset_element_unordered_atomic:
23063 '``llvm.memset.element.unordered.atomic``' Intrinsic
23064 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23069 This is an overloaded intrinsic. You can use ``llvm.memset.element.unordered.atomic`` on
23070 any integer bit width and for different address spaces. Not all targets
23071 support all bit widths however.
23075 declare void @llvm.memset.element.unordered.atomic.p0i8.i32(i8* <dest>,
23078 i32 <element_size>)
23079 declare void @llvm.memset.element.unordered.atomic.p0i8.i64(i8* <dest>,
23082 i32 <element_size>)
23087 The '``llvm.memset.element.unordered.atomic.*``' intrinsic is a specialization of the
23088 '``llvm.memset.*``' intrinsic. It differs in that the ``dest`` is treated as an array
23089 with elements that are exactly ``element_size`` bytes, and the assignment to that array
23090 uses uses a sequence of :ref:`unordered atomic <ordering>` store operations
23091 that are a positive integer multiple of the ``element_size`` in size.
23096 The first three arguments are the same as they are in the :ref:`@llvm.memset <int_memset>`
23097 intrinsic, with the added constraint that ``len`` is required to be a positive integer
23098 multiple of the ``element_size``. If ``len`` is not a positive integer multiple of
23099 ``element_size``, then the behaviour of the intrinsic is undefined.
23101 ``element_size`` must be a compile-time constant positive power of two no greater than
23102 target-specific atomic access size limit.
23104 The ``dest`` input pointer must have the ``align`` parameter attribute specified. It
23105 must be a power of two no less than the ``element_size``. Caller guarantees that
23106 the destination pointer is aligned to that boundary.
23111 The '``llvm.memset.element.unordered.atomic.*``' intrinsic sets the ``len`` bytes of
23112 memory starting at the destination location to the given ``value``. The memory is
23113 set with a sequence of store operations where each access is guaranteed to be a
23114 multiple of ``element_size`` bytes wide and aligned at an ``element_size`` boundary.
23116 The order of the assignment is unspecified. Only one write is issued to the
23117 destination buffer per element. It is well defined to have concurrent reads and
23118 writes to the destination provided those reads and writes are unordered atomic
23121 This intrinsic does not provide any additional ordering guarantees over those
23122 provided by a set of unordered stores to the destination.
23127 In the most general case call to the '``llvm.memset.element.unordered.atomic.*``' is
23128 lowered to a call to the symbol ``__llvm_memset_element_unordered_atomic_*``. Where '*'
23129 is replaced with an actual element size.
23131 The optimizer is allowed to inline the memory assignment when it's profitable to do so.
23133 Objective-C ARC Runtime Intrinsics
23134 ----------------------------------
23136 LLVM provides intrinsics that lower to Objective-C ARC runtime entry points.
23137 LLVM is aware of the semantics of these functions, and optimizes based on that
23138 knowledge. You can read more about the details of Objective-C ARC `here
23139 <https://clang.llvm.org/docs/AutomaticReferenceCounting.html>`_.
23141 '``llvm.objc.autorelease``' Intrinsic
23142 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23148 declare i8* @llvm.objc.autorelease(i8*)
23153 Lowers to a call to `objc_autorelease <https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autorelease>`_.
23155 '``llvm.objc.autoreleasePoolPop``' Intrinsic
23156 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23162 declare void @llvm.objc.autoreleasePoolPop(i8*)
23167 Lowers to a call to `objc_autoreleasePoolPop <https://clang.llvm.org/docs/AutomaticReferenceCounting.html#void-objc-autoreleasepoolpop-void-pool>`_.
23169 '``llvm.objc.autoreleasePoolPush``' Intrinsic
23170 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23176 declare i8* @llvm.objc.autoreleasePoolPush()
23181 Lowers to a call to `objc_autoreleasePoolPush <https://clang.llvm.org/docs/AutomaticReferenceCounting.html#void-objc-autoreleasepoolpush-void>`_.
23183 '``llvm.objc.autoreleaseReturnValue``' Intrinsic
23184 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23190 declare i8* @llvm.objc.autoreleaseReturnValue(i8*)
23195 Lowers to a call to `objc_autoreleaseReturnValue <https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue>`_.
23197 '``llvm.objc.copyWeak``' Intrinsic
23198 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23204 declare void @llvm.objc.copyWeak(i8**, i8**)
23209 Lowers to a call to `objc_copyWeak <https://clang.llvm.org/docs/AutomaticReferenceCounting.html#void-objc-copyweak-id-dest-id-src>`_.
23211 '``llvm.objc.destroyWeak``' Intrinsic
23212 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23218 declare void @llvm.objc.destroyWeak(i8**)
23223 Lowers to a call to `objc_destroyWeak <https://clang.llvm.org/docs/AutomaticReferenceCounting.html#void-objc-destroyweak-id-object>`_.
23225 '``llvm.objc.initWeak``' Intrinsic
23226 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23232 declare i8* @llvm.objc.initWeak(i8**, i8*)
23237 Lowers to a call to `objc_initWeak <https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-initweak>`_.
23239 '``llvm.objc.loadWeak``' Intrinsic
23240 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23246 declare i8* @llvm.objc.loadWeak(i8**)
23251 Lowers to a call to `objc_loadWeak <https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-loadweak>`_.
23253 '``llvm.objc.loadWeakRetained``' Intrinsic
23254 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23260 declare i8* @llvm.objc.loadWeakRetained(i8**)
23265 Lowers to a call to `objc_loadWeakRetained <https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-loadweakretained>`_.
23267 '``llvm.objc.moveWeak``' Intrinsic
23268 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23274 declare void @llvm.objc.moveWeak(i8**, i8**)
23279 Lowers to a call to `objc_moveWeak <https://clang.llvm.org/docs/AutomaticReferenceCounting.html#void-objc-moveweak-id-dest-id-src>`_.
23281 '``llvm.objc.release``' Intrinsic
23282 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23288 declare void @llvm.objc.release(i8*)
23293 Lowers to a call to `objc_release <https://clang.llvm.org/docs/AutomaticReferenceCounting.html#void-objc-release-id-value>`_.
23295 '``llvm.objc.retain``' Intrinsic
23296 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23302 declare i8* @llvm.objc.retain(i8*)
23307 Lowers to a call to `objc_retain <https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-retain>`_.
23309 '``llvm.objc.retainAutorelease``' Intrinsic
23310 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23316 declare i8* @llvm.objc.retainAutorelease(i8*)
23321 Lowers to a call to `objc_retainAutorelease <https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-retainautorelease>`_.
23323 '``llvm.objc.retainAutoreleaseReturnValue``' Intrinsic
23324 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23330 declare i8* @llvm.objc.retainAutoreleaseReturnValue(i8*)
23335 Lowers to a call to `objc_retainAutoreleaseReturnValue <https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-retainautoreleasereturnvalue>`_.
23337 '``llvm.objc.retainAutoreleasedReturnValue``' Intrinsic
23338 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23344 declare i8* @llvm.objc.retainAutoreleasedReturnValue(i8*)
23349 Lowers to a call to `objc_retainAutoreleasedReturnValue <https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-retainautoreleasedreturnvalue>`_.
23351 '``llvm.objc.retainBlock``' Intrinsic
23352 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23358 declare i8* @llvm.objc.retainBlock(i8*)
23363 Lowers to a call to `objc_retainBlock <https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-retainblock>`_.
23365 '``llvm.objc.storeStrong``' Intrinsic
23366 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23372 declare void @llvm.objc.storeStrong(i8**, i8*)
23377 Lowers to a call to `objc_storeStrong <https://clang.llvm.org/docs/AutomaticReferenceCounting.html#void-objc-storestrong-id-object-id-value>`_.
23379 '``llvm.objc.storeWeak``' Intrinsic
23380 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23386 declare i8* @llvm.objc.storeWeak(i8**, i8*)
23391 Lowers to a call to `objc_storeWeak <https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-storeweak>`_.
23393 Preserving Debug Information Intrinsics
23394 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23396 These intrinsics are used to carry certain debuginfo together with
23397 IR-level operations. For example, it may be desirable to
23398 know the structure/union name and the original user-level field
23399 indices. Such information got lost in IR GetElementPtr instruction
23400 since the IR types are different from debugInfo types and unions
23401 are converted to structs in IR.
23403 '``llvm.preserve.array.access.index``' Intrinsic
23404 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23411 @llvm.preserve.array.access.index.p0s_union.anons.p0a10s_union.anons(<type> base,
23418 The '``llvm.preserve.array.access.index``' intrinsic returns the getelementptr address
23419 based on array base ``base``, array dimension ``dim`` and the last access index ``index``
23420 into the array. The return type ``ret_type`` is a pointer type to the array element.
23421 The array ``dim`` and ``index`` are preserved which is more robust than
23422 getelementptr instruction which may be subject to compiler transformation.
23423 The ``llvm.preserve.access.index`` type of metadata is attached to this call instruction
23424 to provide array or pointer debuginfo type.
23425 The metadata is a ``DICompositeType`` or ``DIDerivedType`` representing the
23426 debuginfo version of ``type``.
23431 The ``base`` is the array base address. The ``dim`` is the array dimension.
23432 The ``base`` is a pointer if ``dim`` equals 0.
23433 The ``index`` is the last access index into the array or pointer.
23435 The ``base`` argument must be annotated with an :ref:`elementtype
23436 <attr_elementtype>` attribute at the call-site. This attribute specifies the
23437 getelementptr element type.
23442 The '``llvm.preserve.array.access.index``' intrinsic produces the same result
23443 as a getelementptr with base ``base`` and access operands ``{dim's 0's, index}``.
23445 '``llvm.preserve.union.access.index``' Intrinsic
23446 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23453 @llvm.preserve.union.access.index.p0s_union.anons.p0s_union.anons(<type> base,
23459 The '``llvm.preserve.union.access.index``' intrinsic carries the debuginfo field index
23460 ``di_index`` and returns the ``base`` address.
23461 The ``llvm.preserve.access.index`` type of metadata is attached to this call instruction
23462 to provide union debuginfo type.
23463 The metadata is a ``DICompositeType`` representing the debuginfo version of ``type``.
23464 The return type ``type`` is the same as the ``base`` type.
23469 The ``base`` is the union base address. The ``di_index`` is the field index in debuginfo.
23474 The '``llvm.preserve.union.access.index``' intrinsic returns the ``base`` address.
23476 '``llvm.preserve.struct.access.index``' Intrinsic
23477 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
23484 @llvm.preserve.struct.access.index.p0i8.p0s_struct.anon.0s(<type> base,
23491 The '``llvm.preserve.struct.access.index``' intrinsic returns the getelementptr address
23492 based on struct base ``base`` and IR struct member index ``gep_index``.
23493 The ``llvm.preserve.access.index`` type of metadata is attached to this call instruction
23494 to provide struct debuginfo type.
23495 The metadata is a ``DICompositeType`` representing the debuginfo version of ``type``.
23496 The return type ``ret_type`` is a pointer type to the structure member.
23501 The ``base`` is the structure base address. The ``gep_index`` is the struct member index
23502 based on IR structures. The ``di_index`` is the struct member index based on debuginfo.
23504 The ``base`` argument must be annotated with an :ref:`elementtype
23505 <attr_elementtype>` attribute at the call-site. This attribute specifies the
23506 getelementptr element type.
23511 The '``llvm.preserve.struct.access.index``' intrinsic produces the same result
23512 as a getelementptr with base ``base`` and access operands ``{0, gep_index}``.