1 ===========================
2 QBE Intermediate Language
3 ===========================
10 1. <@ Basic Concepts >
18 3. <@ Constants and Vals >
21 * <@ Aggregate Types >
28 * <@ Arithmetic and Bits >
36 8. <@ Instructions Index >
41 The intermediate language (IL) is a higher-level language
42 than the machine's assembly language. It smoothes most
43 of the irregularities of the underlying hardware and
44 allows an infinite number of temporaries to be used.
45 This higher abstraction level lets frontend programmers
46 focus on language design issues.
51 The intermediate language is provided to QBE as text.
52 Usually, one file is generated per each compilation unit from
53 the frontend input language. An IL file is a sequence of
54 <@ Definitions > for data, functions, and types. Once
55 processed by QBE, the resulting file can be assembled and
56 linked using a standard toolchain (e.g., GNU binutils).
58 Here is a complete "Hello World" IL file which defines a
59 function that prints to the screen. Since the string is
60 not a first class object (only the pointer is) it is
61 defined outside the function's body. Comments start with
62 a # character and finish with the end of the line.
64 # Define the string constant.
65 data $str = { b "hello world", b 0 }
67 export function w $main() {
69 # Call the puts function with $str as argument.
70 %r =w call $puts(l $str)
74 If you have read the LLVM language reference, you might
75 recognize the example above. In comparison, QBE makes a
76 much lighter use of types and the syntax is terser.
81 The language syntax is vaporously described in the sections
82 below using BNF syntax. The different BNF constructs used
85 * Keywords are enclosed between quotes;
86 * `... | ...` expresses alternatives;
87 * `( ... )` groups syntax;
88 * `[ ... ]` marks the nested syntax as optional;
89 * `( ... ),` designates a comma-separated list of the
91 * `...*` and `...+` are used for arbitrary and
92 at-least-once repetition respectively.
97 The intermediate language makes heavy use of sigils, all
98 user-defined names are prefixed with a sigil. This is
99 to avoid keyword conflicts, and also to quickly spot the
100 scope and nature of identifiers.
102 * `:` is for user-defined <@ Aggregate Types>
103 * `$` is for globals (represented by a pointer)
104 * `%` is for function-scope temporaries
105 * `@` is for block labels
107 In this BNF syntax, we use `?IDENT` to designate an identifier
108 starting with the sigil `?`.
116 Individual tokens in IL files must be separated by one or
117 more spacing characters. Both spaces and tabs are recognized
118 as spacing characters. In data and type definitions, newlines
119 may also be used as spaces to prevent overly long lines. When
120 exactly one of two consecutive tokens is a symbol (for example
121 `,` or `=` or `{`), spacing may be omitted.
130 BASETY := 'w' | 'l' | 's' | 'd' # Base types
131 EXTTY := BASETY | 'b' | 'h' # Extended types
133 The IL makes minimal use of types. By design, the types
134 used are restricted to what is necessary for unambiguous
135 compilation to machine code and C interfacing. Unlike LLVM,
136 QBE is not using types as a means to safety; they are only
137 here for semantic purposes.
139 The four base types are `w` (word), `l` (long), `s` (single),
140 and `d` (double), they stand respectively for 32-bit and
141 64-bit integers, and 32-bit and 64-bit floating-point numbers.
142 There are no pointer types available; pointers are typed
143 by an integer type sufficiently wide to represent all memory
144 addresses (e.g., `l` on 64-bit architectures). Temporaries
145 in the IL can only have a base type.
147 Extended types contain base types plus `b` (byte) and `h`
148 (half word), respectively for 8-bit and 16-bit integers.
149 They are used in <@ Aggregate Types> and <@ Data> definitions.
151 For C interfacing, the IL also provides user-defined aggregate
152 types as well as signed and unsigned variants of the sub-word
153 extended types. Read more about these types in the
154 <@ Aggregate Types > and <@ Functions > sections.
159 The IL has a minimal subtyping feature, for integer types only.
160 Any value of type `l` can be used in a `w` context. In that
161 case, only the 32 least significant bits of the word value
164 Make note that it is the opposite of the usual subtyping on
165 integers (in C, we can safely use an `int` where a `long`
166 is expected). A long value cannot be used in word context.
167 The rationale is that a word can be signed or unsigned, so
168 extending it to a long could be done in two ways, either
169 by zero-extension, or by sign-extension.
171 - 3. Constants and Vals
172 -----------------------
176 ['-'] NUMBER # Decimal integer
177 | 's_' FP # Single-precision float
178 | 'd_' FP # Double-precision float
179 | $IDENT # Global symbol
183 | 'thread' $IDENT # Thread-local symbol
189 Constants come in two kinds: compile-time constants and
190 dynamic constants. Dynamic constants include compile-time
191 constants and other symbol variants that are only known at
192 program-load time or execution time. Consequently, dynamic
193 constants can only occur in function bodies.
195 The representation of integers is two's complement.
196 Floating-point numbers are represented using the
197 single-precision and double-precision formats of the
200 Constants specify a sequence of bits and are untyped.
201 They are always parsed as 64-bit blobs. Depending on
202 the context surrounding a constant, only some of its
203 bits are used. For example, in the program below, the
204 two variables defined have the same value since the first
205 operand of the subtraction is a word (32-bit) context.
208 %y =w sub 4294967295, 0
210 Because specifying floating-point constants by their bits
211 makes the code less readable, syntactic sugar is provided
212 to express them. Standard scientific notation is prefixed
213 with `s_` and `d_` for single and double precision numbers
214 respectively. Once again, the following example defines twice
215 the same double-precision constant.
218 %y =d add d_0, -4616189618054758400
220 Global symbols can also be used directly as constants;
221 they will be resolved and turned into actual numeric
222 constants by the linker.
224 When the `thread` keyword prefixes a symbol name, the
225 symbol's numeric value is resolved at runtime in the
226 thread-local storage.
228 Vals are used as arguments in regular, phi, and jump
229 instructions within function definitions. They are
230 either constants or function-scope temporaries.
239 | 'section' SECNAME [NL]
240 | 'section' SECNAME SECFLAGS [NL]
242 SECNAME := '"' .... '"'
243 SECFLAGS := '"' .... '"'
245 Function and data definitions (see below) can specify
246 linkage information to be passed to the assembler and
247 eventually to the linker.
249 The `export` linkage flag marks the defined item as
250 visible outside the current file's scope. If absent,
251 the symbol can only be referred to locally. Functions
252 compiled by QBE and called from C need to be exported.
254 The `thread` linkage flag can only qualify data
255 definitions. It mandates that the object defined is
256 stored in thread-local storage. Each time a runtime
257 thread starts, the supporting platform runtime is in
258 charge of making a new copy of the object for the
259 fresh thread. Objects in thread-local storage must
260 be accessed using the `thread $IDENT` syntax, as
261 specified in the <@ Constants and Vals > section.
263 A `section` flag can be specified to tell the linker to
264 put the defined item in a certain section. The use of
265 the section flag is platform dependent and we refer the
266 user to the documentation of their assembler and linker
267 for relevant information.
269 section ".init_array"
270 data $.init.f = { l $f }
272 The section flag can be used to add function pointers to
273 a global initialization list, as depicted above. Note
274 that some platforms provide a BSS section that can be
275 used to minimize the footprint of uniformly zeroed data.
276 When this section is available, QBE will automatically
277 make use of it and no section flag is required.
279 The section and export linkage flags should each appear
280 at most once in a definition. If multiple occurrences
281 are present, QBE is free to use any.
286 Definitions are the essential components of an IL file.
287 They can define three types of objects: aggregate types,
288 data, and functions. Aggregate types are never exported
289 and do not compile to any code. Data and function
290 definitions have file scope and are mutually recursive
291 (even across IL files). Their visibility can be controlled
300 'type' :IDENT '=' ['align' NUMBER]
305 'type' :IDENT '=' ['align' NUMBER]
314 'type' :IDENT '=' 'align' NUMBER '{' NUMBER '}'
316 SUBTY := EXTTY | :IDENT
318 Aggregate type definitions start with the `type` keyword.
319 They have file scope, but types must be defined before being
320 referenced. The inner structure of a type is expressed by a
321 comma-separated list of types enclosed in curly braces.
323 type :fourfloats = { s, s, d, d }
325 For ease of IL generation, a trailing comma is tolerated by
326 the parser. In case many items of the same type are
327 sequenced (like in a C array), the shorter array syntax
330 type :abyteandmanywords = { b, w 100 }
332 By default, the alignment of an aggregate type is the
333 maximum alignment of its members. The alignment can be
334 explicitly specified by the programmer.
336 type :cryptovector = align 16 { w 4 }
338 Union types allow the same chunk of memory to be used with
339 different layouts. They are defined by enclosing multiple
340 regular aggregate type bodies in a pair of curly braces.
341 Size and alignment of union types are set to the maximum size
342 and alignment of each variation or, in the case of alignment,
343 can be explicitly specified.
345 type :un9 = { { b } { s } }
347 Opaque types are used when the inner structure of an
348 aggregate cannot be specified; the alignment for opaque
349 types is mandatory. They are defined simply by enclosing
350 their size between curly braces.
352 type :opaque = align 16 { 32 }
360 'data' $IDENT '=' ['align' NUMBER]
367 $IDENT ['+' NUMBER] # Symbol and offset
368 | '"' ... '"' # String
371 Data definitions express objects that will be emitted in the
372 compiled file. Their visibility and location in the compiled
373 artifact are controlled with linkage flags described in the
374 <@ Linkage > section.
376 They define a global identifier (starting with the sigil
377 `$`), that will contain a pointer to the object specified
380 Objects are described by a sequence of fields that start with
381 a type letter. This letter can either be an extended type,
382 or the `z` letter. If the letter used is an extended type,
383 the data item following specifies the bits to be stored in
384 the field. When several data items follow a letter, they
385 initialize multiple fields of the same size.
387 The members of a struct will be packed. This means that
388 padding has to be emitted by the frontend when necessary.
389 Alignment of the whole data objects can be manually specified,
390 and when no alignment is provided, the maximum alignment from
391 the platform is used.
393 When the `z` letter is used the number following indicates
394 the size of the field; the contents of the field are zero
395 initialized. It can be used to add padding between fields
396 or zero-initialize big arrays.
398 Here are various examples of data definitions.
400 # Three 32-bit values 1, 2, and 3
401 # followed by a 0 byte.
402 data $a = { w 1 2 3, b 0 }
404 # A thousand bytes 0 initialized.
407 # An object containing two 64-bit
408 # fields, one with all bits sets and the
409 # other containing a pointer to the
411 data $c = { l -1, l $c }
419 'function' [ABITY] $IDENT '(' (PARAM), ')' [NL]
425 ABITY %IDENT # Regular parameter
426 | 'env' %IDENT # Environment parameter (first)
427 | '...' # Variadic marker (last)
429 SUBWTY := 'sb' | 'ub' | 'sh' | 'uh' # Sub-word types
430 ABITY := BASETY | SUBWTY | :IDENT
432 Function definitions contain the actual code to emit in
433 the compiled file. They define a global symbol that
434 contains a pointer to the function code. This pointer
435 can be used in `call` instructions or stored in memory.
437 The type given right before the function name is the
438 return type of the function. All return values of this
439 function must have this return type. If the return
440 type is missing, the function must not return any value.
442 The parameter list is a comma separated list of distinct
443 temporary names prefixed by types. The types are used
444 to correctly implement C compatibility. When an argument
445 has an aggregate type, a pointer to the aggregate is passed
446 by the caller. In the example below, we have to use a load
447 instruction to get the value of the first (and only)
448 member of the struct.
452 function w $getone(:one %p) {
458 If a function accepts or returns values that are smaller
459 than a word, such as `signed char` or `unsigned short` in C,
460 one of the sub-word type must be used. The sub-word types
461 `sb`, `ub`, `sh`, and `uh` stand, respectively, for signed
462 and unsigned 8-bit values, and signed and unsigned 16-bit
463 values. Parameters associated with a sub-word type of bit
464 width N only have their N least significant bits set and
465 have base type `w`. For example, the function
467 function w $addbyte(w %a, sb %b) {
474 needs to sign-extend its second argument before the
475 addition. Dually, return values with sub-word types do
476 not need to be sign or zero extended.
478 If the parameter list ends with `...`, the function is
479 a variadic function: it can accept a variable number of
480 arguments. To access the extra arguments provided by
481 the caller, use the `vastart` and `vaarg` instructions
482 described in the <@ Variadic > section.
484 Optionally, the parameter list can start with an
485 environment parameter `env %e`. This special parameter is
486 a 64-bit integer temporary (i.e., of type `l`). If the
487 function does not use its environment parameter, callers
488 can safely omit it. This parameter is invisible to a C
489 caller: for example, the function
491 export function w $add(env %e, w %a, w %b) {
497 must be given the C prototype `int add(int, int)`.
498 The intended use of this feature is to pass the
499 environment pointer of closures while retaining a
500 very good compatibility with C. The <@ Call > section
501 explains how to pass an environment parameter.
503 Since global symbols are defined mutually recursive,
504 there is no need for function declarations: a function
505 can be referenced before its definition.
506 Similarly, functions from other modules can be used
507 without previous declaration. All the type information
508 necessary to compile a call is in the instruction itself.
510 The syntax and semantics for the body of functions
511 are described in the <@ Control > section.
516 The IL represents programs as textual transcriptions of
517 control flow graphs. The control flow is serialized as
518 a sequence of blocks of straight-line code which are
519 connected using jump instructions.
526 @IDENT NL # Block label
527 ( PHI NL )* # Phi instructions
528 ( INST NL )* # Regular instructions
529 JUMP NL # Jump or return
531 All blocks have a name that is specified by a label at
532 their beginning. Then follows a sequence of instructions
533 that have "fall-through" flow. Finally one jump terminates
534 the block. The jump can either transfer control to another
535 block of the same function or return; jumps are described
538 The first block in a function must not be the target of
539 any jump in the program. If a jump to the function start
540 is needed, the frontend must insert an empty prelude block
541 at the beginning of the function.
543 When one block jumps to the next block in the IL file,
544 it is not necessary to write the jump instruction, it
545 will be automatically added by the parser. For example
546 the start block in the example below jumps directly
552 %x =w phi @start 100, @loop %x1
564 'jmp' @IDENT # Unconditional
565 | 'jnz' VAL, @IDENT, @IDENT # Conditional
566 | 'ret' [VAL] # Return
567 | 'hlt' # Termination
569 A jump instruction ends every block and transfers the
570 control to another program location. The target of
571 a jump must never be the first block in a function.
572 The three kinds of jumps available are described in
575 1. Unconditional jump.
577 Simply jumps to another block of the same function.
581 When its word argument is non-zero, it jumps to its
582 first label argument; otherwise it jumps to the other
583 label. The argument must be of word type; because of
584 subtyping a long argument can be passed, but only its
585 least significant 32 bits will be compared to 0.
589 Terminates the execution of the current function,
590 optionally returning a value to the caller. The value
591 returned must be of the type given in the function
592 prototype. If the function prototype does not specify
593 a return type, no return value can be used.
595 4. Program termination.
597 Terminates the execution of the program with a
598 target-dependent error. This instruction can be used
599 when it is expected that the execution never reaches
600 the end of the block it closes; for example, after
601 having called a function such as `exit()`.
606 Instructions are the smallest piece of code in the IL, they
607 form the body of <@ Blocks >. The IL uses a three-address
608 code, which means that one instruction computes an operation
609 between two operands and assigns the result to a third one.
611 An instruction has both a name and a return type, this
612 return type is a base type that defines the size of the
613 instruction's result. The type of the arguments can be
614 unambiguously inferred using the instruction name and the
615 return type. For example, for all arithmetic instructions,
616 the type of the arguments is the same as the return type.
617 The two additions below are valid if `%y` is a word or a long
618 (because of <@ Subtyping >).
623 Some instructions, like comparisons and memory loads
624 have operand types that differ from their return types.
625 For instance, two floating points can be compared to give a
626 word result (0 if the comparison succeeds, 1 if it fails).
630 In the example above, both operands have to have single type.
631 This is made explicit by the instruction suffix.
633 The types of instructions are described below using a short
634 type string. A type string specifies all the valid return
635 types an instruction can have, its arity, and the type of
636 its arguments depending on its return type.
638 Type strings begin with acceptable return types, then
639 follows, in parentheses, the possible types for the arguments.
640 If the N-th return type of the type string is used for an
641 instruction, the arguments must use the N-th type listed for
642 them in the type string. When an instruction does not have a
643 return type, the type string only contains the types of the
646 The following abbreviations are used.
648 * `T` stands for `wlsd`
649 * `I` stands for `wl`
650 * `F` stands for `sd`
651 * `m` stands for the type of pointers on the target; on
652 64-bit architectures it is the same as `l`
654 For example, consider the type string `wl(F)`, it mentions
655 that the instruction has only one argument and that if the
656 return type used is long, the argument must be of type double.
658 ~ Arithmetic and Bits
659 ~~~~~~~~~~~~~~~~~~~~~
661 * `add`, `sub`, `div`, `mul` -- `T(T,T)`
663 * `udiv`, `rem`, `urem` -- `I(I,I)`
664 * `or`, `xor`, `and` -- `I(I,I)`
665 * `sar`, `shr`, `shl` -- `I(I,ww)`
667 The base arithmetic instructions in the first bullet are
668 available for all types, integers and floating points.
670 When `div` is used with word or long return type, the
671 arguments are treated as signed. The unsigned integral
672 division is available as `udiv` instruction. When the
673 result of a division is not an integer, it is truncated
676 The signed and unsigned remainder operations are available
677 as `rem` and `urem`. The sign of the remainder is the same
678 as the one of the dividend. Its magnitude is smaller than
679 the divisor one. These two instructions and `udiv` are only
680 available with integer arguments and result.
682 Bitwise OR, AND, and XOR operations are available for both
683 integer types. Logical operations of typical programming
684 languages can be implemented using <@ Comparisons > and
687 Shift instructions `sar`, `shr`, and `shl`, shift right or
688 left their first operand by the amount from the second
689 operand. The shifting amount is taken modulo the size of
690 the result type. Shifting right can either preserve the
691 sign of the value (using `sar`), or fill the newly freed
692 bits with zeroes (using `shr`). Shifting left always
693 fills the freed bits with zeroes.
695 Remark that an arithmetic shift right (`sar`) is only
696 equivalent to a division by a power of two for non-negative
697 numbers. This is because the shift right "truncates"
698 towards minus infinity, while the division truncates
704 * Store instructions.
706 * `stored` -- `(d,m)`
707 * `stores` -- `(s,m)`
708 * `storel` -- `(l,m)`
709 * `storew` -- `(w,m)`
710 * `storeh` -- `(w,m)`
711 * `storeb` -- `(w,m)`
713 Store instructions exist to store a value of any base type
714 and any extended type. Since halfwords and bytes are not
715 first class in the IL, `storeh` and `storeb` take a word
716 as argument. Only the first 16 or 8 bits of this word will
717 be stored in memory at the address specified in the second
725 * `loadsw`, `loaduw` -- `I(mm)`
726 * `loadsh`, `loaduh` -- `I(mm)`
727 * `loadsb`, `loadub` -- `I(mm)`
729 For types smaller than long, two variants of the load
730 instruction are available: one will sign extend the loaded
731 value, while the other will zero extend it. Note that
732 all loads smaller than long can load to either a long or
735 The two instructions `loadsw` and `loaduw` have the same
736 effect when they are used to define a word temporary.
737 A `loadw` instruction is provided as syntactic sugar for
738 `loadsw` to make explicit that the extension mechanism
743 * `blit` -- `(m,m,w)`
745 The blit instruction copies in-memory data from its
746 first address argument to its second address argument.
747 The third argument is the number of bytes to copy. The
748 source and destination spans are required to be either
749 non-overlapping, or fully overlapping (source address
750 identical to the destination address). The byte count
751 argument must be a nonnegative numeric constant; it
752 cannot be a temporary.
754 One blit instruction may generate a number of
755 instructions proportional to its byte count argument,
756 consequently, it is recommended to keep this argument
757 relatively small. If large copies are necessary, it is
758 preferable that frontends generate calls to a supporting
765 * `alloc16` -- `m(l)`
767 These instructions allocate a chunk of memory on the
768 stack. The number ending the instruction name is the
769 alignment required for the allocated slot. QBE will
770 make sure that the returned address is a multiple of
771 that alignment value.
773 Stack allocation instructions are used, for example,
774 when compiling the C local variables, because their
775 address can be taken. When compiling Fortran,
776 temporaries can be used directly instead, because
777 it is illegal to take the address of a variable.
779 The following example makes use of some of the memory
780 instructions. Pointers are stored in long temporaries.
782 %A0 =l alloc4 8 # stack allocate an array A of 2 words
784 storew 43, %A0 # A[0] <- 43
785 storew 255, %A1 # A[1] <- 255
786 %v1 =w loadw %A0 # %v1 <- A[0] as word
787 %v2 =w loadsb %A1 # %v2 <- A[1] as signed byte
788 %v3 =w add %v1, %v2 # %v3 is 42 here
793 Comparison instructions return an integer value (either a word
794 or a long), and compare values of arbitrary types. The returned
795 value is 1 if the two operands satisfy the comparison
796 relation, or 0 otherwise. The names of comparisons respect
797 a standard naming scheme in three parts.
799 1. All comparisons start with the letter `c`.
801 2. Then comes a comparison type. The following
802 types are available for integer comparisons:
805 * `ne` for inequality
806 * `sle` for signed lower or equal
807 * `slt` for signed lower
808 * `sge` for signed greater or equal
809 * `sgt` for signed greater
810 * `ule` for unsigned lower or equal
811 * `ult` for unsigned lower
812 * `uge` for unsigned greater or equal
813 * `ugt` for unsigned greater
815 Floating point comparisons use one of these types:
818 * `ne` for inequality
819 * `le` for lower or equal
821 * `ge` for greater or equal
823 * `o` for ordered (no operand is a NaN)
824 * `uo` for unordered (at least one operand is a NaN)
826 Because floating point types always have a sign bit,
827 all the comparisons available are signed.
829 3. Finally, the instruction name is terminated with a
830 basic type suffix precising the type of the operands
833 For example, `cod` (`I(dd,dd)`) compares two double-precision
834 floating point numbers and returns 1 if the two floating points
835 are not NaNs, or 0 otherwise. The `csltw` (`I(ww,ww)`)
836 instruction compares two words representing signed numbers and
837 returns 1 when the first argument is smaller than the second one.
842 Conversion operations change the representation of a value,
843 possibly modifying it if the target type cannot hold the value
844 of the source type. Conversions can extend the precision of a
845 temporary (e.g., from signed 8-bit to 32-bit), or convert a
846 floating point into an integer and vice versa.
848 * `extsw`, `extuw` -- `l(w)`
849 * `extsh`, `extuh` -- `I(ww)`
850 * `extsb`, `extub` -- `I(ww)`
862 Extending the precision of a temporary is done using the
863 `ext` family of instructions. Because QBE types do not
864 specify the signedness (like in LLVM), extension instructions
865 exist to sign-extend and zero-extend a value. For example,
866 `extsb` takes a word argument and sign-extends the 8
867 least-significant bits to a full word or long, depending on
870 The instructions `exts` (extend single) and `truncd` (truncate
871 double) are provided to change the precision of a floating
872 point value. When the double argument of `truncd` cannot
873 be represented as a single-precision floating point, it is
874 truncated towards zero.
876 Converting between signed integers and floating points is done
877 using `stosi` (single to signed integer), `stoui` (single to
878 unsigned integer, `dtosi` (double to signed integer), `dtoui`
879 (double to unsigned integer), `swtof` (signed word to float),
880 `uwtof` (unsigned word to float), `sltof` (signed long to
881 float) and `ultof` (unsigned long to float).
883 Because of <@ Subtyping >, there is no need to have an
884 instruction to lower the precision of an integer temporary.
889 The `cast` and `copy` instructions return the bits of their
890 argument verbatim. However a `cast` will change an integer
891 into a floating point of the same width and vice versa.
893 * `cast` -- `wlsd(sdwl)`
896 Casts can be used to make bitwise operations on the
897 representation of floating point numbers. For example
898 the following program will compute the opposite of the
899 single-precision floating point number `%f` into `%rs`.
902 %b1 =w xor 2147483648, %b0 # flip the msb
909 CALL := [%IDENT '=' ABITY] 'call' VAL '(' (ARG), ')'
912 ABITY VAL # Regular argument
913 | 'env' VAL # Environment argument (first)
914 | '...' # Variadic marker
916 SUBWTY := 'sb' | 'ub' | 'sh' | 'uh' # Sub-word types
917 ABITY := BASETY | SUBWTY | :IDENT
919 The call instruction is special in several ways. It is not
920 a three-address instruction and requires the type of all
921 its arguments to be given. Also, the return type can be
922 either a base type or an aggregate type. These specifics
923 are required to compile calls with C compatibility (i.e.,
926 When an aggregate type is used as argument type or return
927 type, the value respectively passed or returned needs to be
928 a pointer to a memory location holding the value. This is
929 because aggregate types are not first-class citizens of
932 Sub-word types are used for arguments and return values
933 of width less than a word. Details on these types are
934 presented in the <@ Functions > section. Arguments with
935 sub-word types need not be sign or zero extended according
936 to their type. Calls with a sub-word return type define
937 a temporary of base type `w` with its most significant bits
940 Unless the called function does not return a value, a
941 return temporary must be specified, even if it is never
944 An environment parameter can be passed as first argument
945 using the `env` keyword. The passed value must be a 64-bit
946 integer. If the called function does not expect an environment
947 parameter, it will be safely discarded. See the <@ Functions >
948 section for more information about environment parameters.
950 When the called function is variadic, there must be a `...`
951 marker separating the named and variadic arguments.
956 The `vastart` and `vaarg` instructions provide a portable
957 way to access the extra parameters of a variadic function.
960 * `vaarg` -- `T(mmmm)`
962 The `vastart` instruction initializes a *variable argument
963 list* used to access the extra parameters of the enclosing
964 variadic function. It is safe to call it multiple times.
966 The `vaarg` instruction fetches the next argument from
967 a variable argument list. It is currently limited to
968 fetching arguments that have a base type. This instruction
969 is essentially effectful: calling it twice in a row will
970 return two consecutive arguments from the argument list.
972 Both instructions take a pointer to a variable argument
973 list as sole argument. The size and alignment of variable
974 argument lists depend on the target used. However, it
975 is possible to conservatively use the maximum size and
976 alignment required by all the targets.
978 type :valist = align 8 { 24 } # For amd64_sysv
979 type :valist = align 8 { 32 } # For arm64
980 type :valist = align 8 { 8 } # For rv64
982 The following example defines a variadic function adding
983 its first three arguments.
985 function s $add3(s %a, ...) {
989 %r =s call $vadd(s %a, l %ap)
993 function s $vadd(s %a, l %ap) {
1006 PHI := %IDENT '=' BASETY 'phi' ( @IDENT VAL ),
1008 First and foremost, phi instructions are NOT necessary when
1009 writing a frontend to QBE. One solution to avoid having to
1010 deal with SSA form is to use stack allocated variables for
1011 all source program variables and perform assignments and
1012 lookups using <@ Memory > operations. This is what LLVM
1015 Another solution is to simply emit code that is not in SSA
1016 form! Contrary to LLVM, QBE is able to fixup programs not
1017 in SSA form without requiring the boilerplate of loading
1018 and storing in memory. For example, the following program
1019 will be correctly compiled by QBE.
1031 Now, if you want to know what phi instructions are and how
1032 to use them in QBE, you can read the following.
1034 Phi instructions are specific to SSA form. In SSA form
1035 values can only be assigned once, without phi instructions,
1036 this requirement is too strong to represent many programs.
1037 For example consider the following C program.
1048 The variable `y` is assigned twice, the solution to
1049 translate it in SSA form is to insert a phi instruction.
1058 %y =w phi @ift 1, @iff 2
1061 Phi instructions return one of their arguments depending
1062 on where the control came from. In the example, `%y` is
1063 set to 1 if the `@ift` branch is taken, or it is set to
1066 An important remark about phi instructions is that QBE
1067 assumes that if a variable is defined by a phi it respects
1068 all the SSA invariants. So it is critical to not use phi
1069 instructions unless you know exactly what you are doing.
1071 - 8. Instructions Index
1072 -----------------------
1074 * <@ Arithmetic and Bits >:
1172 * <@ Cast and Copy > :