1 =====================================
2 Syntax of AMDGPU Instruction Operands
3 =====================================
11 The following notation is used throughout this document:
13 =================== =============================================================================
15 =================== =============================================================================
16 {0..N} Any integer value in the range from 0 to N (inclusive).
17 <x> Syntax and meaning of *x* is explained elsewhere.
18 =================== =============================================================================
20 .. _amdgpu_syn_operands:
30 Vector registers. There are 256 32-bit vector registers.
32 A sequence of *vector* registers may be used to operate with more than 32 bits of data.
34 Assembler currently supports sequences of 1, 2, 3, 4, 8 and 16 *vector* registers.
36 =================================================== ====================================================================
38 =================================================== ====================================================================
39 **v**\<N> A single 32-bit *vector* register.
41 *N* must be a decimal integer number.
42 **v[**\ <N>\ **]** A single 32-bit *vector* register.
44 *N* may be specified as an
45 :ref:`integer number<amdgpu_synid_integer_number>`
46 or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
47 **v[**\ <N>:<K>\ **]** A sequence of (\ *K-N+1*\ ) *vector* registers.
49 *N* and *K* may be specified as
50 :ref:`integer numbers<amdgpu_synid_integer_number>`
51 or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
52 **[v**\ <N>, \ **v**\ <N+1>, ... **v**\ <K>\ **]** A sequence of (\ *K-N+1*\ ) *vector* registers.
54 Register indices must be specified as decimal integer numbers.
55 =================================================== ====================================================================
57 Note. *N* and *K* must satisfy the following conditions:
62 * *K-N+1* must be equal to 1, 2, 3, 4, 8 or 16.
83 Scalar 32-bit registers. The number of available *scalar* registers depends on GPU:
85 ======= ============================
86 GPU Number of *scalar* registers
87 ======= ============================
91 ======= ============================
93 A sequence of *scalar* registers may be used to operate with more than 32 bits of data.
94 Assembler currently supports sequences of 1, 2, 4, 8 and 16 *scalar* registers.
96 Pairs of *scalar* registers must be even-aligned (the first register must be even).
97 Sequences of 4 and more *scalar* registers must be quad-aligned.
99 ======================================================== ====================================================================
101 ======================================================== ====================================================================
102 **s**\ <N> A single 32-bit *scalar* register.
104 *N* must be a decimal integer number.
105 **s[**\ <N>\ **]** A single 32-bit *scalar* register.
107 *N* may be specified as an
108 :ref:`integer number<amdgpu_synid_integer_number>`
109 or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
110 **s[**\ <N>:<K>\ **]** A sequence of (\ *K-N+1*\ ) *scalar* registers.
112 *N* and *K* may be specified as
113 :ref:`integer numbers<amdgpu_synid_integer_number>`
114 or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
115 **[s**\ <N>, \ **s**\ <N+1>, ... **s**\ <K>\ **]** A sequence of (\ *K-N+1*\ ) *scalar* registers.
117 Register indices must be specified as decimal integer numbers.
118 ======================================================== ====================================================================
120 Note. *N* and *K* must satisfy the following conditions:
122 * *N* must be properly aligned based on sequence size.
124 * 0 <= *N* < *SMAX*\ , where *SMAX* is the number of available *scalar* registers.
125 * 0 <= *K* < *SMAX*\ , where *SMAX* is the number of available *scalar* registers.
126 * *K-N+1* must be equal to 1, 2, 4, 8 or 16.
142 Examples of *scalar* registers with an invalid alignment:
149 .. _amdgpu_synid_trap:
154 A set of trap handler registers:
156 * :ref:`ttmp<amdgpu_synid_ttmp>`
157 * :ref:`tba<amdgpu_synid_tba>`
158 * :ref:`tma<amdgpu_synid_tma>`
160 .. _amdgpu_synid_ttmp:
165 Trap handler temporary scalar registers, 32-bits wide.
166 The number of available *ttmp* registers depends on GPU:
168 ======= ===========================
169 GPU Number of *ttmp* registers
170 ======= ===========================
174 ======= ===========================
176 A sequence of *ttmp* registers may be used to operate with more than 32 bits of data.
177 Assembler currently supports sequences of 1, 2, 4, 8 and 16 *ttmp* registers.
179 Pairs of *ttmp* registers must be even-aligned (the first register must be even).
180 Sequences of 4 and more *ttmp* registers must be quad-aligned.
182 ============================================================= ====================================================================
184 ============================================================= ====================================================================
185 **ttmp**\ <N> A single 32-bit *ttmp* register.
187 *N* must be a decimal integer number.
188 **ttmp[**\ <N>\ **]** A single 32-bit *ttmp* register.
190 *N* may be specified as an
191 :ref:`integer number<amdgpu_synid_integer_number>`
192 or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
193 **ttmp[**\ <N>:<K>\ **]** A sequence of (\ *K-N+1*\ ) *ttmp* registers.
195 *N* and *K* may be specified as
196 :ref:`integer numbers<amdgpu_synid_integer_number>`
197 or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
198 **[ttmp**\ <N>, \ **ttmp**\ <N+1>, ... **ttmp**\ <K>\ **]** A sequence of (\ *K-N+1*\ ) *ttmp* registers.
200 Register indices must be specified as decimal integer numbers.
201 ============================================================= ====================================================================
203 Note. *N* and *K* must satisfy the following conditions:
205 * *N* must be properly aligned based on sequence size.
207 * 0 <= *N* < *TMAX*, where *TMAX* is the number of available *ttmp* registers.
208 * 0 <= *K* < *TMAX*, where *TMAX* is the number of available *ttmp* registers.
209 * *K-N+1* must be equal to 1, 2, 4, 8 or 16.
223 [ttmp4,ttmp5,ttmp6,ttmp7]
225 Examples of *ttmp* registers with an invalid alignment:
232 .. _amdgpu_synid_tba:
237 Trap base address, 64-bits wide. Holds the pointer to the current trap handler program.
239 ================== ======================================================================= =============
240 Syntax Description Availability
241 ================== ======================================================================= =============
242 tba 64-bit *trap base address* register. GFX7, GFX8
243 [tba] 64-bit *trap base address* register (an alternative syntax). GFX7, GFX8
244 [tba_lo,tba_hi] 64-bit *trap base address* register (an alternative syntax). GFX7, GFX8
245 ================== ======================================================================= =============
247 High and low 32 bits of *trap base address* may be accessed as separate registers:
249 ================== ======================================================================= =============
250 Syntax Description Availability
251 ================== ======================================================================= =============
252 tba_lo Low 32 bits of *trap base address* register. GFX7, GFX8
253 tba_hi High 32 bits of *trap base address* register. GFX7, GFX8
254 [tba_lo] Low 32 bits of *trap base address* register (an alternative syntax). GFX7, GFX8
255 [tba_hi] High 32 bits of *trap base address* register (an alternative syntax). GFX7, GFX8
256 ================== ======================================================================= =============
258 Note that *tba*, *tba_lo* and *tba_hi* are not accessible as assembler registers in GFX9,
259 but *tba* is readable/writable with the help of *s_get_reg* and *s_set_reg* instructions.
261 .. _amdgpu_synid_tma:
266 Trap memory address, 64-bits wide.
268 ================= ======================================================================= ==================
269 Syntax Description Availability
270 ================= ======================================================================= ==================
271 tma 64-bit *trap memory address* register. GFX7, GFX8
272 [tma] 64-bit *trap memory address* register (an alternative syntax). GFX7, GFX8
273 [tma_lo,tma_hi] 64-bit *trap memory address* register (an alternative syntax). GFX7, GFX8
274 ================= ======================================================================= ==================
276 High and low 32 bits of *trap memory address* may be accessed as separate registers:
278 ================= ======================================================================= ==================
279 Syntax Description Availability
280 ================= ======================================================================= ==================
281 tma_lo Low 32 bits of *trap memory address* register. GFX7, GFX8
282 tma_hi High 32 bits of *trap memory address* register. GFX7, GFX8
283 [tma_lo] Low 32 bits of *trap memory address* register (an alternative syntax). GFX7, GFX8
284 [tma_hi] High 32 bits of *trap memory address* register (an alternative syntax). GFX7, GFX8
285 ================= ======================================================================= ==================
287 Note that *tma*, *tma_lo* and *tma_hi* are not accessible as assembler registers in GFX9,
288 but *tma* is readable/writable with the help of *s_get_reg* and *s_set_reg* instructions.
290 .. _amdgpu_synid_flat_scratch:
295 Flat scratch address, 64-bits wide. Holds the base address of scratch memory.
297 ================================== ================================================================
299 ================================== ================================================================
300 flat_scratch 64-bit *flat scratch* address register.
301 [flat_scratch] 64-bit *flat scratch* address register (an alternative syntax).
302 [flat_scratch_lo,flat_scratch_hi] 64-bit *flat scratch* address register (an alternative syntax).
303 ================================== ================================================================
305 High and low 32 bits of *flat scratch* address may be accessed as separate registers:
307 ========================= =========================================================================
309 ========================= =========================================================================
310 flat_scratch_lo Low 32 bits of *flat scratch* address register.
311 flat_scratch_hi High 32 bits of *flat scratch* address register.
312 [flat_scratch_lo] Low 32 bits of *flat scratch* address register (an alternative syntax).
313 [flat_scratch_hi] High 32 bits of *flat scratch* address register (an alternative syntax).
314 ========================= =========================================================================
316 .. _amdgpu_synid_xnack:
321 Xnack mask, 64-bits wide. Holds a 64-bit mask of which threads
322 received an *XNACK* due to a vector memory operation.
324 .. WARNING:: GFX7 does not support *xnack* feature. Not all GFX8 and GFX9 :ref:`processors<amdgpu-processors>` support *xnack* feature.
328 ============================== =====================================================
330 ============================== =====================================================
331 xnack_mask 64-bit *xnack mask* register.
332 [xnack_mask] 64-bit *xnack mask* register (an alternative syntax).
333 [xnack_mask_lo,xnack_mask_hi] 64-bit *xnack mask* register (an alternative syntax).
334 ============================== =====================================================
336 High and low 32 bits of *xnack mask* may be accessed as separate registers:
338 ===================== ==============================================================
340 ===================== ==============================================================
341 xnack_mask_lo Low 32 bits of *xnack mask* register.
342 xnack_mask_hi High 32 bits of *xnack mask* register.
343 [xnack_mask_lo] Low 32 bits of *xnack mask* register (an alternative syntax).
344 [xnack_mask_hi] High 32 bits of *xnack mask* register (an alternative syntax).
345 ===================== ==============================================================
347 .. _amdgpu_synid_vcc:
352 Vector condition code, 64-bits wide. A bit mask with one bit per thread;
353 it holds the result of a vector compare operation.
355 ================ =========================================================================
357 ================ =========================================================================
358 vcc 64-bit *vector condition code* register.
359 [vcc] 64-bit *vector condition code* register (an alternative syntax).
360 [vcc_lo,vcc_hi] 64-bit *vector condition code* register (an alternative syntax).
361 ================ =========================================================================
363 High and low 32 bits of *vector condition code* may be accessed as separate registers:
365 ================ =========================================================================
367 ================ =========================================================================
368 vcc_lo Low 32 bits of *vector condition code* register.
369 vcc_hi High 32 bits of *vector condition code* register.
370 [vcc_lo] Low 32 bits of *vector condition code* register (an alternative syntax).
371 [vcc_hi] High 32 bits of *vector condition code* register (an alternative syntax).
372 ================ =========================================================================
379 A 32-bit memory register. It has various uses,
380 including register indexing and bounds checking.
382 =========== ===================================================
384 =========== ===================================================
385 m0 A 32-bit *memory* register.
386 [m0] A 32-bit *memory* register (an alternative syntax).
387 =========== ===================================================
389 .. _amdgpu_synid_exec:
394 Execute mask, 64-bits wide. A bit mask with one bit per thread,
395 which is applied to vector instructions and controls which threads execute
396 and which ignore the instruction.
398 ===================== =================================================================
400 ===================== =================================================================
401 exec 64-bit *execute mask* register.
402 [exec] 64-bit *execute mask* register (an alternative syntax).
403 [exec_lo,exec_hi] 64-bit *execute mask* register (an alternative syntax).
404 ===================== =================================================================
406 High and low 32 bits of *execute mask* may be accessed as separate registers:
408 ===================== =================================================================
410 ===================== =================================================================
411 exec_lo Low 32 bits of *execute mask* register.
412 exec_hi High 32 bits of *execute mask* register.
413 [exec_lo] Low 32 bits of *execute mask* register (an alternative syntax).
414 [exec_hi] High 32 bits of *execute mask* register (an alternative syntax).
415 ===================== =================================================================
417 .. _amdgpu_synid_vccz:
422 A single bit-flag indicating that the :ref:`vcc<amdgpu_synid_vcc>` is all zeros.
424 .. WARNING:: This operand is not currently supported by AMDGPU assembler.
426 .. _amdgpu_synid_execz:
431 A single bit flag indicating that the :ref:`exec<amdgpu_synid_exec>` is all zeros.
433 .. WARNING:: This operand is not currently supported by AMDGPU assembler.
435 .. _amdgpu_synid_scc:
440 A single bit flag indicating the result of a scalar compare operation.
442 .. WARNING:: This operand is not currently supported by AMDGPU assembler.
447 A special operand which supplies a 32-bit value
448 fetched from *LDS* memory using :ref:`m0<amdgpu_synid_m0>` as an address.
450 .. WARNING:: This operand is not currently supported by AMDGPU assembler.
452 .. _amdgpu_synid_constant:
457 A set of integer and floating-point *inline constants*:
459 * :ref:`iconst<amdgpu_synid_iconst>`
460 * :ref:`fconst<amdgpu_synid_fconst>`
462 These operands are encoded as a part of instruction.
464 If a number may be encoded as either
465 a :ref:`literal<amdgpu_synid_literal>` or
466 an :ref:`inline constant<amdgpu_synid_constant>`,
467 assembler selects the latter encoding as more efficient.
469 .. _amdgpu_synid_iconst:
474 An :ref:`integer number<amdgpu_synid_integer_number>`
475 encoded as an *inline constant*.
477 Only a small fraction of integer numbers may be encoded as *inline constants*.
478 They are enumerated in the table below.
479 Other integer numbers have to be encoded as :ref:`literals<amdgpu_synid_literal>`.
481 Integer *inline constants* are converted to
482 :ref:`expected operand type<amdgpu_syn_instruction_type>`
483 as described :ref:`here<amdgpu_synid_int_const_conv>`.
485 ================================== ====================================
487 ================================== ====================================
488 {0..64} Positive integer inline constants.
489 {-16..-1} Negative integer inline constants.
490 ================================== ====================================
492 .. WARNING:: GFX7 does not support inline constants for *f16* operands.
494 There are also symbolic inline constants which provide read-only access to H/W registers.
496 .. WARNING:: These inline constants are not currently supported by AMDGPU assembler.
500 ======================== ================================================ =============
501 Syntax Note Availability
502 ======================== ================================================ =============
503 shared_base Base address of shared memory region. GFX9
504 shared_limit Address of the end of shared memory region. GFX9
505 private_base Base address of private memory region. GFX9
506 private_limit Address of the end of private memory region. GFX9
507 pops_exiting_wave_id A dedicated counter for POPS. GFX9
508 ======================== ================================================ =============
510 .. _amdgpu_synid_fconst:
515 A :ref:`floating-point number<amdgpu_synid_floating-point_number>`
516 encoded as an *inline constant*.
518 Only a small fraction of floating-point numbers may be encoded as *inline constants*.
519 They are enumerated in the table below.
520 Other floating-point numbers have to be encoded as :ref:`literals<amdgpu_synid_literal>`.
522 Floating-point *inline constants* are converted to
523 :ref:`expected operand type<amdgpu_syn_instruction_type>`
524 as described :ref:`here<amdgpu_synid_fp_const_conv>`.
526 ===================== ===================================================== ==================
527 Value Note Availability
528 ===================== ===================================================== ==================
529 0.0 The same as integer constant 0. All GPUs
530 0.5 Floating-point constant 0.5 All GPUs
531 1.0 Floating-point constant 1.0 All GPUs
532 2.0 Floating-point constant 2.0 All GPUs
533 4.0 Floating-point constant 4.0 All GPUs
534 -0.5 Floating-point constant -0.5 All GPUs
535 -1.0 Floating-point constant -1.0 All GPUs
536 -2.0 Floating-point constant -2.0 All GPUs
537 -4.0 Floating-point constant -4.0 All GPUs
538 0.1592 1.0/(2.0*pi). Use only for 16-bit operands. GFX8, GFX9
539 0.15915494 1.0/(2.0*pi). Use only for 16- and 32-bit operands. GFX8, GFX9
540 0.15915494309189532 1.0/(2.0*pi). GFX8, GFX9
541 ===================== ===================================================== ==================
543 .. WARNING:: GFX7 does not support inline constants for *f16* operands.
545 .. _amdgpu_synid_literal:
550 A literal is a 64-bit value which is encoded as a separate 32-bit dword in the instruction stream.
552 If a number may be encoded as either
553 a :ref:`literal<amdgpu_synid_literal>` or
554 an :ref:`inline constant<amdgpu_synid_constant>`,
555 assembler selects the latter encoding as more efficient.
557 Literals may be specified as :ref:`integer numbers<amdgpu_synid_integer_number>`,
558 :ref:`floating-point numbers<amdgpu_synid_floating-point_number>` or
559 :ref:`expressions<amdgpu_synid_expression>`
560 (expressions are currently supported for 32-bit operands only).
562 A 64-bit literal value is converted by assembler
563 to an :ref:`expected operand type<amdgpu_syn_instruction_type>`
564 as described :ref:`here<amdgpu_synid_lit_conv>`.
566 An instruction may use only one literal but several operands may refer the same literal.
568 .. _amdgpu_synid_uimm8:
573 A 8-bit positive :ref:`integer number<amdgpu_synid_integer_number>`.
574 The value is encoded as part of the opcode so it is free to use.
576 .. _amdgpu_synid_uimm32:
581 A 32-bit positive :ref:`integer number<amdgpu_synid_integer_number>`.
582 The value is stored as a separate 32-bit dword in the instruction stream.
584 .. _amdgpu_synid_uimm20:
589 A 20-bit positive :ref:`integer number<amdgpu_synid_integer_number>`.
591 .. _amdgpu_synid_uimm21:
596 A 21-bit positive :ref:`integer number<amdgpu_synid_integer_number>`.
598 .. WARNING:: Assembler currently supports 20-bit offsets only. Use :ref:`uimm20<amdgpu_synid_uimm20>` as a replacement.
600 .. _amdgpu_synid_simm21:
605 A 21-bit :ref:`integer number<amdgpu_synid_integer_number>`.
607 .. WARNING:: Assembler currently supports 20-bit unsigned offsets only .Use :ref:`uimm20<amdgpu_synid_uimm20>` as a replacement.
609 .. _amdgpu_synid_off:
614 A special entity which indicates that the value of this operand is not used.
616 ================================== ===================================================
618 ================================== ===================================================
619 off Indicates an unused operand.
620 ================================== ===================================================
623 .. _amdgpu_synid_number:
628 .. _amdgpu_synid_integer_number:
633 Integer numbers are 64 bits wide.
634 They may be specified in binary, octal, hexadecimal and decimal formats:
636 ============== ====================================
638 ============== ====================================
639 Decimal [-]?[1-9][0-9]*
642 Hexadecimal [-]?0x[0-9a-fA-F]+
643 \ [-]?[0x]?[0-9][0-9a-fA-F]*[hH]
644 ============== ====================================
656 .. _amdgpu_synid_floating-point_number:
658 Floating-Point Numbers
659 ----------------------
661 All floating-point numbers are handled as double (64 bits wide).
663 Floating-point numbers may be specified in hexadecimal and decimal formats:
665 ============== ======================================================== ========================================================
667 ============== ======================================================== ========================================================
668 Decimal [-]?[0-9]*[.][0-9]*([eE][+-]?[0-9]*)? Must include either a decimal separator or an exponent.
669 Hexadecimal [-]0x[0-9a-fA-F]*(.[0-9a-fA-F]*)?[pP][+-]?[0-9a-fA-F]+
670 ============== ======================================================== ========================================================
681 .. _amdgpu_synid_expression:
686 An expression specifies an address or a numeric value.
687 There are two kinds of expressions:
689 * :ref:`Absolute<amdgpu_synid_absolute_expression>`.
690 * :ref:`Relocatable<amdgpu_synid_relocatable_expression>`.
692 .. _amdgpu_synid_absolute_expression:
697 The value of an absolute expression remains the same after program relocation.
698 Absolute expressions must not include unassigned and relocatable values
708 .. _amdgpu_synid_relocatable_expression:
710 Relocatable Expressions
711 -----------------------
713 The value of a relocatable expression depends on program relocation.
715 Note that use of relocatable expressions is limited with branch targets
716 and 32-bit :ref:`literals<amdgpu_synid_literal>`.
718 Addition information about relocation may be found :ref:`here<amdgpu-relocation-records>`.
724 y = x + 10 // x is not yet defined. Undefined symbols are assumed to be PC-relative.
730 Expressions and operands of expressions are interpreted as 64-bit integers.
732 Expressions may include 64-bit :ref:`floating-point numbers<amdgpu_synid_floating-point_number>` (double).
733 However these operands are also handled as 64-bit integers
734 using binary representation of specified floating-point numbers.
735 No conversion from floating-point to integer is performed.
741 x = 0.1 // x is assigned an integer 4591870180066957722 which is a binary representation of 0.1.
742 y = x + x // y is a sum of two integer values; it is not equal to 0.2!
747 Expressions are composed of
748 :ref:`symbols<amdgpu_synid_symbol>`,
749 :ref:`integer numbers<amdgpu_synid_integer_number>`,
750 :ref:`floating-point numbers<amdgpu_synid_floating-point_number>`,
751 :ref:`binary operators<amdgpu_synid_expression_bin_op>`,
752 :ref:`unary operators<amdgpu_synid_expression_un_op>` and subexpressions.
754 Expressions may also use "." which is a reference to the current PC (program counter).
756 The syntax of expressions is shown below::
758 expr ::= expr binop expr | primaryexpr ;
760 primaryexpr ::= '(' expr ')' | symbol | number | '.' | unop primaryexpr ;
788 .. _amdgpu_synid_expression_bin_op:
793 Binary operators are described in the following table.
794 They operate on and produce 64-bit integers.
795 Operators with higher priority are performed first.
797 ========== ========= ===============================================
798 Operator Priority Meaning
799 ========== ========= ===============================================
800 \* 5 Integer multiplication.
801 / 5 Integer division.
802 % 5 Integer signed remainder.
803 \+ 4 Integer addition.
804 \- 4 Integer subtraction.
805 << 3 Integer shift left.
806 >> 3 Logical shift right.
807 == 2 Equality comparison.
808 != 2 Inequality comparison.
809 <> 2 Inequality comparison.
810 < 2 Signed less than comparison.
811 <= 2 Signed less than or equal comparison.
812 > 2 Signed greater than comparison.
813 >= 2 Signed greater than or equal comparison.
819 ========== ========= ===============================================
821 .. _amdgpu_synid_expression_un_op:
826 Unary operators are described in the following table.
827 They operate on and produce 64-bit integers.
829 ========== ===============================================
831 ========== ===============================================
834 \+ Integer unary plus.
835 \- Integer unary minus.
836 ========== ===============================================
838 .. _amdgpu_synid_symbol:
843 A symbol is a named 64-bit value, representing a relocatable
844 address or an absolute (non-relocatable) number.
846 Symbol names have the following syntax:
847 ``[a-zA-Z_.][a-zA-Z0-9_$.@]*``
849 The table below provides several examples of syntax used for symbol definition.
851 ================ ==========================================================
853 ================ ==========================================================
854 .globl <S> Declares a global symbol S without assigning it a value.
855 .set <S>, <E> Assigns the value of an expression E to a symbol S.
856 <S> = <E> Assigns the value of an expression E to a symbol S.
857 <S>: Declares a label S and assigns it the current PC value.
858 ================ ==========================================================
860 A symbol may be used before it is declared or assigned;
861 unassigned symbols are assumed to be PC-relative.
863 Addition information about symbols may be found :ref:`here<amdgpu-symbols>`.
865 .. _amdgpu_synid_conv:
870 This section describes what happens when a 64-bit
871 :ref:`integer number<amdgpu_synid_integer_number>`, a
872 :ref:`floating-point numbers<amdgpu_synid_floating-point_number>` or a
873 :ref:`symbol<amdgpu_synid_symbol>`
874 is used for an operand which has a different type or size.
876 Depending on operand kind, this conversion is performed by either assembler or AMDGPU H/W:
878 * Values encoded as :ref:`inline constants<amdgpu_synid_constant>` are handled by H/W.
879 * Values encoded as :ref:`literals<amdgpu_synid_literal>` are converted by assembler.
881 .. _amdgpu_synid_const_conv:
886 .. _amdgpu_synid_int_const_conv:
888 Integer Inline Constants
889 ~~~~~~~~~~~~~~~~~~~~~~~~
891 Integer :ref:`inline constants<amdgpu_synid_constant>`
892 may be thought of as 64-bit
893 :ref:`integer numbers<amdgpu_synid_integer_number>`;
894 when used as operands they are truncated to the size of
895 :ref:`expected operand type<amdgpu_syn_instruction_type>`.
896 No data type conversions are performed.
904 v_add_u16 v0, -1, 0 // v0 = 0xFFFF
905 v_add_f16 v0, -1, 0 // v0 = 0xFFFF (NaN)
907 v_add_u32 v0, -1, 0 // v0 = 0xFFFFFFFF
908 v_add_f32 v0, -1, 0 // v0 = 0xFFFFFFFF (NaN)
910 .. _amdgpu_synid_fp_const_conv:
912 Floating-Point Inline Constants
913 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
915 Floating-point :ref:`inline constants<amdgpu_synid_constant>`
916 may be thought of as 64-bit
917 :ref:`floating-point numbers<amdgpu_synid_floating-point_number>`;
918 when used as operands they are converted to a floating-point number of
919 :ref:`expected operand size<amdgpu_syn_instruction_type>`.
927 v_add_f16 v0, 1.0, 0 // v0 = 0x3C00 (1.0)
928 v_add_u16 v0, 1.0, 0 // v0 = 0x3C00
930 v_add_f32 v0, 1.0, 0 // v0 = 0x3F800000 (1.0)
931 v_add_u32 v0, 1.0, 0 // v0 = 0x3F800000
934 .. _amdgpu_synid_lit_conv:
939 .. _amdgpu_synid_int_lit_conv:
944 Integer :ref:`literals<amdgpu_synid_literal>`
945 are specified as 64-bit :ref:`integer numbers<amdgpu_synid_integer_number>`.
947 When used as operands they are converted to
948 :ref:`expected operand type<amdgpu_syn_instruction_type>` as described below.
950 ============== ============== =============== ====================================================================
951 Expected type Condition Result Note
952 ============== ============== =============== ====================================================================
953 i16, u16, b16 cond(num,16) num.u16 Truncate to 16 bits.
954 i32, u32, b32 cond(num,32) num.u32 Truncate to 32 bits.
955 i64 cond(num,32) {-1,num.i32} Truncate to 32 bits and then sign-extend the result to 64 bits.
956 u64, b64 cond(num,32) { 0,num.u32} Truncate to 32 bits and then zero-extend the result to 64 bits.
957 f16 cond(num,16) num.u16 Use low 16 bits as an f16 value.
958 f32 cond(num,32) num.u32 Use low 32 bits as an f32 value.
959 f64 cond(num,32) {num.u32,0} Use low 32 bits of the number as high 32 bits
960 of the result; low 32 bits of the result are zeroed.
961 ============== ============== =============== ====================================================================
963 The condition *cond(X,S)* indicates if a 64-bit number *X*
964 can be converted to a smaller size *S* by truncation of upper bits.
965 There are two cases when the conversion is possible:
967 * The truncated bits are all 0.
968 * The truncated bits are all 1 and the value after truncation has its MSB bit set.
970 Examples of valid literals:
975 // Literal value after conversion:
976 v_add_u16 v0, 0xff00, v0 // 0xff00
977 v_add_u16 v0, 0xffffffffffffff00, v0 // 0xff00
978 v_add_u16 v0, -256, v0 // 0xff00
979 // Literal value after conversion:
980 s_bfe_i64 s[0:1], 0xffefffff, s3 // 0xffffffffffefffff
981 s_bfe_u64 s[0:1], 0xffefffff, s3 // 0x00000000ffefffff
982 v_ceil_f64_e32 v[0:1], 0xffefffff // 0xffefffff00000000 (-1.7976922776554302e308)
984 Examples of invalid literals:
990 v_add_u16 v0, 0x1ff00, v0 // truncated bits are not all 0 or 1
991 v_add_u16 v0, 0xffffffffffff00ff, v0 // truncated bits do not match MSB of the result
993 .. _amdgpu_synid_fp_lit_conv:
995 Floating-Point Literals
996 ~~~~~~~~~~~~~~~~~~~~~~~
998 Floating-point :ref:`literals<amdgpu_synid_literal>` are specified as 64-bit
999 :ref:`floating-point numbers<amdgpu_synid_floating-point_number>`.
1001 When used as operands they are converted to
1002 :ref:`expected operand type<amdgpu_syn_instruction_type>` as described below.
1004 ============== ============== ================= =================================================================
1005 Expected type Condition Result Note
1006 ============== ============== ================= =================================================================
1007 i16, u16, b16 cond(num,16) f16(num) Convert to f16 and use bits of the result as an integer value.
1008 i32, u32, b32 cond(num,32) f32(num) Convert to f32 and use bits of the result as an integer value.
1009 i64, u64, b64 false \- Conversion disabled because of an unclear semantics.
1010 f16 cond(num,16) f16(num) Convert to f16.
1011 f32 cond(num,32) f32(num) Convert to f32.
1012 f64 true {num.u32.hi,0} Use high 32 bits of the number as high 32 bits of the result;
1013 zero-fill low 32 bits of the result.
1015 Note that the result may differ from the original number.
1016 ============== ============== ================= =================================================================
1018 The condition *cond(X,S)* indicates if an f64 number *X* can be converted
1019 to a smaller *S*-bit floating-point type without overflow or underflow.
1020 Precision lost is allowed.
1022 Examples of valid literals:
1028 v_add_f16 v1, 65500.0, v2
1029 v_add_f32 v1, 65600.0, v2
1031 // Literal value before conversion: 1.7976931348623157e308 (0x7fefffffffffffff)
1032 // Literal value after conversion: 1.7976922776554302e308 (0x7fefffff00000000)
1033 v_ceil_f64 v[0:1], 1.7976931348623157e308
1035 Examples of invalid literals:
1041 v_add_f16 v1, 65600.0, v2 // overflow
1043 .. _amdgpu_synid_exp_conv:
1048 Expressions operate with and result in 64-bit integers.
1050 When used as operands they are truncated to
1051 :ref:`expected operand size<amdgpu_syn_instruction_type>`.
1052 No data type conversions are performed.
1061 v_sqrt_f32 v0, x // v0 = [low 32 bits of 0.1 (double)]
1062 v_sqrt_f32 v0, (0.1 + 0) // the same as above
1063 v_sqrt_f32 v0, 0.1 // v0 = [0.1 (double) converted to float]