llvm/docs/AMDGPUOperandSyntax.rst

   1 =====================================
   2 Syntax of AMDGPU Instruction Operands
   3 =====================================
   4
   5 .. contents::
   6    :local:
   7
   8 Conventions
   9 ===========
  10
  11 The following notation is used throughout this document:
  12
  13     =================== =============================================================================
  14     Notation            Description
  15     =================== =============================================================================
  16     {0..N}              Any integer value in the range from 0 to N (inclusive).
  17     <x>                 Syntax and meaning of *x* is explained elsewhere.
  18     =================== =============================================================================
  19
  20 .. _amdgpu_syn_operands:
  21
  22 Operands
  23 ========
  24
  25 .. _amdgpu_synid_v:
  26
  27 v
  28 -
  29
  30 Vector registers. There are 256 32-bit vector registers.
  31
  32 A sequence of *vector* registers may be used to operate with more than 32 bits of data.
  33
  34 Assembler currently supports sequences of 1, 2, 3, 4, 5, 6, 8, 16 and 32 *vector* registers.
  35
  36     =================================================== ====================================================================
  37     Syntax                                              Description
  38     =================================================== ====================================================================
  39     **v**\<N>                                           A single 32-bit *vector* register.
  40
  41                                                         *N* must be a decimal
  42                                                         :ref:`integer number<amdgpu_synid_integer_number>`.
  43     **v[**\ <N>\ **]**                                  A single 32-bit *vector* register.
  44
  45                                                         *N* may be specified as an
  46                                                         :ref:`integer number<amdgpu_synid_integer_number>`
  47                                                         or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
  48     **v[**\ <N>:<K>\ **]**                              A sequence of (\ *K-N+1*\ ) *vector* registers.
  49
  50                                                         *N* and *K* may be specified as
  51                                                         :ref:`integer numbers<amdgpu_synid_integer_number>`
  52                                                         or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
  53     **[v**\ <N>, \ **v**\ <N+1>, ... **v**\ <K>\ **]**  A sequence of (\ *K-N+1*\ ) *vector* registers.
  54
  55                                                         Register indices must be specified as decimal
  56                                                         :ref:`integer numbers<amdgpu_synid_integer_number>`.
  57     =================================================== ====================================================================
  58
  59 Note: *N* and *K* must satisfy the following conditions:
  60
  61 * *N* <= *K*.
  62 * 0 <= *N* <= 255.
  63 * 0 <= *K* <= 255.
  64 * *K-N+1* must be equal to 1, 2, 3, 4, 5, 6, 8, 16 or 32.
  65
  66 GFX90A has an additional alignment requirement: pairs of *vector* registers must be even-aligned
  67 (first register must be even).
  68
  69 Examples:
  70
  71 .. parsed-literal::
  72
  73   v255
  74   v[0]
  75   v[0:1]
  76   v[1:1]
  77   v[0:3]
  78   v[2*2]
  79   v[1-1:2-1]
  80   [v252]
  81   [v252,v253,v254,v255]
  82
  83 .. _amdgpu_synid_nsa:
  84
  85 GFX10 *Image* instructions may use special *NSA* (Non-Sequential Address) syntax for *image addresses*:
  86
  87     ===================================== =================================================
  88     Syntax                                Description
  89     ===================================== =================================================
  90     **[Vm**, \ **Vn**, ... **Vk**\ **]**  A sequence of 32-bit *vector* registers.
  91                                           Each register may be specified using syntax
  92                                           defined :ref:`above<amdgpu_synid_v>`.
  93
  94                                           In contrast with standard syntax, registers
  95                                           in *NSA* sequence are not required to have
  96                                           consecutive indices. Moreover, the same register
  97                                           may appear in the list more than once.
  98     ===================================== =================================================
  99
 100 Examples:
 101
 102 .. parsed-literal::
 103
 104   [v32,v1,v[2]]
 105   [v[32],v[1:1],[v2]]
 106   [v4,v4,v4,v4]
 107
 108 .. _amdgpu_synid_a:
 109
 110 a
 111 -
 112
 113 Accumulator registers. There are 256 32-bit accumulator registers.
 114
 115 A sequence of *accumulator* registers may be used to operate with more than 32 bits of data.
 116
 117 Assembler currently supports sequences of 1, 2, 3, 4, 5, 6, 8, 16 and 32 *accumulator* registers.
 118
 119     =================================================== ========================================================= ====================================================================
 120     Syntax                                              An Alternative Syntax (SP3)                               Description
 121     =================================================== ========================================================= ====================================================================
 122     **a**\<N>                                           **acc**\<N>                                               A single 32-bit *accumulator* register.
 123
 124                                                                                                                   *N* must be a decimal
 125                                                                                                                   :ref:`integer number<amdgpu_synid_integer_number>`.
 126     **a[**\ <N>\ **]**                                  **acc[**\ <N>\ **]**                                      A single 32-bit *accumulator* register.
 127
 128                                                                                                                   *N* may be specified as an
 129                                                                                                                   :ref:`integer number<amdgpu_synid_integer_number>`
 130                                                                                                                   or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
 131     **a[**\ <N>:<K>\ **]**                              **acc[**\ <N>:<K>\ **]**                                  A sequence of (\ *K-N+1*\ ) *accumulator* registers.
 132
 133                                                                                                                   *N* and *K* may be specified as
 134                                                                                                                   :ref:`integer numbers<amdgpu_synid_integer_number>`
 135                                                                                                                   or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
 136     **[a**\ <N>, \ **a**\ <N+1>, ... **a**\ <K>\ **]**  **[acc**\ <N>, \ **acc**\ <N+1>, ... **acc**\ <K>\ **]**  A sequence of (\ *K-N+1*\ ) *accumulator* registers.
 137
 138                                                                                                                   Register indices must be specified as decimal
 139                                                                                                                   :ref:`integer numbers<amdgpu_synid_integer_number>`.
 140     =================================================== ========================================================= ====================================================================
 141
 142 Note: *N* and *K* must satisfy the following conditions:
 143
 144 * *N* <= *K*.
 145 * 0 <= *N* <= 255.
 146 * 0 <= *K* <= 255.
 147 * *K-N+1* must be equal to 1, 2, 3, 4, 5, 6, 8, 16 or 32.
 148
 149 GFX90A has an additional alignment requirement: pairs of *accumulator* registers must be even-aligned
 150 (first register must be even).
 151
 152 Examples:
 153
 154 .. parsed-literal::
 155
 156   a255
 157   a[0]
 158   a[0:1]
 159   a[1:1]
 160   a[0:3]
 161   a[2*2]
 162   a[1-1:2-1]
 163   [a252]
 164   [a252,a253,a254,a255]
 165
 166   acc0
 167   acc[1]
 168   [acc250]
 169   [acc2,acc3]
 170
 171 .. _amdgpu_synid_s:
 172
 173 s
 174 -
 175
 176 Scalar 32-bit registers. The number of available *scalar* registers depends on GPU:
 177
 178     ======= ============================
 179     GPU     Number of *scalar* registers
 180     ======= ============================
 181     GFX7    104
 182     GFX8    102
 183     GFX9    102
 184     GFX10   106
 185     ======= ============================
 186
 187 A sequence of *scalar* registers may be used to operate with more than 32 bits of data.
 188 Assembler currently supports sequences of 1, 2, 4, 8, 16 and 32 *scalar* registers.
 189
 190 Pairs of *scalar* registers must be even-aligned (first register must be even).
 191 Sequences of 4 and more *scalar* registers must be quad-aligned.
 192
 193     ======================================================== ====================================================================
 194     Syntax                                                   Description
 195     ======================================================== ====================================================================
 196     **s**\ <N>                                               A single 32-bit *scalar* register.
 197
 198                                                              *N* must be a decimal
 199                                                              :ref:`integer number<amdgpu_synid_integer_number>`.
 200
 201     **s[**\ <N>\ **]**                                       A single 32-bit *scalar* register.
 202
 203                                                              *N* may be specified as an
 204                                                              :ref:`integer number<amdgpu_synid_integer_number>`
 205                                                              or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
 206     **s[**\ <N>:<K>\ **]**                                   A sequence of (\ *K-N+1*\ ) *scalar* registers.
 207
 208                                                              *N* and *K* may be specified as
 209                                                              :ref:`integer numbers<amdgpu_synid_integer_number>`
 210                                                              or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
 211
 212     **[s**\ <N>, \ **s**\ <N+1>, ... **s**\ <K>\ **]**       A sequence of (\ *K-N+1*\ ) *scalar* registers.
 213
 214                                                              Register indices must be specified as decimal
 215                                                              :ref:`integer numbers<amdgpu_synid_integer_number>`.
 216     ======================================================== ====================================================================
 217
 218 Note: *N* and *K* must satisfy the following conditions:
 219
 220 * *N* must be properly aligned based on sequence size.
 221 * *N* <= *K*.
 222 * 0 <= *N* < *SMAX*\ , where *SMAX* is the number of available *scalar* registers.
 223 * 0 <= *K* < *SMAX*\ , where *SMAX* is the number of available *scalar* registers.
 224 * *K-N+1* must be equal to 1, 2, 4, 8, 16 or 32.
 225
 226 Examples:
 227
 228 .. parsed-literal::
 229
 230   s0
 231   s[0]
 232   s[0:1]
 233   s[1:1]
 234   s[0:3]
 235   s[2*2]
 236   s[1-1:2-1]
 237   [s4]
 238   [s4,s5,s6,s7]
 239
 240 Examples of *scalar* registers with an invalid alignment:
 241
 242 .. parsed-literal::
 243
 244   s[1:2]
 245   s[2:5]
 246
 247 .. _amdgpu_synid_trap:
 248
 249 trap
 250 ----
 251
 252 A set of trap handler registers:
 253
 254 * :ref:`ttmp<amdgpu_synid_ttmp>`
 255 * :ref:`tba<amdgpu_synid_tba>`
 256 * :ref:`tma<amdgpu_synid_tma>`
 257
 258 .. _amdgpu_synid_ttmp:
 259
 260 ttmp
 261 ----
 262
 263 Trap handler temporary scalar registers, 32-bits wide.
 264 The number of available *ttmp* registers depends on GPU:
 265
 266     ======= ===========================
 267     GPU     Number of *ttmp* registers
 268     ======= ===========================
 269     GFX7    12
 270     GFX8    12
 271     GFX9    16
 272     GFX10   16
 273     ======= ===========================
 274
 275 A sequence of *ttmp* registers may be used to operate with more than 32 bits of data.
 276 Assembler currently supports sequences of 1, 2, 4, 8 and 16 *ttmp* registers.
 277
 278 Pairs of *ttmp* registers must be even-aligned (first register must be even).
 279 Sequences of 4 and more *ttmp* registers must be quad-aligned.
 280
 281     ============================================================= ====================================================================
 282     Syntax                                                        Description
 283     ============================================================= ====================================================================
 284     **ttmp**\ <N>                                                 A single 32-bit *ttmp* register.
 285
 286                                                                   *N* must be a decimal
 287                                                                   :ref:`integer number<amdgpu_synid_integer_number>`.
 288     **ttmp[**\ <N>\ **]**                                         A single 32-bit *ttmp* register.
 289
 290                                                                   *N* may be specified as an
 291                                                                   :ref:`integer number<amdgpu_synid_integer_number>`
 292                                                                   or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
 293     **ttmp[**\ <N>:<K>\ **]**                                     A sequence of (\ *K-N+1*\ ) *ttmp* registers.
 294
 295                                                                   *N* and *K* may be specified as
 296                                                                   :ref:`integer numbers<amdgpu_synid_integer_number>`
 297                                                                   or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
 298     **[ttmp**\ <N>, \ **ttmp**\ <N+1>, ... **ttmp**\ <K>\ **]**   A sequence of (\ *K-N+1*\ ) *ttmp* registers.
 299
 300                                                                   Register indices must be specified as decimal
 301                                                                   :ref:`integer numbers<amdgpu_synid_integer_number>`.
 302     ============================================================= ====================================================================
 303
 304 Note: *N* and *K* must satisfy the following conditions:
 305
 306 * *N* must be properly aligned based on sequence size.
 307 * *N* <= *K*.
 308 * 0 <= *N* < *TMAX*, where *TMAX* is the number of available *ttmp* registers.
 309 * 0 <= *K* < *TMAX*, where *TMAX* is the number of available *ttmp* registers.
 310 * *K-N+1* must be equal to 1, 2, 4, 8 or 16.
 311
 312 Examples:
 313
 314 .. parsed-literal::
 315
 316   ttmp0
 317   ttmp[0]
 318   ttmp[0:1]
 319   ttmp[1:1]
 320   ttmp[0:3]
 321   ttmp[2*2]
 322   ttmp[1-1:2-1]
 323   [ttmp4]
 324   [ttmp4,ttmp5,ttmp6,ttmp7]
 325
 326 Examples of *ttmp* registers with an invalid alignment:
 327
 328 .. parsed-literal::
 329
 330   ttmp[1:2]
 331   ttmp[2:5]
 332
 333 .. _amdgpu_synid_tba:
 334
 335 tba
 336 ---
 337
 338 Trap base address, 64-bits wide. Holds the pointer to the current trap handler program.
 339
 340     ================== ======================================================================= =============
 341     Syntax             Description                                                             Availability
 342     ================== ======================================================================= =============
 343     tba                64-bit *trap base address* register.                                    GFX7, GFX8
 344     [tba]              64-bit *trap base address* register (an SP3 syntax).                    GFX7, GFX8
 345     [tba_lo,tba_hi]    64-bit *trap base address* register (an SP3 syntax).                    GFX7, GFX8
 346     ================== ======================================================================= =============
 347
 348 High and low 32 bits of *trap base address* may be accessed as separate registers:
 349
 350     ================== ======================================================================= =============
 351     Syntax             Description                                                             Availability
 352     ================== ======================================================================= =============
 353     tba_lo             Low 32 bits of *trap base address* register.                            GFX7, GFX8
 354     tba_hi             High 32 bits of *trap base address* register.                           GFX7, GFX8
 355     [tba_lo]           Low 32 bits of *trap base address* register (an SP3 syntax).            GFX7, GFX8
 356     [tba_hi]           High 32 bits of *trap base address* register (an SP3 syntax).           GFX7, GFX8
 357     ================== ======================================================================= =============
 358
 359 Note that *tba*, *tba_lo* and *tba_hi* are not accessible as assembler registers in GFX9 and GFX10,
 360 but *tba* is readable/writable with the help of *s_get_reg* and *s_set_reg* instructions.
 361
 362 .. _amdgpu_synid_tma:
 363
 364 tma
 365 ---
 366
 367 Trap memory address, 64-bits wide.
 368
 369     ================= ======================================================================= ==================
 370     Syntax            Description                                                             Availability
 371     ================= ======================================================================= ==================
 372     tma               64-bit *trap memory address* register.                                  GFX7, GFX8
 373     [tma]             64-bit *trap memory address* register (an SP3 syntax).                  GFX7, GFX8
 374     [tma_lo,tma_hi]   64-bit *trap memory address* register (an SP3 syntax).                  GFX7, GFX8
 375     ================= ======================================================================= ==================
 376
 377 High and low 32 bits of *trap memory address* may be accessed as separate registers:
 378
 379     ================= ======================================================================= ==================
 380     Syntax            Description                                                             Availability
 381     ================= ======================================================================= ==================
 382     tma_lo            Low 32 bits of *trap memory address* register.                          GFX7, GFX8
 383     tma_hi            High 32 bits of *trap memory address* register.                         GFX7, GFX8
 384     [tma_lo]          Low 32 bits of *trap memory address* register (an SP3 syntax).          GFX7, GFX8
 385     [tma_hi]          High 32 bits of *trap memory address* register (an SP3 syntax).         GFX7, GFX8
 386     ================= ======================================================================= ==================
 387
 388 Note that *tma*, *tma_lo* and *tma_hi* are not accessible as assembler registers in GFX9 and GFX10,
 389 but *tma* is readable/writable with the help of *s_get_reg* and *s_set_reg* instructions.
 390
 391 .. _amdgpu_synid_flat_scratch:
 392
 393 flat_scratch
 394 ------------
 395
 396 Flat scratch address, 64-bits wide. Holds the base address of scratch memory.
 397
 398     ================================== ================================================================
 399     Syntax                             Description
 400     ================================== ================================================================
 401     flat_scratch                       64-bit *flat scratch* address register.
 402     [flat_scratch]                     64-bit *flat scratch* address register (an SP3 syntax).
 403     [flat_scratch_lo,flat_scratch_hi]  64-bit *flat scratch* address register (an SP3 syntax).
 404     ================================== ================================================================
 405
 406 High and low 32 bits of *flat scratch* address may be accessed as separate registers:
 407
 408     ========================= =========================================================================
 409     Syntax                    Description
 410     ========================= =========================================================================
 411     flat_scratch_lo           Low 32 bits of *flat scratch* address register.
 412     flat_scratch_hi           High 32 bits of *flat scratch* address register.
 413     [flat_scratch_lo]         Low 32 bits of *flat scratch* address register (an SP3 syntax).
 414     [flat_scratch_hi]         High 32 bits of *flat scratch* address register (an SP3 syntax).
 415     ========================= =========================================================================
 416
 417 Note that *flat_scratch*, *flat_scratch_lo* and *flat_scratch_hi* are not accessible as assembler
 418 registers in GFX10, but *flat_scratch* is readable/writable with the help of
 419 *s_get_reg* and *s_set_reg* instructions.
 420
 421 .. _amdgpu_synid_xnack:
 422 .. _amdgpu_synid_xnack_mask:
 423
 424 xnack_mask
 425 ----------
 426
 427 Xnack mask, 64-bits wide. Holds a 64-bit mask of which threads
 428 received an *XNACK* due to a vector memory operation.
 429
 430 .. WARNING:: GFX7 does not support *xnack* feature. For availability of this feature in other GPUs, refer :ref:`this table<amdgpu-processors>`.
 431
 432 \
 433
 434     ============================== =====================================================
 435     Syntax                         Description
 436     ============================== =====================================================
 437     xnack_mask                     64-bit *xnack mask* register.
 438     [xnack_mask]                   64-bit *xnack mask* register (an SP3 syntax).
 439     [xnack_mask_lo,xnack_mask_hi]  64-bit *xnack mask* register (an SP3 syntax).
 440     ============================== =====================================================
 441
 442 High and low 32 bits of *xnack mask* may be accessed as separate registers:
 443
 444     ===================== ==============================================================
 445     Syntax                Description
 446     ===================== ==============================================================
 447     xnack_mask_lo         Low 32 bits of *xnack mask* register.
 448     xnack_mask_hi         High 32 bits of *xnack mask* register.
 449     [xnack_mask_lo]       Low 32 bits of *xnack mask* register (an SP3 syntax).
 450     [xnack_mask_hi]       High 32 bits of *xnack mask* register (an SP3 syntax).
 451     ===================== ==============================================================
 452
 453 Note that *xnack_mask*, *xnack_mask_lo* and *xnack_mask_hi* are not accessible as assembler
 454 registers in GFX10, but *xnack_mask* is readable/writable with the help of
 455 *s_get_reg* and *s_set_reg* instructions.
 456
 457 .. _amdgpu_synid_vcc:
 458 .. _amdgpu_synid_vcc_lo:
 459
 460 vcc
 461 ---
 462
 463 Vector condition code, 64-bits wide. A bit mask with one bit per thread;
 464 it holds the result of a vector compare operation.
 465
 466 Note that GFX10 H/W does not use high 32 bits of *vcc* in *wave32* mode.
 467
 468     ================ =========================================================================
 469     Syntax           Description
 470     ================ =========================================================================
 471     vcc              64-bit *vector condition code* register.
 472     [vcc]            64-bit *vector condition code* register (an SP3 syntax).
 473     [vcc_lo,vcc_hi]  64-bit *vector condition code* register (an SP3 syntax).
 474     ================ =========================================================================
 475
 476 High and low 32 bits of *vector condition code* may be accessed as separate registers:
 477
 478     ================ =========================================================================
 479     Syntax           Description
 480     ================ =========================================================================
 481     vcc_lo           Low 32 bits of *vector condition code* register.
 482     vcc_hi           High 32 bits of *vector condition code* register.
 483     [vcc_lo]         Low 32 bits of *vector condition code* register (an SP3 syntax).
 484     [vcc_hi]         High 32 bits of *vector condition code* register (an SP3 syntax).
 485     ================ =========================================================================
 486
 487 .. _amdgpu_synid_m0:
 488
 489 m0
 490 --
 491
 492 A 32-bit memory register. It has various uses,
 493 including register indexing and bounds checking.
 494
 495     =========== ===================================================
 496     Syntax      Description
 497     =========== ===================================================
 498     m0          A 32-bit *memory* register.
 499     [m0]        A 32-bit *memory* register (an SP3 syntax).
 500     =========== ===================================================
 501
 502 .. _amdgpu_synid_exec:
 503
 504 exec
 505 ----
 506
 507 Execute mask, 64-bits wide. A bit mask with one bit per thread,
 508 which is applied to vector instructions and controls which threads execute
 509 and which ignore the instruction.
 510
 511 Note that GFX10 H/W does not use high 32 bits of *exec* in *wave32* mode.
 512
 513     ===================== =================================================================
 514     Syntax                Description
 515     ===================== =================================================================
 516     exec                  64-bit *execute mask* register.
 517     [exec]                64-bit *execute mask* register (an SP3 syntax).
 518     [exec_lo,exec_hi]     64-bit *execute mask* register (an SP3 syntax).
 519     ===================== =================================================================
 520
 521 High and low 32 bits of *execute mask* may be accessed as separate registers:
 522
 523     ===================== =================================================================
 524     Syntax                Description
 525     ===================== =================================================================
 526     exec_lo               Low 32 bits of *execute mask* register.
 527     exec_hi               High 32 bits of *execute mask* register.
 528     [exec_lo]             Low 32 bits of *execute mask* register (an SP3 syntax).
 529     [exec_hi]             High 32 bits of *execute mask* register (an SP3 syntax).
 530     ===================== =================================================================
 531
 532 .. _amdgpu_synid_vccz:
 533
 534 vccz
 535 ----
 536
 537 A single bit flag indicating that the :ref:`vcc<amdgpu_synid_vcc>` is all zeros.
 538
 539 Note: when GFX10 operates in *wave32* mode, this register reflects state of :ref:`vcc_lo<amdgpu_synid_vcc_lo>`.
 540
 541 .. _amdgpu_synid_execz:
 542
 543 execz
 544 -----
 545
 546 A single bit flag indicating that the :ref:`exec<amdgpu_synid_exec>` is all zeros.
 547
 548 Note: when GFX10 operates in *wave32* mode, this register reflects state of :ref:`exec_lo<amdgpu_synid_exec>`.
 549
 550 .. _amdgpu_synid_scc:
 551
 552 scc
 553 ---
 554
 555 A single bit flag indicating the result of a scalar compare operation.
 556
 557 .. _amdgpu_synid_lds_direct:
 558
 559 lds_direct
 560 ----------
 561
 562 A special operand which supplies a 32-bit value
 563 fetched from *LDS* memory using :ref:`m0<amdgpu_synid_m0>` as an address.
 564
 565 .. _amdgpu_synid_null:
 566
 567 null
 568 ----
 569
 570 This is a special operand which may be used as a source or a destination.
 571
 572 When used as a destination, the result of the operation is discarded.
 573
 574 When used as a source, it supplies zero value.
 575
 576 GFX10 only.
 577
 578 .. WARNING:: Due to a H/W bug, this operand cannot be used with VALU instructions in first generation of GFX10.
 579
 580 .. _amdgpu_synid_constant:
 581
 582 inline constant
 583 ---------------
 584
 585 An *inline constant* is an integer or a floating-point value encoded as a part of an instruction.
 586 Compare *inline constants* with :ref:`literals<amdgpu_synid_literal>`.
 587
 588 Inline constants include:
 589
 590 * :ref:`iconst<amdgpu_synid_iconst>`
 591 * :ref:`fconst<amdgpu_synid_fconst>`
 592 * :ref:`ival<amdgpu_synid_ival>`
 593
 594 If a number may be encoded as either
 595 a :ref:`literal<amdgpu_synid_literal>` or
 596 a :ref:`constant<amdgpu_synid_constant>`,
 597 assembler selects the latter encoding as more efficient.
 598
 599 .. _amdgpu_synid_iconst:
 600
 601 iconst
 602 ~~~~~~
 603
 604 An :ref:`integer number<amdgpu_synid_integer_number>` or
 605 an :ref:`absolute expression<amdgpu_synid_absolute_expression>`
 606 encoded as an *inline constant*.
 607
 608 Only a small fraction of integer numbers may be encoded as *inline constants*.
 609 They are enumerated in the table below.
 610 Other integer numbers have to be encoded as :ref:`literals<amdgpu_synid_literal>`.
 611
 612     ================================== ====================================
 613     Value                              Note
 614     ================================== ====================================
 615     {0..64}                            Positive integer inline constants.
 616     {-16..-1}                          Negative integer inline constants.
 617     ================================== ====================================
 618
 619 .. WARNING:: GFX7 does not support inline constants for *f16* operands.
 620
 621 .. _amdgpu_synid_fconst:
 622
 623 fconst
 624 ~~~~~~
 625
 626 A :ref:`floating-point number<amdgpu_synid_floating-point_number>`
 627 encoded as an *inline constant*.
 628
 629 Only a small fraction of floating-point numbers may be encoded as *inline constants*.
 630 They are enumerated in the table below.
 631 Other floating-point numbers have to be encoded as :ref:`literals<amdgpu_synid_literal>`.
 632
 633     ===================== ===================================================== ==================
 634     Value                 Note                                                  Availability
 635     ===================== ===================================================== ==================
 636     0.0                   The same as integer constant 0.                       All GPUs
 637     0.5                   Floating-point constant 0.5                           All GPUs
 638     1.0                   Floating-point constant 1.0                           All GPUs
 639     2.0                   Floating-point constant 2.0                           All GPUs
 640     4.0                   Floating-point constant 4.0                           All GPUs
 641     -0.5                  Floating-point constant -0.5                          All GPUs
 642     -1.0                  Floating-point constant -1.0                          All GPUs
 643     -2.0                  Floating-point constant -2.0                          All GPUs
 644     -4.0                  Floating-point constant -4.0                          All GPUs
 645     0.1592                1.0/(2.0*pi). Use only for 16-bit operands.           GFX8, GFX9, GFX10
 646     0.15915494            1.0/(2.0*pi). Use only for 16- and 32-bit operands.   GFX8, GFX9, GFX10
 647     0.15915494309189532   1.0/(2.0*pi).                                         GFX8, GFX9, GFX10
 648     ===================== ===================================================== ==================
 649
 650 .. WARNING:: Floating-point inline constants cannot be used with *16-bit integer* operands. \
 651              Assembler will attempt to encode these values as literals.
 652
 653 .. WARNING:: GFX7 does not support inline constants for *f16* operands.
 654
 655 .. _amdgpu_synid_ival:
 656
 657 ival
 658 ~~~~
 659
 660 A symbolic operand encoded as an *inline constant*.
 661 These operands provide read-only access to H/W registers.
 662
 663     ======================== ================================================ =============
 664     Syntax                   Note                                             Availability
 665     ======================== ================================================ =============
 666     shared_base              Base address of shared memory region.            GFX9, GFX10
 667     shared_limit             Address of the end of shared memory region.      GFX9, GFX10
 668     private_base             Base address of private memory region.           GFX9, GFX10
 669     private_limit            Address of the end of private memory region.     GFX9, GFX10
 670     pops_exiting_wave_id     A dedicated counter for POPS.                    GFX9, GFX10
 671     ======================== ================================================ =============
 672
 673 .. _amdgpu_synid_literal:
 674
 675 literal
 676 -------
 677
 678 A *literal* is a 64-bit value encoded as a separate 32-bit dword in the instruction stream.
 679 Compare *literals* with :ref:`inline constants<amdgpu_synid_constant>`.
 680
 681 If a number may be encoded as either
 682 a :ref:`literal<amdgpu_synid_literal>` or
 683 an :ref:`inline constant<amdgpu_synid_constant>`,
 684 assembler selects the latter encoding as more efficient.
 685
 686 Literals may be specified as :ref:`integer numbers<amdgpu_synid_integer_number>`,
 687 :ref:`floating-point numbers<amdgpu_synid_floating-point_number>`,
 688 :ref:`absolute expressions<amdgpu_synid_absolute_expression>` or
 689 :ref:`relocatable expressions<amdgpu_synid_relocatable_expression>`.
 690
 691 An instruction may use only one literal but several operands may refer the same literal.
 692
 693 .. _amdgpu_synid_uimm8:
 694
 695 uimm8
 696 -----
 697
 698 A 8-bit :ref:`integer number<amdgpu_synid_integer_number>`
 699 or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
 700 The value must be in the range 0..0xFF.
 701
 702 .. _amdgpu_synid_uimm32:
 703
 704 uimm32
 705 ------
 706
 707 A 32-bit :ref:`integer number<amdgpu_synid_integer_number>`
 708 or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
 709 The value must be in the range 0..0xFFFFFFFF.
 710
 711 .. _amdgpu_synid_uimm20:
 712
 713 uimm20
 714 ------
 715
 716 A 20-bit :ref:`integer number<amdgpu_synid_integer_number>`
 717 or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
 718
 719 The value must be in the range 0..0xFFFFF.
 720
 721 .. _amdgpu_synid_simm21:
 722
 723 simm21
 724 ------
 725
 726 A 21-bit :ref:`integer number<amdgpu_synid_integer_number>`
 727 or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
 728
 729 The value must be in the range -0x100000..0x0FFFFF.
 730
 731 .. _amdgpu_synid_off:
 732
 733 off
 734 ---
 735
 736 A special entity which indicates that the value of this operand is not used.
 737
 738     ================================== ===================================================
 739     Syntax                             Description
 740     ================================== ===================================================
 741     off                                Indicates an unused operand.
 742     ================================== ===================================================
 743
 744
 745 .. _amdgpu_synid_number:
 746
 747 Numbers
 748 =======
 749
 750 .. _amdgpu_synid_integer_number:
 751
 752 Integer Numbers
 753 ---------------
 754
 755 Integer numbers are 64 bits wide.
 756 They are converted to :ref:`expected operand type<amdgpu_syn_instruction_type>`
 757 as described :ref:`here<amdgpu_synid_int_conv>`.
 758
 759 Integer numbers may be specified in binary, octal, hexadecimal and decimal formats:
 760
 761     ============ =============================== ========
 762     Format       Syntax                          Example
 763     ============ =============================== ========
 764     Decimal      [-]?[1-9][0-9]*                 -1234
 765     Binary       [-]?0b[01]+                     0b1010
 766     Octal        [-]?0[0-7]+                     010
 767     Hexadecimal  [-]?0x[0-9a-fA-F]+              0xff
 768     \            [-]?[0x]?[0-9][0-9a-fA-F]*[hH]  0ffh
 769     ============ =============================== ========
 770
 771 .. _amdgpu_synid_floating-point_number:
 772
 773 Floating-Point Numbers
 774 ----------------------
 775
 776 All floating-point numbers are handled as double (64 bits wide).
 777 They are converted to
 778 :ref:`expected operand type<amdgpu_syn_instruction_type>`
 779 as described :ref:`here<amdgpu_synid_fp_conv>`.
 780
 781 Floating-point numbers may be specified in hexadecimal and decimal formats:
 782
 783     ============ ======================================================== ====================== ====================
 784     Format       Syntax                                                   Examples               Note
 785     ============ ======================================================== ====================== ====================
 786     Decimal      [-]?[0-9]*[.][0-9]*([eE][+-]?[0-9]*)?                    -1.234, 234e2          Must include either
 787                                                                                                  a decimal separator
 788                                                                                                  or an exponent.
 789     Hexadecimal  [-]0x[0-9a-fA-F]*(.[0-9a-fA-F]*)?[pP][+-]?[0-9a-fA-F]+   -0x1afp-10, 0x.1afp10
 790     ============ ======================================================== ====================== ====================
 791
 792 .. _amdgpu_synid_expression:
 793
 794 Expressions
 795 ===========
 796
 797 An expression is evaluated to a 64-bit integer.
 798 Note that floating-point expressions are not supported.
 799
 800 There are two kinds of expressions:
 801
 802 * :ref:`Absolute<amdgpu_synid_absolute_expression>`.
 803 * :ref:`Relocatable<amdgpu_synid_relocatable_expression>`.
 804
 805 .. _amdgpu_synid_absolute_expression:
 806
 807 Absolute Expressions
 808 --------------------
 809
 810 The value of an absolute expression does not change after program relocation.
 811 Absolute expressions must not include unassigned and relocatable values
 812 such as labels.
 813
 814 Absolute expressions are evaluated to 64-bit integer values and converted to
 815 :ref:`expected operand type<amdgpu_syn_instruction_type>`
 816 as described :ref:`here<amdgpu_synid_int_conv>`.
 817
 818 Examples:
 819
 820 .. parsed-literal::
 821
 822     x = -1
 823     y = x + 10
 824
 825 .. _amdgpu_synid_relocatable_expression:
 826
 827 Relocatable Expressions
 828 -----------------------
 829
 830 The value of a relocatable expression depends on program relocation.
 831
 832 Note that use of relocatable expressions is limited with branch targets
 833 and 32-bit integer operands.
 834
 835 A relocatable expression is evaluated to a 64-bit integer value
 836 which depends on operand kind and :ref:`relocation type<amdgpu-relocation-records>`
 837 of symbol(s) used in the expression. For example, if an instruction refers a label,
 838 this reference is evaluated to an offset from the address after the instruction
 839 to the label address:
 840
 841 .. parsed-literal::
 842
 843     label:
 844     v_add_co_u32_e32 v0, vcc, label, v1  // 'label' operand is evaluated to -4
 845
 846 Note that values of relocatable expressions are usually unknown at assembly time;
 847 they are resolved later by a linker and converted to
 848 :ref:`expected operand type<amdgpu_syn_instruction_type>`
 849 as described :ref:`here<amdgpu_synid_rl_conv>`.
 850
 851 Operands and Operations
 852 -----------------------
 853
 854 Expressions are composed of 64-bit integer operands and operations.
 855 Operands include :ref:`integer numbers<amdgpu_synid_integer_number>`
 856 and :ref:`symbols<amdgpu_synid_symbol>`.
 857
 858 Expressions may also use "." which is a reference to the current PC (program counter).
 859
 860 :ref:`Unary<amdgpu_synid_expression_un_op>` and :ref:`binary<amdgpu_synid_expression_bin_op>`
 861 operations produce 64-bit integer results.
 862
 863 Syntax of Expressions
 864 ---------------------
 865
 866 Syntax of expressions is shown below::
 867
 868     expr ::= expr binop expr | primaryexpr ;
 869
 870     primaryexpr ::= '(' expr ')' | symbol | number | '.' | unop primaryexpr ;
 871
 872     binop ::= '&&'
 873             | '||'
 874             | '|'
 875             | '^'
 876             | '&'
 877             | '!'
 878             | '=='
 879             | '!='
 880             | '<>'
 881             | '<'
 882             | '<='
 883             | '>'
 884             | '>='
 885             | '<<'
 886             | '>>'
 887             | '+'
 888             | '-'
 889             | '*'
 890             | '/'
 891             | '%' ;
 892
 893     unop ::= '~'
 894            | '+'
 895            | '-'
 896            | '!' ;
 897
 898 .. _amdgpu_synid_expression_bin_op:
 899
 900 Binary Operators
 901 ----------------
 902
 903 Binary operators are described in the following table.
 904 They operate on and produce 64-bit integers.
 905 Operators with higher priority are performed first.
 906
 907     ========== ========= ===============================================
 908     Operator   Priority  Meaning
 909     ========== ========= ===============================================
 910        \*         5      Integer multiplication.
 911        /          5      Integer division.
 912        %          5      Integer signed remainder.
 913        \+         4      Integer addition.
 914        \-         4      Integer subtraction.
 915        <<         3      Integer shift left.
 916        >>         3      Logical shift right.
 917        ==         2      Equality comparison.
 918        !=         2      Inequality comparison.
 919        <>         2      Inequality comparison.
 920        <          2      Signed less than comparison.
 921        <=         2      Signed less than or equal comparison.
 922        >          2      Signed greater than comparison.
 923        >=         2      Signed greater than or equal comparison.
 924       \|          1      Bitwise or.
 925        ^          1      Bitwise xor.
 926        &          1      Bitwise and.
 927        &&         0      Logical and.
 928        ||         0      Logical or.
 929     ========== ========= ===============================================
 930
 931 .. _amdgpu_synid_expression_un_op:
 932
 933 Unary Operators
 934 ---------------
 935
 936 Unary operators are described in the following table.
 937 They operate on and produce 64-bit integers.
 938
 939     ========== ===============================================
 940     Operator   Meaning
 941     ========== ===============================================
 942        !       Logical negation.
 943        ~       Bitwise negation.
 944        \+      Integer unary plus.
 945        \-      Integer unary minus.
 946     ========== ===============================================
 947
 948 .. _amdgpu_synid_symbol:
 949
 950 Symbols
 951 -------
 952
 953 A symbol is a named 64-bit integer value, representing a relocatable
 954 address or an absolute (non-relocatable) number.
 955
 956 Symbol names have the following syntax:
 957     ``[a-zA-Z_.][a-zA-Z0-9_$.@]*``
 958
 959 The table below provides several examples of syntax used for symbol definition.
 960
 961     ================ ==========================================================
 962     Syntax           Meaning
 963     ================ ==========================================================
 964     .globl <S>       Declares a global symbol S without assigning it a value.
 965     .set <S>, <E>    Assigns the value of an expression E to a symbol S.
 966     <S> = <E>        Assigns the value of an expression E to a symbol S.
 967     <S>:             Declares a label S and assigns it the current PC value.
 968     ================ ==========================================================
 969
 970 A symbol may be used before it is declared or assigned;
 971 unassigned symbols are assumed to be PC-relative.
 972
 973 Additional information about symbols may be found :ref:`here<amdgpu-symbols>`.
 974
 975 .. _amdgpu_synid_conv:
 976
 977 Type and Size Conversion
 978 ========================
 979
 980 This section describes what happens when a 64-bit
 981 :ref:`integer number<amdgpu_synid_integer_number>`, a
 982 :ref:`floating-point number<amdgpu_synid_floating-point_number>` or an
 983 :ref:`expression<amdgpu_synid_expression>`
 984 is used for an operand which has a different type or size.
 985
 986 .. _amdgpu_synid_int_conv:
 987
 988 Conversion of Integer Values
 989 ----------------------------
 990
 991 Instruction operands may be specified as 64-bit :ref:`integer numbers<amdgpu_synid_integer_number>` or
 992 :ref:`absolute expressions<amdgpu_synid_absolute_expression>`. These values are converted to
 993 the :ref:`expected operand type<amdgpu_syn_instruction_type>` using the following steps:
 994
 995 1. *Validation*. Assembler checks if the input value may be truncated without loss to the required *truncation width*
 996 (see the table below). There are two cases when this operation is enabled:
 997
 998     * The truncated bits are all 0.
 999     * The truncated bits are all 1 and the value after truncation has its MSB bit set.
1000
1001 In all other cases assembler triggers an error.
1002
1003 2. *Conversion*. The input value is converted to the expected type as described in the table below.
1004 Depending on operand kind, this conversion is performed by either assembler or AMDGPU H/W (or both).
1005
1006     ============== ================= =============== ====================================================================
1007     Expected type  Truncation Width  Conversion      Description
1008     ============== ================= =============== ====================================================================
1009     i16, u16, b16  16                num.u16         Truncate to 16 bits.
1010     i32, u32, b32  32                num.u32         Truncate to 32 bits.
1011     i64            32                {-1,num.i32}    Truncate to 32 bits and then sign-extend the result to 64 bits.
1012     u64, b64       32                {0,num.u32}     Truncate to 32 bits and then zero-extend the result to 64 bits.
1013     f16            16                num.u16         Use low 16 bits as an f16 value.
1014     f32            32                num.u32         Use low 32 bits as an f32 value.
1015     f64            32                {num.u32,0}     Use low 32 bits of the number as high 32 bits
1016                                                      of the result; low 32 bits of the result are zeroed.
1017     ============== ================= =============== ====================================================================
1018
1019 Examples of enabled conversions:
1020
1021 .. parsed-literal::
1022
1023     // GFX9
1024
1025     v_add_u16 v0, -1, 0                   // src0 = 0xFFFF
1026     v_add_f16 v0, -1, 0                   // src0 = 0xFFFF (NaN)
1027                                           //
1028     v_add_u32 v0, -1, 0                   // src0 = 0xFFFFFFFF
1029     v_add_f32 v0, -1, 0                   // src0 = 0xFFFFFFFF (NaN)
1030                                           //
1031     v_add_u16 v0, 0xff00, v0              // src0 = 0xff00
1032     v_add_u16 v0, 0xffffffffffffff00, v0  // src0 = 0xff00
1033     v_add_u16 v0, -256, v0                // src0 = 0xff00
1034                                           //
1035     s_bfe_i64 s[0:1], 0xffefffff, s3      // src0 = 0xffffffffffefffff
1036     s_bfe_u64 s[0:1], 0xffefffff, s3      // src0 = 0x00000000ffefffff
1037     v_ceil_f64_e32 v[0:1], 0xffefffff     // src0 = 0xffefffff00000000 (-1.7976922776554302e308)
1038                                           //
1039     x = 0xffefffff                        //
1040     s_bfe_i64 s[0:1], x, s3               // src0 = 0xffffffffffefffff
1041     s_bfe_u64 s[0:1], x, s3               // src0 = 0x00000000ffefffff
1042     v_ceil_f64_e32 v[0:1], x              // src0 = 0xffefffff00000000 (-1.7976922776554302e308)
1043
1044 Examples of disabled conversions:
1045
1046 .. parsed-literal::
1047
1048     // GFX9
1049
1050     v_add_u16 v0, 0x1ff00, v0               // truncated bits are not all 0 or 1
1051     v_add_u16 v0, 0xffffffffffff00ff, v0    // truncated bits do not match MSB of the result
1052
1053 .. _amdgpu_synid_fp_conv:
1054
1055 Conversion of Floating-Point Values
1056 -----------------------------------
1057
1058 Instruction operands may be specified as 64-bit :ref:`floating-point numbers<amdgpu_synid_floating-point_number>`.
1059 These values are converted to the :ref:`expected operand type<amdgpu_syn_instruction_type>` using the following steps:
1060
1061 1. *Validation*. Assembler checks if the input f64 number can be converted
1062 to the *required floating-point type* (see the table below) without overflow or underflow.
1063 Precision lost is allowed. If this conversion is not possible, assembler triggers an error.
1064
1065 2. *Conversion*. The input value is converted to the expected type as described in the table below.
1066 Depending on operand kind, this is performed by either assembler or AMDGPU H/W (or both).
1067
1068     ============== ================ ================= =================================================================
1069     Expected type  Required FP Type Conversion        Description
1070     ============== ================ ================= =================================================================
1071     i16, u16, b16  f16              f16(num)          Convert to f16 and use bits of the result as an integer value.
1072                                                       The value has to be encoded as a literal or an error occurs.
1073                                                       Note that the value cannot be encoded as an inline constant.
1074     i32, u32, b32  f32              f32(num)          Convert to f32 and use bits of the result as an integer value.
1075     i64, u64, b64  \-               \-                Conversion disabled.
1076     f16            f16              f16(num)          Convert to f16.
1077     f32            f32              f32(num)          Convert to f32.
1078     f64            f64              {num.u32.hi,0}    Use high 32 bits of the number as high 32 bits of the result;
1079                                                       zero-fill low 32 bits of the result.
1080
1081                                                       Note that the result may differ from the original number.
1082     ============== ================ ================= =================================================================
1083
1084 Examples of enabled conversions:
1085
1086 .. parsed-literal::
1087
1088     // GFX9
1089
1090     v_add_f16 v0, 1.0, 0        // src0 = 0x3C00 (1.0)
1091     v_add_u16 v0, 1.0, 0        // src0 = 0x3C00
1092                                 //
1093     v_add_f32 v0, 1.0, 0        // src0 = 0x3F800000 (1.0)
1094     v_add_u32 v0, 1.0, 0        // src0 = 0x3F800000
1095
1096                                 // src0 before conversion:
1097                                 //   1.7976931348623157e308 = 0x7fefffffffffffff
1098                                 // src0 after conversion:
1099                                 //   1.7976922776554302e308 = 0x7fefffff00000000
1100     v_ceil_f64 v[0:1], 1.7976931348623157e308
1101
1102     v_add_f16 v1, 65500.0, v2   // ok for f16.
1103     v_add_f32 v1, 65600.0, v2   // ok for f32, but would result in overflow for f16.
1104
1105 Examples of disabled conversions:
1106
1107 .. parsed-literal::
1108
1109     // GFX9
1110
1111     v_add_f16 v1, 65600.0, v2    // overflow
1112
1113 .. _amdgpu_synid_rl_conv:
1114
1115 Conversion of Relocatable Values
1116 --------------------------------
1117
1118 :ref:`Relocatable expressions<amdgpu_synid_relocatable_expression>`
1119 may be used with 32-bit integer operands and jump targets.
1120
1121 When the value of a relocatable expression is resolved by a linker, it is
1122 converted as needed and truncated to the operand size. The conversion depends
1123 on :ref:`relocation type<amdgpu-relocation-records>` and operand kind.
1124
1125 For example, when a 32-bit operand of an instruction refers a relocatable expression *expr*,
1126 this reference is evaluated to a 64-bit offset from the address after the
1127 instruction to the address being referenced, *counted in bytes*.
1128 Then the value is truncated to 32 bits and encoded as a literal:
1129
1130 .. parsed-literal::
1131
1132     expr = .
1133     v_add_co_u32_e32 v0, vcc, expr, v1  // 'expr' operand is evaluated to -4
1134                                         // and then truncated to 0xFFFFFFFC
1135
1136 As another example, when a branch instruction refers a label,
1137 this reference is evaluated to an offset from the address after the
1138 instruction to the label address, *counted in dwords*.
1139 Then the value is truncated to 16 bits:
1140
1141 .. parsed-literal::
1142
1143     label:
1144     s_branch label  // 'label' operand is evaluated to -1 and truncated to 0xFFFF