4 .CD "as \(en assembler"
5 .SE "AS\(emASSEMBLER [IBM]"
8 This document describes the language accepted by the 80386 assembler
9 that is part of the Amsterdam Compiler Kit. Note that only the syntax is
10 described, only a few 386 instructions are shown as examples.
11 .SS "Tokens, Numbers, Character Constants, and Strings"
13 The syntax of numbers is the same as in C.
14 The constants 32, 040, and 0x20 all represent the same number, but are
15 written in decimal, octal, and hex, respectively.
16 The rules for character constants and strings are also the same as in C.
17 For example, \(fma\(fm is a character constant.
18 A typical string is "string".
19 Expressions may be formed with C operators, but must use [ and ] for
20 parentheses. (Normal parentheses are claimed by the operand syntax.)
23 Symbols contain letters and digits, as well as three special characters:
24 dot, tilde, and underscore.
25 The first character may not be a digit or tilde.
27 The names of the 80386 registers are reserved. These are:
33 ~~~ax, bx, cx, dx, eax, ebx, ecx, edx
35 ~~~si, di, bp, sp, esi, edi, ebp, esp
37 ~~~cs, ds, ss, es, fs, gs
39 The xx and exx variants of the eight general registers are treated as
40 synonyms by the assembler. Normally "ax" is the 16-bit low half of the
41 32-bit "eax" register. The assembler determines if a 16 or 32 bit
42 operation is meant solely by looking at the instruction or the
43 instruction prefixes. It is however best to use the proper registers
44 when writing assembly to not confuse those who read the code.
46 The last group of 6 segment registers are used for selector + offset mode
47 addressing, in which the effective address is at a given offset in one of
50 Names of instructions and pseudo-ops are not reserved.
51 Alphabetic characters in opcodes and pseudo-ops must be in lower case.
54 Commas, blanks, and tabs are separators and can be interspersed freely
55 between tokens, but not within tokens.
56 Commas are only legal between operands.
59 The comment character is \*(OQ!\*(CQ.
60 The rest of the line is ignored.
63 The opcodes are listed below.
64 Notes: (1) Different names for the same instruction are separated by \*(OQ/\*(CQ.
65 (2) Square brackets ([]) indicate that 0 or 1 of the enclosed characters
67 (3) Curly brackets ({}) work similarly, except that one of the
68 enclosed characters \fImust\fR be included.
69 Thus square brackets indicate an option, whereas curly brackets indicate
70 that a choice must be made.
72 .if t .ta 0.25i 1.2i 3i
77 mov[b] dest, source ! Move word/byte from source to dest
79 push source ! Push stack
80 xchg[b] op1, op2 ! Exchange word/byte
82 o16 ! Operate on a 16 bit object instead of 32 bit
86 in[b] source ! Input from source I/O port
87 in[b] ! Input from DX I/O port
88 out[b] dest ! Output to dest I/O port
89 out[b] ! Output to DX I/O port
93 lds reg,source ! Load reg and DS from source
94 les reg,source ! Load reg and ES from source
95 lea reg,source ! Load effect address of source to reg and DS
96 {cdsefg}seg ! Specify seg register for next instruction
97 a16 ! Use 16 bit addressing mode instead of 32 bit
101 lahf ! Load AH from flag register
104 sahf ! Store AH in flag register
108 aaa ! Adjust result of BCD addition
109 add[b] dest,source ! Add
110 adc[b] dest,source ! Add with carry
111 daa ! Decimal Adjust after addition
112 inc[b] dest ! Increment by 1
116 aas ! Adjust result of BCD subtraction
117 sub[b] dest,source ! Subtract
118 sbb[b] dest,source ! Subtract with borrow from dest
119 das ! Decimal adjust after subtraction
120 dec[b] dest ! Decrement by one
122 cmp[b] dest,source ! Compare
126 aam ! Adjust result of BCD multiply
127 imul[b] source ! Signed multiply
128 mul[b] source ! Unsigned multiply
132 aad ! Adjust AX for BCD division
133 o16 cbw ! Sign extend AL into AH
134 o16 cwd ! Sign extend AX into DX
135 cwde ! Sign extend AX into EAX
136 cdq ! Sign extend EAX into EDX
137 idiv[b] source ! Signed divide
138 div[b] source ! Unsigned divide
142 and[b] dest,source ! Logical and
143 not[b] dest ! Logical not
144 or[b] dest,source ! Logical inclusive or
145 test[b] dest,source ! Logical test
146 xor[b] dest,source ! Logical exclusive or
150 sal[b]/shl[b] dest,CL ! Shift logical left
151 sar[b] dest,CL ! Shift arithmetic right
152 shr[b] dest,CL ! Shift logical right
156 rcl[b] dest,CL ! Rotate left, with carry
157 rcr[b] dest,CL ! Rotate right, with carry
158 rol[b] dest,CL ! Rotate left
159 ror[b] dest,CL ! Rotate right
161 .B "String Manipulation"
163 cmps[b] ! Compare string element ds:esi with es:edi
164 lods[b] ! Load from ds:esi into AL, AX, or EAX
165 movs[b] ! Move from ds:esi to es:edi
166 rep ! Repeat next instruction until ECX=0
167 repe/repz ! Repeat next instruction until ECX=0 and ZF=1
168 repne/repnz ! Repeat next instruction until ECX!=0 and ZF=0
169 scas[b] ! Compare ds:esi with AL/AX/EAX
170 stos[b] ! Store AL/AX/EAX in es:edi
173 .B "Control Transfer"
175 \fIAs\fR accepts a number of special jump opcodes that can assemble to
176 instructions with either a byte displacement, which can only reach to targets
177 within \(mi126 to +129 bytes of the branch, or an instruction with a 32-bit
178 displacement. The assembler automatically chooses a byte or word displacement
181 The English translation of the opcodes should be obvious, with
182 \*(OQl(ess)\*(CQ and \*(OQg(reater)\*(CQ for signed comparisions, and
183 \*(OQb(elow)\*(CQ and \*(OQa(bove)*(CQ for unsigned comparisions. There are
184 lots of synonyms to allow you to write "jump if not that" instead of "jump
187 The \*(OQcall\*(CQ, \*(OQjmp\*(CQ, and \*(OQret\*(CQ instructions can be
188 either intrasegment or
189 intersegment. The intersegment versions are indicated with
190 the suffix \*(OQf\*(CQ.
192 .if t .ta 0.25i 1.2i 3i
197 jmp[f] dest ! jump to dest (8 or 32-bit displacement)
198 call[f] dest ! call procedure
199 ret[f] ! return from procedure
203 ja/jnbe ! if above/not below or equal (unsigned)
204 jae/jnb/jnc ! if above or equal/not below/not carry (uns.)
205 jb/jnae/jc ! if not above nor equal/below/carry (unsigned)
206 jbe/jna ! if below or equal/not above (unsigned)
207 jg/jnle ! if greater/not less nor equal (signed)
208 jge/jnl ! if greater or equal/not less (signed)
209 jl/jnqe ! if less/not greater nor equal (signed)
210 jle/jgl ! if less or equal/not greater (signed)
211 je/jz ! if equal/zero
212 jne/jnz ! if not equal/not zero
213 jno ! if overflow not set
215 jnp/jpo ! if parity not set/parity odd
216 jp/jpe ! if parity set/parity even
217 jns ! if sign not set
220 .B "Iteration Control"
222 jcxz dest ! jump if ECX = 0
223 loop dest ! Decrement ECX and jump if CX != 0
224 loope/loopz dest ! Decrement ECX and jump if ECX = 0 and ZF = 1
225 loopne/loopnz dest ! Decrement ECX and jump if ECX != 0 and ZF = 0
229 int n ! Software interrupt n
230 into ! Interrupt if overflow set
231 iretd ! Return from interrupt
235 clc ! Clear carry flag
236 cld ! Clear direction flag
237 cli ! Clear interrupt enable flag
238 cmc ! Complement carry flag
240 std ! Set direction flag
241 sti ! Set interrupt enable flag
244 .SS "Location Counter"
246 The special symbol \*(OQ.\*(CQ is the location counter and its value
247 is the address of the first byte of the instruction in which the symbol
248 appears and can be used in expressions.
251 There are four different assembly segments: text, rom, data and bss.
252 Segments are declared and selected by the \fI.sect\fR pseudo-op. It is
253 customary to declare all segments at the top of an assembly file like
256 ~~~.sect .text; .sect .rom; .sect .data; .sect .bss
258 The assembler accepts up to 16 different segments, but
260 expects only four to be used. Anything can in principle be assembled
261 into any segment, but the
263 bss segment may only contain uninitialized data.
264 Note that the \*(OQ.\*(CQ symbol refers to the location in the current
268 There are two types: name and numeric. Name labels consist of a name
269 followed by a colon (:).
271 The numeric labels are single digits. The nearest 0: label may be
272 referenced as 0f in the forward direction, or 0b backwards.
273 .SS "Statement Syntax"
275 Each line consists of a single statement.
276 Blank or comment lines are allowed.
277 .SS "Instruction Statements"
279 The most general form of an instruction is
281 ~~~label: opcode operand1, operand2 ! comment
283 .SS "Expression Semantics"
286 The following operators can be used:
287 + \(mi * / & | ^ ~ << (shift left) >> (shift right) \(mi (unary minus).
289 32-bit integer arithmetic is used.
290 Division produces a truncated quotient.
291 .SS "Addressing Modes"
293 Below is a list of the addressing modes supported.
294 Each one is followed by an example.
298 constant mov eax, 123456
299 direct access mov eax, (counter)
300 register mov eax, esi
301 indirect mov eax, (esi)
302 base + disp. mov eax, 6(ebp)
303 scaled index mov eax, (4*esi)
304 base + index mov eax, (ebp)(2*esi)
305 base + index + disp. mov eax, 10(edi)(1*esi)
308 Any of the constants or symbols may be replacement by expressions. Direct
309 access, constants and displacements may be any type of expression. A scaled
310 index with scale 1 may be written without the \*(OQ1*\*(CQ.
313 The \*(OQcall\*(CQ and \*(OQjmp\*(CQ instructions can be interpreted
314 as a load into the instruction pointer.
318 call _routine ! Direct, intrasegment
319 call (subloc) ! Indirect, intrasegment
320 call 6(ebp) ! Indirect, intrasegment
321 call ebx ! Direct, intrasegment
322 call (ebx) ! Indirect, intrasegment
323 callf (subloc) ! Indirect, intersegment
324 callf seg:offs ! Direct, intersegment
328 .SS "Symbol Assigment"
331 Symbols can acquire values in one of two ways.
332 Using a symbol as a label sets it to \*(OQ.\*(CQ for the current
333 segment with type relocatable.
334 Alternative, a symbol may be given a name via an assignment of the form
336 ~~~symbol = expression
338 in which the symbol is assigned the value and type of its arguments.
340 .SS "Storage Allocation"
343 Space can be reserved for bytes, words, and longs using pseudo-ops.
344 They take one or more operands, and for each generate a value
345 whose size is a byte, word (2 bytes) or long (4 bytes). For example:
349 .data1 2, 6 ! allocate 2 bytes initialized to 2 and 6
351 .data2 3, 0x10 ! allocate 2 words initialized to 3 and 16
353 .data4 010 ! allocate a longword initialized to 8
355 .space 40 ! allocates 40 bytes of zeros
357 allocates 50 (decimal) bytes of storage, initializing the first two
358 bytes to 2 and 6, the next two words to 3 and 16, then one longword with
359 value 8 (010 octal), last 40 bytes of zeros.
360 .SS "String Allocation"
362 The pseudo-ops \fI.ascii\fR and \fI.asciz\fR
363 take one string argument and generate the ASCII character
364 codes for the letters in the string.
365 The latter automatically terminates the string with a null (0) byte.
374 Sometimes it is necessary to force the next item to begin at a word, longword
375 or even a 16 byte address boundary.
376 The \fI.align\fR pseudo-op zero or more null byte if the current location
377 is a multiple of the argument of .align.
378 .SS "Segment Control"
380 Every item assembled goes in one of the four segments: text, rom, data,
381 or bss. By using the \fI.sect\fR pseudo-op with argument
382 \fI.text, .rom, .data\fR or \fI.bss\fR, the programmer can force the
383 next items to go in a particular segment.
386 A symbol can be given global scope by including it in a \fI.define\fR pseudo-op.
387 Multiple names may be listed, separate by commas.
388 It must be used to export symbols defined in the current program.
389 Names not defined in the current program are treated as "undefined
390 external" automatically, although it is customary to make this explicit
391 with the \fI.extern\fR pseudo-op.
394 The \fI.comm\fR pseudo-op declares storage that can be common to more than
395 one module. There are two arguments: a name and an absolute expression giving
396 the size in bytes of the area named by the symbol.
397 The type of the symbol becomes
398 external. The statement can appear in any segment.
399 If you think this has something to do with FORTRAN, you are right.
402 In the kernel directory, there are several assembly code files that are
403 worth inspecting as examples.
404 However, note that these files, are designed to first be
405 run through the C preprocessor. (The very first character is a # to signal
406 this.) Thus they contain numerous constructs
407 that are not pure assembler.
408 For true assembler examples, compile any C program provided with
410 using the \fB\(enS\fR flag.
411 This will result in an assembly language file with a suffix with the same
412 name as the C source file, but ending with the .s suffix.