1 \
FLK simple
postfix assembler
3 \
Copyright (C) 1998 Lars Krueger
5 \
This file is
part of FLK.
7 \
FLK is
free software
; you
can redistribute
it and/or
8 \ modify
it under the
terms of the
GNU General Public License
9 \
as published by the
Free Software Foundation; either version
2
10 \
of the
License, or (at
your option) any later version
.
12 \
This program is
distributed in the
hope that
it will
be useful
,
13 \ but
WITHOUT ANY WARRANTY; without
even the
implied warranty
of
14 \
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
15 \
GNU General Public License for more
details.
17 \
You should have
received a copy
of the GNU General Public License
18 \ along
with this
program; if not
, write
to the Free Software
19 \
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
21 \ $
Id: flkasm
.fs
,v
1.14 1998/09/20 09:34:14 root
Exp $
23 \
Revision 1.14 1998/09/20 09:34:14 root
24 \ changed ebp offset
to byte
(saving
3 bytes
per stack flush
)
26 \
Revision 1.13 1998/09/13 15:42:04 root
27 \ introduced separate
control flow stack
29 \
Revision 1.12 1998/07/16 19:31:37 root
30 \ added conditional
near jumps
32 \
Revision 1.11 1998/07/13 18:08:54 root
35 \
Revision 1.10 1998/07/03 20:57:50 root
38 \
Revision 1.9 1998/07/03 09:09:28 root
39 \ support
for level
2 optimizer
41 \
Revision 1.8 1998/06/08 22:14:51 root
42 \
added SIB-addressing
44 \
Revision 1.7 1998/06/03 16:31:37 root
47 \
Revision 1.6 1998/05/01 18:11:25 root
48 \
GNU license
text added
51 \
Revision 1.5 1998/04/30 09:42:25 root
54 \
Revision 1.4 1998/04/24 16:47:39 root
55 \
added some
instructions (fpu
)
56 \ corrected allocator
-store
58 \
Revision 1.3 1998/04/16 18:43:49 root
59 \ flow
control improved
61 \
Revision 1.2 1998/04/15 18:15:30 root
64 \
Revision 1.1 1998/04/07 20:10:33 root
68 \
This file implements
a classical
postfix assembler
for Intel 386+ and based
69 \ on that
a register allocator
for the optimizer. All operations are
70 \
"src dest op," or "dest op," . A postfix assembler is
controlled by
a few
71 \ variables
that contain information about
the type of operand
, its
size,
72 \ register number
and offset
. One set of variables
is required
per possible
73 \ operand i
.e
. a CPU supports
commands like
"add r0,r1,r17" meaning
"add the
74 \ contents of r1 and r17 and store the result in r0", three
sets of variables
75 \ are
used. Apart from some
stranger multiplication
operations the Intel 386
76 \ never uses
more than
2 operands
, further referred
to as source and
79 \
At the begin of the operation and after each stored
operation a pointer has
80 \
to be re-set that tells
the system whether
the source or the destination set
81 \
of variables
is meant
with the given operand
.
83 \
For a detailed description
of the 386 and above
see Intels manuals.
85 \
The next few constants
are here
for readability
.
87 \
The size of the operand.
88 0 CONSTANT SZ-8 \ al
through dh
89 1 CONSTANT SZ-32 \ eax
through esi
90 2 CONSTANT SZ-UNKNOWN \
for memory references
92 \
The type of the operand.
93 0 CONSTANT RT-REG \ register
94 1 CONSTANT RT-INDEX \
[register
]
95 2 CONSTANT RT-IMMED \ literal
96 3 CONSTANT RT-ABS \
[literal
]
98 \
The scale of a sib
-byte
104 \
Some error messages
are nessesary too
.
105 \
When this
messages occurs
something really went
wrong.
106 : (internal
-error) ( -- )
107 CR ." This is an internal assembler error." CR BYE ;
108 \
This message is printed whenever an invalid combination
of operands
is
109 \ found
, i
.e
. after
0 [ebp
] 0 [ecx
] mov
, .
110 : (unknown
-combination
) ( -- )
111 CR ." Illegal or unimplemented combination of operands." CR ABORT ;
112 \
If called
with a true flag in TOS (top
of stack
) the register cache
is full
.
113 : TooManyRegs IF ." Too many registers requested." ABORT THEN ;
114 \
Called with TRUE when the sizes of the operands are different.
115 : (sz
-mismatch
) IF ." Mismatching sizes of operands." ABORT THEN ;
117 \
Called to complain about
unknown size.
118 : (unknown-size) ( size -- )
119 SZ-UNKNOWN = IF ." Unknown operand size." ABORT THEN ;
121 \
Called to complain about
wrong size.
122 : (wrong-size) ( wrong? -- )
123 IF ." Wrong operand size." ABORT THEN ;
126 \
These words check
a type value for the given property
.
127 : isreg
? ( type -- flag )
129 : ismem
? ( type -- flag )
130 DUP RT-INDEX = \
type flag2
132 : isr
/m
? ( type -- flag )
133 DUP isreg
? \
type flag1
135 : isimm
? ( type -- flag )
138 \
The sets of variables
containing the details of the operands.
139 VARIABLE (ts
) VARIABLE (td
) \
type
140 VARIABLE (rs
) VARIABLE (rd
) \
register number
141 VARIABLE (ss
) VARIABLE (sd
) \
size
142 VARIABLE (os
) VARIABLE (od
) \ offset
/immediate
...
144 VARIABLE (sb
) VARIABLE (db
) \ sib base
145 VARIABLE (sc
) VARIABLE (dc
) \ sib
scale
146 VARIABLE (si
) VARIABLE (di
) \ sib index
148 \
Word operations in the 386+ are distinguished from byte
operations by
149 \ setting
Bit 0 in the opcode. The following VALUE contains
this Bit 0
152 \
To distinguish between
the "normal" (non
-SIB) addressing
and the
153 \
SIB-addressing mode
the following word is used. Setting if to TRUE makes
the
154 \
mod/rm byte
compiler switch
to SIB-mode
. It is reset
by (asm
-reset
).
157 \
To generate
byte-offsets
on demand
the next VALUE is used. Does it contain
158 \
TRUE, byte offsets are generated.
159 FALSE VALUE (byte-offs
)
161 \
Prefix to switch
to byte offsets.
162 : BOFFS TRUE TO (byte-offs
) ;
164 \
This word simply
sets the word/byte bit in the opcode depending
on the state
166 : (w
+) ( opcode -- opcode )
169 \
This VALUE is the pointer telling whether
source or destination VARIABLEs
173 \
Since no syntax checking
is performed (How? Why?) this check gives
a minimum
174 \ security
that the VARIABLEs contain valid numbers
.
175 : (#operands?) ( n -- )
176 2 > IF ." Too many operands." ABORT THEN ;
178 \
Before giving the first operand to an
operation the source/destination
179 \ pointer must
be set to its
initial state. This can be done by the user who
180 \ will
forget that now and then or by the system that never forgets
this.
181 \
Therefore this word must
be executed before
assembling the first instruction
182 \
and after
the assembly of each instruction
.
186 \
To advance
the pointer
from source to destination call
this word. A check on
187 \ valid number
of operands is placed here
because this word is called by all
188 \
words that advance
the pointer
.
190 (#operands) DUP (#operands?)
193 \
The next word delivers
the address
of a source or destination VARIABLE
194 \ depending
on the flag given. TRUE means
the destination set, FALSE the
196 : {s
/d
} ( flag a-src
a-dst
-- addr
)
203 \
For safety another
check of valid
operand number
is performed before
the
204 \
word decideds
between source and destination depending
on the state of the
205 \
source/destination pointer
.
206 : (s
/d
) ( addr
-source addr-dest
-- addr )
207 (#operands) DUP (#operands?)
210 \
These words return
the address of the variable depending
on the state of the
211 \
source/destination pointer
.
212 : (type) (ts
) (td
) (s
/d
) ;
213 : (size) (ss
) (sd
) (s
/d
) ;
214 : (regnum
) (rs
) (rd
) (s
/d
) ;
215 : (offs
) (os
) (od
) (s
/d
) ;
217 : (sib
-scale) (sc
) (dc
) (s
/d
) ;
218 : (sib
-index
) (si
) (di
) (s
/d
) ;
219 : (sib
-base
) (sb
) (db
) (s
/d
) ;
221 \
These words return
the address of the variable depending
on the flag given.
222 \
FALSE means
source, TRUE destination. All words have
the stack effect
224 : {type} (ts
) (td
) {s
/d
} ;
225 : {size} (ss
) (sd
) {s
/d
} ;
226 : {regnum
} (rs
) (rd
) {s
/d
} ;
227 : {offs
} (os
) (od
) {s
/d
} ;
229 : {sib
-scale} (sc
) (dc
) {s
/d
} ;
230 : {sib
-index
} (si
) (di
) {s
/d
} ;
231 : {sib
-base
} (sb
) (db
) {s
/d
} ;
233 \
Since every instructions requires a different number of operands these
234 \
words perform the check.
235 : need0op
( addr u
-- )
237 ." Operation " TYPE ." needs no operands." CR BYE
240 : need1op
( addr u
-- )
242 ." Operation " TYPE ." needs one operand." CR BYE
245 : need2op
( addr u
-- )
247 ." Operation " TYPE ." needs two operands." CR BYE
250 \
When dealing
with relocation tables
one needs
to know whether
a value has to
251 \
be relocated or not
. This VALUE does
the job
.
252 FALSE VALUE (relocate
?)
254 \
In some
cases the user of the assembler needs
to override the default of not
255 \ relocating
, i
.e
. when loading registers
with a value that is an address, so
256 \
the next word turns
relocation on for the the next call of ar
?, .
257 : +relocate
( -- ) TRUE TO (relocate
?) ;
259 \
Relocate the next compiled constant
depending on the flag.
260 : ?+relocate
( flag -- ) IF +relocate
THEN ;
262 \
This word chooses
the right
version of a the comma-operator
depending on the
263 \
relocation flag, which
is reset after
use.
265 (relocate
?) IF asm
-r
, ELSE asm
-, THEN
266 FALSE TO (relocate
?) ;
268 \
Access to the source and destination operand variables
is quite
often used,
269 \ so make
the job easier
and define
a few shortcuts
.
279 \
Whenever the current operand should be a register, this word is called. It
280 \
is used both
by the words for using
a specific
register and the meta
281 \ registers
. Remember to give
the size, because the number of the register
282 \ alone
doesn't tell which register is meant.
283 : (#reg) ( num size -- )
289 \ Refer to an indexed address. The base (or offset) is given by the user, the
290 \ number comes either from the words below or from the allocator.
291 : (#[reg]) ( offset num -- )
294 (offs) ! (+operand) ;
296 \ Refer to an indexed address. Base, index, scale and offset are given by the
298 : (#[sib]) ( offs scale index base -- )
304 (offs) ! (+operand) ;
309 \ Refer to an absolute address. It is nessesary to relocate this address.
312 (offs) ! (+operand) +relocate ;
314 \ The value in TOS is an immediate value. Even though it is allowed only as a
315 \ source, this is not checked here. If the value is an address, +relocate must
316 \ be called before calling the operation assembling word.
319 (offs) ! (+operand) ;
323 \ Mark the current operand as a double-word (32 bit) or byte (8 bit).
324 \ Nessesary before xx ## [eax] mov,
325 : DWORD SZ-32 (size) ! ;
326 : BYTE SZ-8 (size) ! ;
328 \ Symbolic names for the registers are easier to remember, so provide words
329 \ for both 32 bit and 8 bit registers. Because of the register bl and the
330 \ constant BL collision, the 8 bit registers begin with "reg-" .
331 : eax 0 SZ-32 (#reg) ;
332 : ecx 1 SZ-32 (#reg) ;
333 : edx 2 SZ-32 (#reg) ;
334 : ebx 3 SZ-32 (#reg) ;
335 : esp 4 SZ-32 (#reg) ;
336 : ebp 5 SZ-32 (#reg) ;
337 : esi 6 SZ-32 (#reg) ;
338 : edi 7 SZ-32 (#reg) ;
340 : reg-al 0 SZ-8 (#reg) ;
341 : reg-cl 1 SZ-8 (#reg) ;
342 : reg-dl 2 SZ-8 (#reg) ;
343 : reg-bl 3 SZ-8 (#reg) ;
344 : reg-ah 4 SZ-8 (#reg) ;
345 : reg-ch 5 SZ-8 (#reg) ;
346 : reg-dh 6 SZ-8 (#reg) ;
347 : reg-bh 7 SZ-8 (#reg) ;
357 \ This word performs three task. (i) it adjustes the sizes of operands, if one
358 \ size is unknown, (ii) it stops the execution if the sizes of both operands
359 \ and (iii) sets the word operation flag.
360 : (check-sizes) ( -- )
361 ss@ SZ-UNKNOWN = IF sd@ (ss) ! ELSE
362 sd@ SZ-UNKNOWN = IF ss@ (sd) ! THEN THEN
363 ss@ SZ-UNKNOWN = sd@ SZ-UNKNOWN = AND (sz-mismatch)
364 ss@ sd@ <> (sz-mismatch)
365 ss@ SZ-32 = IF 1 ELSE 0 THEN
368 \ Writing an offset or not depends on the type of the memory operand. mop is
369 \ TRUE when the memory operand is the destination operand, rop when the
370 \ register operand is.
371 : (offs/rm), ( mop rop -- )
372 OVER DUP {offs} @ \ mop rop mop offs
373 SWAP {regnum} @ 5 <> SWAP 0=
374 AND IF ( [reg] ) \ mop rop
377 ELSE ( n [reg] ) \ mop rop
378 OVER SWAP \ mop mop rop
380 {regnum} @ + SWAP \ hmod/rm mop
381 {offs} @ \ hmod/rm offs
390 \ This word compiles a mod/rm byte from the settings of operand types,
391 \ register names etc. if no SIB-byte is required.
392 : (mod/rm-no-sib) ( mop rop -- )
394 {type} @ \ mop rop mtype
395 RT-ABS OVER = IF DROP \ mop rop
396 {regnum} @ 8 * 5 + asm-c, \ mop
399 RT-REG = IF \ mop rop
400 192 SWAP {regnum} @ 8 * + \ mop mod/rm
401 SWAP {regnum} @ + asm-c, \
402 ELSE ( index ) \ mop rop
407 \ This word compiles a mod/rm byte from the settings of operand types,
408 \ register names etc. if a SIB-byte is required.
409 : (mod/rm-sib) ( mop rop -- )
410 {regnum} @ 8 * OVER {offs} @
414 4 + asm-C, ( mod/rm ) \ mop
415 DUP 2DUP \ mop mop mop mop
416 {sib-scale} @ 6 LSHIFT \ mop mop mop sib
417 SWAP {sib-index} @ 3 LSHIFT + \ mop mop sib
418 SWAP {sib-base} @ + \ mop sib
420 {offs} @ ?DUP IF ar?, THEN ;
422 \ This word is the main work horse of the assembler. It decides whether or not
423 \ to use sib addressing and calls the special compiler words for these cases.
424 : ((mod/rm)), ( mop rop -- )
431 \ The 386 can handle combinations with at least one memory operand. The
432 \ decision whether this memory operand is the source or the destination of an
433 \ operation is done by the opcode. This word is called with the value TRUE
434 \ when the memory operand is the source operand after the decision
435 \ produced this flag and compiled the opcode.
436 : (mod/rm), ( source-is-rm -- )
437 DUP INVERT SWAP ((mod/rm)), ;
439 \ In the reset-state, the sizes of both operands are unknown and no operands
440 \ have been accepted yet.
446 FALSE TO (byte-offs) ;
448 \ All tools for creating assembler operation words are ready so we can start
449 \ with the actual work.
451 \ The mov operation is a good example how such operation words are written. At
452 \ first the checks for valid number of operands and valid sizes are performed.
453 \ Then all supported register/memory combinations are compared with the
454 \ parameters and the right version is assembled then. If no combination can be
455 \ found, complain about it and leave. After successful assembly reset the
460 ts@ isreg? td@ isr/m? AND IF ( mov r/m, r )
461 $$ 88 (w+) asm-c, FALSE (mod/rm),
463 ts@ isr/m? td@ isreg? AND IF ( mov r, r/m )
464 $$ 8A (w+) asm-c, TRUE (mod/rm),
466 td@ isreg? ts@ isimm? AND IF
467 (wrd) 0<> IF $$ B8 ELSE $$ B0 THEN
469 (wrd) 0<> IF ar?, ELSE asm-c, THEN
471 ts@ isimm? td@ isr/m? AND IF
474 RT-REG OF FALSE $$ C0 rd@ + ENDOF
475 RT-ABS OF TRUE 5 ENDOF
476 RT-INDEX OF TRUE $$ 80 rd@ + ENDOF
477 ." Can't address by ##. Use #[]. " ABORT
479 asm-c, IF od@ asm-, THEN
482 (unknown-combination)
486 \ Compile a jmp operation
489 ts@ RT-IMMED = IF ( jmp 42 )
493 ts@ isr/m? ss@ SZ-8 <> AND IF
494 ss@ SZ-32 <> (wrong-size)
495 $$ FF asm-c, 4 ss@ (#reg) TRUE (mod/rm),
497 (unknown-combination)
501 \ Compile a conditional near (32 bit relative) jump
502 : (n-jcc,) ( opcode addr len -- )
505 $$ 0F asm-c, asm-c, os@ asm-here 4 + -
508 (unknown-combination)
511 : n-ja, $$ 87 S" n-ja," (n-jcc,) ;
512 : n-jae, $$ 83 S" n-jae," (n-jcc,) ;
513 : n-jb, $$ 82 S" n-jb," (n-jcc,) ;
514 : n-jbe, $$ 86 S" n-jbe," (n-jcc,) ;
515 : n-jc, $$ 82 S" n-jc," (n-jcc,) ;
516 : n-je, $$ 84 S" n-je," (n-jcc,) ;
517 : n-jg, $$ 8F S" n-jg," (n-jcc,) ;
518 : n-jge, $$ 8D S" n-jge," (n-jcc,) ;
519 : n-jl, $$ 8C S" n-jl," (n-jcc,) ;
520 : n-jle, $$ 8E S" n-jle," (n-jcc,) ;
521 : n-jna, $$ 86 S" n-jna," (n-jcc,) ;
522 : n-jnae, $$ 82 S" n-jnae," (n-jcc,) ;
523 : n-jnb, $$ 83 S" n-jnb," (n-jcc,) ;
524 : n-jnbe, $$ 87 S" n-jnbe," (n-jcc,) ;
525 : n-jnc, $$ 83 S" n-jnc," (n-jcc,) ;
526 : n-jne, $$ 85 S" n-jne," (n-jcc,) ;
527 : n-jng, $$ 8E S" n-jng," (n-jcc,) ;
528 : n-jnge, $$ 8C S" n-jnge," (n-jcc,) ;
529 : n-jnl, $$ 8D S" n-jnl," (n-jcc,) ;
530 : n-jnle, $$ 8F S" n-jnle," (n-jcc,) ;
531 : n-jno, $$ 81 S" n-jno," (n-jcc,) ;
532 : n-jnp, $$ 8B S" n-jnp," (n-jcc,) ;
533 : n-jns, $$ 89 S" n-jns," (n-jcc,) ;
534 : n-jnz, $$ 85 S" n-jnz," (n-jcc,) ;
535 : n-jo, $$ 80 S" n-jo," (n-jcc,) ;
536 : n-jp, $$ 8A S" n-jp," (n-jcc,) ;
537 : n-jpe, $$ 8A S" n-jpe," (n-jcc,) ;
538 : n-jpo, $$ 8B S" n-jpo," (n-jcc,) ;
539 : n-js, $$ 88 S" n-js," (n-jcc,) ;
540 : n-jz, $$ 84 S" n-jz," (n-jcc,) ;
549 IF ." Call address must be 32 bit" ABORT THEN
550 ts@ RT-IMMED = IF ( call 42 )
555 $$ FF asm-c, 2 SZ-32 (#reg) TRUE (mod/rm),
557 (unknown-combination)
561 \ Compile a single byte instruction
562 : <single-byte> ( byte -- )
565 : aaa, $$ 37 S" aaa," <single-byte> ;
566 : aas, $$ 3F S" aas," <single-byte> ;
567 : clc, $$ F8 S" clc," <single-byte> ;
568 : cld, $$ FC S" cld," <single-byte> ;
569 : cmc, $$ F5 S" cmc," <single-byte> ;
570 : cdq, $$ 99 S" cdq," <single-byte> ;
571 : cmpsb, $$ A6 S" cmpsb," <single-byte> ;
572 : cmpsd, $$ A7 S" cmpsd," <single-byte> ;
573 : daa, $$ 27 S" daa," <single-byte> ;
574 : das, $$ 2F S" das," <single-byte> ;
575 : movsb, $$ A4 S" movsb," <single-byte> ;
576 : movsd, $$ A5 S" movsd," <single-byte> ;
577 : movs, $$ A5 S" movs," <single-byte> ;
578 : lodsb, $$ AC S" lodsb," <single-byte> ;
579 : lodsd, $$ AD S" lodsd," <single-byte> ;
580 : nop, $$ 90 S" nop," <single-byte> ;
581 : repne, $$ F2 S" repne," <single-byte> ;
582 : repnz, $$ F2 S" repnz," <single-byte> ;
583 : repz, $$ F3 S" repz," <single-byte> ;
584 : popf, $$ 9D S" popf," <single-byte> ;
585 : ret, $$ C3 S" ret," <single-byte> ;
586 : pushf, $$ 9C S" pushf," <single-byte> ;
587 : rep, $$ F3 S" rep," <single-byte> ;
588 : repe, $$ F3 S" repe," <single-byte> ;
589 : scasb, $$ AE S" scasb," <single-byte> ;
590 : scasd, $$ AF S" scasd," <single-byte> ;
591 : stosd, $$ AB S" stosd," <single-byte> ;
592 : stc, $$ F9 S" stc," <single-byte> ;
593 : std, $$ FD S" std," <single-byte> ;
594 : xlat, $$ D7 S" xlat," <single-byte> ;
595 : stosb, $$ AA S" stosb," <single-byte> ;
596 : sahf, $$ E9 S" sahf," <single-byte> ;
597 : WORD: $$ 66 S" WORD:" <single-byte> ;
598 : wait, $$ 9B S" wait," <single-byte> ;
600 \ compile an alu-operation
601 : <alu> ( eax,i32 r32,i32 col r,rm -- )
605 ts@ isimm? IF \ eax,i32 r32,i32 col r,rm
606 DROP \ eax,i32 r32,i32 col
607 td@ isreg? rd@ 0= AND IF
608 2DROP (w+) asm-c, os@ ar?,
610 td@ isr/m? IF \ eax,i32 r32,i32 col
611 ROT DROP \ r32,i32 col
612 SWAP (w+) asm-c, \ col
618 (unknown-combination)
620 ELSE \ eax,i32 r32,i32 col r,rm
622 td@ isr/m? ts@ isreg? AND IF
626 td@ isreg? ts@ isr/m? AND IF
630 (unknown-combination)
634 : adc, $$ 14 $$ 80 $$ 02 $$ 10 S" adc," <alu> ;
635 : add, $$ 04 $$ 80 $$ 00 $$ 00 S" add," <alu> ;
636 : and, $$ 24 $$ 80 $$ 04 $$ 20 S" and," <alu> ;
637 : cmp, $$ 3C $$ 80 $$ 07 $$ 38 S" cmp," <alu> ;
638 : or, $$ 0C $$ 80 $$ 01 $$ 08 S" or," <alu> ;
639 : sbb, $$ 1C $$ 80 $$ 03 $$ 18 S" sbb," <alu> ;
640 : sub, $$ 2C $$ 80 $$ 05 $$ 28 S" sub," <alu> ;
641 : test, $$ A8 $$ F6 $$ 00 $$ 84 S" test," <alu> ;
642 : xor, $$ 34 $$ 80 $$ 06 $$ 30 S" xor," <alu> ;
645 : <inc>, ( column -- )
647 ts@ isreg? ss@ SZ-32 = AND IF
648 8 * $$ 40 + rs@ + asm-c,
650 ts@ isr/m? ts@ RT-ABS = OR IF
651 $$ FE (w+) asm-c, SZ-32 (#reg)
654 (unknown-combination)
661 : sreg=ecx/cl? ( -- flag )
662 ts@ isreg? rs@ 1 = AND ;
665 : <shift>, ( column -- )
668 sd@ SZ-32 = IF 1 ELSE 0 THEN TO (wrd)
669 ts@ isimm? td@ isr/m? AND os@ 1 = AND IF
675 ts@ isimm? td@ isr/m? AND IF
682 sreg=ecx/cl? td@ isr/m? AND IF
688 (unknown-combination)
701 \ compile a mul, div, neg or not
702 : <mul/neg>, ( column -- )
703 S" mul/neg/div" need1op
704 ts@ isr/m? ss@ SZ-UNKNOWN <> AND IF
705 $$ F6 (w+) asm-c, SZ-32 (#reg) TRUE (mod/rm),
707 (unknown-combination)
711 : not, 2 <mul/neg>, ;
712 : neg, 3 <mul/neg>, ;
713 : mul, 4 <mul/neg>, ;
714 : imul, 5 <mul/neg>, ;
715 : div, 6 <mul/neg>, ;
716 : idiv, 7 <mul/neg>, ;
718 \ produce an exchange operation
720 S" xchg," need2op (check-sizes)
721 ts@ isreg? rs@ 0= AND ss@ SZ-32 = AND
722 td@ isreg? AND IF ( xchg eax, reg)
725 $$ 86 (w+) asm-c, ts@ ismem? (mod/rm),
730 : <push/pop> ( m32 col rd -- )
733 IF ." push, and pop, only work on DWORDs." ABORT THEN
734 ts@ ismem? IF \ m32 col rd
736 SZ-32 (#reg) asm-c, TRUE (mod/rm),
738 ts@ isreg? IF \ m32 col
742 (unknown-combination)
743 THEN THEN (asm-reset) ;
745 : push, $$ FF 6 $$ 50 <push/pop> ;
753 $$ 8F 0 $$ 58 <push/pop>
760 IF ." setcc, requires a byte operand." ABORT THEN
762 $$ 0f asm-c, asm-c, 0 SZ-32 (#reg) TRUE (mod/rm),
764 (unknown-combination)
768 : seta, $$ 97 <setcc> ;
769 : setae, $$ 93 <setcc> ;
770 : setb, $$ 92 <setcc> ;
771 : setbe, $$ 96 <setcc> ;
772 : setc, $$ 92 <setcc> ;
773 : sete, $$ 94 <setcc> ;
774 : setg, $$ 9F <setcc> ;
775 : setge, $$ 9D <setcc> ;
776 : setl, $$ 9C <setcc> ;
777 : setle, $$ 9E <setcc> ;
778 : setna, $$ 96 <setcc> ;
779 : setnae, $$ 92 <setcc> ;
780 : setnb, $$ 93 <setcc> ;
781 : setnbe, $$ 97 <setcc> ;
782 : setnc, $$ 93 <setcc> ;
783 : setne, $$ 95 <setcc> ;
784 : setng, $$ 9E <setcc> ;
785 : setnge, $$ 9C <setcc> ;
786 : setnl, $$ 9D <setcc> ;
787 : setnle, $$ 9F <setcc> ;
788 : setno, $$ 91 <setcc> ;
789 : setnp, $$ 9B <setcc> ;
790 : setns, $$ 99 <setcc> ;
791 : setnz, $$ 95 <setcc> ;
792 : seto, $$ 90 <setcc> ;
793 : setp, $$ 9A <setcc> ;
794 : setpe, $$ 9A <setcc> ;
795 : setpo, $$ 9B <setcc> ;
796 : sets, $$ 98 <setcc> ;
797 : setz, $$ 94 <setcc> ;
799 \ The local label mechanism is quite simple but useful. The number passed to
800 \ jcond, is the number of the local label which can be used either as a
801 \ forward or backward jump label. Due to space constraints only one forward
802 \ jump can be used for one label, but an unlimited number of backward jumps to
803 \ this label. It is possible to use a label for a fwd jump first and then for
804 \ backward jumps. If you need more than one forward jump to the same place,
805 \ use different labels.
806 CREATE loclabel-tab MAXLOCALLABEL CELLS ALLOT
808 \ The given label number is checked and complained about if wrong.
809 : (chk-label-ind) ( ind -- )
810 MAXLOCALLABEL < INVERT
811 IF ." Label number too high." ABORT THEN ;
813 \ Provide simple access to the labels.
814 : >label ( label -- addr )
816 CELLS loclabel-tab + ;
818 \ An address of 0 for a label means that the label is not used yet. At the
819 \ start of each local label scope this state has to be set.
820 : reset-labels ( -- )
825 : save-labels ( -- labels n )
830 : restore-labels ( labels n -- )
833 MAXLOCALLABEL 1- I - >label !
836 \ Declare a local label. If the label is a forward jump calculate the offset,
837 \ check it and store it in the appropiate place.
839 DUP >label @ 0<> IF ( fwd jmp )
840 DUP >label @ \ label dst-addr
841 asm-here OVER 1+ - \ label dst-addr abs-dist
843 IF ." Jump out of bounds." ABORT THEN \ label dst-addr abs-dist
846 asm-here SWAP >label ! ;
848 \ Change the short-branch-target to the give address.
849 : change-$: ( addr label -- )
852 \ Compile a conditional jump.
853 : <jcc>, ( label opcode -- )
857 0= IF ( fwd jmp ) \ label
858 asm-here SWAP \ here label
861 ELSE ( bwd jmp) \ label
862 asm-here 1+ SWAP >label @ \ dst orig
864 IF ." Jump out of bounds." ABORT THEN \ abs-dist
868 : jae, $$ 73 <jcc>, ;
870 : jbe, $$ 76 <jcc>, ;
872 : jcxz, $$ E3 <jcc>, ;
875 : jge, $$ 7D <jcc>, ;
878 : jle, $$ 7E <jcc>, ;
879 : jnle, $$ 7F <jcc>, ;
880 : jna, $$ 76 <jcc>, ;
881 : jno, $$ 71 <jcc>, ;
882 : jnae, $$ 72 <jcc>, ;
883 : jnp, $$ 7B <jcc>, ;
884 : jnb, $$ 73 <jcc>, ;
885 : jns, $$ 79 <jcc>, ;
886 : jnbe, $$ 77 <jcc>, ;
887 : jnz, $$ 75 <jcc>, ;
888 : jnc, $$ 73 <jcc>, ;
890 : jne, $$ 75 <jcc>, ;
892 : jng, $$ 7E <jcc>, ;
893 : jpe, $$ 7A <jcc>, ;
894 : jnge, $$ 7C <jcc>, ;
895 : jpo, $$ 7B <jcc>, ;
896 : jnl, $$ 7D <jcc>, ;
899 : loopne, $$ E0 <jcc>, ;
900 : loopnz, $$ E0 <jcc>, ;
901 : loopz, $$ E1 <jcc>, ;
902 : loop, $$ E2 <jcc>, ;
903 : loope, $$ E1 <jcc>, ;
904 \ Uncondition short jump.
905 : jmpn, $$ EB <jcc>, ;
907 \ ------------------------------------------------------------------------------
908 \ ---------------------------- floating point words ----------------------------
909 \ ------------------------------------------------------------------------------
921 \ Check the given FPU register for valid index.
923 0 8 WITHIN INVERT IF ." Invalid FP-Stack register." ABORT THEN ;
925 \ Compile an FPU operation requiring a mod/rm parameter.
926 : <fop-mod/rm>, ( col op addr len -- )
928 SZ-32 (#reg) TRUE (mod/rm),
931 \ Compile an FPU operation without or with implicit parameters.
932 : <fop>, ( oc1 oc2 addr len -- )
937 \ Compile an FPU operation with one FPU register as parameter.
938 : <fopst>, ( st oc1 oc2 addr len -- )
939 need0op PLUCK st-range
943 \ Store operations in different formats.
944 : fst32, 2 $$ D9 S" fst32," <fop-mod/rm>, ;
945 : fst64, 2 $$ DD S" fst64," <fop-mod/rm>, ;
946 : fstp32, 3 $$ D9 S" fstp32," <fop-mod/rm>, ;
947 : fstp64, 3 $$ DD S" fstp64," <fop-mod/rm>, ;
948 : fstp80, 7 $$ DB S" fstp80," <fop-mod/rm>, ;
949 : fist16, 2 $$ DF S" fist16," <fop-mod/rm>, ;
950 : fist32, 2 $$ DB S" fist32," <fop-mod/rm>, ;
951 : fistp16, 3 $$ DF S" fistp16," <fop-mod/rm>, ;
952 : fistp32, 3 $$ DB S" fistp32," <fop-mod/rm>, ;
953 : fistp64, 7 $$ DF S" fistp64," <fop-mod/rm>, ;
954 : fbstp, 6 $$ DF S" fbstp," <fop-mod/rm>, ;
956 \ Loads in different formats.
957 : fld32, 0 $$ D9 S" fld32," <fop-mod/rm>, ;
958 : fld64, 0 $$ DD S" fld64," <fop-mod/rm>, ;
959 : fld80, 5 $$ DB S" fld80," <fop-mod/rm>, ;
960 : fild16, 0 $$ DF S" fild16," <fop-mod/rm>, ;
961 : fild32, 0 $$ DB S" fild32," <fop-mod/rm>, ;
962 : fild64, 5 $$ DF S" fild64," <fop-mod/rm>, ;
963 : fbld, 4 $$ DF S" fbld," <fop-mod/rm>, ;
965 \ Other operations to memory.
966 : frstor, 4 $$ DD S" frstor," <fop-mod/rm>, ;
967 : fnsave, 6 $$ DD S" fnsave," <fop-mod/rm>, ;
968 : fnstcw, 7 $$ D9 S" fnstcw," <fop-mod/rm>, ;
969 : fldcw, 5 $$ D9 S" fldcw," <fop-mod/rm>, ;
971 \ Calculations, comparing ops, etc.
972 : fchs, $$ D9 $$ E0 S" fchs," <fop>, ;
973 : fabs, $$ D9 $$ E1 S" fabs," <fop>, ;
974 : f2xm1, $$ D9 $$ F0 S" f2xm1," <fop>, ;
975 : fcos, $$ D9 $$ FF S" fcos," <fop>, ;
976 : fscale, $$ D9 $$ FD S" fscale," <fop>, ;
977 : fsin, $$ D9 $$ FE S" fsin," <fop>, ;
978 : fsincos, $$ D9 $$ FB S" fsincos," <fop>, ;
979 : fsqrt, $$ D9 $$ FA S" fsqrt," <fop>, ;
980 : ftst, $$ D9 $$ E4 S" ftst," <fop>, ;
981 : fxtract, $$ D9 $$ F4 S" fxtract," <fop>, ;
982 : fyl2x, $$ D9 $$ F1 S" fyl2x," <fop>, ;
983 : fyl2xp1, $$ D9 $$ F9 S" fyl2xp1," <fop>, ;
984 : fnstswax, $$ DF $$ E0 S" fnstswax," <fop>, ;
985 : fcompp, $$ DE $$ D9 S" fcompp," <fop>, ;
986 : fld1, $$ D9 $$ E8 S" fld1," <fop>, ;
987 : fldl2t, $$ D9 $$ E9 S" fldl2t," <fop>, ;
988 : fldl2e, $$ D9 $$ EA S" fldl2e," <fop>, ;
989 : fldpi, $$ D9 $$ EB S" fldpi," <fop>, ;
990 : fldlg2, $$ D9 $$ EC S" fldlg2," <fop>, ;
991 : fldln2, $$ D9 $$ ED S" fldln2," <fop>, ;
992 : fldz, $$ D9 $$ EE S" fldz," <fop>, ;
993 : fincstp, $$ D9 $$ F7 S" fincstp," <fop>, ;
994 : frndint, $$ D9 $$ FC S" frndint," <fop>, ;
995 : fxam, $$ D9 $$ E5 S" fxam," <fop>, ;
996 : fninit, $$ DB $$ E3 S" fninit," <fop>, ;
997 : fpatan, $$ D9 $$ F3 S" fpatan," <fop>, ;
998 : fprem, $$ D9 $$ F8 S" fprem," <fop>, ;
999 : fprem1, $$ D9 $$ F5 S" fprem1," <fop>, ;
1000 : fptan, $$ D9 $$ F2 S" fptan," <fop>, ;
1001 : ftst, $$ D9 $$ E4 S" ftst," <fop>, ;
1003 : fld, $$ D9 $$ C0 S" fld," <fopst>, ;
1004 : fmulp, $$ DE $$ C8 S" fmulp," <fopst>, ;
1005 : faddp, $$ DE $$ C0 S" faddp," <fopst>, ;
1006 : fsubp, $$ DE $$ E8 S" fsubp," <fopst>, ;
1007 : fdivp, $$ DE $$ F8 S" fdivp," <fopst>, ;
1008 : ffree, $$ DD $$ C0 S" ffree," <fopst>, ;
1009 : fxch, $$ D9 $$ C8 S" fxch," <fopst>, ;
1010 : fstp, $$ DD $$ D8 S" fstp," <fopst>, ;
1011 : fmul, $$ D8 $$ C8 S" fmul," <fopst>, ;
1012 : fmulr, $$ DC $$ C8 S" fmulr," <fopst>, ;
1013 : fcomp, $$ D8 $$ D8 S" fcomp," <fopst>, ;
1014 : fcom, $$ D8 $$ D0 S" fcom," <fopst>, ;
1015 \ Since I am much too lazy to write a special kind of assembler word for fsub
1016 \ I invented a mnemonic: fssub. It means: float-swap-subtract and is written
1017 \ as: fsub st(i), st.
1018 : fssub, $$ DC $$ E8 S" fssub," <fopst>, ;
1019 \ This is the normal subtraction operator: fsub st, st(i)
1020 : fsub, $$ D8 $$ E0 S" fsub," <fopst>, ;
1021 \ The reverse subtraction with pop.
1022 : fsubrp, $$ DE $$ E0 S" fsubrp," <fopst>, ;
1024 \ Compile a software interrupt.
1027 $$ CD asm-c, asm-c, (asm-reset) ;
1029 \ The load-effective-address operation. It is most useful with sib-addressing.
1032 $$ 8D asm-c, TRUE (mod/rm), (asm-reset) ;
1034 \ ==============================================================================
1035 \ =============================== meta assembler ===============================
1036 \ ==============================================================================
1038 \ The register allocator uses the registers eax, ebx, ecx, edx, esi and esi to
1039 \ cache the upper 6 items on the data stack. It is very time expensive to
1040 \ restore the correct possitions of all 6 registers especially if we assume
1041 \ that an average of 3 registers is cached. But caching only one register by
1042 \ default is never a bad idea.
1044 \ At the begin of each word, EAX caches TOS. Any other register is available.
1045 \ Using 32-bit-flat-memory-mode, only near calls are used. EBP is used as the
1046 \ data stack pointer. Therefore an offset is always nessesary. This offset
1047 \ is accumulated when moving data to and from the stack. It is assumed to be
1048 \ zero at the begin and end of each word, therefore EBP points to TOS.
1049 \ In conjunction with the (future) use
1050 \ of inlineable words, this saves one ADD/SUB EBP, 4 in every word using the
1051 \ stack, but introduces on ADD EBP,x in the return. This can be paired with the
1052 \ last store of the register, that have to be flushed to stack or the move
1053 \ to make EAX TOS or the RET itself.
1055 \ Using the registers as virtual TOS+x "stackrobatics" with less than seven
1056 \ registers do not cost any cycle, if no register has to be loaded before or
1057 \ flushed after. Only in case the op. request a special register in a special
1058 \ place, some mov, or xchg, is produced.
1061 \ : test ( a b c -- e )
1062 \ (1) ROT \ b c a 2 cycles for loading ebx, ecx
1063 \ (2) ROT \ c a b no cycle, just changing the
1064 \ the order while compiling
1065 \ (3) + \ c a+b one cycle for the add
1066 \ (4) SWAP \ a+b c no cycle
1067 \ (5) 2* 2* \ a+b 4*c 2 cycles for the shifts
1068 \ (6) + \ a+b+4*c 1 cycle for the add
1069 \ (7) ; 2 cycles for flush + bp ofs
1071 \ In line (1) EBX and ECX are loaded, because ROT asks for 3 used registers.
1072 \ In line (2) only the tables in the compiler are changed, but no code is
1074 \ Lines (3) and (5) perform their calculations by producing one ADD and
1076 \ Line (4) produces no code, just the tables are changed.
1077 \ In Line (7) the consistent state must be accomplished, therefore
1078 \ one register has to be flushed (and EAX may be loaded from an other register, I
1079 \ haven't
tracked this) and EBP has to be increased by 8.
1081 \
The registers need numbers
to identify them
.
1082 0 CONSTANT VREG-EAX \
virtual register numbers
for
1083 1 CONSTANT VREG-ECX \
register allocator
1089 -1 CONSTANT REG-NONE
1091 \
Translate vreg to reg using
a look
-up table
.
1092 CREATE ((vreg>reg
)) 0 , 1 , 2 , 3 , 6 , 7 ,
1093 : (vreg>reg
) ( vreg -- reg
)
1095 IF ." Called (vreg>reg) with invalid register." ABORT THEN
1096 CELLS ((vreg>reg
)) + @ ;
1098 \
This array contains
the state of the register allocator. The cell i contains
1099 \
the number of the register caching TOS+i
.
1100 CREATE tos
-cache
#USEREGS CHARS ALLOT
1101 : cache
! ( reg ind
-- )
1102 CHARS tos
-cache
+ C! ;
1103 : cache
@ ( ind
-- reg
)
1104 CHARS tos
-cache
+ C@ ;
1106 \
This array contains
the numbers
of registers that were
marked free.
1107 CREATE free-cache
#USEREGS CHARS ALLOT
1108 : free! ( reg ind
-- )
1109 CHARS free-cache
+ C! ;
1110 : free@ ( ind
-- reg
)
1111 CHARS free-cache
+ C@ ;
1113 \
Number of items
in tos
-cache
1116 \
Increase #tos-cache
1117 : (#tc++) #tos-cache 1+ TO #tos-cache ;
1119 \
Number of items
in free-cache
1123 : vreg>name
( vreg -- addr len
)
1125 VREG-EAX OF S" eax" ENDOF
1126 VREG-EBX OF S" ebx" ENDOF
1127 VREG-ECX OF S" ecx" ENDOF
1128 VREG-EDX OF S" edx" ENDOF
1129 VREG-ESI OF S" esi" ENDOF
1130 VREG-EDI OF S" edi" ENDOF
1134 \
Mark a register as requested with free.
1135 : (mark
-free) ( vreg -- )
1136 #free-req free! #free-req 1+ TO #free-req ;
1138 \
Check if vreg is marked free.
1139 : (#marked) ( vreg -- flag )
1140 #free-req 0 ?DO \ vreg
1142 DROP TRUE UNLOOP EXIT
1147 \
Number of consequtively
requested registers in compiler
1150 \
Increase #reg-req .
1151 : (#rr++) #reg-req 1+ TO #reg-req ;
1153 \
Accumulated offset to ebp
1156 \
Maintain a offs
-ebp
to be within +/- 124 to fit
into a byte.
1157 : (add-ebp
) ( n -- )
1159 DUP ABS 124 >= IF \
no
1165 \
Save the state of the allocator on the stack. No code
is produced.
1166 : save
-allocator ( -- allocator )
1167 #USEREGS 0 DO I cache@ LOOP
1168 #USEREGS 0 DO I free@ LOOP
1169 #tos-cache #free-req offs-ebp ;
1171 \
Restore the state of the allocator from the stack. No code
is produced.
1172 : restore
-allocator ( allocator -- )
1174 TO #free-req TO #tos-cache
1175 #USEREGS 0 DO #USEREGS 1- I - free! LOOP
1176 #USEREGS 0 DO #USEREGS 1- I - cache! LOOP ;
1178 \
Print the state of the allocator.
1185 #tos-cache 1- I - cache@ vreg>name TYPE ." "
1193 #free-req 1- I - free@ vreg>name TYPE ." "
1195 THEN ." offs: " offs
-ebp
. CR ;
1197 \
Check whether
vreg is used
1198 : (#used) ( vreg -- flag )
1201 DROP UNLOOP TRUE EXIT
1206 \ find
the vreg in current requested cache
slot
1207 : tc
(#rr) ( -- vreg )
1210 \
check whether enough
registers are in use
1211 : (#enough) ( -- flag )
1212 #reg-req #tos-cache < ;
1214 \
load the register vreg into current requested slot
1215 : (#load) ( vreg -- )
1216 BOFFS offs
-ebp
#tos-cache CELLS + [ebp]
1217 DUP (vreg>reg
) SZ-32 (#reg) mov,
1219 #tos-cache 1+ TO #tos-cache ;
1221 \ find
the cache-slot vreg is in. Return -1 for non
-cached
register
1222 : (#find) ( vreg -- nr )
1230 \ exchange
the meanings
and the contents of the regs
1231 \
free(vreg1
) used(vreg2
) -> mov vreg1
, vreg2
1232 \
used(vreg1
) free(vreg2
) -> mov vreg2
, vreg1
1233 \
else -> xchg vreg1
, vreg2
1234 : tc
-xchg
( vreg1 vreg2
-- )
1235 2DUP = IF 2DROP EXIT THEN
1236 DUP (#used) INVERT IF ( vreg2 free ) \ vreg1 vreg2
1238 (vreg>reg
) SZ-32 (#reg)
1239 (vreg>reg
) SZ-32 (#reg) mov, \ vreg1 vreg2
1240 SWAP (#find) \ vreg2 tos1
1243 OVER (#used) INVERT IF ( vreg1 free ) \ vreg1 vreg2
1245 (vreg>reg
) SZ-32 (#reg)
1246 (vreg>reg
) SZ-32 (#reg) mov, \ vreg1 vreg2
1247 (#find) \ vreg1 tos2
1251 (vreg>reg
) SZ-32 (#reg)
1252 (vreg>reg
) SZ-32 (#reg) xchg, \ vreg1 vreg2
1253 2DUP (#find) SWAP (#find) \ vreg1 vreg2 tos2 tos1
1254 ROT SWAP \ vreg1 tos2 vreg2 tos1
1259 \
Set the bit vreg in mask
, if vreg is cached.
1260 \
This word has an environmental
dependency. It assumes, that one cell
1261 \
has more than #USEREGS bits.
1262 : (cached-mask
) ( -- mask
)
1268 \
Find the first unrequested
register in cache or REG-NONE if all
are used.
1269 : (unrequested
) ( -- vreg )
1275 DUP 1 I LSHIFT AND 0= IF
1281 \ flush
the lowest cache slot to memory and return the free register
1282 : (flushreg
) ( -- vreg )
1283 #tos-cache 1- cache@ DUP \ vreg vreg
1284 ( mov [ebp
+ offs
+ {#tc-1}*4], vreg )
1285 (vreg>reg
) SZ-32 (#reg)
1286 BOFFS #tos-cache 1- CELLS offs-ebp + [ebp] mov,
1287 #tos-cache 1- TO #tos-cache ;
1289 \ request
a virtual register by number
1290 \
n is tos + #reg-req
1292 \
--------------------------------------------------------
1293 \
free(vreg) & #rr<#tc xchg(vreg,tc(#rr))
1294 \
free(vreg) & #rr=#tc load(vreg)
1295 \
used(vreg) & #rr<#tc xchg(vreg,tc(#rr))
1296 \
used(vreg) & #rr=#tc & free(vreg2) load(vreg2), xchg(vreg2,vreg)
1297 \
used(vreg) & #rr=#tc & used(vreg2) flush(#ur-1), xchg( #ur-1, vreg)
1298 : (#req) ( vreg -- )
1300 DUP (#used) IF \ vreg
1303 ELSE ( req
. reg
in use ) \
vreg
1304 (unrequested
) \
vreg vreg2
1305 DUP REG-NONE = IF ( none
free )
1306 DROP (flushreg
) \
vreg vreg2
1307 ELSE ( 1 free found )
1316 (unrequested
) \
vreg vreg2
1327 : req
-eax VREG-EAX (#req) ;
1328 : req
-ebx
VREG-EBX (#req) ;
1329 : req
-ecx
VREG-ECX (#req) ;
1330 : req
-edx
VREG-EDX (#req) ;
1331 : req
-edi
VREG-EDI (#req) ;
1332 : req
-esi
VREG-ESI (#req) ;
1334 \
Is register eax, ebx
, ecx
or edx
?
1335 : (is-a-d
) ( vreg -- flag )
1339 SWAP VREG-EDX = OR ;
1341 \
Request the register if unused
.
1342 : (#req-unused) ( vreg -- ok? )
1343 DUP (#used) INVERT DUP \ vreg ok? ok?
1344 IF SWAP (#req) ELSE NIP THEN ;
1346 \
Request any one of eax-edx
. Check if one of them
is not used. If so request
1347 \
it else check all
other requested below
if they
are eax-edx
. If so request
1351 #reg-req cache@ (is-a-d) IF
1355 VREG-EAX (#req-unused) IF EXIT THEN
1356 VREG-ECX (#req-unused) IF EXIT THEN
1357 VREG-EDX (#req-unused) IF EXIT THEN
1358 VREG-EBX (#req-unused) IF EXIT THEN
1359 #tos-cache #reg-req ?DO
1360 I cache@ DUP (is-a-d
) \
vreg general
?
1361 IF (#req) UNLOOP EXIT ELSE DROP THEN
1363 ." Can't request that many general registers." ABORT ;
1365 \
request any virtual register
1368 (unrequested
) \
vreg
1369 \
DUP ." req-any: " . CR
1373 \
request any BUT the register vreg
1374 \
n is tos + #reg-req
1376 \
--------------------------------------------------------
1377 \ enough
, n<>vreg ---
1378 \ enough
, n=vreg, free(vreg2
) xchg
(vreg,vreg2
)
1379 \ enough
, n=vreg, none
free, #rr<#ur-1 xchg(vreg,#ur-1)
1380 \ enough
, n=vreg, none
free, #rr=#ur-1 xchg(vreg,0)
1381 \
not enough, used(vreg) load(vreg2
)
1382 \
not enough, free(vreg), free(vreg2
) load(vreg2
)
1383 \
not enough, free(vreg), used(vreg2
) xchg
(vreg,vreg2
), load(vreg2
)
1384 : (xchg
-not) ( vreg -- )
1385 (unrequested
) \
vreg vreg2
1389 #reg-req #USEREGS 1- = IF
1397 : (#req-not) ( vreg -- )
1406 DUP (#used) IF \ vreg
1408 (unrequested
) DUP REG-NONE =
1412 (unrequested
) DUP REG-NONE <>
1416 TUCK \ vreg2
vreg vreg2
1422 : req
-not-eax VREG-EAX (#req-not) ;
1424 CREATE a-d
-table
VREG-EAX , VREG-EBX , VREG-ECX , VREG-EDX ,
1427 \ find
the first unmarked
register eax, ebx
, ecx
or edx
1428 \
return nr
or -1 if all
marked
1429 : (unmarked
-a-d
) ( -- vreg )
1431 forall
-a-d
DO \
addr
1432 DUP @ DUP (#marked) INVERT \ addr vreg mark
1437 LOOP DROP REG-NONE ;
1439 \ all
eax-edx
marked
1440 : (a-d
-marked) ( -- flag )
1441 (unmarked
-a-d
) REG-NONE = ;
1443 \
request a free register, but only eax, ebx
, ecx
or edx
1445 \
----------------------------------------------
1447 \
a-d req
, s
-d req flush
1449 \
a-d req
, s
-d unreq swap
, mark
1453 IF ." Can't request this many general registers." ABORT THEN
1455 DUP REG-NONE = IF \
vreg
1456 DROP (flushreg
) \
vreg
1458 DUP (is-a-d
) IF \
vreg
1461 (unmarked
-a-d
) \
vreg-s
-d
vreg-a-d
1462 TUCK tc
-xchg
(mark
-free)
1465 \
request a free register
1467 \
----------------------------------------------
1469 \ all
requested error
1470 \ all
cached flush
, mark
1471 \ uncached
(vreg) mark
1472 : req
-free ( -- ) ( OK )
1473 #free-req #reg-req +
1474 #USEREGS = IF ." All registers requested." ABORT THEN
1475 (unrequested
) \
vreg
1477 DROP (flushreg
) \
vreg
1481 \
request a free register by number vreg TODO more efficient
1483 \
----------------------------------------------------
1484 \
marked(vreg) error
1486 \ unmarked
(vreg), cached(vreg), unused
(v2
) swap
, mark
1487 \ unmarked
(vreg), all
cached, vreg=#ur-1 flush, mark
1488 \ unmarked
(vreg), all
cached, vreg<>#ur-1 flush, swap, mark
1489 : (#req-free) ( vreg -- )
1490 DUP (#marked) IF ." Register is already marked." ABORT THEN
1491 DUP (#used) INVERT IF
1494 (unrequested
) \
vreg vreg2
1495 2DUP = IF (internal
-error) THEN \
vreg vreg2
1496 DUP REG-NONE <> IF \
vreg vreg2
1497 SWAP TUCK tc
-xchg \
vreg
1501 DUP (#find) \ vreg nr
1502 DUP -1 = IF (internal
-error) THEN
1503 #USEREGS 1- OVER = IF
1504 (flushreg
) \
vreg vreg
1505 2DUP <> IF (internal
-error) THEN
1508 (flushreg
) \
vreg vreg2
1509 SWAP TUCK tc
-xchg
(mark
-free)
1514 : free-eax VREG-EAX (#req-free) ;
1515 : free-ecx
VREG-ECX (#req-free) ;
1516 : free-edx
VREG-EDX (#req-free) ;
1517 : free-edi
VREG-EDI (#req-free) ;
1518 : free-esi
VREG-ESI (#req-free) ;
1520 \ swap
the vregs in tos+n1
and tos+n2
1521 : tos-swap
( n1 n2
-- )
1522 2DUP #reg-req < SWAP #reg-req < AND
1523 INVERT IF ." Too few registers requested." ABORT THEN
1525 cache@ SWAP cache@ \ n1 n2 r2 r1
1527 cache! SWAP cache! ;
1531 #tos-cache DUP 0= IF ." No register in cache." ABORT THEN
1532 1- DUP TO #tos-cache
1535 DUP 1+ cache@ \
end curr
vreg
1536 OVER cache! \
end curr
1541 \
free n times
the tos+0
1547 \ put
the register in free-cache+n on top
of stack
1549 DUP #free-req >= IF ." Too few free registers requested." ABORT THEN
1551 ( make
space for register )
1552 #tos-cache BEGIN \ vreg i
1555 DUP cache@ OVER 1+ cache!
1561 \
reset register allocator ( at
start of compiler word )
1562 : regalloc
-reset ( -- )
1563 0 TO #reg-req 0 TO #free-req reset-labels ;
1565 \ initialize
register allocator ( at
start of word compilation
)
1566 : regalloc
-init
( -- )
1571 \ create
a consistent state before
ret/call
1572 : regalloc
-flush
( -- )
1576 \ make
sure only reg
is loaded
1582 \ make
sure it is eax
1583 0 cache@ VREG-EAX <> IF
1584 0 cache@ VREG-EAX tc
-xchg
1588 offs
-ebp
## ebp add,
1592 \ flush all
registers to stack and correct
ebp
1593 : regalloc
-flushjmp
( -- )
1600 offs
-ebp ## ebp add,
1601 THEN 0 TO offs
-ebp ;
1603 \ flush all
except 2 registers to stack and correct
ebp
1604 : regalloc
-flush
-do ( -- )
1611 offs
-ebp 8 + ## ebp add,
1616 \ access
to meta-register
1618 DUP #tos-cache >= IF ." Request more registers." ABORT THEN
1619 cache@ (vreg>reg
) SZ-32 (#reg) ;
1628 : ([tosn
]) ( offs
n -- )
1629 DUP #tos-cache >= IF ." Request more registers." ABORT THEN
1630 cache@ (vreg>reg
) (#[reg]) ;
1632 : [tos0
] 0 ([tosn
]) ;
1633 : [tos1
] 1 ([tosn
]) ;
1634 : [tos2
] 2 ([tosn
]) ;
1635 : [tos3
] 3 ([tosn
]) ;
1636 : [tos4
] 4 ([tosn
]) ;
1637 : [tos5
] 5 ([tosn
]) ;
1640 DUP #free-req >= IF ." Request more free registers." ABORT THEN
1641 free@ (vreg>reg
) SZ-32 (#reg) ;
1650 : ([freen
]) ( offs
n -- )
1651 DUP #free-req >= IF ." Request more free registers." ABORT THEN
1652 free@ (vreg>reg
) (#[reg]) ;
1654 : [free0
] 0 ([freen
]) ;
1655 : [free1
] 1 ([freen
]) ;
1656 : [free2
] 2 ([freen
]) ;
1657 : [free3
] 3 ([freen
]) ;
1658 : [free4
] 4 ([freen
]) ;
1659 : [free5
] 5 ([freen
]) ;
1661 : (vreg>reg_l
) ( vreg -- )
1662 DUP (is-a-d
) INVERT IF ." Can't get lower part of edi or esi." ABORT THEN
1664 VREG-EAX OF reg
-al
ENDOF
1665 VREG-EDX OF reg
-dl
ENDOF
1666 VREG-ECX OF reg
-cl
ENDOF
1667 VREG-EBX OF reg
-bl
ENDOF
1670 : (vreg>reg_h
) ( vreg -- )
1671 DUP (is-a-d
) INVERT IF ." Can't get higher part of edi or esi." ABORT THEN
1673 VREG-EAX OF reg
-ah
ENDOF
1674 VREG-EDX OF reg
-dh
ENDOF
1675 VREG-ECX OF reg
-ch
ENDOF
1676 VREG-EBX OF reg
-bh
ENDOF
1679 \
n is the nr
of the 32 bit virtual register
1681 DUP #free-req >= IF ." Request more free registers." ABORT THEN
1682 free@ (vreg>reg_l
) ;
1685 DUP #free-req >= IF ." Request more free registers." ABORT THEN
1686 free@ (vreg>reg_h
) ;
1688 : free0l
0 (free_l
) ;
1689 : free1l
1 (free_l
) ;
1690 : free2l
2 (free_l
) ;
1691 : free3l
3 (free_l
) ;
1692 : free4l
4 (free_l
) ;
1693 : free5l
5 (free_l
) ;
1694 : free0h
0 (free_h
) ;
1695 : free1h
1 (free_h
) ;
1696 : free2h
2 (free_h
) ;
1697 : free3h
3 (free_h
) ;
1698 : free4h
4 (free_h
) ;
1699 : free5h
5 (free_h
) ;
1702 DUP #tos-cache >= IF ." Request more registers." ABORT THEN
1703 cache@ (vreg>reg_l
) ;
1706 DUP #tos-cache >= IF ." Request more registers." ABORT THEN
1707 cache@ (vreg>reg_h
) ;
1722 \
Scale index base addressing. All words are ( offs
-- )
1723 : [tos0
+tos1
] SC-1 0 cache@ (vreg>reg
) 1 cache@ (vreg>reg
) (#[sib]) ;
1724 : [4*tos0
+tos1
] SC-4 0 cache@ (vreg>reg
) 1 cache@ (vreg>reg
) (#[sib]) ;
1726 \
==============================================================================
1727 \
=============================== compiler support =============================
1728 \
==============================================================================
1730 \
Produce a near jump and put
the address of the cell with the distance on the
1731 \
stack. Due to the fact that these jumps
are relative no relocation is
1733 : fwd
-jmp
( xt
-- addr )
1734 0 ## EXECUTE asm-here 4 - ;
1736 \
Resolve the jump to this address.
1737 : resolve
-jmp
( fwd
-addr -- )
1738 asm
-here \
addr here
1739 over
4 + - \
addr rel
1742 \
Save the allocator state in the returned dyn
-array.
1743 : allocator-state ( addr -- )
1744 #tos-cache OVER C! CHAR+ \ addr
1745 #tos-cache 0 ?DO \ addr
1752 \
Set the bit with the number vreg for each
register vreg in the state.
1753 : (state-mask
) ( state #regs -- mask )
1754 0 -ROT 0 ?DO \ mask
state
1755 DUP I CHARS + C@ \ mask
state vreg
1756 1 SWAP LSHIFT \ mask
state vrmask
1757 ROT OR SWAP \ mask
state
1760 \
Find the lowest bit set in x
and return it's number.
1761 : lowest-bit ( x -- nr )
1763 1 I LSHIFT OVER AND \ x reg?
1767 LOOP DROP REG-NONE ;
1769 \ Try to load a register that is in state but not in cache. There must be at
1770 \ least one of them since (alloc-load) is called only with fewer regs in cache
1772 : ((alloc-load)) ( state #regs -- )
1773 ( find regs in cache )
1774 (cached-mask) \ state #regs cache-mask
1775 ( find regs in state )
1776 -ROT (state-mask) \ cache-mask state-mask
1777 ( leave all flags that are in state-mask AND NOT in cache-mask )
1778 SWAP INVERT AND \ load-mask
1779 DUP 0= IF (internal-error) THEN \ load-mask
1783 \ Load as many registers as nessesary. Try to use those that needed.
1784 : (alloc-load) ( state #regs -- state #regs )
1786 #tos-cache \ state #regs #tc
1787 OVER \ state #regs #tc #regs
1794 \ Flush some register till the same number as in state in reached.
1795 : (alloc-flush) ( #regs -- #regs )
1797 #tos-cache OVER \ #regs #tc #regs
1803 \ Exchange vreg and cache(ind) if nessesary.
1804 : (alloc-adjust) ( vreg ind -- )
1812 \ Perform sign extension from a byte to a cell.
1813 : (sign-extend) ( c -- n )
1814 DUP 128 AND \ c sign-bit
1815 0<> 255 INVERT AND OR
1818 \ Retrieve the save offset and correct ebp if nessesary. If save-flags? is
1819 \ true, the CPU flags are saved before and restored afterwards ( add could
1821 : (fix-offs) ( save-flags? state ind -- )
1822 CHARS + C@ (sign-extend) \ save? dest-offs
1825 ELSE \ save? dest-offs
1826 OVER IF pushf, THEN \ save? dest-offs
1833 \ Rebuild the allocator to the given state. The index of the user data in
1834 \ state is returned.
1836 \ 1. too few regs cached? -> load registers
1837 \ 2. too many regs cached? -> flush registers
1839 : allocator-rebuild ( save-flags? state -- )
1840 DUP C@ SWAP CHAR+ SWAP \ save? state #regs
1842 (alloc-flush) \ save? state #regs
1843 0 SWAP \ save? state ind #regs
1844 0 ?DO \ save? state ind
1845 2DUP CHARS + C@ \ save? state ind reg(ind)
1846 OVER (alloc-adjust) \ save? state ind
1848 LOOP \ save? state ind
1851 \ Store the state in the allocator without generating code.
1852 : allocator-store ( state -- )
1853 DUP C@ \ state #regs
1855 1 CHARS SWAP \ state ind #regs
1857 2DUP CHARS + C@ \ state ind vreg
1858 I cache! \ state ind
1861 + C@ (sign-extend) TO offs-ebp ;