URFORTH.me

   1 this is direct-threaded self-hosting x86 32-bit GNU/Linux Forth system.
   2
   3 UrForth level 1 doesn't need anything except UrForth itself to build the
   4 system from the sources. there is a prebuilt binary in this repository
   5 which can be used to bootstrap the system (with 'c.sh' shell script).
   6
   7 some notes about standards-compliance: i don't fuckin' care. pre-ANS
   8 standards are too restrictive, and ANS standard is fubared with idiotic
   9 "portability", and inability to get rid of obsolete concepts. this goes
  10 up to 2012 (and then); just look at `FIND`! and fuck it, i don't expect
  11 anybody (except me, of course) to use UrForth anyway.
  12
  13 also, about ANS "portability" crap: i strongly believe that default
  14 architecture should be 32-bit 2-complement one, without any forced data
  15 align rules. for non-conformant systems, an implementation should emulate
  16 the abovementioned arch. this represents most architectures out there,
  17 and if your arch is different, your Forth system should take care of that,
  18 running standards-compliant code without any changes. and library authors
  19 can avoid error-prone jumping through the hoops.
  20
  21 if you think that speed is more important than compatibility, there is
  22 always an option to ignore any standards.
  23
  24 also, i'm not planning to create 64-bit versions of UrForth. "64-bitness"
  25 is another thing i consider useless. and PIC code will not be supported.
  26
  27
  28 more ANS idiocity: 2! and 2@ are storing 64-bit numbers as
  29 big-dword-endian. i introduced "2!LE" and "2@LE", and left "standard"
  30 words intact.
  31
  32 also, "cell" was another idiotic word choice. does "c@" operate on cells,
  33 or on chars? i wonder if anybody there had even one working *brain* *cell*.
  34 (got it? got a joke? it's a great joke!)
  35
  36
  37 i added benchmark from BigForth. so far, UrForth is ~2.2 (2.1 with
  38 debugger) times slower than BigForth on it. of course, benchmarks only
  39 shows how good the system can pass a benchmark, but stil... it's not that
  40 bad, considering that BigForth is subroutine-threaded (i.e. generates
  41 native code), and somewhat optimised. also, it is only ~1.12 times slower
  42 that unoptimised SPF4.
  43
  44
  45 some advanced UrForth features, in no particular order:
  46
  47 * vocabularies has hastables for word names (256 bytes per vocabulary).
  48   this makes searches ~50-100 times faster:
  49     FORTH -- 799 words, 64 of 64 buckets used, 4 min items, 21 max items, average: 12 words per bucket
  50   considering that each word first checked for valid hash and length,
  51   the searcher usually does only one full string comparison.
  52   in other words: vocabulary searches are lightning fast.
  53   (ok, its not that advanced, because most Forth systems out there does
  54   that, but hey, it's my book, written by my rules!)
  55
  56 * number prefixes:
  57     $,#,0x,&h -- hex number (note that 2012 standard wants "#" for decimal)
  58     %,0b,&b -- binary number
  59     0o,&o -- octal number
  60     0d,&d -- decimal number
  61
  62 * number postfixes:
  63     nnnH -- hex
  64     nnnO -- octal
  65     nnnB -- binary (only for BASE<12)
  66
  67 * underscores in numbers are ignored:
  68     0x8000_00_00 is a valid number
  69
  70 * extended word search:
  71     you can use "a:b" to find word "b" in vocabulary "a".
  72     of course, "a:b:c" and such are allowed too.
  73     note that "colon access" will ignore "hidden" attribute. this is not a bug.
  74
  75 * UrForth has fully working BREAK and CONTINUE, they can be used in BEGIN and DO...LOOP.
  76     also, they know about CASE, so you can use BREAK/CONTINUE in OF/OTHERWISE clauses.
  77
  78 * BEGIN loops can contain arbitrary number of WHILE/NOT-WHILE parts,
  79   and they can be terminated with UNTIL/NOT-UNTIL even if WHILE is present.
  80   AGAIN is allowed too, for any kind of BEGIN loop.
  81
  82 * there is IFNOT in addition to IF. also, there are +IF and -IF, to check for
  83   positive and negative numbers respectively. zero is neither negative, nor
  84   positive. to include zero, use "+0IF" and "-0IF".
  85
  86 * the above positive/negative checks can be used with loops too ("+WHILE", and such).
  87
  88 * segfault handler will show you stack dump and backtrace.
  89
  90 * vocabularies supports public and hidden words. it is possible to create
  91   "nested" vocabulary, which will see all parent's words (including hidden ones).
  92
  93 * hidden words in "current" vocabulary are always visible. to access hidden words in
  94   other vocabularies, use "vocname:wordname" syntax.
  95
  96 * multiline comments: normal (* ... *), and nested (( ... ))
  97   nested multiline comment allows other nested comments
  98
  99 * x86 assembler with defered plug-in interfaces for memory r/w and label manager
 100   (can be used to create metacompilers, or even standalone assemblers).
 101   it is using normal intel syntax, not yoda-style. also, you don't need to
 102   separate operands with spaces, because assembler does its own input stream
 103   parsing.
 104
 105 * there are `[:` and `;]` to create cblocks. this feature can be used like this:
 106     : foreach-do ( cfa -- )  10 0 do i over execute loop drop ;
 107     : a  [: . cr ;] foreach-do ;
 108   internally, it compiles header-less word, and leaves its CFA on the stack.
 109
 110 * cblocks can be used in interpreter too, i.e. you can type this at REPL:
 111     [: 10 0 do i . cr loop ;] execute
 112   and it will work. and even this will work:
 113     [: ." hey!" cr [: 65 emit cr ;] execute ;] execute
 114   note that you cannot assign such cblocks to DEFERed words, because they
 115   will not live long enough.
 116   also, this feature is not really planned, it is just a side-effect of cblocks
 117   implementation. do not rely on it.
 118
 119 * internally, there is DP-TEMP variable. when it is 0, the normal DP is used
 120   for HERE. but when it is non-zero, all HERE-based words will use it instead.
 121   this is used to make cblocks working in interpreter, for example. no words
 122   except "HERE" and "N-ALLOT" are accessing "DP" directly.
 123
 124 * "OVERRIDE" word can be used to override Forth words with other Forth words.
 125   it works like this:
 126     : newdot  ( n old-xtoken )
 127       check-some-condition if
 128         OVERRIDE-EXECUTE
 129       else
 130         2drop
 131       endif
 132     ;
 133     OVERRIDE . newdot
 134    or
 135     ' . ' newdot (OVERRIDE)
 136
 137   note that overriden word can be called as usual, and override forces even
 138   previously compiled words to call `newdot`. also note that `newdot` cannot
 139   be called as normal forth word anymore (no checks are made, it will just
 140   make your system unusable due to UB).
 141
 142   currently, there is no way to "unoverride" the word. it may be added later.
 143
 144   also, remember that "old-xtoken" cannot be called with "EXECUTE". but you
 145   can freely pass "old-xtoken" around and call it with "OVERRIDE-EXECUTE" at
 146   any moment, and at any place.
 147
 148   this can be used to create things like metacompilers, for example, without
 149   duplicating all compiler word definitions again.
 150
 151   if you will override already overriden word, old override will be replaced
 152   with the new one (i.e. the overrides are not chained).
 153
 154   you can chain overrides like this, if you want to:
 155     0 value (prev-ovr)  (hidden)
 156     : ovr-new  ( ... xtoken -- )
 157       (prev-ovr) ?drop override-execute
 158     ;
 159     get-override . to (prev-ovr)
 160     override . ovr-new
 161
 162   note that this feature is highly experimental, and may change/disappear.
 163
 164 * the kernel includes "scattered colon" feature (invented by M. Gassanenko).
 165   most of initialisation and other chained things are implemented with it.
 166   also, there are interpreter and "to" scattered colon hooks.
 167   see "samples/scolon.f" for example code.
 168   scattered colons are great, you *WILL* miss them in other Forth systems! ;-)
 169
 170 * "REPLACE" word can be used to perform system-wide word replacement:
 171     replace oldword newword
 172   WARNING! you can replace any word with any other word (including constants,
 173   code words, and so on, on both sides; the desired effect is up to you). i.e.
 174   this will work:
 175     69 constant fuck
 176     : hell 666 ;
 177     replace fuck hell
 178     fuck 666 = .  ( true )
 179   note that aliases may be optimised to the original words, so replacing an alias
 180   may not work for compiled code.
 181
 182 * you can set dstack/rstack value with POKE/RPOKE. this is complementary to
 183   PICK/RPICK (and i aliased those to PEEK/RPEEK).
 184   usage is: `value index POKE`, where `index` is the same as in PICK.
 185
 186 * there is FOR ... ENDFOR that accepts limit, and iterates [0..limit). it is
 187   safe to use with negative or zero limit -- the loop will be skipped in this case.
 188   i.e. `3 FOR I . ENDFOR` will print "0 1 2", and `-1 FOR I . ENDFOR` will print
 189   nothing.
 190
 191 * interpreter extension mechanics:
 192   there are two scattered colon words:
 193     (INTERPRET-WFIND)      ( addr count -- cfa 1 // cfa -1 // addr count false )
 194     (INTERPRET-NOT-FOUND)  ( addr count -- true // addr count false )
 195
 196   "(interpret-wfind)" will be called to find a word.
 197   return `1` if cfa is immediate, `-1` if it is a normal word.
 198   if your chain handler succeeded, you MUST EXIT from it!
 199   you can leave various data on the stack if your handler returned immediate word.
 200
 201   "(interpret-not-found)" will be called when a word cannot be found, and cannot
 202   be converted to a number.
 203   if your chain handler succeeded, you MUST EXIT from it!
 204
 205
 206
 207 some notable differences from ANS:
 208
 209 * number parser does not understand decimal dot, nor floating exponent; i.e. you cannot
 210   directly input double numbers or floats. this is because i don't see much use in
 211   that, and you can write your own prefix words if you need to (as fp did with "f#").
 212 * TRUE is 1, not -1; there are LOGAND and LOGOR; NOT is logical, bitwise not is BITNOT.
 213 * TIB is still there, no ANS SOURCE and such.
 214 * TIB size is in variable #TIB; there's no need to finish TIB with any special byte. yet
 215   the parser will stop if it will see zero byte in TIB.
 216 * current tib line is in variable TIB-LINE# (if it is 0, no line counting and debug info).
 217 * do not read TIB manually, use "tib-peekch" and "tib-getch".
 218 * DOES> words use extra bytes after PFA.
 219 * there is [COMPILE], and no CHAR or ASCII (use [CHAR] instead).
 220 * [CHAR] will error on any string that is not one-char.
 221 * use WFIND ( addr count -- cfa -1 // cfa 1 // false ) instead of idiotic FIND.
 222 * COUNT expects cell-counted string; for byte-counted, use BCOUNT.
 223 * WORD returns cell-counted string (at HERE) (but don't use it, use PARSE and PARSE-NAME).
 224 * there is N-ALLOT ( size -- start-addr ) low-level word, which is used by ALLOT and
 225   others. it returns starting address of the allocated dictionary memory (which can
 226   be in transient DP-TEMP).
 227 * i don't fuckin' know what "address unit" is, and why it is necessary. so MOVE always
 228   works with bytes.
 229 * FIG-style "CFA->NFA" and such are still there.
 230 * S" always unescapes string, because i see no reason to not do it. also, terminating
 231   zero byte is always there (but it is not included in count).
 232   S" will use byte-counted string if it can (with a different LIT word).
 233 * ." cannot print strings longer than 255 chars ('cmon!).
 234 * there is FIG-style VAR and DEFERED is FIG-style (VARIABLE and DEFER are ANS)
 235 * PAD is not relative to HERE, it resides in separate memory area.
 236 * floating point words are supported only partially, via FPU+FPU stack; printing and
 237   parsing of floating point numbers are very inexact, and cannot be used for round-trips.
 238   this is something i'm planning to fix eventually (prolly by porting Ryu and plan9 parser).
 239   note: 32-bit floats seems to round-trip under the default FPU mode, but it's still better
 240   to not rely on that.
 241 * do not even try that ANS-allowed idiocity like "BEGIN ... WHILE ... UNTIL ... THEN".
 242   this is hard to read, and plainly stupid. in UrForth, you can have as much "WHILE"
 243   (or "NOT-WHILE") secions as you want to, and you can finish "BEGIN" with any of
 244   "REPEAT", "AGAIN", "UNTIL". there is no need for extra "THEN" (and the compiler will
 245   complain if you'll try to do such thing). also, there are "BREAK" and "CONTINUE" words.
 246 * `ABORT"` is unconditional abort. use `?ABORT"` instead -- this is more logical, and is
 247   in line with `?ERROR`. "use-lib: ans" will change that.
 248 * ALIAS will copy source word attributes ("hidden" and "immediate"). not copying attributes
 249   is another standard idiocity, and a source of bugs. i've never ever encountered a case
 250   where i don't want an alias to not be immediate when a source word is immediate, for example.
 251   "use-lib: ans" will change that.
 252
 253
 254
 255 FAQ ;-)
 256
 257 Q: i am calling functions from .so, and they are segfaulting at random!
 258 it doesn't happen with C code, UrForth has a bug there!
 259
 260 A: nope. what is happening is that your system is broken, and doesn't
 261 follow ABI. 32-bit ABI doesn't require the stack to be aligned in any
 262 particluar way (except being dword-aligned), but modern GCC not only
 263 aligns the stack at 16 bytes, but generates code that expects the stack
 264 to be always aligned like that. it is a violation of ABI, and a bug in
 265 GCC. rebuild your system with "-mstackrealign" GCC flag to fix it.
 266
 267 Q: but everybody else is happy to do what GCC du..evelopers command them
 268 to do! why can't you simply add stack aligning code to UrForth?!
 269
 270 A: because i see no reason to workaround GCC bugs in my code. the longer
 271 we will tolerate GCC ABI breakage, the longer it will last.
 272
 273 Q: what the fuck with your assembler code? what is "ld", "jr", "jp",
 274 "cp", and other shit?
 275
 276 A: i am sorry. i used to write for Z80, and i am too lazy to switch
 277 mnemonics. conventional x86 mnemonics are supported too, so you can use
 278 whichever you like. consult Z80 manual to understand the shit i wrote, if
 279 you really need to.
 280
 281
 282 have fun, and happy hacking
 283 Ketmar Dark // Invisible Vector