From 3b91bb70129a4d6276716c740e89a02854a0216a Mon Sep 17 00:00:00 2001
From: charles childers <charles@zenwalk.hyperion.net>
Date: Sat, 5 Jul 2008 21:42:52 -0400
Subject: [PATCH] More work on docs for the new opcodes.

---
 NEW-OPCODES | 83 ++++++++++++++++++++++++++++++++-----------------------------
 1 file changed, 43 insertions(+), 40 deletions(-)

diff --git a/NEW-OPCODES b/NEW-OPCODES
index 063b6ba..e9a3f38 100644
--- a/NEW-OPCODES
+++ b/NEW-OPCODES
@@ -4,6 +4,9 @@ Introduction
 This documentation describes additional features and functionality of Mat's extended Ngaro VM.
 Other implementations may also incorporate some or all of these extensions in the future.
 
+Work to support these extensions is being done in the 'compiler' module. Once this is worked
+out an alternate Retro core using these extensions can be developed.
+
 
 
 DTC Streams and Token Threading Code
@@ -13,33 +16,33 @@ There exist three categories of executable code in Ngaro: CPU dependant machine
 and bytecode sequences. The C code primitives, which form the basic instruction set, are implemented
 as label sequences and are generated at compile time. A GCC extension allows label addresses to be
 stored into variables. Thanks to this feature it is possible to execute the primitives though a
-simple goto statement. 
+simple goto statement.
 
 Example: 
 
-  labelA: C code .. 
+  labelA: C code ..
 
-  void **pointer_array[255]; 
-  void *inst; 
+  void **pointer_array[255];
+  void *inst;
 
-  pointer_array[0] = &&labelA; 
-  ... 
+  pointer_array[0] = &&labelA;
+  ...
 
-  goto *inst++ = vm_code[pointer_array[vpc++]]; 
+  goto *inst++ = VM_code[pointer_array[vpc++]];
 
-With this technique it is possible to implement direct (DTC), indirect (ITC) and token threading (TTC)
-interpreters without the including of assembler code. The vm use TTC because it is a straightforward
-and fast way to decode the Ngaro bytecode. TTC has the disavantage to be slower then DTC because of
-the bytecode to address translation for executing but the advantage of lower TLB cache misses in the
-case of vm branch instructions. 
+With this technique it is possible to implement direct (DTC), indirect (ITC), and token threading (TTC)
+interpreters without use of assembler code. The Ngaro VM uses TTC because it is a straightforward
+and fast way to decode the Ngaro bytecode. TTC has a disavantage of being slower then DTC because of
+the bytecode to address translation for executing, but has the advantage of lower TLB cache misses in
+the case of branch instructions.
 
-The vm is able to compile linear bytecode sequences to DTC code (a stream). DTC streams can then be
+The VM is able to compile linear bytecode sequences to DTC code (a stream). DTC streams can then be
 executed like all other bytecodes (so the interpreter can extend its instruction set at runtime). It
 is also possible to compile and execute a stream without extending the instruction set permanently
 (that's analog to the generation of super instructions in other forth systems like gforth). The
 combination of TTC and DTC combines a good branch performance with a faster dispatch method which can
 result in a huge performane gain over traditional threading techniques. However to profit from this
-capability the retro compiler must support the generation of super instructions at minimum. 
+capability the Retro compiler must support the generation of super instructions at minimum.
 
 
 
@@ -48,61 +51,61 @@ Compiling a DTC Stream
 
 To compile a new stream there exist a special instruction: 
 
-  SINST number of instructions: N instruction 1,instruction 2 .. instruction N 
+  SINST number of instructions: N instruction 1,instruction 2 .. instruction N
 
 Because of the incompatible way of threading beetween DTC and STC it is not possible to branch from
 a stream to bytecode directly and vis versa. Instead subroutines must be compiled seperatly to new
 instructions and can then be included as extended instruction with a XOP prefix (similar to a microcode
 technique very good known by Intel and Zilog to support more than 255 opcodes, by the way). Another
 possibility would be to use two special instructions (TCALL and TRETURN) for this task but these are
-not yet tested out and included in the actual vm version (coming soon :-). 
+not yet tested.
 
 The SINST bytecode push the start offset of the generated stream onto the data stack so the bytecode
-of the new instruction is identical with its stream offset. To implement a temporaly super instruction
+of the new instruction is identical with its stream offset. To implement a temporary super instruction
 the bytecode can be executed with the EXE instruction before the internal offset pointer to the stream
 memory is decreased to its old position by the TRESET instruction with the result to free the allocated
-memory of the instruction: 
+memory of the instruction:
 
 Example: 
 
-  SINST 6 
-  LI_IAC 0 
-  LIT 1 
-  ADD 
-  DUP 
-  LIT 1000000 
-  LT_JUMP 2 
-  EXE 
-  TRESET 
+  SINST 6
+  LI_IAC 0
+  LIT 1
+  ADD
+  DUP
+  LIT 1000000
+  LT_JUMP 2
+  EXE
+  TRESET
 
 These example shows two details of the SINST bytecode. The number of instructions without parameters is
 counted form one up in contrast to the immediate destination of conditional branches where parameters
 are counted too. Here, six bytecodes are compiled to a stream and the LI_JUMP instruction branches to
-the LIT bytecode. 
+the LIT bytecode.
 
 
 
 Avoiding Stack Addressing 
 -------------------------
 
-Register based vm designs offer the chance of a better vm performance because there primitives doesn't
-need to address the stack for arithmetic and logic operations. Ngaro compensate this though static
+Register based VM designs offer the chance of a better VM performance because the primitives don't need
+to address the stack for arithmetic and logic operations. Ngaro compensates for this though static
 2r-stack caching with the possibility to address the two cache registers for the first and second stack
-element directly: 
+element directly:
 
 
-  LI_IAC  value 
-  LI_IOP  value 
+  LI_IAC  value
+  LI_IOP  value
 
 These two instruction load the TOS and NOS cache registers (IAC and IOP) directly with its immediate
-parameters. 
+parameters.
 
 
-  RADD 
-  RSUB 
-  RMUL 
-  RAND  value 
-  ROR  value 
-  RXOR  value 
+  RADD
+  RSUB
+  RMUL
+  RAND value
+  ROR  value
+  RXOR value
 
 Arithmetic and logic operations, IOP or immediate parameter = operand, result stored in IAC.
-- 
2.11.4.GIT