doc/xlispins.doc

   1 I've just finished reading the xlisp 2.1 source code for the first
   2 time.  The tutorial and reference material included with the winterp
   3 distribution are well done, but I would have liked an overview of the
   4 interpreter internals.  Here's a first cut at such a document.
   5 Comments welcome...
   6
   7 Jeff Prothero
   8
   9 I've modified Jeff's overview to match current XLISP-PLUS 3.0
  10 practice, more thoroughly describe the operation of contexts, and
  11 describe initialization and save/restore.
  12
  13 Please note that "Xlisp 3.0" by David Betz is based on XScheme and
  14 doesn't follow the design described in this document.
  15
  16 Tom Almy
  17 1/15/97
  18
  19
  20 ---------------------------cut here-------------------------------
  21
  22 90Nov13 jsp@milton.u.washington.edu       Public Domain.
  23
  24                    +--------------------------+
  25                    | xlisp-plus 3.0 internals |
  26                    +--------------------------+
  27
  28
  29  Who should read this?
  30  ---------------------
  31
  32 Anyone poking through the C implementation of xlisp for the first
  33 time.  This is intended to provide a rough roadmap of the global
  34 xlisp structures and algorithms.  If you just want to write lisp code
  35 in xlisp, you don't need to read this file -- go read cldoc.txt.  If
  36 you want to tinker with the xlisp implementation code, you should
  37 *still* read that before reading this.  The following isn't intended
  38 to be exhaustively precise -- that's what the source code is for!  It
  39 is intended only to give you sufficient orientation give you a
  40 fighting chance to understand the code the first time through,
  41 instead of the third time.
  42
  43 At the bottom of the file you'll find an example of how to add new
  44 primitive functions to xlisp.
  45
  46
  47
  48  General note on MS-DOS Memory Models
  49  ------------------------------------
  50
  51 If you are not using MS-DOS ignore all the "FAR" and "NEAR" keywords.
  52 Also ignore MEDMEM, and most uppercase macros default to the lowercase
  53 library functions.
  54
  55 For MS-DOS, originally XLISP used Large memory model. However in
  56 order to allow for a clean MS Windows port (never completed :-() and
  57 to improve performance, the code was overhauled to use Medium memory
  58 model. Every xlisp object is is allocated from FAR memory, so
  59 pointers need to be cast to FAR. Local variables, and statically
  60 allocated global variables are NEAR, however. Since FAR strings cannot be
  61 handled by Medium model library string routines, they must be moded
  62 into a NEAR buffer first.
  63
  64
  65  Other added confusions
  66  ______________________
  67
  68
  69 There are two separate sets of memory allocation/garbage collection
  70 routines. One pair, xldmem/xlimage, was the traditional xlisp
  71 allocator. The second pair, dldmem/dlimage, additionally manages
  72 arrays and strings. In the original version, malloc and free are
  73 used, which can cause memory fragmentation problems. The dl pair
  74 compresses array memory segments to eliminate fragmentation.
  75
  76 There are many compilation options. Thank goodness many more have
  77 been eliminated.
  78
  79
  80  What is an LVAL?
  81  ----------------
  82
  83 An "LVAL" is the C type for a generic pointer to an xlisp
  84 garbage-collectable something.  (Cons cell, object, string, closure,
  85 symbol, vector, whatever.)  Virtually every variable in the
  86 interpreter is an LVAL.  Cons cells contain two LVAL slots,
  87 symbols contains four LVAL slots, etc.
  88
  89
  90
  91  What is the obarray?
  92  --------------------
  93
  94 The obarray is the xlisp symbol table.  More precisely, it is is a
  95 hashtable mapping ascii strings (symbol names) to symbols.  ("obarray"
  96 is a misnomer, since it contains only xlisp SYMBOLs, and in particular
  97 contains no xlisp OBJECTs.)  It is used when converting lisp
  98 expressions from text to internal form. Since it is a root for the
  99 garbage collector, it also serves to distinguish permanent
 100 global-variable symbols from other symbols -- you can permanently
 101 protect a symbol from the garbage collector by entering it into the
 102 obarray.  This is called "interning" the symbol.  The obarray is
 103 called "obarray" in C and "*OBARRAY*" in xlisp.
 104
 105 When the package facility is compiled, *OBARRAY* no longer exists,
 106 and obarray is a list of packages. A package is a structure (an
 107 array) of six elements. The first is an obarray for the internal
 108 symbols in the package, the second is an obarry for the external
 109 symbols, the third is a list of shadowing symbols, the fourth is a
 110 list of packages this package uses, the fifth is a list of packages
 111 which use this package, and the sixth is a list of the package's
 112 names with all but the first being the nicknames. In addition,
 113 symbols have an additional attribute in their structure which is the
 114 home package of the symbol.
 115
 116
 117  The Interpreter Stacks
 118  ----------------------
 119
 120 xlisp uses two stacks, an "evaluation stack" and an "argument stack".
 121 Both are roots for the garbage collector.  The evaluation stack is
 122 largely private to the interpreter and protects internal values from
 123 garbage collection, while the argument stack holds the conventional
 124 user-visible stackframes.
 125
 126
 127 The evaluation stack is an EDEPTH-long array of "LVAL" statically
 128 allocated.  It grows zeroward.
 129
 130 xlstkbase points to the zero-near end of the evaluation stack.
 131
 132 xlstktop points to the zero-far end of the evaluation stack: the
 133 occupied part of the stack lies between xlstack and xlstktop.  NOTE
 134 that xlstktop is *NOT* the top of the stack in the conventional sense
 135 of indicating the most recent entry on the stack: xlstktop is a static
 136 bounds pointer which never changes.
 137
 138 xlstack starts at the zero-far end of the evaluation stack.  *xlstack
 139 is the most recent LVAL on the stack.  The garbage collector MARKs
 140 everything reachable from the evaluation stack (among other things),
 141 so we frequently push things on this stack while C code is
 142 manipulating them. (Via xlsave(), xlprotect(), xlsave1(), xlprot1().)
 143
 144
 145 The argument stack is an ADEPTH-long array of "LVAL".  It also grows
 146 zeroward.  The evaluator pushes arguments on the argument stack at the
 147 start of a function call (form evaluation).  Built-in functions
 148 usually eat them directly off the stack.  For user-lisp functions
 149 xleval.c:evfun() pops them off the stack and binds them to the
 150 appropriate symbols before beginning execution of the function body
 151 proper.
 152
 153 xlargstkbase is the zero-near end of argument stack.
 154 xlargstktop is the zero-far end of argument stack.  Like xlstktop,
 155 xlargstktop is a static bounds pointer.
 156
 157 *xlsp ("sp"=="stack pointer") is the most recent item on the argument stack.
 158
 159 xlfp ("fp"=="frame pointer") is the base of the current stackframe.
 160
 161
 162
 163   What is a context?
 164   ------------------
 165
 166 An xlisp "context" is something like a checkpoint, recording a
 167 particular point buried in the execution history so that we can
 168 abort/return back to it.  Contexts are used to implement call/return,
 169 catch/throw, signals, gotos, and breaks.  xlcontext points to the
 170 chain of active contexts, the top one being the second-newest active
 171 context.  (The newest -- that is, current -- active context is
 172 implemented by the variables xlstack xlenv xlfenv xldenv xlcontext
 173 xlargv xlargc xlfp xlsp.)  Context records are written by
 174 xljump.c:xlbegin() and read by xljump.c:xljump().  Context records
 175 are C structures on the C program stack; They are not in the dynamic
 176 memory pool or on the lisp execution or argument stacks.
 177
 178 To create a context, the function defines a local CONTEXT structure,
 179 and then calls xlbegin, passing the address of the structure, the
 180 appropriate context flags (ored together, if more than one), and an
 181 expression which is the tag for return and throw contexts, and NIL
 182 for others. xlend is used to end the context. Within the context, a
 183 setjmp using the jump buffer in the context structure provides the
 184 mechanism to return to the function.
 185
 186 The xljump function takes three arguments, the target context, a mask
 187 value (which is returned by the setjmp in the target context), and a
 188 return value. The return value is typically NIL, but holds the return
 189 value of RETURN and THROW functions for those operations. xljump
 190 unwinds the contexts until the target context is found, or an
 191 UNWIND-PROTECT context is found. In the latter case, xunwindprotect
 192 saves the unwind target so that unwinding can resume.
 193
 194 The implemented context flags are:
 195
 196 CF_GO -- used by TAGBODY, searched by GO (and the xlgo function).
 197
 198 CF_RETURN -- used by named blocks and functions, searched by RETURN.
 199
 200 CF_THROW -- used by CATCH, searched by THROW.
 201
 202 CF_ERROR -- used by ERRSET to mark context of an error handler.
 203
 204 CF_UNWIND -- used by UNWIND-PROTECT.
 205
 206 CF_TOPLEVEL -- used at the top-level loop.
 207
 208 CF_BRKLEVEL -- used in top level and break loops. Searched on
 209 uncaught (with CF_ERROR) errors.
 210
 211 CF_CLEANUP -- used in the top level and break loops. Searched when
 212 going back a break level. Note that this and CF_BRKLEVEL always appear
 213 together in contexts, but imply different functionality when signaled.
 214
 215 CF_CONTINUE -- used in break loops to cause continuation.
 216
 217 In most cases, the mask value passed to xljump is the context flag.
 218 For the break loop, for example, this allows dispatching based on the
 219 flag value of BRKLEVEL (which reenters the debug loop), CLEANUP
 220 (which returns to the previous break loop), or CONTINUE (which
 221 attempts to continue execution). Starting with xlisp-plus 3.0, in the
 222 case of the TAGBODY context, the mask value is the index into the
 223 TAGBODY of the jump target.
 224
 225   What is an environment?
 226   -----------------------
 227
 228 An environment is basically a store of symbol-value pairs, used to
 229 resolve variable references by the lisp program.  xlisp maintains
 230 three environments, in the global variables xlenv, xlfenv and xldenv.
 231
 232 xlenv and xlfenv are linked-list stacks which are pushed when we
 233 enter a function and popped when we exit it.  We also switch
 234 xlenv+xlfenf environments entirely when we begin executing a new
 235 closure (user-fn written in lisp). The xlenv environment is enlarged
 236 during a LET function (or other special form with value bindings),
 237 while the xlfenv environment is enlarged with FLET/MACROLET/LABELS.
 238
 239 The xlenv environment is the most heavily used environment.  It is
 240 used to resolve everyday data references to local variables.  It
 241 consists of a list of frames (and objects).  Each frame is a list of
 242 sym-val pairs.  In the case of an object, we check all the instance
 243 and class variables of the object, then do the same for its
 244 superclass, until we run out of superclasses. If the symbol is not
 245 found it has not been bound and its global value is used.
 246
 247 The xlfenv environment is used to find function values instead of
 248 variable values.
 249
 250 When we send a message, we set xlenv to the value it had when the
 251 message CLOSURE was built, then push on (obj msg-class), where
 252 msg-class is the [super]class defining the method.  (We also set
 253 xlfenv to the value xlfenv had when the method was built.)  This makes
 254 the object instance variables part of the environment, and saves the
 255 information needed to correctly resolve references to class variables,
 256 and to implement SEND-SUPER.
 257
 258 The xldenv environment tracks the old values of global variables
 259 which we have changed but intend to restore later to their original
 260 values, particularly for variables (symbols) marked as F_SPECIAL, but
 261 also for progv. This is also used internally when we bind and unbind
 262 s_evalhook and s_applyhook (*EVALHOOK* and *APPLYHOOK*). It is a
 263 simple list of sym-val pairs, treated as a stack.
 264
 265 These environments are manipulated in C via the xlisp.h macros
 266 xlframe(e), xlbind(s,v), xlfbind(s,v), xlpbind(s,v,e), xldbind(s,v),
 267 xlunbind(e).
 268
 269
 270   How are multiple return values handled?
 271   ---------------------------------------
 272
 273 The first value is always returned via the C function mechanism.
 274 All return values are passed back in an array xlresults[]. The
 275 number of return values are specified in xlnumresults. In order to
 276 avoid having to repeat the multiple value code in every function, a
 277 flag in the SUBR or FSUBR cell indicates if multiple values are used,
 278 and code in xleval.c mimics multiple value functions for single value
 279 functions. The function VALUES, defined in xvalues() allows closures
 280 to return multiple values.
 281
 282
 283   How are xlisp entities stored and identified?
 284   ---------------------------------------------
 285
 286 Conceptually, xlisp manages memory as a single array of fixed-size
 287 objects.  Keeping all objects the same size simplifies memory
 288 management enormously, since any object can be allocated anywhere, and
 289 complex compacting schemes aren't needed.  Every LVAL pointer points
 290 somewhere in this array.  Every xlisp object has the basic format
 291 (xldmem.h:typdef struct node)
 292
 293  struct node {
 294      char n_type;
 295      char n_flags;
 296      LVAL car;
 297      LVAL cdr;
 298  }
 299
 300 where n_type is one of:
 301
 302  FREE     A node on the freelist.
 303  SUBR     A function implemented in C. (Needs evalutated arguments.)
 304  FSUBR    A special function implemented in C. (Needs unevaluated arguments).
 305  CONS     A regular lisp cons cell.
 306  FIXNUM   An integer.
 307  FLONUM   A floating-point number.
 308  STRING   A string.
 309  STREAM   An input or output file.
 310  CHAR     An ascii character.
 311  USTREAM  An internal stream.
 312  RATIO    A ratio.
 313  SYMBOL   A symbol.
 314  OBJECT   Any object, including class objects.
 315  VECTOR   A variable-size array of LVALs.
 316  CLOSURE  Result of DEFUN or LAMBDA -- a function written in lisp.
 317  STRUCT   A structure.
 318  COMPLEX  A complex number
 319  BIGNUM   A bignum (integer that won't fit a FIXNUM cell)
 320  PACKAGE  A package
 321
 322 Messages may be sent only to nodes with n_type == OBJECT.
 323
 324 Obviously, several of the above types won't fit in a fixed-size
 325 two-slot node.  The escape is to have them malloc() some memory and
 326 have one of the slots point to it -- VECTOR is the archetype.  For
 327 example, see xldmem.c:newvector().  To some extent, this malloc()
 328 hack simply exports the memory- fragmentation problem to the C
 329 malloc()/free() routines.  However, it helps keep xlisp simple, and
 330 it has the happy side-effect of unpinning the body of the vector, so
 331 that vectors can easily be expanded and contracted.  When the
 332 dldmem.c version of the memory manager is used, this memory is
 333 managed by xlisp and vector memory is allocated from compressable
 334 vector segments.
 335
 336 The garbage collector has special-case code for each of the above node
 337 types, so it can find all LVAL slots and recycle any malloc()ed ram
 338 when a node is garbage-collected. If the collected node is a STREAM,
 339 then it's associated file is closed.
 340
 341 Xlisp pre-allocates nodes for all ascii characters, and for small
 342 integers.  These nodes are never garbage-collected.
 343
 344 As a practical matter, allocating all nodes in a single array is not
 345 very sensible.  Instead, nodes are allocated as needed, in segments of
 346 one or two thousand nodes, and the segments linked by a pointer chain
 347 rooted at xldmem.c:segs.
 348
 349
 350
 351   How are vectors implemented?
 352   ----------------------------
 353
 354 An xlisp vector is a generic array of LVAL slots.  Vectors are also
 355 the canonical illustration of xlisp's escape mechanism for node types
 356 which need more than two LVAL slots (the maximum possible in the
 357 fixed-size nodes in the dynamic memory pool).  The node CAR/CDR slots
 358 for a vector hold a size field plus a pointer to a malloc()ed ram
 359 chunk, which is automatically free()ed when the vector is
 360 garbage-collected.
 361
 362 xldmem.h defines macros for reading and writing vector fields and
 363 slots: getsize(), getelement() and setelement().  It also defines
 364 macros for accessing each of the other types of xlisp nodes.
 365
 366
 367
 368   How are strings implemented?
 369   ----------------------------
 370
 371 Strings work much like vectors: The node has a pointer to a malloc()ed
 372 ram chunk which is automatically free()ed when the string gets
 373 garbage-collected. The size field of a string is size in bytes. Unlike C,
 374 the null character can be a string constituent.
 375
 376
 377
 378  How are various numeric types implemented?
 379  __________________________________________
 380
 381 There are five numeric types: FIXNUM, BIGNUM, RATIO, FLONUM, and
 382 COMPLEX. The first two constitute integers, the first three rational
 383 numbers. BIGNUM, RATIO, and COMPLEX are optional types, with BIGNUM and
 384 RATIO requiring COMPLEX.
 385
 386 Numeric cells are never modified, new cells are created as necessary.
 387
 388 FIXNUMs are integers that fit in a FIXTYPE (typically "long"). The
 389 car slot (basically) contains the number. There is a pool of
 390 preallocated small fixnums (range SFIXMIN to SFIXMAX) that get
 391 reused. Only these small FIXNUMs can be tested for equality with EQ.
 392
 393 BIGNUMs are used when an integer will not fit in a FIXNUM. A BIGNUM
 394 structure is like an character string, however the length is a
 395 multiple of sizeof(unsigned short). The first unsigned short cell
 396 contains the sign of the number, and the remaining unsigned short
 397 cells contain the number, most significant part first. The first
 398 numeric cell may be zero. Functions that can create bignum results
 399 will convert automatically to FIXNUM if the result fits, however with
 400 the function small BIGNUMs can exist, and the number of numeric cells
 401 must be at least two.
 402
 403 RATIOs have the car field pointing to the numerator cell and the cdr
 404 field pointing to the denominator cell. These values must be either
 405 both FIXNUM or both BIGNUM.
 406
 407 FLONUMs have the car (overflowing into the cdr) field containing the
 408 floating point value.
 409
 410 COMPLEXs were implemented as a two element array, but now are
 411 implemented like ratios. The two elements are either both FLONUMs or
 412 both integers (rationals if ratios/bignums allowed).
 413
 414
 415  How are streams implemented?
 416  ____________________________
 417
 418 A file stream node has four fields: a file pointer (type FILEP), a
 419 saved lookahead character, flags, and character position within a
 420 line.
 421
 422 The file pointer is either a FILE * or the index into the file table.
 423 In the preferred file table implementation, the index values CLOSED,
 424 STDIN, STDOUT, and CONSOLE are -1, and the first three entries which
 425 are forced to stdin, stdout, and stderr, respectively. The file table
 426 entry contains the FILE *, the actual file name (full pathname), and
 427 for Windows the reopen mode and file position. Under windows, all
 428 files are closed when waiting for input in order to be "friendly" to
 429 the Windows environment. The extra information allows closing the
 430 files and then reopening them exactly as they were.
 431
 432 The saved lookahead character allows the peekchar function and
 433 internal unreadchar function to work. The value is set to 0 when
 434 there is no lookahead character. This does not pose a problem since
 435 ASCII files do not have null characters.
 436
 437 The flags field tells if the file is open for reading, writing, or
 438 both, and also what the last direction was so that an fseek can be
 439 performed on direction change. An additional bit keeps track of the
 440 file being opened for ASCII or BINARY access.
 441
 442 The character position is used for tab calculations.
 443
 444 An unnamed stream is simply a TCONC structure. None of the above file
 445 information applies to unnamed streams.
 446
 447  How are symbols implemented?
 448  ----------------------------
 449
 450 A symbol is a generic user-visible lisp variable, with separate slots
 451 for print name, value, function, and property list.  Any or all of
 452 these slots (including name) may be NIL.  You create a symbol in C by
 453 calling "xlmakesym(name)" or "xlenter(name)" (to make a symbol and
 454 enter it in the obarray). You create a symbol in xlisp with the READ
 455 function, or by calling "(gensym)", or indirectly in various ways.
 456 Most of the symbol-specific code in the interpreter is in xlsym.c.
 457
 458 Physically, a symbol is implemented like a four- or five-slot vector.
 459 A couple of free bits in the node structure are used as flags for
 460 F_SPECIAL (special variables) and F_CONSTANT (constants).
 461
 462 A symbol is marked as unbound (no value) or undefined (no function)
 463 by placing a pointer to symbol s_unbound in the value or function
 464 field, respectively. The symbol s_unbound is not interned and is not
 465 used other than to set and check for unbound variables and undefined
 466 functions.
 467
 468 The symbol NIL is statically allocated so its address is a constant.
 469 This makes the frequent comparision of a pointer to NIL faster and
 470 more compact (in some cases).
 471
 472 Random musing: Abstractly, the LISP symbols plus cons cells (etc)
 473 constitute a single directed graph, and the symbols mark spots where
 474 normal recursive evaluation should stop.  Normal lisp programming
 475 practice is to have a symbol in every cycle in the graph, so that
 476 recursive traversal can be done without MARK bits.
 477
 478
 479
 480   How are closures implemented?
 481   -----------------------------
 482
 483 A closure, the return value from a lambda, is a regular coded-in-lisp
 484 fn.  Physically, it it implemented like an eleven-slot vector, with the
 485 node n_type field hacked to contain CLOSURE instead of VECTOR. The
 486 vector slots contain:
 487
 488  name   symbol -- 1st arg of DEFUN.  NIL for LAMBDA closures.
 489  type   (s_lambda or s_macro).
 490  args   List of "required" formal arguments (as symbols)
 491  oargs  List of "optional" args, each like: (name (default specified-p))
 492  rest   Name of "&rest" formal arg, else NIL.
 493  kargs  keyword args, each like: ((':foo 'bar default specified-p))
 494  aargs  &aux vars, each like: (('arg default))
 495  body   actual code (as lisp list) for fn.
 496  env    value of xlenv when the closure was built.  NIL for macros.
 497  fenv   value of xlfenv when the closure was built. NIL for macros.
 498  lambda The original formal args list in the defining form.
 499
 500 The lambda field is for printout purposes.  The remaining fields store
 501 a predigested version of the formal args list.  This is a limited form
 502 of compilation: by processing the args list at closure-creation time,
 503 we reduce the work needed during calls to the closure.
 504
 505
 506
 507   How are objects implemented?
 508   ----------------------------
 509
 510 An object is implemented like a vector, with the size determined by
 511 the number of instance variables.  The first slot in the vector points
 512 to the class of the object; the remaining slots hold the instance
 513 variables for the object.  An object needs enough slots to hold all
 514 the instance variables defined by its class, *plus* all the instance
 515 variables defined by all of its superclasses.
 516
 517
 518
 519   How are classes implemented?
 520   ----------------------------
 521
 522 A class is a specific kind of object, hence has a class pointer plus
 523 instance variables.  All classes have the following instance variables:
 524
 525  MESSAGES   A list of (interned-symbol method-closure) pairs.
 526  IVARS      Instance variable names: A list of interned symbols.
 527  CVARS      Class variable names:    A list of interned symbols.
 528  CVALS      Class variable values:   A vector of values.
 529  SUPERCLASS A pointer to the superclass.
 530  IVARCNT    Number of class instance variables, as a fixnum.
 531  IVARTOTAL  Total number of instance variables, as a fixnum.
 532  PNAME      Printname for this class
 533
 534 IVARCNT is the count of the number of instance variables defined by
 535 our class.  IVARTOTAL is the total number of instance variables in an
 536 object of this class -- IVARCNT for this class plus the IVARCNTs from
 537 all of our superclasses.
 538
 539
 540
 541
 542   How is the class hierarchy laid out?
 543   ------------------------------------
 544
 545 The fundamental objects are the OBJECT and CLASS class objects.  (Both
 546 are instances of class CLASS, and since CLASSes are a particular kind
 547 of OBJECT, both are also objects, with n_type==OBJECT.  Bear with me!)
 548
 549 OBJECT is the root of the class hierarchy: everything you can send a
 550 message to is of type OBJECT.  (Vectors, chars, integers and so forth
 551 stand outside the object hierarchy -- you can't send messages to them.
 552 I'm not sure why Dave did it this way.) OBJECT defines the messages:
 553
 554  :isnew -- Does nothing
 555  :class -- Returns contents of class-pointer slot.
 556  :show  -- Prints names of obj, obj->class and instance vars (for debugging).
 557  :prin1 -- print the object to argument stream
 558
 559 A CLASS is a specialized type of OBJECT (with instance variables like
 560 MESSAGES which generic OBJECTs lack), class CLASS hence has class
 561 OBJECT as its superclass.  The CLASS object defines the messages:
 562
 563  :new    -- Create new object with self.IVARTOTAL LVAR slots, plus
 564             one for the class pointer. Point class slot to self.
 565             Set new.n_type char to OBJECT.
 566  :isnew  -- Fill in IVARS, CVARS, CVALS, SUPERCLASS, IVARCNT and
 567             IVARTOTAL, using parameters from :new call.  (The
 568             :isnew msg inherits the :new msg parameters because
 569             the  :isnew msg is generated automatically after
 570             each :new   msg, courtesy of a special hack in
 571             xlobj.c:sendmsg().)
 572  :answer -- Add a (msg closure) pair to self.MESSAGES.
 573
 574
 575
 576 Here's a figure to summarize the above, with a generic object thrown
 577 in for good measure.  Note that all instances of CLASS will have a
 578 SUPERCLASS pointer, but no non-class object will.  Note also that the
 579 messages known to an object are those which can be reached by
 580 following exactly one Class Ptr and then zero or more Superclass Ptrs.
 581 For example, the generic object can respond to :ISNEW, :CLASS and
 582 :SHOW, but not to :NEW or :ANSWER.  (The functions implementing the
 583 given messages are shown in parentheses.)
 584
 585                                     NIL
 586                                      ^
 587                                      |
 588                                      |Superclass Ptr
 589                                      |
 590                             Msg+--------+
 591  :isnew (xlobj.c:obisnew) <----|  class |Class Ptr
 592  :class (xlobj.c:obclass) <----| OBJECT |------------+
 593  :show  (xlobj.c:objshow) <----|        |            |
 594                                +--------+            |
 595        +---------+                ^  ^               |
 596        | generic |Class Ptr       |  |               |
 597        | object  |----------------+  |Superclass Ptr |
 598        +---------+                   |               |
 599                             Msg+--------+            |
 600  :isnew (xlobj.c:clnew)   <----| class  |Class Ptr   |
 601  :new   (xlobj.c:clisnew) <----| CLASS  |--------+   |
 602  :answer(xlobj.c:clanswer)<----|        |        |   |
 603                                +--------+        |   |
 604                                   ^  ^           |   |
 605                                   |  |           |   |
 606                                   |  +-----------+   |
 607                                   +------------------+
 608
 609
 610 Thus, class CLASS inherits the :CLASS and :SHOW messages from class
 611 OBJECT, overrides the default :ISNEW message, and provides new the
 612 messages :NEW and :ANSWER.
 613
 614 New classes are created by (send CLASS :NEW ...) messages.  Their
 615 Class Ptr will point to CLASS.  By default, they will have OBJECT as
 616 their superclass, but this can be overridden by the second optional
 617 argument to :NEW.
 618
 619 The above basic structure is set up by xlobj.c:xloinit().
 620
 621
 622
 623   How do we look up the value of a variable?
 624   ------------------------------------------
 625
 626 When we're cruising along evaluating an expression and encounter a
 627 symbol, the symbol might refer to a global variable, an instance
 628 variable, or a class variable in any of our superclasses.  Figuring
 629 out which means digging throught the environment.  The canonical place
 630 this happens is in xleval.c:xleval(), which simply passes the buck to
 631 xlsym.c:xlgetvalue(), which in turn passes the buck to
 632 xlxsym.c:xlxgetvalue(), where the fun of scanning down xlenv begins.
 633 The xlenv environment looks something like
 634
 635          Backbone    Environment frame contents
 636          --------    --------------------------
 637 xlenv --> frame      ((sym val) (sym val) (sym val) ... )
 638           frame      ...
 639           object     (obj msg-class)
 640           frame      ...
 641           object     ...
 642           frame      ...
 643           ...
 644
 645 The "frame" lines are due to everyday nested constructs like LET
 646 expressions, while the "object" lines represent an object environment
 647 entered via a message send.  xlxgetvalue scans the enviroment left to
 648 right, and then top to bottom.  It scans down the regular environment
 649 frames itself, and calls xlobj.c:xlobjgetvalue() to search the object
 650 environment frames.
 651
 652 xlobjgetvalue() first searches for the symbol in the msg-class, then
 653 in all the successive superclasses of msg-class.  In each class, it
 654 first checks the list of instance-variable names in the IVARS slot,
 655 then the list of class-variables name in the CVARS slot.
 656
 657
 658
 659   How are function calls implemented?
 660   -----------------------------------
 661
 662 xleval.c contains the central expression-evaluation code.
 663 xleval.c:xleval() is the standard top-level entrypoint.  The two
 664 central functions are xleval.c:xlevform() and xleval.c:evfun().
 665 xlevform() can evaluate four kinds of expression nodes:
 666
 667 SUBR: A normal primitive fn coded in C.  We call evpushargs() to
 668 evaluate and push the arguments, then call the primitive.
 669
 670 FSUBR: A special primitive fn coded in C, which (like IF) wants its
 671 arguments unevaluated.  We call pushargs() (instead of evpushargs())
 672 and then the C fn.
 673
 674 CLOSURE: A preprocessed written-in-lisp fn from a DEFUN or LAMBDA.  We
 675 call evpushargs() and then evfun().
 676
 677 CONS: We issue an error if it isn't a LAMBDA, otherwise we call
 678 xleval.c:xlclose() to build a CLOSURE from the LAMBDA, and fall into
 679 the CLOSURE code.
 680
 681 The common thread in all the above cases is that we call evpushargs()
 682 or pushargs() to push all the arguments on the evaluation stack,
 683 leaving the number and location of the arguments in the global
 684 variables xlargc and xlargv.  The primitive C functions consume
 685 their arguments directly from the argument stack.
 686
 687 xleval.c:evfun() evaluates a CLOSURE by:
 688
 689 (1) Switching xlenv and xlfenv to the values they had when
 690 the CLOSURE was built. (These values are recorded in the CLOSURE.)
 691
 692 (2) Binding the arguments to the environment.  This involves scanning
 693 through the section of the argument stack indicated by xlargc/xlargv,
 694 using information from the CLOSURE to resolve keyword arguments
 695 correctly and assign appropriate default values to optional arguments,
 696 among other things.
 697
 698 (3) Evaluating the body of the function via xleval.c:xleval().
 699
 700 (4) Cleaning up and restoring the original environment.
 701
 702
 703
 704   How are message-sends implemented?
 705   ----------------------------------
 706
 707 We scan the MESSAGES list in the CLASS object of the recipient,
 708 looking for a (message-symbol method) pair that matches our message
 709 symbol.  If necessary, we scan the MESSAGES lists of the recipients
 710 superclasses too.  (xlobj.c:sendmsg().)  Once we find it, we basically
 711 do a normal function evaluation. (xlobjl.c:evmethod().)  Two oddities:
 712 We need to replace the message-symbol by the recipient on the argument
 713 stack to make things look normal, and we need to push an 'object'
 714 stack entry on the xlenv environment so we remember which class is
 715 handling the message.
 716
 717
 718
 719   How is garbage collection implemented?
 720   --------------------------------------
 721
 722 The dynamic memory pool managed by xlisp consists of a chain of memory
 723 *segments* rooted at global C variable "segs".  Each segment contains
 724 an array of "struct node"s plus a pointer to the next segment.  Each
 725 node contains a n_type field and a MARK bit, which is zero except
 726 during garbage collection.
 727
 728 Xlisp uses a simple, classical mark-and-sweep garbage collector.  When
 729 it runs out of memory (fnodes==NIL), it does a recursive traversal
 730 setting the MARK flag on all nodes reachable from the obarray, the
 731 three environments xlenv/xlfenv/xldenv, and the evaluation and
 732 argument stacks.  (A "switch" on the n_type field tells us how to find
 733 all the LVAL slots in the node (plus associated storage), and a
 734 pointer-reversal trick lets us avoid using too much stack space during
 735 the traversal.)  sweep() then adds all un-MARKed LVALs to fnodes, and
 736 clears the MARK bit on the remaining nodes.  If this fails to produce
 737 enough free nodes, a new segment is malloc()ed.
 738
 739 The code to do this stuff is mostly in xldmem.c.
 740
 741
 742 How is garbage collection of vectors/strings implemented?
 743 ---------------------------------------------------------
 744
 745 In the dldmem.c version, vector/string space is allocated from a
 746 memory pool maintained by xlisp, rather than relying on the C libary
 747 malloc() and free() functions. The pool is a linked list of VSEGMENT
 748 with the root vsegments.
 749
 750 When no free memory of a size necessary for an allocation request is
 751 available, a garbage collection is performed. If there is still no
 752 available memory, then a new vector segment is allocated.
 753
 754 The garbage collection process compacts the memory in each vector
 755 segment. This is an easy process because each allocated vector area
 756 is pointed to by only one location (in an LVAL), and a backpointer is
 757 maintained from the vector segment to the LVAL. Empty vector segments
 758 are returned to the heap using free() if there greater than 50% of
 759 the vector space is free. This is done to reduce thrashing while
 760 making memory available for allocation to LVALs.
 761
 762
 763  How does XLISP initialize?
 764  --------------------------
 765
 766 A major confusing aspect of xlisp is how it initializes, and how
 767 save/restore works. This is a multistep process that must take place
 768 in a specific order.
 769
 770 When xlisp starts, there is no obarray, and in fact no symbols at
 771 all. All initial symbols must be created as part of initialization.
 772 In addition the character and small integer cells must be created,
 773 and all the C variables that point to xlisp symbols must be
 774 initialized.
 775
 776 Initialization is mostly performed in xlinit.c, from the function
 777 xlinit(). This function is called from main() after main parses the
 778 command line, calls osinit() to initialize the OS interface, and sets
 779 the initial top level context. This initial context causes a fatal
 780 error to occur if any error happens during the initialization
 781 process.
 782
 783 The first step of xlinit() is to call xlminit(), which is in xldmem.c
 784 or dldmem.c. This initializes the pointers for the memory manager,
 785 stacks, creates the small integer and character LVAL segments, and
 786 creates the NIL symbol by filling in the statically allocated NIL
 787 LVAL from one that is temporarily created. This first call of
 788 xlmakesym will do the first garbage collection -- all of the roots
 789 used for the mark routine have been set so that marking will not
 790 occur (there is nothing to mark!). It is important, however, that
 791 garbage collection not occur again until initialization is completed.
 792 This can be assured by having the allocation segment size, NNODES, be
 793 large enough for the entire initialization.
 794
 795 The second step in xlinit() is to call xldbug.c:xldinit() to turn
 796 debugging off.
 797
 798 At this point, if a restore is to occur from a workspace file, then
 799 the restore is attempted. If the restore is sucessful, then
 800 initialization is finished. See "How does save/restore work?" which
 801 is the next question.
 802
 803 If the restore fails or there is no file to restore, an initial
 804 workspace must be created. This is done by function initwks(), also
 805 in xlinit.c.
 806
 807 Initwks() starts by calling four initialization functions.
 808
 809 xlsym.c:xlsinit() creates the symbol *OBARRAY* (and sets the
 810 variable obarray to point to it), creates the object array as the
 811 value, and enters obarray into the array.
 812
 813 xlsymbols() is called next. It enters all of the symbols referenced
 814 by the interpreter into the obarray, and saves their addresses in C
 815 variables. This function is also called during restores, so it is
 816 important that it does not change the value of any symbols where the
 817 value would be set by restore. If the unbound indicator symbol does
 818 not exist, one is created. Then it puts NIL in the obarray if not
 819 already there (NIL being already created), then all the other symbols
 820 are added (if they don't exist), and their addresses saved in C
 821 variables.
 822
 823 This function also initializes constants such as NIL, T, and PI.
 824 Because a saved workspace might have a different file stream
 825 environment, xlsymbols always initializes the standard file streams
 826 based on the current xlisp invocation. It builds the structure
 827 property for RANDOM-STATE structures. It then (shudder!) calls two
 828 other initialization functions xlobj.c:obsymbols() and ossymbols (in
 829 the appropriate *stuff.c file) to enter symbols used in the object
 830 feature and operating system specific code, respectively.
 831
 832 Returning to initwks(), two aditional initialization routines are
 833 called. Xlread.c:xlrinit() initializes the read-table and installs
 834 the standard read macros. Xlobj.c:xloinit() creates the class and
 835 object objects.
 836
 837 Since the NIL and *OBARRAY* symbols were created before the unbound
 838 marker symbol, initwks sets the function values of these symbols, and
 839 of the unbound marker symbol, to be unbound. It then initializes all
 840 the global variables. Finally it creates all the builtin function
 841 bindings from the funtab table. The synonym functions are created
 842 last.
 843
 844
 845  How are workspaces saved and restored?
 846  --------------------------------------
 847
 848 All the work is done in dlimage.c or xlimage.c, depending on the
 849 memory management scheme. The basic trick in a save is that memory
 850 locations upon a restore would be entirely different. Because of
 851 that, addresses written out are converted into offsets from the start
 852 of the segs LVAL segment list. In the restore operation, the offset
 853 is converted back into an address; if the offset is larger than the
 854 allocated segment memory, additional segment memory is allocated
 855 until an address can be calculated.
 856
 857 Looking at the save function, xlisave(char *fname), the argument
 858 string is taken as the name of a workspace file, and a binary file is
 859 created of that name. Then a garbage collection is performed since it
 860 would be wasteful to write out garbage.
 861
 862 The size of ftab is writen as a validity check, figuring that if the
 863 configuration changed, then this value would be different.
 864
 865 The offset of the *obarray* symbol is written next, since obarray is
 866 crucial to doing a restore -- it is used to get the addresses of all
 867 the other symbols used in the interpreter. Since NIL is a statically
 868 allocated symbol, the offsets to its function, property list, and
 869 printname are written. (I bet you didn't know you could define a
 870 function called NIL!)
 871
 872 Now the segs are traversed and all nodes are written out. The nodes
 873 are written in the format of a one byte tag followed by information
 874 dependent on the node type. Since many locations in the segment can
 875 be empty, one node type, FREE, has data that sets the next offset in
 876 the file, thus allowing skipping of many locations with a single
 877 command. The function setoffset(), called before writing tags in the
 878 other cases, handles writing the FREE entry. CONS, USTREAM, COMPLEX,
 879 and RATIO cells consist of two pointers, which are written after
 880 conversion into offsets. For all other node types, the raw
 881 information is written by writenode(). This could be optimized since
 882 not all information is needed (for instance, the address of arrays
 883 won't be needed!)
 884
 885 A terminator entry (FREE with length of zero) is written, and the
 886 segs are traversed again, looking for nodes with attached
 887 array/string data and streams. For the types where the attached data
 888 is an array, the array elements (pointers) are converted to offsets
 889 and written out. For a string, the characters are written. For a
 890 stream (assuming FILETABLE is used), the file's name and position are
 891 written if it is other than standard input, output, or error (which
 892 cannot be saved or restored).
 893
 894
 895 Restoring a workspace is somewhat more difficult. The file is opened
 896 and checked for validity. Then the old memory image is deleted.
 897 During the deletion, any open file streams other than standard input,
 898 output, or error are closed. All of the global memory allocation
 899 pointers and stacks are reset, just like in initialization. Since the
 900 fixnum and character segments are of fixed size and need a known
 901 location, there are allocated. Their values, however, will be filled
 902 in from the workspace file. This is another wasteful inefficiency,
 903 but at least these segments are small.
 904
 905 The global pointer obarray is read from the file. As mentioned
 906 earlier, the cviptr() function converts the offset in the file into a
 907 physical address, allocating more node segments as necessary. The
 908 array portion of the NIL symbol is allocated, and its function,
 909 property list, and printname pointers are read from the file.
 910
 911 Then the node information is read in. For type FREE, the offset is
 912 adjusted. For CONS, USTREAM, COMPLEX, and (when bignums defined)
 913 RATIO the car and cdr pointers are read, for other types, the LVAL
 914 data is read raw.
 915
 916
 917 The LVAL segments are scanned, and now the vector/string components
 918 of nodes are read. Since the order of nodes is unchanged, the data is
 919 read into the correct nodes. For vector types, the size field of the
 920 LVAL is consulted, vector space is allocated, and offsets are read
 921 from the file and converted into pointers. For strings, the string
 922 space is allocated and the string is read. For streams other than
 923 standard input, output, or error, the file name and position is read
 924 and an attempt is made to open the file. If the file can be opened,
 925 then it is positioned.
 926
 927 During the scan, for SUBR/FSUBR nodes funtab is consulted and the
 928 correct subroutine address is inserted.
 929
 930 During the whole process, if a tag is invalid or the file size is not
 931 correct, a fatal error occurs.
 932
 933 A garbage collection is performed to initialize the free space for
 934 future node allocation.
 935
 936 Finally xlsymbols() is called, as in the initializaton process, so
 937 that the C pointers in the interpreter can be set. This also sets the
 938 streams for standard input, output, and error to the correct values.
 939
 940
 941  How do I add a new primitive fn to xlisp?
 942  -----------------------------------------
 943
 944 Add a line to the end of xlftab.c:funtab[], and a function prototype
 945 to xlftab.h.  This table contains a list of triples:
 946
 947 The first element of each triple is the function name as it will
 948 appear to the programmer. Make it all upper case.
 949
 950 The second element is S (for SUBR) if (like most fns) your function
 951 wants its arguments pre-evaluated, else F (for FSUBR).
 952
 953 The third element is the name of the C function to call.
 954
 955 Remember that your arguments arrive on the xlisp argument rather
 956 than via the usual C parameter mechanism.
 957
 958 CAUTION: Try to keep your files separate from generic xlisp files, and
 959 to minimize the number of changes you make in the generic xlisp files.
 960 This way, you'll have an easier time re-installing your changes when
 961 new versions of xlisp come out.  It's a good idea to put a marker
 962 (like a comment with your initials) on each line you change or insert
 963 in the generic xlisp fileset.  For example, if you are going to add
 964 many primitive functions to your xlisp, use an #include file rather
 965 than putting them all in xlftab.c.
 966
 967 CAUTION: Remember that you usually need to protect the LVAL variables
 968 in your function from the garbage-collector.  It never hurts to do
 969 this, and often produces obscure bugs if you dont.  Generic code for
 970 a new primitive fn:
 971
 972 /* xlsamplefun - do useless stuff. */
 973 /* Called like (samplefun '(a c b) 1 2.0) */
 974 LVAL xlsamplefun()
 975 {
 976     /* Variables to hold the arguments: */
 977     LVAL    list_arg, integer_arg, float_arg;
 978
 979     /* Get the arguments, with appropriate errors */
 980     /* if any are of the wrong type.  Look in     */
 981     /* xlisp.h for macros to read other types of  */
 982     /* arguments.  Look in xlmath.c for examples  */
 983     /* of functions which can handle an argument  */
 984     /* which may be either int or float:          */
 985     list_arg    = xlgalist()  ;  /* "XLisp Get A LIST"   */
 986     integer_arg = xlgafixnum();  /* "XLisp Get A FIXNUM" */
 987     float_arg   = xlgaflonum();  /* "XLisp Get A FLONUM" */
 988
 989     /* Issue an error message if there are any extra arguments: */
 990     xllastarg();
 991
 992
 993
 994     /* Call a separate C function to do the actual  */
 995     /* work.  This way, the main function can       */
 996     /* be called from both xlisp code and C code.   */
 997     /* By convention, the name of the xlisp wrapper */
 998     /* starts with "xl", and the native C function  */
 999     /* has the same name minus the "xl" prefix:     */
1000     return samplefun( list_arg, integer_arg, float_arg );
1001 }
1002 LVAL samplefun( list_arg, integer_arg, float_arg )
1003 LVAL            list_arg, integer_arg, float_arg;
1004 {
1005     FIXTYPE val_of_integer_arg;
1006     FLOTYPE val_of_float_arg;
1007
1008     /* Variables which will point to LISP objects: */
1009     LVAL result;
1010
1011     LVAL list_ptr;
1012     LVAL float_ptr;
1013     LVAL int_ptr;
1014
1015     /* Protect our internal pointers by */
1016     /* pushing them on the evaluation   */
1017     /* stack so the garbage collector   */
1018     /* can't recycle them in the middle */
1019     /* of the routine:                  */
1020     xlstkcheck(3);    /* Make sure following xlsave */
1021                       /* calls won't overrun stack. */
1022     xlsave(list_ptr); /* Use xlsave1() if you don't */
1023     xlsave(float_ptr);/* do an xlstkcheck().        */
1024     xlsave(int_ptr);
1025
1026     /* Create an internal list structure, protected */
1027     /* against garbage collection until we exit fn: */
1028     list_ptr = cons(list_arg,list_arg);
1029
1030     /* Get the actual values of our fixnum and flonum: */
1031     val_of_integer_arg = getfixnum( integer_arg );
1032     val_of_float_arg   = getflonum( float_arg   );
1033
1034
1035     /*******************************************/
1036     /* You can have any amount of intermediate */
1037     /* computations at this point in the fn... */
1038     /*******************************************/
1039
1040
1041     /* Make new numeric values to return: */
1042     integer_ptr = cvflonum( val_of_integer_arg * 3   );
1043     float_ptr   = cvflonum( val_of_float_arg   * 3.0 );
1044
1045     /* Cons it all together to produce a return value: */
1046     result = cons(
1047         list_ptr,
1048         cons(
1049             integer_ptr,
1050             cons(
1051                 float_ptr,
1052                 NIL
1053             )
1054         )
1055     );
1056
1057     /* Restore the stack, cancelling the xlsave()s: */
1058     xlpopn(3); /* Use xlpop() for a single argument.*/
1059
1060     return result;
1061 }
1062
1063
1064 - Jeff   (S)
1065
1066