1 I've just finished reading the xlisp 2.1 source code for the first
2 time. The tutorial and reference material included with the winterp
3 distribution are well done, but I would have liked an overview of the
4 interpreter internals. Here's a first cut at such a document.
9 I've modified Jeff's overview to match current XLISP-PLUS 3.0
10 practice, more thoroughly describe the operation of contexts, and
11 describe initialization and save/restore.
13 Please note that "Xlisp 3.0" by David Betz is based on XScheme and
14 doesn't follow the design described in this document.
20 ---------------------------cut here-------------------------------
22 90Nov13 jsp@milton.u.washington.edu Public Domain.
24 +--------------------------+
25 | xlisp-plus 3.0 internals |
26 +--------------------------+
32 Anyone poking through the C implementation of xlisp for the first
33 time. This is intended to provide a rough roadmap of the global
34 xlisp structures and algorithms. If you just want to write lisp code
35 in xlisp, you don't need to read this file -- go read cldoc.txt. If
36 you want to tinker with the xlisp implementation code, you should
37 *still* read that before reading this. The following isn't intended
38 to be exhaustively precise -- that's what the source code is for! It
39 is intended only to give you sufficient orientation give you a
40 fighting chance to understand the code the first time through,
41 instead of the third time.
43 At the bottom of the file you'll find an example of how to add new
44 primitive functions to xlisp.
48 General note on MS-DOS Memory Models
49 ------------------------------------
51 If you are not using MS-DOS ignore all the "FAR" and "NEAR" keywords.
52 Also ignore MEDMEM, and most uppercase macros default to the lowercase
55 For MS-DOS, originally XLISP used Large memory model. However in
56 order to allow for a clean MS Windows port (never completed :-() and
57 to improve performance, the code was overhauled to use Medium memory
58 model. Every xlisp object is is allocated from FAR memory, so
59 pointers need to be cast to FAR. Local variables, and statically
60 allocated global variables are NEAR, however. Since FAR strings cannot be
61 handled by Medium model library string routines, they must be moded
62 into a NEAR buffer first.
65 Other added confusions
66 ______________________
69 There are two separate sets of memory allocation/garbage collection
70 routines. One pair, xldmem/xlimage, was the traditional xlisp
71 allocator. The second pair, dldmem/dlimage, additionally manages
72 arrays and strings. In the original version, malloc and free are
73 used, which can cause memory fragmentation problems. The dl pair
74 compresses array memory segments to eliminate fragmentation.
76 There are many compilation options. Thank goodness many more have
83 An "LVAL" is the C type for a generic pointer to an xlisp
84 garbage-collectable something. (Cons cell, object, string, closure,
85 symbol, vector, whatever.) Virtually every variable in the
86 interpreter is an LVAL. Cons cells contain two LVAL slots,
87 symbols contains four LVAL slots, etc.
94 The obarray is the xlisp symbol table. More precisely, it is is a
95 hashtable mapping ascii strings (symbol names) to symbols. ("obarray"
96 is a misnomer, since it contains only xlisp SYMBOLs, and in particular
97 contains no xlisp OBJECTs.) It is used when converting lisp
98 expressions from text to internal form. Since it is a root for the
99 garbage collector, it also serves to distinguish permanent
100 global-variable symbols from other symbols -- you can permanently
101 protect a symbol from the garbage collector by entering it into the
102 obarray. This is called "interning" the symbol. The obarray is
103 called "obarray" in C and "*OBARRAY*" in xlisp.
105 When the package facility is compiled, *OBARRAY* no longer exists,
106 and obarray is a list of packages. A package is a structure (an
107 array) of six elements. The first is an obarray for the internal
108 symbols in the package, the second is an obarry for the external
109 symbols, the third is a list of shadowing symbols, the fourth is a
110 list of packages this package uses, the fifth is a list of packages
111 which use this package, and the sixth is a list of the package's
112 names with all but the first being the nicknames. In addition,
113 symbols have an additional attribute in their structure which is the
114 home package of the symbol.
117 The Interpreter Stacks
118 ----------------------
120 xlisp uses two stacks, an "evaluation stack" and an "argument stack".
121 Both are roots for the garbage collector. The evaluation stack is
122 largely private to the interpreter and protects internal values from
123 garbage collection, while the argument stack holds the conventional
124 user-visible stackframes.
127 The evaluation stack is an EDEPTH-long array of "LVAL" statically
128 allocated. It grows zeroward.
130 xlstkbase points to the zero-near end of the evaluation stack.
132 xlstktop points to the zero-far end of the evaluation stack: the
133 occupied part of the stack lies between xlstack and xlstktop. NOTE
134 that xlstktop is *NOT* the top of the stack in the conventional sense
135 of indicating the most recent entry on the stack: xlstktop is a static
136 bounds pointer which never changes.
138 xlstack starts at the zero-far end of the evaluation stack. *xlstack
139 is the most recent LVAL on the stack. The garbage collector MARKs
140 everything reachable from the evaluation stack (among other things),
141 so we frequently push things on this stack while C code is
142 manipulating them. (Via xlsave(), xlprotect(), xlsave1(), xlprot1().)
145 The argument stack is an ADEPTH-long array of "LVAL". It also grows
146 zeroward. The evaluator pushes arguments on the argument stack at the
147 start of a function call (form evaluation). Built-in functions
148 usually eat them directly off the stack. For user-lisp functions
149 xleval.c:evfun() pops them off the stack and binds them to the
150 appropriate symbols before beginning execution of the function body
153 xlargstkbase is the zero-near end of argument stack.
154 xlargstktop is the zero-far end of argument stack. Like xlstktop,
155 xlargstktop is a static bounds pointer.
157 *xlsp ("sp"=="stack pointer") is the most recent item on the argument stack.
159 xlfp ("fp"=="frame pointer") is the base of the current stackframe.
166 An xlisp "context" is something like a checkpoint, recording a
167 particular point buried in the execution history so that we can
168 abort/return back to it. Contexts are used to implement call/return,
169 catch/throw, signals, gotos, and breaks. xlcontext points to the
170 chain of active contexts, the top one being the second-newest active
171 context. (The newest -- that is, current -- active context is
172 implemented by the variables xlstack xlenv xlfenv xldenv xlcontext
173 xlargv xlargc xlfp xlsp.) Context records are written by
174 xljump.c:xlbegin() and read by xljump.c:xljump(). Context records
175 are C structures on the C program stack; They are not in the dynamic
176 memory pool or on the lisp execution or argument stacks.
178 To create a context, the function defines a local CONTEXT structure,
179 and then calls xlbegin, passing the address of the structure, the
180 appropriate context flags (ored together, if more than one), and an
181 expression which is the tag for return and throw contexts, and NIL
182 for others. xlend is used to end the context. Within the context, a
183 setjmp using the jump buffer in the context structure provides the
184 mechanism to return to the function.
186 The xljump function takes three arguments, the target context, a mask
187 value (which is returned by the setjmp in the target context), and a
188 return value. The return value is typically NIL, but holds the return
189 value of RETURN and THROW functions for those operations. xljump
190 unwinds the contexts until the target context is found, or an
191 UNWIND-PROTECT context is found. In the latter case, xunwindprotect
192 saves the unwind target so that unwinding can resume.
194 The implemented context flags are:
196 CF_GO -- used by TAGBODY, searched by GO (and the xlgo function).
198 CF_RETURN -- used by named blocks and functions, searched by RETURN.
200 CF_THROW -- used by CATCH, searched by THROW.
202 CF_ERROR -- used by ERRSET to mark context of an error handler.
204 CF_UNWIND -- used by UNWIND-PROTECT.
206 CF_TOPLEVEL -- used at the top-level loop.
208 CF_BRKLEVEL -- used in top level and break loops. Searched on
209 uncaught (with CF_ERROR) errors.
211 CF_CLEANUP -- used in the top level and break loops. Searched when
212 going back a break level. Note that this and CF_BRKLEVEL always appear
213 together in contexts, but imply different functionality when signaled.
215 CF_CONTINUE -- used in break loops to cause continuation.
217 In most cases, the mask value passed to xljump is the context flag.
218 For the break loop, for example, this allows dispatching based on the
219 flag value of BRKLEVEL (which reenters the debug loop), CLEANUP
220 (which returns to the previous break loop), or CONTINUE (which
221 attempts to continue execution). Starting with xlisp-plus 3.0, in the
222 case of the TAGBODY context, the mask value is the index into the
223 TAGBODY of the jump target.
225 What is an environment?
226 -----------------------
228 An environment is basically a store of symbol-value pairs, used to
229 resolve variable references by the lisp program. xlisp maintains
230 three environments, in the global variables xlenv, xlfenv and xldenv.
232 xlenv and xlfenv are linked-list stacks which are pushed when we
233 enter a function and popped when we exit it. We also switch
234 xlenv+xlfenf environments entirely when we begin executing a new
235 closure (user-fn written in lisp). The xlenv environment is enlarged
236 during a LET function (or other special form with value bindings),
237 while the xlfenv environment is enlarged with FLET/MACROLET/LABELS.
239 The xlenv environment is the most heavily used environment. It is
240 used to resolve everyday data references to local variables. It
241 consists of a list of frames (and objects). Each frame is a list of
242 sym-val pairs. In the case of an object, we check all the instance
243 and class variables of the object, then do the same for its
244 superclass, until we run out of superclasses. If the symbol is not
245 found it has not been bound and its global value is used.
247 The xlfenv environment is used to find function values instead of
250 When we send a message, we set xlenv to the value it had when the
251 message CLOSURE was built, then push on (obj msg-class), where
252 msg-class is the [super]class defining the method. (We also set
253 xlfenv to the value xlfenv had when the method was built.) This makes
254 the object instance variables part of the environment, and saves the
255 information needed to correctly resolve references to class variables,
256 and to implement SEND-SUPER.
258 The xldenv environment tracks the old values of global variables
259 which we have changed but intend to restore later to their original
260 values, particularly for variables (symbols) marked as F_SPECIAL, but
261 also for progv. This is also used internally when we bind and unbind
262 s_evalhook and s_applyhook (*EVALHOOK* and *APPLYHOOK*). It is a
263 simple list of sym-val pairs, treated as a stack.
265 These environments are manipulated in C via the xlisp.h macros
266 xlframe(e), xlbind(s,v), xlfbind(s,v), xlpbind(s,v,e), xldbind(s,v),
270 How are multiple return values handled?
271 ---------------------------------------
273 The first value is always returned via the C function mechanism.
274 All return values are passed back in an array xlresults[]. The
275 number of return values are specified in xlnumresults. In order to
276 avoid having to repeat the multiple value code in every function, a
277 flag in the SUBR or FSUBR cell indicates if multiple values are used,
278 and code in xleval.c mimics multiple value functions for single value
279 functions. The function VALUES, defined in xvalues() allows closures
280 to return multiple values.
283 How are xlisp entities stored and identified?
284 ---------------------------------------------
286 Conceptually, xlisp manages memory as a single array of fixed-size
287 objects. Keeping all objects the same size simplifies memory
288 management enormously, since any object can be allocated anywhere, and
289 complex compacting schemes aren't needed. Every LVAL pointer points
290 somewhere in this array. Every xlisp object has the basic format
291 (xldmem.h:typdef struct node)
300 where n_type is one of:
302 FREE A node on the freelist.
303 SUBR A function implemented in C. (Needs evalutated arguments.)
304 FSUBR A special function implemented in C. (Needs unevaluated arguments).
305 CONS A regular lisp cons cell.
307 FLONUM A floating-point number.
309 STREAM An input or output file.
310 CHAR An ascii character.
311 USTREAM An internal stream.
314 OBJECT Any object, including class objects.
315 VECTOR A variable-size array of LVALs.
316 CLOSURE Result of DEFUN or LAMBDA -- a function written in lisp.
318 COMPLEX A complex number
319 BIGNUM A bignum (integer that won't fit a FIXNUM cell)
322 Messages may be sent only to nodes with n_type == OBJECT.
324 Obviously, several of the above types won't fit in a fixed-size
325 two-slot node. The escape is to have them malloc() some memory and
326 have one of the slots point to it -- VECTOR is the archetype. For
327 example, see xldmem.c:newvector(). To some extent, this malloc()
328 hack simply exports the memory- fragmentation problem to the C
329 malloc()/free() routines. However, it helps keep xlisp simple, and
330 it has the happy side-effect of unpinning the body of the vector, so
331 that vectors can easily be expanded and contracted. When the
332 dldmem.c version of the memory manager is used, this memory is
333 managed by xlisp and vector memory is allocated from compressable
336 The garbage collector has special-case code for each of the above node
337 types, so it can find all LVAL slots and recycle any malloc()ed ram
338 when a node is garbage-collected. If the collected node is a STREAM,
339 then it's associated file is closed.
341 Xlisp pre-allocates nodes for all ascii characters, and for small
342 integers. These nodes are never garbage-collected.
344 As a practical matter, allocating all nodes in a single array is not
345 very sensible. Instead, nodes are allocated as needed, in segments of
346 one or two thousand nodes, and the segments linked by a pointer chain
347 rooted at xldmem.c:segs.
351 How are vectors implemented?
352 ----------------------------
354 An xlisp vector is a generic array of LVAL slots. Vectors are also
355 the canonical illustration of xlisp's escape mechanism for node types
356 which need more than two LVAL slots (the maximum possible in the
357 fixed-size nodes in the dynamic memory pool). The node CAR/CDR slots
358 for a vector hold a size field plus a pointer to a malloc()ed ram
359 chunk, which is automatically free()ed when the vector is
362 xldmem.h defines macros for reading and writing vector fields and
363 slots: getsize(), getelement() and setelement(). It also defines
364 macros for accessing each of the other types of xlisp nodes.
368 How are strings implemented?
369 ----------------------------
371 Strings work much like vectors: The node has a pointer to a malloc()ed
372 ram chunk which is automatically free()ed when the string gets
373 garbage-collected. The size field of a string is size in bytes. Unlike C,
374 the null character can be a string constituent.
378 How are various numeric types implemented?
379 __________________________________________
381 There are five numeric types: FIXNUM, BIGNUM, RATIO, FLONUM, and
382 COMPLEX. The first two constitute integers, the first three rational
383 numbers. BIGNUM, RATIO, and COMPLEX are optional types, with BIGNUM and
384 RATIO requiring COMPLEX.
386 Numeric cells are never modified, new cells are created as necessary.
388 FIXNUMs are integers that fit in a FIXTYPE (typically "long"). The
389 car slot (basically) contains the number. There is a pool of
390 preallocated small fixnums (range SFIXMIN to SFIXMAX) that get
391 reused. Only these small FIXNUMs can be tested for equality with EQ.
393 BIGNUMs are used when an integer will not fit in a FIXNUM. A BIGNUM
394 structure is like an character string, however the length is a
395 multiple of sizeof(unsigned short). The first unsigned short cell
396 contains the sign of the number, and the remaining unsigned short
397 cells contain the number, most significant part first. The first
398 numeric cell may be zero. Functions that can create bignum results
399 will convert automatically to FIXNUM if the result fits, however with
400 the function small BIGNUMs can exist, and the number of numeric cells
401 must be at least two.
403 RATIOs have the car field pointing to the numerator cell and the cdr
404 field pointing to the denominator cell. These values must be either
405 both FIXNUM or both BIGNUM.
407 FLONUMs have the car (overflowing into the cdr) field containing the
408 floating point value.
410 COMPLEXs were implemented as a two element array, but now are
411 implemented like ratios. The two elements are either both FLONUMs or
412 both integers (rationals if ratios/bignums allowed).
415 How are streams implemented?
416 ____________________________
418 A file stream node has four fields: a file pointer (type FILEP), a
419 saved lookahead character, flags, and character position within a
422 The file pointer is either a FILE * or the index into the file table.
423 In the preferred file table implementation, the index values CLOSED,
424 STDIN, STDOUT, and CONSOLE are -1, and the first three entries which
425 are forced to stdin, stdout, and stderr, respectively. The file table
426 entry contains the FILE *, the actual file name (full pathname), and
427 for Windows the reopen mode and file position. Under windows, all
428 files are closed when waiting for input in order to be "friendly" to
429 the Windows environment. The extra information allows closing the
430 files and then reopening them exactly as they were.
432 The saved lookahead character allows the peekchar function and
433 internal unreadchar function to work. The value is set to 0 when
434 there is no lookahead character. This does not pose a problem since
435 ASCII files do not have null characters.
437 The flags field tells if the file is open for reading, writing, or
438 both, and also what the last direction was so that an fseek can be
439 performed on direction change. An additional bit keeps track of the
440 file being opened for ASCII or BINARY access.
442 The character position is used for tab calculations.
444 An unnamed stream is simply a TCONC structure. None of the above file
445 information applies to unnamed streams.
447 How are symbols implemented?
448 ----------------------------
450 A symbol is a generic user-visible lisp variable, with separate slots
451 for print name, value, function, and property list. Any or all of
452 these slots (including name) may be NIL. You create a symbol in C by
453 calling "xlmakesym(name)" or "xlenter(name)" (to make a symbol and
454 enter it in the obarray). You create a symbol in xlisp with the READ
455 function, or by calling "(gensym)", or indirectly in various ways.
456 Most of the symbol-specific code in the interpreter is in xlsym.c.
458 Physically, a symbol is implemented like a four- or five-slot vector.
459 A couple of free bits in the node structure are used as flags for
460 F_SPECIAL (special variables) and F_CONSTANT (constants).
462 A symbol is marked as unbound (no value) or undefined (no function)
463 by placing a pointer to symbol s_unbound in the value or function
464 field, respectively. The symbol s_unbound is not interned and is not
465 used other than to set and check for unbound variables and undefined
468 The symbol NIL is statically allocated so its address is a constant.
469 This makes the frequent comparision of a pointer to NIL faster and
470 more compact (in some cases).
472 Random musing: Abstractly, the LISP symbols plus cons cells (etc)
473 constitute a single directed graph, and the symbols mark spots where
474 normal recursive evaluation should stop. Normal lisp programming
475 practice is to have a symbol in every cycle in the graph, so that
476 recursive traversal can be done without MARK bits.
480 How are closures implemented?
481 -----------------------------
483 A closure, the return value from a lambda, is a regular coded-in-lisp
484 fn. Physically, it it implemented like an eleven-slot vector, with the
485 node n_type field hacked to contain CLOSURE instead of VECTOR. The
486 vector slots contain:
488 name symbol -- 1st arg of DEFUN. NIL for LAMBDA closures.
489 type (s_lambda or s_macro).
490 args List of "required" formal arguments (as symbols)
491 oargs List of "optional" args, each like: (name (default specified-p))
492 rest Name of "&rest" formal arg, else NIL.
493 kargs keyword args, each like: ((':foo 'bar default specified-p))
494 aargs &aux vars, each like: (('arg default))
495 body actual code (as lisp list) for fn.
496 env value of xlenv when the closure was built. NIL for macros.
497 fenv value of xlfenv when the closure was built. NIL for macros.
498 lambda The original formal args list in the defining form.
500 The lambda field is for printout purposes. The remaining fields store
501 a predigested version of the formal args list. This is a limited form
502 of compilation: by processing the args list at closure-creation time,
503 we reduce the work needed during calls to the closure.
507 How are objects implemented?
508 ----------------------------
510 An object is implemented like a vector, with the size determined by
511 the number of instance variables. The first slot in the vector points
512 to the class of the object; the remaining slots hold the instance
513 variables for the object. An object needs enough slots to hold all
514 the instance variables defined by its class, *plus* all the instance
515 variables defined by all of its superclasses.
519 How are classes implemented?
520 ----------------------------
522 A class is a specific kind of object, hence has a class pointer plus
523 instance variables. All classes have the following instance variables:
525 MESSAGES A list of (interned-symbol method-closure) pairs.
526 IVARS Instance variable names: A list of interned symbols.
527 CVARS Class variable names: A list of interned symbols.
528 CVALS Class variable values: A vector of values.
529 SUPERCLASS A pointer to the superclass.
530 IVARCNT Number of class instance variables, as a fixnum.
531 IVARTOTAL Total number of instance variables, as a fixnum.
532 PNAME Printname for this class
534 IVARCNT is the count of the number of instance variables defined by
535 our class. IVARTOTAL is the total number of instance variables in an
536 object of this class -- IVARCNT for this class plus the IVARCNTs from
537 all of our superclasses.
542 How is the class hierarchy laid out?
543 ------------------------------------
545 The fundamental objects are the OBJECT and CLASS class objects. (Both
546 are instances of class CLASS, and since CLASSes are a particular kind
547 of OBJECT, both are also objects, with n_type==OBJECT. Bear with me!)
549 OBJECT is the root of the class hierarchy: everything you can send a
550 message to is of type OBJECT. (Vectors, chars, integers and so forth
551 stand outside the object hierarchy -- you can't send messages to them.
552 I'm not sure why Dave did it this way.) OBJECT defines the messages:
554 :isnew -- Does nothing
555 :class -- Returns contents of class-pointer slot.
556 :show -- Prints names of obj, obj->class and instance vars (for debugging).
557 :prin1 -- print the object to argument stream
559 A CLASS is a specialized type of OBJECT (with instance variables like
560 MESSAGES which generic OBJECTs lack), class CLASS hence has class
561 OBJECT as its superclass. The CLASS object defines the messages:
563 :new -- Create new object with self.IVARTOTAL LVAR slots, plus
564 one for the class pointer. Point class slot to self.
565 Set new.n_type char to OBJECT.
566 :isnew -- Fill in IVARS, CVARS, CVALS, SUPERCLASS, IVARCNT and
567 IVARTOTAL, using parameters from :new call. (The
568 :isnew msg inherits the :new msg parameters because
569 the :isnew msg is generated automatically after
570 each :new msg, courtesy of a special hack in
572 :answer -- Add a (msg closure) pair to self.MESSAGES.
576 Here's a figure to summarize the above, with a generic object thrown
577 in for good measure. Note that all instances of CLASS will have a
578 SUPERCLASS pointer, but no non-class object will. Note also that the
579 messages known to an object are those which can be reached by
580 following exactly one Class Ptr and then zero or more Superclass Ptrs.
581 For example, the generic object can respond to :ISNEW, :CLASS and
582 :SHOW, but not to :NEW or :ANSWER. (The functions implementing the
583 given messages are shown in parentheses.)
591 :isnew (xlobj.c:obisnew) <----| class |Class Ptr
592 :class (xlobj.c:obclass) <----| OBJECT |------------+
593 :show (xlobj.c:objshow) <----| | |
596 | generic |Class Ptr | | |
597 | object |----------------+ |Superclass Ptr |
600 :isnew (xlobj.c:clnew) <----| class |Class Ptr |
601 :new (xlobj.c:clisnew) <----| CLASS |--------+ |
602 :answer(xlobj.c:clanswer)<----| | | |
610 Thus, class CLASS inherits the :CLASS and :SHOW messages from class
611 OBJECT, overrides the default :ISNEW message, and provides new the
612 messages :NEW and :ANSWER.
614 New classes are created by (send CLASS :NEW ...) messages. Their
615 Class Ptr will point to CLASS. By default, they will have OBJECT as
616 their superclass, but this can be overridden by the second optional
619 The above basic structure is set up by xlobj.c:xloinit().
623 How do we look up the value of a variable?
624 ------------------------------------------
626 When we're cruising along evaluating an expression and encounter a
627 symbol, the symbol might refer to a global variable, an instance
628 variable, or a class variable in any of our superclasses. Figuring
629 out which means digging throught the environment. The canonical place
630 this happens is in xleval.c:xleval(), which simply passes the buck to
631 xlsym.c:xlgetvalue(), which in turn passes the buck to
632 xlxsym.c:xlxgetvalue(), where the fun of scanning down xlenv begins.
633 The xlenv environment looks something like
635 Backbone Environment frame contents
636 -------- --------------------------
637 xlenv --> frame ((sym val) (sym val) (sym val) ... )
639 object (obj msg-class)
645 The "frame" lines are due to everyday nested constructs like LET
646 expressions, while the "object" lines represent an object environment
647 entered via a message send. xlxgetvalue scans the enviroment left to
648 right, and then top to bottom. It scans down the regular environment
649 frames itself, and calls xlobj.c:xlobjgetvalue() to search the object
652 xlobjgetvalue() first searches for the symbol in the msg-class, then
653 in all the successive superclasses of msg-class. In each class, it
654 first checks the list of instance-variable names in the IVARS slot,
655 then the list of class-variables name in the CVARS slot.
659 How are function calls implemented?
660 -----------------------------------
662 xleval.c contains the central expression-evaluation code.
663 xleval.c:xleval() is the standard top-level entrypoint. The two
664 central functions are xleval.c:xlevform() and xleval.c:evfun().
665 xlevform() can evaluate four kinds of expression nodes:
667 SUBR: A normal primitive fn coded in C. We call evpushargs() to
668 evaluate and push the arguments, then call the primitive.
670 FSUBR: A special primitive fn coded in C, which (like IF) wants its
671 arguments unevaluated. We call pushargs() (instead of evpushargs())
674 CLOSURE: A preprocessed written-in-lisp fn from a DEFUN or LAMBDA. We
675 call evpushargs() and then evfun().
677 CONS: We issue an error if it isn't a LAMBDA, otherwise we call
678 xleval.c:xlclose() to build a CLOSURE from the LAMBDA, and fall into
681 The common thread in all the above cases is that we call evpushargs()
682 or pushargs() to push all the arguments on the evaluation stack,
683 leaving the number and location of the arguments in the global
684 variables xlargc and xlargv. The primitive C functions consume
685 their arguments directly from the argument stack.
687 xleval.c:evfun() evaluates a CLOSURE by:
689 (1) Switching xlenv and xlfenv to the values they had when
690 the CLOSURE was built. (These values are recorded in the CLOSURE.)
692 (2) Binding the arguments to the environment. This involves scanning
693 through the section of the argument stack indicated by xlargc/xlargv,
694 using information from the CLOSURE to resolve keyword arguments
695 correctly and assign appropriate default values to optional arguments,
698 (3) Evaluating the body of the function via xleval.c:xleval().
700 (4) Cleaning up and restoring the original environment.
704 How are message-sends implemented?
705 ----------------------------------
707 We scan the MESSAGES list in the CLASS object of the recipient,
708 looking for a (message-symbol method) pair that matches our message
709 symbol. If necessary, we scan the MESSAGES lists of the recipients
710 superclasses too. (xlobj.c:sendmsg().) Once we find it, we basically
711 do a normal function evaluation. (xlobjl.c:evmethod().) Two oddities:
712 We need to replace the message-symbol by the recipient on the argument
713 stack to make things look normal, and we need to push an 'object'
714 stack entry on the xlenv environment so we remember which class is
715 handling the message.
719 How is garbage collection implemented?
720 --------------------------------------
722 The dynamic memory pool managed by xlisp consists of a chain of memory
723 *segments* rooted at global C variable "segs". Each segment contains
724 an array of "struct node"s plus a pointer to the next segment. Each
725 node contains a n_type field and a MARK bit, which is zero except
726 during garbage collection.
728 Xlisp uses a simple, classical mark-and-sweep garbage collector. When
729 it runs out of memory (fnodes==NIL), it does a recursive traversal
730 setting the MARK flag on all nodes reachable from the obarray, the
731 three environments xlenv/xlfenv/xldenv, and the evaluation and
732 argument stacks. (A "switch" on the n_type field tells us how to find
733 all the LVAL slots in the node (plus associated storage), and a
734 pointer-reversal trick lets us avoid using too much stack space during
735 the traversal.) sweep() then adds all un-MARKed LVALs to fnodes, and
736 clears the MARK bit on the remaining nodes. If this fails to produce
737 enough free nodes, a new segment is malloc()ed.
739 The code to do this stuff is mostly in xldmem.c.
742 How is garbage collection of vectors/strings implemented?
743 ---------------------------------------------------------
745 In the dldmem.c version, vector/string space is allocated from a
746 memory pool maintained by xlisp, rather than relying on the C libary
747 malloc() and free() functions. The pool is a linked list of VSEGMENT
748 with the root vsegments.
750 When no free memory of a size necessary for an allocation request is
751 available, a garbage collection is performed. If there is still no
752 available memory, then a new vector segment is allocated.
754 The garbage collection process compacts the memory in each vector
755 segment. This is an easy process because each allocated vector area
756 is pointed to by only one location (in an LVAL), and a backpointer is
757 maintained from the vector segment to the LVAL. Empty vector segments
758 are returned to the heap using free() if there greater than 50% of
759 the vector space is free. This is done to reduce thrashing while
760 making memory available for allocation to LVALs.
763 How does XLISP initialize?
764 --------------------------
766 A major confusing aspect of xlisp is how it initializes, and how
767 save/restore works. This is a multistep process that must take place
770 When xlisp starts, there is no obarray, and in fact no symbols at
771 all. All initial symbols must be created as part of initialization.
772 In addition the character and small integer cells must be created,
773 and all the C variables that point to xlisp symbols must be
776 Initialization is mostly performed in xlinit.c, from the function
777 xlinit(). This function is called from main() after main parses the
778 command line, calls osinit() to initialize the OS interface, and sets
779 the initial top level context. This initial context causes a fatal
780 error to occur if any error happens during the initialization
783 The first step of xlinit() is to call xlminit(), which is in xldmem.c
784 or dldmem.c. This initializes the pointers for the memory manager,
785 stacks, creates the small integer and character LVAL segments, and
786 creates the NIL symbol by filling in the statically allocated NIL
787 LVAL from one that is temporarily created. This first call of
788 xlmakesym will do the first garbage collection -- all of the roots
789 used for the mark routine have been set so that marking will not
790 occur (there is nothing to mark!). It is important, however, that
791 garbage collection not occur again until initialization is completed.
792 This can be assured by having the allocation segment size, NNODES, be
793 large enough for the entire initialization.
795 The second step in xlinit() is to call xldbug.c:xldinit() to turn
798 At this point, if a restore is to occur from a workspace file, then
799 the restore is attempted. If the restore is sucessful, then
800 initialization is finished. See "How does save/restore work?" which
801 is the next question.
803 If the restore fails or there is no file to restore, an initial
804 workspace must be created. This is done by function initwks(), also
807 Initwks() starts by calling four initialization functions.
809 xlsym.c:xlsinit() creates the symbol *OBARRAY* (and sets the
810 variable obarray to point to it), creates the object array as the
811 value, and enters obarray into the array.
813 xlsymbols() is called next. It enters all of the symbols referenced
814 by the interpreter into the obarray, and saves their addresses in C
815 variables. This function is also called during restores, so it is
816 important that it does not change the value of any symbols where the
817 value would be set by restore. If the unbound indicator symbol does
818 not exist, one is created. Then it puts NIL in the obarray if not
819 already there (NIL being already created), then all the other symbols
820 are added (if they don't exist), and their addresses saved in C
823 This function also initializes constants such as NIL, T, and PI.
824 Because a saved workspace might have a different file stream
825 environment, xlsymbols always initializes the standard file streams
826 based on the current xlisp invocation. It builds the structure
827 property for RANDOM-STATE structures. It then (shudder!) calls two
828 other initialization functions xlobj.c:obsymbols() and ossymbols (in
829 the appropriate *stuff.c file) to enter symbols used in the object
830 feature and operating system specific code, respectively.
832 Returning to initwks(), two aditional initialization routines are
833 called. Xlread.c:xlrinit() initializes the read-table and installs
834 the standard read macros. Xlobj.c:xloinit() creates the class and
837 Since the NIL and *OBARRAY* symbols were created before the unbound
838 marker symbol, initwks sets the function values of these symbols, and
839 of the unbound marker symbol, to be unbound. It then initializes all
840 the global variables. Finally it creates all the builtin function
841 bindings from the funtab table. The synonym functions are created
845 How are workspaces saved and restored?
846 --------------------------------------
848 All the work is done in dlimage.c or xlimage.c, depending on the
849 memory management scheme. The basic trick in a save is that memory
850 locations upon a restore would be entirely different. Because of
851 that, addresses written out are converted into offsets from the start
852 of the segs LVAL segment list. In the restore operation, the offset
853 is converted back into an address; if the offset is larger than the
854 allocated segment memory, additional segment memory is allocated
855 until an address can be calculated.
857 Looking at the save function, xlisave(char *fname), the argument
858 string is taken as the name of a workspace file, and a binary file is
859 created of that name. Then a garbage collection is performed since it
860 would be wasteful to write out garbage.
862 The size of ftab is writen as a validity check, figuring that if the
863 configuration changed, then this value would be different.
865 The offset of the *obarray* symbol is written next, since obarray is
866 crucial to doing a restore -- it is used to get the addresses of all
867 the other symbols used in the interpreter. Since NIL is a statically
868 allocated symbol, the offsets to its function, property list, and
869 printname are written. (I bet you didn't know you could define a
870 function called NIL!)
872 Now the segs are traversed and all nodes are written out. The nodes
873 are written in the format of a one byte tag followed by information
874 dependent on the node type. Since many locations in the segment can
875 be empty, one node type, FREE, has data that sets the next offset in
876 the file, thus allowing skipping of many locations with a single
877 command. The function setoffset(), called before writing tags in the
878 other cases, handles writing the FREE entry. CONS, USTREAM, COMPLEX,
879 and RATIO cells consist of two pointers, which are written after
880 conversion into offsets. For all other node types, the raw
881 information is written by writenode(). This could be optimized since
882 not all information is needed (for instance, the address of arrays
885 A terminator entry (FREE with length of zero) is written, and the
886 segs are traversed again, looking for nodes with attached
887 array/string data and streams. For the types where the attached data
888 is an array, the array elements (pointers) are converted to offsets
889 and written out. For a string, the characters are written. For a
890 stream (assuming FILETABLE is used), the file's name and position are
891 written if it is other than standard input, output, or error (which
892 cannot be saved or restored).
895 Restoring a workspace is somewhat more difficult. The file is opened
896 and checked for validity. Then the old memory image is deleted.
897 During the deletion, any open file streams other than standard input,
898 output, or error are closed. All of the global memory allocation
899 pointers and stacks are reset, just like in initialization. Since the
900 fixnum and character segments are of fixed size and need a known
901 location, there are allocated. Their values, however, will be filled
902 in from the workspace file. This is another wasteful inefficiency,
903 but at least these segments are small.
905 The global pointer obarray is read from the file. As mentioned
906 earlier, the cviptr() function converts the offset in the file into a
907 physical address, allocating more node segments as necessary. The
908 array portion of the NIL symbol is allocated, and its function,
909 property list, and printname pointers are read from the file.
911 Then the node information is read in. For type FREE, the offset is
912 adjusted. For CONS, USTREAM, COMPLEX, and (when bignums defined)
913 RATIO the car and cdr pointers are read, for other types, the LVAL
917 The LVAL segments are scanned, and now the vector/string components
918 of nodes are read. Since the order of nodes is unchanged, the data is
919 read into the correct nodes. For vector types, the size field of the
920 LVAL is consulted, vector space is allocated, and offsets are read
921 from the file and converted into pointers. For strings, the string
922 space is allocated and the string is read. For streams other than
923 standard input, output, or error, the file name and position is read
924 and an attempt is made to open the file. If the file can be opened,
925 then it is positioned.
927 During the scan, for SUBR/FSUBR nodes funtab is consulted and the
928 correct subroutine address is inserted.
930 During the whole process, if a tag is invalid or the file size is not
931 correct, a fatal error occurs.
933 A garbage collection is performed to initialize the free space for
934 future node allocation.
936 Finally xlsymbols() is called, as in the initializaton process, so
937 that the C pointers in the interpreter can be set. This also sets the
938 streams for standard input, output, and error to the correct values.
941 How do I add a new primitive fn to xlisp?
942 -----------------------------------------
944 Add a line to the end of xlftab.c:funtab[], and a function prototype
945 to xlftab.h. This table contains a list of triples:
947 The first element of each triple is the function name as it will
948 appear to the programmer. Make it all upper case.
950 The second element is S (for SUBR) if (like most fns) your function
951 wants its arguments pre-evaluated, else F (for FSUBR).
953 The third element is the name of the C function to call.
955 Remember that your arguments arrive on the xlisp argument rather
956 than via the usual C parameter mechanism.
958 CAUTION: Try to keep your files separate from generic xlisp files, and
959 to minimize the number of changes you make in the generic xlisp files.
960 This way, you'll have an easier time re-installing your changes when
961 new versions of xlisp come out. It's a good idea to put a marker
962 (like a comment with your initials) on each line you change or insert
963 in the generic xlisp fileset. For example, if you are going to add
964 many primitive functions to your xlisp, use an #include file rather
965 than putting them all in xlftab.c.
967 CAUTION: Remember that you usually need to protect the LVAL variables
968 in your function from the garbage-collector. It never hurts to do
969 this, and often produces obscure bugs if you dont. Generic code for
972 /* xlsamplefun - do useless stuff. */
973 /* Called like (samplefun '(a c b) 1 2.0) */
976 /* Variables to hold the arguments: */
977 LVAL list_arg, integer_arg, float_arg;
979 /* Get the arguments, with appropriate errors */
980 /* if any are of the wrong type. Look in */
981 /* xlisp.h for macros to read other types of */
982 /* arguments. Look in xlmath.c for examples */
983 /* of functions which can handle an argument */
984 /* which may be either int or float: */
985 list_arg = xlgalist() ; /* "XLisp Get A LIST" */
986 integer_arg = xlgafixnum(); /* "XLisp Get A FIXNUM" */
987 float_arg = xlgaflonum(); /* "XLisp Get A FLONUM" */
989 /* Issue an error message if there are any extra arguments: */
994 /* Call a separate C function to do the actual */
995 /* work. This way, the main function can */
996 /* be called from both xlisp code and C code. */
997 /* By convention, the name of the xlisp wrapper */
998 /* starts with "xl", and the native C function */
999 /* has the same name minus the "xl" prefix: */
1000 return samplefun( list_arg, integer_arg, float_arg );
1002 LVAL samplefun( list_arg, integer_arg, float_arg )
1003 LVAL list_arg, integer_arg, float_arg;
1005 FIXTYPE val_of_integer_arg;
1006 FLOTYPE val_of_float_arg;
1008 /* Variables which will point to LISP objects: */
1015 /* Protect our internal pointers by */
1016 /* pushing them on the evaluation */
1017 /* stack so the garbage collector */
1018 /* can't recycle them in the middle */
1019 /* of the routine: */
1020 xlstkcheck(3); /* Make sure following xlsave */
1021 /* calls won't overrun stack. */
1022 xlsave(list_ptr); /* Use xlsave1() if you don't */
1023 xlsave(float_ptr);/* do an xlstkcheck(). */
1026 /* Create an internal list structure, protected */
1027 /* against garbage collection until we exit fn: */
1028 list_ptr = cons(list_arg,list_arg);
1030 /* Get the actual values of our fixnum and flonum: */
1031 val_of_integer_arg = getfixnum( integer_arg );
1032 val_of_float_arg = getflonum( float_arg );
1035 /*******************************************/
1036 /* You can have any amount of intermediate */
1037 /* computations at this point in the fn... */
1038 /*******************************************/
1041 /* Make new numeric values to return: */
1042 integer_ptr = cvflonum( val_of_integer_arg * 3 );
1043 float_ptr = cvflonum( val_of_float_arg * 3.0 );
1045 /* Cons it all together to produce a return value: */
1057 /* Restore the stack, cancelling the xlsave()s: */
1058 xlpopn(3); /* Use xlpop() for a single argument.*/