No empty .Rs/.Re
[netbsd-mini2440.git] / share / doc / papers / px / pxin1.n
blob2b18a63d679badb18c9151130ee5af218cb225a9
1 .\" $NetBSD: pxin1.n,v 1.2 1998/01/09 06:41:55 perry Exp $
2 .\"
3 .\" Copyright (c) 1979 The Regents of the University of California.
4 .\" All rights reserved.
5 .\"
6 .\" Redistribution and use in source and binary forms, with or without
7 .\" modification, are permitted provided that the following conditions
8 .\" are met:
9 .\" 1. Redistributions of source code must retain the above copyright
10 .\" notice, this list of conditions and the following disclaimer.
11 .\" 2. Redistributions in binary form must reproduce the above copyright
12 .\" notice, this list of conditions and the following disclaimer in the
13 .\" documentation and/or other materials provided with the distribution.
14 .\" 3. Neither the name of the University nor the names of its contributors
15 .\" may be used to endorse or promote products derived from this software
16 .\" without specific prior written permission.
17 .\"
18 .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
19 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
20 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
21 .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
22 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
23 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
24 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
25 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
26 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
27 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
28 .\" SUCH DAMAGE.
29 .\"
30 .\" @(#)pxin1.n 5.2 (Berkeley) 4/17/91
31 .\"
32 .if !\n(xx .so tmac.p
33 .tr _\(ru
34 .nr H1 0
35 .NH
36 Organization
37 .PP
38 Most of
39 .I px
40 is written in the
41 .SM "VAX 11/780"
42 assembly language, using the
43 .UX
44 assembler
45 .I as.
46 Portions of
47 .I px
48 are also written in the
49 .UX
50 systems programming language C.
51 .I Px
52 consists of a main procedure that reads in the interpreter code,
53 a main interpreter loop that transfers successively to various
54 code segments implementing the abstract machine operations,
55 built-in procedures and functions,
56 and several routines that support the implementation of the
57 Pascal input-output environment.
58 .PP
59 The interpreter runs at a fraction of the speed of equivalent
60 compiled C code, with this fraction varying from 1/5 to 1/15.
61 The interpreter occupies 18.5K bytes of instruction space, shared among
62 all processes executing Pascal, and has 4.6K bytes of data space (constants,
63 error messages, etc.) a copy of which is allocated to each executing process.
64 .NH 2
65 Format of the object file
66 .PP
67 .I Px
68 normally interprets the code left in an object file by a run of the
69 Pascal translator
70 .I pi.
71 The file where the translator puts the object originally, and the most
72 commonly interpreted file, is called
73 .I obj.
74 In order that all persons using
75 .I px
76 share a common text image, this executable file is
77 a small process that coordinates with the interpreter to start
78 execution.
79 The interpreter code is placed
80 at the end of a special ``header'' file and the size of the initialized
81 data area of this header file is expanded to include this code,
82 so that during execution it is located at an
83 easily determined address in its data space.
84 When executed, the object process creates a
85 .I pipe ,
86 creates another process by doing a
87 .I fork ,
88 and arranges that the resulting parent process becomes an instance of
89 .I px .
90 The child process then writes the interpreter code through
91 the pipe that it has to the
92 interpreter process parent.
93 When this process is complete, the child exits.
94 .PP
95 The real advantage of this approach is that it does not require modifications
96 to the shell, and that the resultant objects are ``true objects'' not
97 requiring special treatment.
98 A simpler mechanism would be to determine the name of the file that was
99 executed and pass this to the interpreter.
100 However it is not possible to determine this name
101 in all cases.\*(Dd
103 \*(dd\ For instance, if the
104 .I pxref
105 program is placed in the directory
106 `/usr/bin'
107 then when the user types
108 ``pxref program.p''
109 the first argument to the program, nominally the programs name, is
110 ``pxref.''
111 While it would be possible to search in the standard place,
112 i.e. the current directory, and the system directories
113 `/bin'
115 `/usr/bin'
116 for a corresponding object file,
117 this would be expensive and not guaranteed to succeed.
118 Several shells exist that allow other directories to be searched
119 for commands, and there is,
120 in general,
121 no way to determine what these directories are.
123 .NH 2
124 General features of object code
126 Pascal object code is relocatable as all addressing references for
127 control transfers within the code are relative.
128 The code consists of instructions interspersed with inline data.
129 All instructions have a length that is an even number of bytes.
130 No variables are kept in the object code area.
132 The first byte of a Pascal interpreter instruction contains an operation
133 code.
134 This allows a total of 256 major operation codes, and 232 of these are
135 in use in the current
136 .I px.
137 The second byte of each interpreter instruction is called the
138 ``sub-operation code'',
139 or more commonly the
140 .I sub-opcode.
141 It contains a small integer that may, for example, be used as a
142 block-structure level for the associated operation.
143 If the instruction can take a longword constant,
144 this constant is often packed into the sub-opcode
145 if it fits into 8 bits and is not zero.
146 A sub-opcode value of zero specifies that the constant would not
147 fit and therefore follows in the next word.
148 This is a space optimization, the value of zero for flagging
149 the longer case being convenient because it is easy to test.
151 Other instruction formats are used.
152 The branching
153 instructions take an offset in the following word,
154 operators that load constants onto the stack
155 take arbitrarily long inline constant values,
156 and many operations deal exclusively with data on the
157 interpreter stack, requiring no inline data.
158 .NH 2
159 Stack structure of the interpreter
161 The interpreter emulates a stack-structured Pascal machine.
162 The ``load'' instructions put values onto the stack, where all
163 arithmetic operations take place.
164 The ``store'' instructions take values off the stack
165 and place them in an address that is also contained on the stack.
166 The only way to move data or to compute in the machine is with the stack.
168 To make the interpreter operations more powerful
169 and to thereby increase the interpreter speed,
170 the arithmetic operations in the interpreter are ``typed''.
171 That is, length conversion of arithmetic values occurs when they are
172 used in an operation.
173 This eliminates interpreter cycles for length conversion
174 and the associated overhead.
175 For example, when adding an integer that fits in one byte to one that
176 requires four bytes to store, no ``conversion'' operators are required.
177 The one byte integer is loaded onto the stack, followed by the four
178 byte integer, and then an adding operator is used that has, implicit
179 in its definition, the sizes of the arguments.
180 .NH 2
181 Data types in the interpreter
183 The interpreter deals with several different fundamental data types.
184 In the memory of the machine, 1, 2, and 4 byte integers are supported,
185 with only 2 and 4 byte integers being present on the stack.
186 The interpreter always converts to 4 byte integers when there is a possibility
187 of overflowing the shorter formats.
188 This corresponds to the Pascal language definition of overflow in
189 arithmetic operations that requires that the result be correct
190 if all partial values lie within the bounds of the base integer type:
191 4 byte integer values.
193 Character constants are treated similarly to 1 byte integers for
194 most purposes, as are Boolean values.
195 All enumerated types are treated as integer values of
196 an appropriate length, usually 1 byte.
197 The interpreter also has real numbers, occupying 8 bytes of storage,
198 and sets and strings of varying length.
199 The appropriate operations are included for each data type, such as
200 set union and intersection and an operation to write a string.
202 No special
203 .B packed
204 data formats are supported by the interpreter.
205 The smallest unit of storage occupied by any variable is one byte.
206 The built-ins
207 .I pack
209 .I unpack
210 thus degenerate to simple memory to memory transfers with
211 no special processing.
212 .NH 2
213 Runtime environment
215 The interpreter runtime environment uses a stack data area and a heap
216 data area, that are kept at opposite ends of memory
217 and grow towards each other.
218 All global variables and variables local to procedures and functions
219 are kept in the stack area.
220 Dynamically allocated variables and buffers for input/output are
221 allocated in the heap.
223 The addressing of block structured variables is done by using
224 a fixed display
225 that contains the address of its stack frame
226 for each statically active block.\*(Dg
228 \*(dg\ Here ``block'' is being used to mean any
229 .I procedure ,
230 .I function
231 or the main program.
233 This display is referenced by instructions that load and store
234 variables and maintained by the operations for
235 block entry and exit, and for non-local
236 .B goto
237 statements.
238 .NH 2
239 Dp, lc, loop
241 Three ``global'' variables in the interpreter, in addition to the
242 ``display'', are the
243 .I dp,
244 .I lc,
245 and the
246 .I loop.
248 .I dp
249 is a pointer to the display entry for the current block;
251 .I lc
252 is the abstract machine location counter;
253 and the
254 .I loop
255 is a register that holds the address of the main interpreter
256 loop so that returning to the loop to fetch the next instruction is
257 a fast operation.
258 .NH 2
259 The stack frame structure
261 Each active block
262 has a stack frame consisting of three parts:
263 a block mark, local variables, and temporary storage for partially
264 evaluated expressions.
265 The stack in the interpreter grows from the high addresses in memory
266 to the low addresses,
267 so that those parts of the stack frame that are ``on the top''
268 of the stack have the most negative offsets from the display
269 entry for the block.
270 The major parts of the stack frame are represented in Figure 1.1.
271 .so fig1.1.n
272 Note that the local variables of each block
273 have negative offsets from the corresponding display entry,
274 the ``first'' local variable having offset `\-2'.
275 .NH 2
276 The block mark
278 The block mark contains the saved information necessary
279 to restore the environment when the current block exits.
280 It consists of two parts.
281 The first and top-most part is saved by the
282 .SM CALL
283 instruction in the interpreter.
284 This information is not present for the main program
285 as it is never ``called''.
286 The second part of the block mark is created by the
287 .SM BEG
288 begin block operator that also allocates and clears the
289 local variable storage.
290 The format of these blocks is represented in Figure 1.2.
292 .so fig1.2.n
294 The data saved by the
295 .SM CALL
296 operator includes the line number
297 .I lino
298 of the point of call,
299 that is printed if the program execution ends abnormally;
300 the location counter
301 .I lc
302 giving the return address;
303 and the current display entry address
304 .I dp
305 at the time of call.
308 .SM BEG
309 begin operator saves the previous display contents at the level
310 of this block, so that the display can be restored on block exit.
311 A pointer to the beginning line number and the
312 name of this block is also saved.
313 This information is stored in the interpreter object code in-line after the
314 .SM BEG
315 operator.
316 It is used in printing a post-mortem backtrace.
317 The saved file name and buffer reference are necessary because of
318 the input/output structure
319 (this is discussed in detail in
320 sections 3.3 and 3.4).
321 The top of stack reference gives the value the stack pointer should
322 have when there are no expression temporaries on the stack.
323 It is used for a consistency check in the
324 .SM LINO
325 line number operators in the interpreter, that occurs before
326 each statement executed.
327 This helps to catch bugs in the interpreter, that often manifest
328 themselves by leaving the stack non-empty between statements.
330 Note that there is no explicit static link here.
331 Thus to set up the display correctly after a non-local
332 .B goto
333 statement one must ``unwind''
334 through all the block marks on the stack to rebuild the display.
335 .NH 2
336 Arguments and return values
338 A function returns its value into a space reserved by the calling
339 block.
340 Arguments to a
341 .B function
342 are placed on top of this return area.
343 For both
344 .B procedure
346 .B function
347 calls, arguments are placed at the end of the expression evaluation area
348 of the caller.
349 When a
350 .B function
351 completes, expression evaluation can continue
352 after popping the arguments to the
353 .B function
354 off the stack,
355 exactly as if the function value had been ``loaded''.
356 The arguments to a
357 .B procedure
358 are also popped off the stack by the caller
359 after its execution ends.
362 As a simple example consider the following stack structure
363 for a call to a function
364 .I f,
365 of the form ``f(a)''.
366 .so fig1.3.n
369 If we suppose that
370 .I f
371 returns a
372 .I real
373 and that
374 .I a
375 is an integer,
376 the calling sequence for this function would be:
379 lp-2w(8) l.
380 PUSH \-8
381 RV4:\fIl a\fR
382 CALL:\fIl f\fR
383 POP 4
387 Here we use the operator
388 .SM PUSH
389 to clear space for the return value,
390 load
391 .I a
392 on the stack with a ``right value'' operator,
393 call the function,
394 pop off the argument
395 .I a ,
396 and can then complete evaluation of the containing expression.
397 The operations used here will be explained in section 2.
399 If the function
400 .I f
401 were given by
403 10 \*bfunction\fR f(i: integer): real;
404 11 \*bbegin\fR
405 12 f := i
406 13 \*bend\fR;
408 then
409 .I f
410 would have code sequence:
413 lp-2w(8) l.
414 BEG:2 0
417 LV:\fIl\fR 40
418 RV4:\fIl\fR 32
419 AS48
424 Here the
425 .SM BEG
426 operator takes 9 bytes of inline data.
427 The first byte specifies the
428 length of the function name.
429 The second longword specifies the
430 amount of local variable storage, here none.
431 The succeeding two lines give the line number of the
432 .B begin
433 and the name of the block
434 for error traceback.
436 .SM BEG
437 operator places a name pointer in the block mark.
438 The body of the
439 .B function
440 first takes an address of the
441 .B function
442 result variable
443 .I f
444 using the address of operator
445 .SM LV
446 .I a .
447 The next operation in the interpretation of this function is the loading
448 of the value of
449 .I i .
450 .I I
451 is at the level of the
452 .B function
453 .I f ,
454 here symbolically
455 .I l,
456 and the first variable in the local variable area.
458 .B function
459 completes by assigning the 4 byte integer on the stack to the 8 byte
460 return location, hence the
461 .SM AS48
462 assignment operator, and then uses the
463 .SM END
464 operator to exit the current block.
465 .NH 2
466 The main interpreter loop
468 The main interpreter loop is simply:
471 iloop:
472 \fBcaseb\fR (lc)+,$0,$255
473 <table of opcode interpreter addresses>
476 The main opcode is extracted from the first byte of the instruction
477 and used to index into the table of opcode interpreter addresses.
478 Control is then transferred to the specified location.
479 The sub-opcode may be used to index the display,
480 as a small constant,
481 or to specify one of several relational operators.
482 In the cases where a constant is needed, but it
483 is not small enough to fit in the byte sub-operator,
484 a zero is placed there and the constant follows in the next word.
485 Zero is easily tested for,
486 as the instruction that fetches the
487 sub-opcode sets the condition code flags.
488 A construction like:
491 _OPER:
492 \fBcvtbl\fR (lc)+,r0
493 \fBbneq\fR L1
494 \fBcvtwl\fR (lc)+,r0
495 L1: ...
497 is all that is needed to effect this packing of data.
498 This technique saves space in the Pascal
499 .I obj
500 object code.
502 The address of the instruction at
503 .I iloop
504 is always contained in the register variable
505 .I loop .
506 Thus a return to the main interpreter is simply:
508 \fBjmp\fR (loop)
510 that is both quick and occupies little space.
511 .NH 2
512 Errors
514 Errors during interpretation fall into three classes:
516 1) Interpreter detected errors.
517 2) Hardware detected errors.
518 3) External events.
521 Interpreter detected errors include I/O errors and
522 built-in function errors.
523 These errors cause a subroutine call to an error routine
524 with a single parameter indicating the cause of the error.
525 Hardware errors such as range errors and overflows are
526 fielded by a special routine that determines the opcode
527 that caused the error.
528 It then calls the error routine with an appropriate error
529 parameter.
530 External events include interrupts and system limits such
531 as available memory.
532 They generate a call to the error routine with an
533 appropriate error code.
534 The error routine processes the error condition,
535 printing an appropriate error message and usually
536 a backtrace from the point of the error.