1 .\" $NetBSD
: pxin1
.n
,v
1.2 1998/01/09 06:41:55 perry Exp $
3 .\" Copyright (c
) 1979 The Regents of the University of California
.
4 .\" All rights reserved
.
6 .\" Redistribution
and use
in source
and binary forms
, with or without
7 .\" modification
, are permitted provided that the following conditions
9 .\" 1. Redistributions of source code must retain the above copyright
10 .\" notice
, this list of conditions
and the following disclaimer
.
11 .\" 2. Redistributions
in binary form must reproduce the above copyright
12 .\" notice
, this list of conditions
and the following disclaimer
in the
13 .\" documentation
and/or other materials provided with the distribution
.
14 .\" 3. Neither the name of the University nor the names of its contributors
15 .\" may be used to endorse or promote products derived from
this software
16 .\" without specific prior written permission
.
18 .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS
'' AND
19 .\" ANY EXPRESS OR IMPLIED WARRANTIES
, INCLUDING
, BUT NOT LIMITED TO
, THE
20 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
21 .\" ARE DISCLAIMED
. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
22 .\" FOR ANY DIRECT
, INDIRECT
, INCIDENTAL
, SPECIAL
, EXEMPLARY
, OR CONSEQUENTIAL
23 .\" DAMAGES (INCLUDING
, BUT NOT LIMITED TO
, PROCUREMENT OF SUBSTITUTE GOODS
24 .\" OR SERVICES
; LOSS OF USE
, DATA
, OR PROFITS
; OR BUSINESS INTERRUPTION
)
25 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY
, WHETHER IN CONTRACT
, STRICT
26 .\" LIABILITY
, OR
TORT (INCLUDING NEGLIGENCE OR OTHERWISE
) ARISING IN ANY WAY
27 .\" OUT OF THE USE OF THIS SOFTWARE
, EVEN IF ADVISED OF THE POSSIBILITY OF
30 .\" @
(#)pxin1.n 5.2 (Berkeley) 4/17/91
42 assembly language
, using the
48 are also written
in the
50 systems programming language C
.
52 consists of a main procedure that reads
in the interpreter code
,
53 a main interpreter loop that transfers successively to various
54 code segments implementing the
abstract machine operations
,
55 built
-in procedures
and functions
,
56 and several routines that support the implementation of the
57 Pascal input
-output environment
.
59 The interpreter runs at a fraction of the speed of equivalent
60 compiled C code
, with
this fraction varying from
1/5 to
1/15.
61 The interpreter occupies
18.5K bytes of instruction space
, shared among
62 all processes executing Pascal
, and has
4.6K bytes of data
space (constants
,
63 error messages
, etc
.) a copy of which is allocated to each executing process
.
65 Format of the object file
68 normally interprets the code left
in an object file by a run of the
71 The file
where the translator puts the object originally
, and the most
72 commonly interpreted file
, is called
74 In order that all persons
using
76 share a common text image
, this executable file is
77 a small process that coordinates with the interpreter to start
79 The interpreter code is placed
80 at the end of a special ``header
'' file
and the size of the initialized
81 data area of
this header file is expanded to include
this code
,
82 so that during execution it is located at an
83 easily determined address
in its data space
.
84 When executed
, the object process creates a
86 creates another process by doing a
88 and arranges that the resulting parent process becomes an instance of
90 The child process then writes the interpreter code through
91 the pipe that it has to the
92 interpreter process parent
.
93 When
this process is complete
, the child exits
.
95 The real advantage of
this approach is that it does not require modifications
96 to the shell
, and that the resultant objects are ``
true objects
'' not
97 requiring special treatment
.
98 A simpler mechanism would be to determine the name of the file that was
99 executed
and pass
this to the interpreter
.
100 However it is not possible to determine
this name
103 \
*(dd\ For instance
, if the
105 program is placed
in the directory
107 then when the user types
109 the first argument to the program, nominally the programs name, is
111 While it would be possible to search in the standard place,
112 i.e. the current directory, and the system directories
116 for a corresponding object file,
117 this would be expensive and not guaranteed to succeed.
118 Several shells exist that allow other directories to be searched
119 for commands, and there is,
121 no way to determine what these directories are.
124 General features of object code
126 Pascal object code is relocatable as all addressing references for
127 control transfers within the code are relative.
128 The code consists of instructions interspersed with inline data.
129 All instructions have a length that is an even number of bytes.
130 No variables are kept in the object code area.
132 The first byte of a Pascal interpreter instruction contains an operation
134 This allows a total of 256 major operation codes, and 232 of these are
135 in use in the current
137 The second byte of each interpreter instruction is called the
138 ``sub-operation code'',
141 It contains a small integer that may, for example, be used as a
142 block-structure level for the associated operation.
143 If the instruction can take a longword constant,
144 this constant is often packed into the sub-opcode
145 if it fits into 8 bits and is not zero.
146 A sub-opcode value of zero specifies that the constant would not
147 fit and therefore follows in the next word.
148 This is a space optimization, the value of zero for flagging
149 the longer case being convenient because it is easy to test.
151 Other instruction formats are used.
153 instructions take an offset in the following word,
154 operators that load constants onto the stack
155 take arbitrarily long inline constant values,
156 and many operations deal exclusively with data on the
157 interpreter stack, requiring no inline data.
159 Stack structure of the interpreter
161 The interpreter emulates a stack-structured Pascal machine.
162 The ``load'' instructions put values onto the stack, where all
163 arithmetic operations take place.
164 The ``store'' instructions take values off the stack
165 and place them in an address that is also contained on the stack.
166 The only way to move data or to compute in the machine is with the stack.
168 To make the interpreter operations more powerful
169 and to thereby increase the interpreter speed,
170 the arithmetic operations in the interpreter are ``typed''.
171 That is, length conversion of arithmetic values occurs when they are
172 used in an operation.
173 This eliminates interpreter cycles for length conversion
174 and the associated overhead.
175 For example, when adding an integer that fits in one byte to one that
176 requires four bytes to store, no ``conversion'' operators are required.
177 The one byte integer is loaded onto the stack, followed by the four
178 byte integer, and then an adding operator is used that has, implicit
179 in its definition, the sizes of the arguments.
181 Data types in the interpreter
183 The interpreter deals with several different fundamental data types.
184 In the memory of the machine, 1, 2, and 4 byte integers are supported,
185 with only 2 and 4 byte integers being present on the stack.
186 The interpreter always converts to 4 byte integers when there is a possibility
187 of overflowing the shorter formats.
188 This corresponds to the Pascal language definition of overflow in
189 arithmetic operations that requires that the result be correct
190 if all partial values lie within the bounds of the base integer type:
191 4 byte integer values.
193 Character constants are treated similarly to 1 byte integers for
194 most purposes, as are Boolean values.
195 All enumerated types are treated as integer values of
196 an appropriate length, usually 1 byte.
197 The interpreter also has real numbers, occupying 8 bytes of storage,
198 and sets and strings of varying length.
199 The appropriate operations are included for each data type, such as
200 set union and intersection and an operation to write a string.
204 data formats are supported by the interpreter.
205 The smallest unit of storage occupied by any variable is one byte.
210 thus degenerate to simple memory to memory transfers with
211 no special processing.
215 The interpreter runtime environment uses a stack data area and a heap
216 data area, that are kept at opposite ends of memory
217 and grow towards each other.
218 All global variables and variables local to procedures and functions
219 are kept in the stack area.
220 Dynamically allocated variables and buffers for input/output are
221 allocated in the heap.
223 The addressing of block structured variables is done by using
225 that contains the address of its stack frame
226 for each statically active block.\*(Dg
228 \*(dg\ Here ``block'' is being used to mean any
233 This display is referenced by instructions that load and store
234 variables and maintained by the operations for
235 block entry and exit, and for non-local
241 Three ``global'' variables in the interpreter, in addition to the
249 is a pointer to the display entry for the current block;
252 is the abstract machine location counter;
255 is a register that holds the address of the main interpreter
256 loop so that returning to the loop to fetch the next instruction is
259 The stack frame structure
262 has a stack frame consisting of three parts:
263 a block mark, local variables, and temporary storage for partially
264 evaluated expressions.
265 The stack in the interpreter grows from the high addresses in memory
266 to the low addresses,
267 so that those parts of the stack frame that are ``on the top''
268 of the stack have the most negative offsets from the display
270 The major parts of the stack frame are represented in Figure 1.1.
272 Note that the local variables of each block
273 have negative offsets from the corresponding display entry,
274 the ``first'' local variable having offset `\-2'.
278 The block mark contains the saved information necessary
279 to restore the environment
when the current block exits
.
280 It consists of two parts
.
281 The first
and top
-most part is saved by the
283 instruction
in the interpreter
.
284 This information is not present
for the main program
285 as it is never ``called
''.
286 The second part of the block mark is created by the
288 begin block operator that also allocates
and clears the
289 local variable storage
.
290 The format of these blocks is represented
in Figure
1.2.
294 The data saved by the
296 operator includes the line number
298 of the point of call
,
299 that is printed
if the program execution ends abnormally
;
302 giving the return address
;
303 and the current display entry address
309 begin operator saves the previous display contents at the level
310 of
this block
, so that the display can be restored on block exit
.
311 A pointer to the beginning line number
and the
312 name of
this block is also saved
.
313 This information is stored
in the interpreter object code
in-line after the
316 It is used
in printing a post
-mortem backtrace
.
317 The saved file name
and buffer reference are necessary because of
318 the input
/output structure
319 (this is discussed
in detail
in
320 sections
3.3 and 3.4).
321 The top of stack reference gives the value the stack pointer should
322 have
when there are no expression temporaries on the stack
.
323 It is used
for a consistency check
in the
325 line number operators
in the interpreter
, that occurs before
326 each statement executed
.
327 This helps to
catch bugs
in the interpreter
, that often manifest
328 themselves by leaving the stack non
-empty between statements
.
330 Note that there is no explicit
static link here
.
331 Thus to set up the display correctly after a non
-local
333 statement one must ``unwind
''
334 through all the block marks on the stack to rebuild the display
.
336 Arguments
and return values
338 A function returns its value into a space reserved by the calling
342 are placed on top of
this return area
.
347 calls
, arguments are placed at the end of the expression evaluation area
351 completes
, expression evaluation can continue
352 after popping the arguments to the
355 exactly
as if the function value had been ``loaded
''.
358 are also popped off the stack by the caller
359 after its execution ends
.
362 As a simple example consider the following stack structure
363 for a call to a function
365 of the form ``
f(a
)''.
376 the calling sequence
for this function would be
:
387 Here we use the operator
389 to clear space
for the return value
,
392 on the stack with a ``right value
'' operator
,
396 and can then complete evaluation of the containing expression
.
397 The operations used here will be explained
in section
2.
403 10 \
*bfunction
\fR
f(i
: integer
): real
;
410 would have code sequence
:
426 operator takes
9 bytes of inline data
.
427 The first
byte specifies the
428 length of the function name
.
429 The second longword specifies the
430 amount of local variable storage
, here none
.
431 The succeeding two lines give the line number of the
433 and the name of the block
437 operator places a name pointer
in the block mark
.
440 first takes an address of the
444 using the address of operator
447 The next operation
in the interpretation of
this function is the loading
451 is at the level of the
456 and the first variable
in the local variable area
.
459 completes by assigning the
4 byte integer on the stack to the
8 byte
460 return location
, hence the
462 assignment operator
, and then uses the
464 operator to exit the current block
.
466 The main interpreter loop
468 The main interpreter loop is simply
:
472 \fBcaseb
\fR
(lc
)+,$0
,$255
473 <table of opcode interpreter addresses
>
476 The main opcode is extracted from the first
byte of the instruction
477 and used to index into the table of opcode interpreter addresses
.
478 Control is then transferred to the specified location
.
479 The sub
-opcode may be used to index the display
,
481 or to specify one of several relational operators
.
482 In the cases
where a constant is needed
, but it
483 is not small enough to fit
in the
byte sub
-operator
,
484 a zero is placed there
and the constant follows
in the next word
.
485 Zero is easily tested
for,
486 as the instruction that fetches the
487 sub
-opcode sets the condition code flags
.
497 is all that is needed to effect
this packing of data
.
498 This technique saves space
in the Pascal
502 The address of the instruction at
504 is always contained
in the register variable
506 Thus a return to the main interpreter is simply
:
510 that is both quick
and occupies little space
.
514 Errors during interpretation fall into three classes
:
516 1) Interpreter detected errors
.
517 2) Hardware detected errors
.
521 Interpreter detected errors include I
/O errors
and
522 built
-in function errors
.
523 These errors cause a subroutine call to an error routine
524 with a single parameter indicating the cause of the error
.
525 Hardware errors such
as range errors
and overflows are
526 fielded by a special routine that determines the opcode
527 that caused the error
.
528 It then calls the error routine with an appropriate error
530 External events include interrupts
and system limits such
532 They generate a call to the error routine with an
533 appropriate error code
.
534 The error routine processes the error condition
,
535 printing an appropriate error message
and usually
536 a backtrace from the point of the error
.