share/doc/papers/px/pxin1.n

   1 .\"     $NetBSD: pxin1.n,v 1.2 1998/01/09 06:41:55 perry Exp $
   2 .\"
   3 .\" Copyright (c) 1979 The Regents of the University of California.
   4 .\" All rights reserved.
   5 .\"
   6 .\" Redistribution and use in source and binary forms, with or without
   7 .\" modification, are permitted provided that the following conditions
   8 .\" are met:
   9 .\" 1. Redistributions of source code must retain the above copyright
  10 .\"    notice, this list of conditions and the following disclaimer.
  11 .\" 2. Redistributions in binary form must reproduce the above copyright
  12 .\"    notice, this list of conditions and the following disclaimer in the
  13 .\"    documentation and/or other materials provided with the distribution.
  14 .\" 3. Neither the name of the University nor the names of its contributors
  15 .\"    may be used to endorse or promote products derived from this software
  16 .\"    without specific prior written permission.
  17 .\"
  18 .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
  19 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  20 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  21 .\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
  22 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  23 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  24 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  25 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  26 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  27 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  28 .\" SUCH DAMAGE.
  29 .\"
  30 .\"     @(#)pxin1.n     5.2 (Berkeley) 4/17/91
  31 .\"
  32 .if !\n(xx .so tmac.p
  33 .tr _\(ru
  34 .nr H1 0
  35 .NH
  36 Organization
  37 .PP
  38 Most of
  39 .I px
  40 is written in the
  41 .SM "VAX 11/780"
  42 assembly language, using the
  43 .UX
  44 assembler
  45 .I as.
  46 Portions of
  47 .I px
  48 are also written in the
  49 .UX
  50 systems programming language C.
  51 .I Px
  52 consists of a main procedure that reads in the interpreter code,
  53 a main interpreter loop that transfers successively to various
  54 code segments implementing the abstract machine operations,
  55 built-in procedures and functions,
  56 and several routines that support the implementation of the
  57 Pascal input-output environment.
  58 .PP
  59 The interpreter runs at a fraction of the speed of equivalent
  60 compiled C code, with this fraction varying from 1/5 to 1/15.
  61 The interpreter occupies 18.5K bytes of instruction space, shared among
  62 all processes executing Pascal, and has 4.6K bytes of data space (constants,
  63 error messages, etc.) a copy of which is allocated to each executing process.
  64 .NH 2
  65 Format of the object file
  66 .PP
  67 .I Px
  68 normally interprets the code left in an object file by a run of the
  69 Pascal translator
  70 .I pi.
  71 The file where the translator puts the object originally, and the most
  72 commonly interpreted file, is called
  73 .I obj.
  74 In order that all persons using
  75 .I px
  76 share a common text image, this executable file is
  77 a small process that coordinates with the interpreter to start
  78 execution.
  79 The interpreter code is placed
  80 at the end of a special ``header'' file and the size of the initialized
  81 data area of this header file is expanded to include this code,
  82 so that during execution it is located at an
  83 easily determined address in its data space.
  84 When executed, the object process creates a
  85 .I pipe ,
  86 creates another process by doing a
  87 .I fork ,
  88 and arranges that the resulting parent process becomes an instance of
  89 .I px .
  90 The child process then writes the interpreter code through
  91 the pipe that it has to the
  92 interpreter process parent.
  93 When this process is complete, the child exits.
  94 .PP
  95 The real advantage of this approach is that it does not require modifications
  96 to the shell, and that the resultant objects are ``true objects'' not
  97 requiring special treatment.
  98 A simpler mechanism would be to determine the name of the file that was
  99 executed and pass this to the interpreter.
 100 However it is not possible to determine this name
 101 in all cases.\*(Dd
 102 .FS
 103 \*(dd\ For instance, if the
 104 .I pxref
 105 program is placed in the directory
 106 `/usr/bin'
 107 then when the user types
 108 ``pxref program.p''
 109 the first argument to the program, nominally the programs name, is
 110 ``pxref.''
 111 While it would be possible to search in the standard place,
 112 i.e. the current directory, and the system directories
 113 `/bin'
 114 and
 115 `/usr/bin'
 116 for a corresponding object file,
 117 this would be expensive and not guaranteed to succeed.
 118 Several shells exist that allow other directories to be searched
 119 for commands, and there is,
 120 in general,
 121 no way to determine what these directories are.
 122 .FE
 123 .NH 2
 124 General features of object code
 125 .PP
 126 Pascal object code is relocatable as all addressing references for
 127 control transfers within the code are relative.
 128 The code consists of instructions interspersed with inline data.
 129 All instructions have a length that is an even number of bytes.
 130 No variables are kept in the object code area.
 131 .PP
 132 The first byte of a Pascal interpreter instruction contains an operation
 133 code.
 134 This allows a total of 256 major operation codes, and 232 of these are
 135 in use in the current
 136 .I px.
 137 The second byte of each interpreter instruction is called the
 138 ``sub-operation code'',
 139 or more commonly the
 140 .I sub-opcode.
 141 It contains a small integer that may, for example, be used as a
 142 block-structure level for the associated operation.
 143 If the instruction can take a longword constant,
 144 this constant is often packed into the sub-opcode
 145 if it fits into 8 bits and is not zero.
 146 A sub-opcode value of zero specifies that the constant would not
 147 fit and therefore follows in the next word.
 148 This is a space optimization, the value of zero for flagging
 149 the longer case being convenient because it is easy to test.
 150 .PP
 151 Other instruction formats are used.
 152 The branching
 153 instructions take an offset in the following word,
 154 operators that load constants onto the stack
 155 take arbitrarily long inline constant values,
 156 and many operations deal exclusively with data on the
 157 interpreter stack, requiring no inline data.
 158 .NH 2
 159 Stack structure of the interpreter
 160 .PP
 161 The interpreter emulates a stack-structured Pascal machine.
 162 The ``load'' instructions put values onto the stack, where all
 163 arithmetic operations take place.
 164 The ``store'' instructions take values off the stack
 165 and place them in an address that is also contained on the stack.
 166 The only way to move data or to compute in the machine is with the stack.
 167 .PP
 168 To make the interpreter operations more powerful
 169 and to thereby increase the interpreter speed,
 170 the arithmetic operations in the interpreter are ``typed''.
 171 That is, length conversion of arithmetic values occurs when they are
 172 used in an operation.
 173 This eliminates interpreter cycles for length conversion
 174 and the associated overhead.
 175 For example, when adding an integer that fits in one byte to one that
 176 requires four bytes to store, no ``conversion'' operators are required.
 177 The one byte integer is loaded onto the stack, followed by the four
 178 byte integer, and then an adding operator is used that has, implicit
 179 in its definition, the sizes of the arguments.
 180 .NH 2
 181 Data types in the interpreter
 182 .PP
 183 The interpreter deals with several different fundamental data types.
 184 In the memory of the machine, 1, 2, and 4 byte integers are supported,
 185 with only 2 and 4 byte integers being present on the stack.
 186 The interpreter always converts to 4 byte integers when there is a possibility
 187 of overflowing the shorter formats.
 188 This corresponds to the Pascal language definition of overflow in
 189 arithmetic operations that requires that the result be correct
 190 if all partial values lie within the bounds of the base integer type:
 191 4 byte integer values.
 192 .PP
 193 Character constants are treated similarly to 1 byte integers for
 194 most purposes, as are Boolean values.
 195 All enumerated types are treated as integer values of
 196 an appropriate length, usually 1 byte.
 197 The interpreter also has real numbers, occupying 8 bytes of storage,
 198 and sets and strings of varying length.
 199 The appropriate operations are included for each data type, such as
 200 set union and intersection and an operation to write a string.
 201 .PP
 202 No special
 203 .B packed
 204 data formats are supported by the interpreter.
 205 The smallest unit of storage occupied by any variable is one byte.
 206 The built-ins
 207 .I pack
 208 and
 209 .I unpack
 210 thus degenerate to simple memory to memory transfers with
 211 no special processing.
 212 .NH 2
 213 Runtime environment
 214 .PP
 215 The interpreter runtime environment uses a stack data area and a heap
 216 data area, that are kept at opposite ends of memory
 217 and grow towards each other.
 218 All global variables and variables local to procedures and functions
 219 are kept in the stack area.
 220 Dynamically allocated variables and buffers for input/output are
 221 allocated in the heap.
 222 .PP
 223 The addressing of block structured variables is done by using
 224 a fixed display
 225 that contains the address of its stack frame
 226 for each statically active block.\*(Dg
 227 .FS
 228 \*(dg\ Here ``block'' is being used to mean any
 229 .I procedure ,
 230 .I function
 231 or the main program.
 232 .FE
 233 This display is referenced by instructions that load and store
 234 variables and maintained by the operations for
 235 block entry and exit, and for non-local
 236 .B goto
 237 statements.
 238 .NH 2
 239 Dp, lc, loop
 240 .PP
 241 Three ``global'' variables in the interpreter, in addition to the
 242 ``display'', are the
 243 .I dp,
 244 .I lc,
 245 and the
 246 .I loop.
 247 The
 248 .I dp
 249 is a pointer to the display entry for the current block;
 250 the
 251 .I lc
 252 is the abstract machine location counter;
 253 and the
 254 .I loop
 255 is a register that holds the address of the main interpreter
 256 loop so that returning to the loop to fetch the next instruction is
 257 a fast operation.
 258 .NH 2
 259 The stack frame structure
 260 .PP
 261 Each active block
 262 has a stack frame consisting of three parts:
 263 a block mark, local variables, and temporary storage for partially
 264 evaluated expressions.
 265 The stack in the interpreter grows from the high addresses in memory
 266 to the low addresses,
 267 so that those parts of the stack frame that are ``on the top''
 268 of the stack have the most negative offsets from the display
 269 entry for the block.
 270 The major parts of the stack frame are represented in Figure 1.1.
 271 .so fig1.1.n
 272 Note that the local variables of each block
 273 have negative offsets from the corresponding display entry,
 274 the ``first'' local variable having offset `\-2'.
 275 .NH 2
 276 The block mark
 277 .PP
 278 The block mark contains the saved information necessary
 279 to restore the environment when the current block exits.
 280 It consists of two parts.
 281 The first and top-most part is saved by the
 282 .SM CALL
 283 instruction in the interpreter.
 284 This information is not present for the main program
 285 as it is never ``called''.
 286 The second part of the block mark is created by the
 287 .SM BEG
 288 begin block operator that also allocates and clears the
 289 local variable storage.
 290 The format of these blocks is represented in Figure 1.2.
 291 .sp
 292 .so fig1.2.n
 293 .PP
 294 The data saved by the
 295 .SM CALL
 296 operator includes the line number
 297 .I lino
 298 of the point of call,
 299 that is printed if the program execution ends abnormally;
 300 the location counter
 301 .I lc
 302 giving the return address;
 303 and the current display entry address
 304 .I dp
 305 at the time of call.
 306 .PP
 307 The
 308 .SM BEG
 309 begin operator saves the previous display contents at the level
 310 of this block, so that the display can be restored on block exit.
 311 A pointer to the beginning line number and the
 312 name of this block is also saved.
 313 This information is stored in the interpreter object code in-line after the
 314 .SM BEG
 315 operator.
 316 It is used in printing a post-mortem backtrace.
 317 The saved file name and buffer reference are necessary because of
 318 the input/output structure
 319 (this is discussed in detail in
 320 sections 3.3 and 3.4).
 321 The top of stack reference gives the value the stack pointer should
 322 have when there are no expression temporaries on the stack.
 323 It is used for a consistency check in the
 324 .SM LINO
 325 line number operators in the interpreter, that occurs before
 326 each statement executed.
 327 This helps to catch bugs in the interpreter, that often manifest
 328 themselves by leaving the stack non-empty between statements.
 329 .PP
 330 Note that there is no explicit static link here.
 331 Thus to set up the display correctly after a non-local
 332 .B goto
 333 statement one must ``unwind''
 334 through all the block marks on the stack to rebuild the display.
 335 .NH 2
 336 Arguments and return values
 337 .PP
 338 A function returns its value into a space reserved by the calling
 339 block.
 340 Arguments to a
 341 .B function
 342 are placed on top of this return area.
 343 For both
 344 .B procedure
 345 and
 346 .B function
 347 calls, arguments are placed at the end of the expression evaluation area
 348 of the caller.
 349 When a
 350 .B function
 351 completes, expression evaluation can continue
 352 after popping the arguments to the
 353 .B function
 354 off the stack,
 355 exactly as if the function value had been ``loaded''.
 356 The arguments to a
 357 .B procedure
 358 are also popped off the stack by the caller
 359 after its execution ends.
 360 .KS
 361 .PP
 362 As a simple example consider the following stack structure
 363 for a call to a function
 364 .I f,
 365 of the form ``f(a)''.
 366 .so fig1.3.n
 367 .KE
 368 .PP
 369 If we suppose that
 370 .I f
 371 returns a
 372 .I real
 373 and that
 374 .I a
 375 is an integer,
 376 the calling sequence for this function would be:
 377 .DS
 378 .TS
 379 lp-2w(8) l.
 380 PUSH    \-8
 381 RV4:\fIl        a\fR
 382 CALL:\fIl       f\fR
 383 POP     4
 384 .TE
 385 .DE
 386 .ZP
 387 Here we use the operator
 388 .SM PUSH
 389 to clear space for the return value,
 390 load
 391 .I a
 392 on the stack with a ``right value'' operator,
 393 call the function,
 394 pop off the argument
 395 .I a ,
 396 and can then complete evaluation of the containing expression.
 397 The operations used here will be explained in section 2.
 398 .PP
 399 If the function
 400 .I f
 401 were given by
 402 .LS
 403  10 \*bfunction\fR f(i: integer): real;
 404  11 \*bbegin\fR
 405  12     f := i
 406  13 \*bend\fR;
 407 .LE
 408 then
 409 .I f
 410 would have code sequence:
 411 .DS
 412 .TS
 413 lp-2w(8) l.
 414 BEG:2   0
 415         11
 416         "f"
 417 LV:\fIl\fR      40
 418 RV4:\fIl\fR     32
 419 AS48
 420 END
 421 .TE
 422 .DE
 423 .ZP
 424 Here the
 425 .SM BEG
 426 operator takes 9 bytes of inline data.
 427 The first byte specifies the
 428 length of the function name.
 429 The second longword specifies the
 430 amount of local variable storage, here none.
 431 The succeeding two lines give the line number of the
 432 .B begin
 433 and the name of the block
 434 for error traceback.
 435 The
 436 .SM BEG
 437 operator places a name pointer in the block mark.
 438 The body of the
 439 .B function
 440 first takes an address of the
 441 .B function
 442 result variable
 443 .I f
 444 using the address of operator
 445 .SM LV
 446 .I a .
 447 The next operation in the interpretation of this function is the loading
 448 of the value of
 449 .I i .
 450 .I I
 451 is at the level of the
 452 .B function
 453 .I f ,
 454 here symbolically
 455 .I l,
 456 and the first variable in the local variable area.
 457 The
 458 .B function
 459 completes by assigning the 4 byte integer on the stack to the 8 byte
 460 return location, hence the
 461 .SM AS48
 462 assignment operator, and then uses the
 463 .SM END
 464 operator to exit the current block.
 465 .NH 2
 466 The main interpreter loop
 467 .PP
 468 The main interpreter loop is simply:
 469 .DS
 470 .mD
 471 iloop:
 472         \fBcaseb\fR     (lc)+,$0,$255
 473         <table of opcode interpreter addresses>
 474 .DE
 475 .ZP
 476 The main opcode is extracted from the first byte of the instruction
 477 and used to index into the table of opcode interpreter addresses.
 478 Control is then transferred to the specified location.
 479 The sub-opcode may be used to index the display,
 480 as a small constant,
 481 or to specify one of several relational operators.
 482 In the cases where a constant is needed, but it
 483 is not small enough to fit in the byte sub-operator,
 484 a zero is placed there and the constant follows in the next word.
 485 Zero is easily tested for,
 486 as the instruction that fetches the
 487 sub-opcode sets the condition code flags.
 488 A construction like:
 489 .DS
 490 .mD
 491 _OPER:
 492         \fBcvtbl\fR     (lc)+,r0
 493         \fBbneq\fR      L1
 494         \fBcvtwl\fR     (lc)+,r0
 495 L1:     ...
 496 .DE
 497 is all that is needed to effect this packing of data.
 498 This technique saves space in the Pascal
 499 .I obj
 500 object code.
 501 .PP
 502 The address of the instruction at
 503 .I iloop
 504 is always contained in the register variable
 505 .I loop .
 506 Thus a return to the main interpreter is simply:
 507 .DS
 508         \fBjmp\fR       (loop)
 509 .DE
 510 that is both quick and occupies little space.
 511 .NH 2
 512 Errors
 513 .PP
 514 Errors during interpretation fall into three classes:
 515 .DS
 516 1) Interpreter detected errors.
 517 2) Hardware detected errors.
 518 3) External events.
 519 .DE
 520 .PP
 521 Interpreter detected errors include I/O errors and
 522 built-in function errors.
 523 These errors cause a subroutine call to an error routine
 524 with a single parameter indicating the cause of the error.
 525 Hardware errors such as range errors and overflows are
 526 fielded by a special routine that determines the opcode
 527 that caused the error.
 528 It then calls the error routine with an appropriate error
 529 parameter.
 530 External events include interrupts and system limits such
 531 as available memory.
 532 They generate a call to the error routine with an
 533 appropriate error code.
 534 The error routine processes the error condition,
 535 printing an appropriate error message and usually
 536 a backtrace from the point of the error.