Doc/qua.tex

   1 \documentstyle[11pt]{article}
   2 \newcommand{\Cpp}{C\protect\raisebox{.18ex}{++}}
   3
   4 \title{
   5 Interactively Testing Remote Servers Using the Python Programming Language
   6 }
   7
   8 \author{
   9         Guido van Rossum \\
  10         Dept. AA, CWI, P.O. Box 94079 \\
  11         1090 GB Amsterdam, The Netherlands \\
  12         E-mail: {\tt guido@cwi.nl}
  13 \and
  14         Jelke de Boer \\
  15         HIO Enschede; P.O.Box 1326 \\
  16         7500 BH  Enschede, The Netherlands
  17 }
  18
  19 \begin{document}
  20
  21 \maketitle
  22
  23 \begin{abstract}
  24 This paper describes how two tools that were developed quite
  25 independently gained in power by a well-designed connection between
  26 them.  The tools are Python, an interpreted prototyping language, and
  27 AIL, a Remote Procedure Call stub generator.  The context is Amoeba, a
  28 well-known distributed operating system developed jointly by the Free
  29 University and CWI in Amsterdam.
  30
  31 As a consequence of their integration, both tools have profited:
  32 Python gained usability when used with Amoeba --- for which it was not
  33 specifically developed --- and AIL users now have a powerful
  34 interactive tool to test servers and to experiment with new
  35 client/server interfaces.%
  36 \footnote{
  37 An earlier version of this paper was presented at the Spring 1991
  38 EurOpen Conference in Troms{\o} under the title ``Linking a Stub
  39 Generator (AIL) to a Prototyping Language (Python).''
  40 }
  41 \end{abstract}
  42
  43 \section{Introduction}
  44
  45 Remote Procedure Call (RPC) interfaces, used in distributed systems
  46 like Amoeba
  47 \cite{Amoeba:IEEE,Amoeba:CACM},
  48 have a much more concrete character than local procedure call
  49 interfaces in traditional systems.  Because clients and servers may
  50 run on different machines, with possibly different word size, byte
  51 order, etc., much care is needed to describe interfaces exactly and to
  52 implement them in such a way that they continue to work when a client
  53 or server is moved to a different machine.  Since machines may fail
  54 independently, error handling must also be treated more carefully.
  55
  56 A common approach to such problems is to use a {\em stub generator}.
  57 This is a program that takes an interface description and transforms
  58 it into functions that must be compiled and linked with client and
  59 server applications.  These functions are called by the application
  60 code to take care of details of interfacing to the system's RPC layer,
  61 to implement transformations between data representations of different
  62 machines, to check for errors, etc.  They are called `stubs' because
  63 they don't actually perform the action that they are called for but
  64 only relay the parameters to the server
  65 \cite{RPC}.
  66
  67 Amoeba's stub generator is called AIL, which stands for Amoeba
  68 Interface Language
  69 \cite{AIL}.
  70 The first version of AIL generated only C functions, but an explicit
  71 goal of AIL's design was {\em retargetability}: it should be possible
  72 to add back-ends that generate stubs for different languages from the
  73 same interface descriptions.  Moreover, the stubs generated by
  74 different back-ends must be {\em interoperable}: a client written in
  75 Modula-3, say, should be able to use a server written in C, and vice
  76 versa.
  77
  78 This interoperability is the key to the success of the marriage
  79 between AIL and Python.  Python is a versatile interpreted language
  80 developed by the first author.  Originally intended as an alternative
  81 for the kind of odd jobs that are traditionally solved by a mixture of
  82 shell scripts, manually given shell commands, and an occasional ad hoc
  83 C program, Python has evolved into a general interactive prototyping
  84 language.  It has been applied to a wide range of problems, from
  85 replacements for large shell scripts to fancy graphics demos and
  86 multimedia applications.
  87
  88 One of Python's strengths is the ability for the user to type in some
  89 code and immediately run it: no compilation or linking is necessary.
  90 Interactive performance is further enhanced by Python's concise, clear
  91 syntax, its very-high-level data types, and its lack of declarations
  92 (which is compensated by run-time type checking).  All this makes
  93 programming in Python feel like a leisure trip compared to the hard
  94 work involved in writing and debugging even a smallish C program.
  95
  96 It should be clear by now that Python will be the ideal tool to test
  97 servers and their interfaces.  Especially during the development of a
  98 complex server, one often needs to generate test requests on an ad hoc
  99 basis, to answer questions like ``what happens if request X arrives
 100 when the server is in state Y,'' to test the behavior of the server
 101 with requests that touch its limitations, to check server responses to
 102 all sorts of wrong requests, etc.  Python's ability to immediately
 103 execute `improvised' code makes it a much better tool for this
 104 situation than C.
 105
 106 The link to AIL extends Python with the necessary functionality to
 107 connect to arbitrary servers, making the server testbed sketched above
 108 a reality.  Python's high-level data types, general programming
 109 features, and system interface ensure that it has all the power and
 110 flexibility needed for the job.
 111
 112 One could go even further than this.  Current distributed operating
 113 systems, based on client-server interaction, all lack a good command
 114 language or `shell' to give adequate access to available services.
 115 Python has considerable potential for becoming such a shell.
 116
 117 \subsection{Overview of this Paper}
 118
 119 The rest of this paper contains three major sections and a conclusion.
 120 First an overview of the Python programming language is given.  Next
 121 comes a short description of AIL, together with some relevant details
 122 about Amoeba.  Finally, the design and construction of the link
 123 between Python and AIL is described in much detail.  The conclusion
 124 looks back at the work and points out weaknesses and strengths of
 125 Python and AIL that were discovered in the process.
 126
 127 \section{An Overview of Python}
 128
 129 Python%
 130 \footnote{
 131 Named after the funny TV show, not the nasty reptile.
 132 }
 133 owes much to ABC
 134 \cite{ABC},
 135 a language developed at CWI as a programming language for non-expert
 136 computer users.  Python borrows freely from ABC's syntax and data
 137 types, but adds modules, exceptions and classes, extensibility, and
 138 the ability to call system functions.  The concepts of modules,
 139 exceptions and (to some extent) classes are influenced strongly by
 140 their occurrence in Modula-3
 141 \cite{Modula-3}.
 142
 143 Although Python resembles ABC in many ways, there is a a clear
 144 difference in application domain.  ABC is intended to be the only
 145 programming language for those who use a computer as a tool, but
 146 occasionally need to write a program.  For this reason, ABC is not
 147 just a programming language but also a programming environment, which
 148 comes with an integrated syntax-directed editor and some source
 149 manipulation commands.  Python, on the other hand, aims to be a tool
 150 for professional (system) programmers, for whom having a choice of
 151 languages with different feature sets makes it possible to choose `the
 152 right tool for the job.'  The features added to Python make it more
 153 useful than ABC in an environment where access to system functions
 154 (such as file and directory manipulations) are common.  They also
 155 support the building of larger systems and libraries.  The Python
 156 implementation offers little in the way of a programming environment,
 157 but is designed to integrate seamlessly with existing programming
 158 environments (e.g. UNIX and Emacs).
 159
 160 Perhaps the best introduction to Python is a short example.  The
 161 following is a complete Python program to list the contents of a UNIX
 162 directory.
 163 \begin{verbatim}
 164 import sys, posix
 165
 166 def ls(dirname):    # Print sorted directory contents
 167     names = posix.listdir(dirname)
 168     names.sort()
 169     for name in names:
 170         if name[0] != '.': print name
 171
 172 ls(sys.argv[1])
 173 \end{verbatim}
 174 The largest part of this program, in the middle starting with {\tt
 175 def}, is a function definition.  It defines a function named {\tt ls}
 176 with a single parameter called {\tt dirname}.  (Comments in Python
 177 start with `\#' and extend to the end of the line.)  The function body
 178 is indented: Python uses indentation for statement grouping instead of
 179 braces or begin/end keywords.  This is shorter to type and avoids
 180 frustrating mismatches between the perception of grouping by the user
 181 and the parser.  Python accepts one statement per line; long
 182 statements may be broken in pieces using the standard backslash
 183 convention.  If the body of a compound statement is a single, simple
 184 statement, it may be placed on the same line as the head.
 185
 186 The first statement of the function body calls the function {\tt
 187 listdir} defined in the module {\tt posix}.  This function returns a
 188 list of strings representing the contents of the directory name passed
 189 as a string argument, here the argument {\tt dirname}.  If {\tt
 190 dirname} were not a valid directory name, or perhaps not even a
 191 string, {\tt listdir} would raise an exception and the next statement
 192 would never be reached.  (Exceptions can be caught in Python; see
 193 later.)  Assuming {\tt listdir} returns normally, its result is
 194 assigned to the local variable {\tt names}.
 195
 196 The second statement calls the method {\tt sort} of the variable {\tt
 197 names}.  This method is defined for all lists in Python and does the
 198 obvious thing: the elements of the list are reordered according to
 199 their natural ordering relationship.  Since in our example the list
 200 contains strings, they are sorted in ascending \ASCII{} order.
 201
 202 The last two lines of the function contain a loop that prints all
 203 elements of the list whose first character isn't a period.  In each
 204 iteration, the {\tt for} statement assigns an element of the list to
 205 the local variable {\tt name}.  The {\tt print} statement is intended
 206 for simple-minded output; more elaborate formatting is possible with
 207 Python's string handling functions.
 208
 209 The other two parts of the program are easily explained.  The first
 210 line is an {\tt import} statement that tells the interpreter to import
 211 the modules {\tt sys} and {\tt posix}.  As it happens these are both
 212 built into the interpreter.  Importing a module (built-in or
 213 otherwise) only makes the module name available in the current scope;
 214 functions and data defined in the module are accessed through the dot
 215 notation as in {\tt posix.listdir}.  The scope rules of Python are
 216 such that the imported module name {\tt posix} is also available in
 217 the function {\tt ls} (this will be discussed in more detail later).
 218
 219 Finally, the last line of the program calls the {\tt ls} function with
 220 a definite argument.  It must be last since Python objects must be
 221 defined before they can be used; in particular, the function {\tt ls}
 222 must be defined before it can be called.  The argument to {\tt ls} is
 223 {\tt sys.argv[1]}, which happens to be the Python equivalent of {\tt
 224 \$1} in a shell script or {\tt argv[1]} in a C program's {\tt main}
 225 function.
 226
 227 \subsection{Python Data Types}
 228
 229 (This and the following subsections describe Python in quite a lot of
 230 detail.  If you are more interested in AIL, Amoeba and how they are
 231 linked with Python, you can skip to section 3 now.)
 232
 233 Python's syntax may not have big surprises (which is exactly as it
 234 should be), but its data types are quite different from what is found
 235 in languages like C, Ada or Modula-3.  All data types in Python, even
 236 integers, are `objects'.  All objects participate in a common garbage
 237 collection scheme (currently implemented using reference counting).
 238 Assignment is cheap, independent of object size and type: only a
 239 pointer to the assigned object is stored in the assigned-to variable.
 240 No type checking is performed on assignment; only specific operations
 241 like addition test for particular operand types.
 242
 243 The basic object types in Python are numbers, strings, tuples, lists
 244 and dictionaries.  Some other object types are open files, functions,
 245 modules, classes, and class instances; even types themselves are
 246 represented as objects.  Extension modules written in C can define
 247 additional object types; examples are objects representing windows and
 248 Amoeba capabilities.  Finally, the implementation itself makes heavy
 249 use of objects, and defines some private object types that aren't
 250 normally visible to the user.  There is no explicit pointer type in
 251 Python.
 252
 253 {\em Numbers}, both integers and floating point, are pretty
 254 straightforward.  The notation for numeric literals is the same as in
 255 C, including octal and hexadecimal integers; precision is the same as
 256 {\tt long} or {\tt double} in C\@.  A third numeric type, `long
 257 integer', written with an `L' suffix, can be used for arbitrary
 258 precision calculations.  All arithmetic, shifting and masking
 259 operations from C are supported.
 260
 261 {\em Strings} are `primitive' objects just like numbers.  String
 262 literals are written between single quotes, using similar escape
 263 sequences as in C\@.  Operations are built into the language to
 264 concatenate and to replicate strings, to extract substrings, etc.
 265 There is no limit to the length of the strings created by a program.
 266 There is no separate character data type; strings of length one do
 267 nicely.
 268
 269 {\em Tuples} are a way to `pack' small amounts of heterogeneous data
 270 together and carry them around as a unit.  Unlike structure members in
 271 C, tuple items are nameless.  Packing and unpacking assignments allow
 272 access to the items, for example:
 273 \begin{verbatim}
 274 x = 'Hi', (1, 2), 'World'   # x is a 3-item tuple,
 275                             # its middle item is (1, 2)
 276 p, q, r = x                 # unpack x into p, q and r
 277 a, b = q                    # unpack q into a and b
 278 \end{verbatim}
 279 A combination of packing and unpacking assignment can be used as
 280 parallel assignment, and is idiom for permutations, e.g.:
 281 \begin{verbatim}
 282 p, q = q, p                 # swap without temporary
 283 a, b, c = b, c, a           # cyclic permutation
 284 \end{verbatim}
 285 Tuples are also used for function argument lists if there is more than
 286 one argument.  A tuple object, once created, cannot be modified; but
 287 it is easy enough to unpack it and create a new, modified tuple from
 288 the unpacked items and assign this to the variable that held the
 289 original tuple object (which will then be garbage-collected).
 290
 291 {\em Lists} are array-like objects.  List items may be arbitrary
 292 objects and can be accessed and changed using standard subscription
 293 notation.  Lists support item insertion and deletion, and can
 294 therefore be used as queues, stacks etc.; there is no limit to their
 295 size.
 296
 297 Strings, tuples and lists together are {\em sequence} types.  These
 298 share a common notation for generic operations on sequences such as
 299 subscription, concatenation, slicing (taking subsequences) and
 300 membership tests.  As in C, subscripts start at 0.
 301
 302 {\em Dictionaries} are `mappings' from one domain to another.  The
 303 basic operations on dictionaries are item insertion, extraction and
 304 deletion, using subscript notation with the key as subscript.  (The
 305 current implementation allows only strings in the key domain, but a
 306 future version of the language may remove this restriction.)
 307
 308 \subsection{Statements}
 309
 310 Python has various kinds of simple statements, such as assignments
 311 and {\tt print} statements, and several kinds of compound statements,
 312 like {\tt if} and {\tt for} statements.  Formally, function
 313 definitions and {\tt import} statements are also statements, and there
 314 are no restrictions on the ordering of statements or their nesting:
 315 {\tt import} may be used inside a function, functions may be defined
 316 conditionally using an {\tt if} statement, etc.  The effect of a
 317 declaration-like statement takes place only when it is executed.
 318
 319 All statements except assignments and expression statements begin with
 320 a keyword: this makes the language easy to parse.  An overview of the
 321 most common statement forms in Python follows.
 322
 323 An {\em assignment} has the general form
 324 \vspace{\itemsep}
 325
 326 \noindent
 327 {\em variable $=$ variable $= ... =$ variable $=$ expression}
 328 \vspace{\itemsep}
 329
 330 It assigns the value of the expression to all listed variables.  (As
 331 shown in the section on tuples, variables and expressions can in fact
 332 be comma-separated lists.)  The assignment operator is not an
 333 expression operator; there are no horrible things in Python like
 334 \begin{verbatim}
 335 while (p = p->next) { ... }
 336 \end{verbatim}
 337 Expression syntax is mostly straightforward and will not be explained
 338 in detail here.
 339
 340 An {\em expression statement} is just an expression on a line by
 341 itself.  This writes the value of the expression to standard output,
 342 in a suitably unambiguous way, unless it is a `procedure call' (a
 343 function call that returns no value).  Writing the value is useful
 344 when Python is used in `calculator mode', and reminds the programmer
 345 not to ignore function results.
 346
 347 The {\tt if} statement allows conditional execution.  It has optional
 348 {\tt elif} and {\tt else} parts; a construct like {\tt
 349 if...elif...elif...elif...else} can be used to compensate for the
 350 absence of a {\em switch} or {\em case} statement.
 351
 352 Looping is done with {\tt while} and {\tt for} statements.  The latter
 353 (demonstrated in the `ls' example earlier) iterates over the elements
 354 of a `sequence' (see the discussion of data types below).  It is
 355 possible to terminate a loop with a {\tt break} statement or to start
 356 the next iteration with {\tt continue}.  Both looping statements have
 357 an optional {\tt else} clause which is executed after the loop is
 358 terminated normally, but skipped when it is terminated by {\tt break}.
 359 This can be handy for searches, to handle the case that the item is
 360 not found.
 361
 362 Python's {\em exception} mechanism is modelled after that of Modula-3.
 363 Exceptions are raised by the interpreter when an illegal operation is
 364 tried.  It is also possible to explicitly raise an exception with the
 365 {\tt raise} statement:
 366 \vspace{\itemsep}
 367
 368 \noindent
 369 {\tt raise {\em expression}, {\em expression}}
 370 \vspace{\itemsep}
 371
 372 The first expression identifies which exception should be raised;
 373 there are several built-in exceptions and the user may define
 374 additional ones.  The second, optional expression is passed to the
 375 handler, e.g. as a detailed error message.
 376
 377 Exceptions may be handled (caught) with the {\tt try} statement, which
 378 has the following general form:
 379 \vspace{\itemsep}
 380
 381 \noindent
 382 {\tt
 383 \begin{tabular}{l}
 384 try: {\em block} \\
 385 except {\em expression}, {\em variable}: {\em block} \\
 386 except {\em expression}, {\em variable}: {\em block} \\
 387 ... \\
 388 except: {\em block}
 389 \end{tabular}
 390 }
 391 \vspace{\itemsep}
 392
 393 When an exception is raised during execution of the first block, a
 394 search for an exception handler starts.  The first {\tt except} clause
 395 whose {\em expression} matches the exception is executed.  The
 396 expression may specify a list of exceptions to match against.  A
 397 handler without an expression serves as a `catch-all'.  If there is no
 398 match, the search for a handler continues with outer {\tt try}
 399 statements; if no match is found on the entire invocation stack, an
 400 error message and stack trace are printed, and the program is
 401 terminated (interactively, the interpreter returns to its main loop).
 402
 403 Note that the form of the {\tt except} clauses encourages a style of
 404 programming whereby only selected exceptions are caught, passing
 405 unanticipated exceptions on to the caller and ultimately to the user.
 406 This is preferable over a simpler `catch-all' error handling
 407 mechanism, where a simplistic handler intended to catch a single type
 408 of error like `file not found' can easily mask genuine programming
 409 errors --- especially in a language like Python which relies strongly
 410 on run-time checking and allows the catching of almost any type of
 411 error.
 412
 413 Other common statement forms, which we have already encountered, are
 414 function definitions, {\tt import} statements and {\tt print}
 415 statements.  There is also a {\tt del} statement to delete one or more
 416 variables, a {\tt return} statement to return from a function, and a
 417 {\tt global} statement to allow assignments to global variables.
 418 Finally, the {\tt pass} statement is a no-op.
 419
 420 \subsection{Execution Model}
 421
 422 A Python program is executed by a stack-based interpreter.
 423
 424 When a function is called, a new `execution environment' for it is
 425 pushed onto the stack.  An execution environment contains (among other
 426 data) pointers to two `symbol tables' that are used to hold variables:
 427 the local and the global symbol table.  The local symbol table
 428 contains local variables of the current function invocation (including
 429 the function arguments); the global symbol table contains variables
 430 defined in the module containing the current function.
 431
 432 The `global' symbol table is thus only global with respect to the
 433 current function.  There are no system-wide global variables; using
 434 the {\tt import} statement it is easy enough to reference variables
 435 that are defined in other modules.  A system-wide read-only symbol
 436 table is used for built-in functions and constants though.
 437
 438 On assignment to a variable, by default an entry for it is made in the
 439 local symbol table of the current execution environment.  The {\tt
 440 global} command can override this (it is not enough that a global
 441 variable by the same name already exists).  When a variable's value is
 442 needed, it is searched first in the local symbol table, then in the
 443 global one, and finally in the symbol table containing built-in
 444 functions and constants.
 445
 446 The term `variable' in this context refers to any name: functions and
 447 imported modules are searched in exactly the same way.
 448
 449 Names defined in a module's symbol table survive until the end of the
 450 program.  This approximates the semantics of file-static global
 451 variables in C or module variables in Modula-3.  A module is
 452 initialized the first time it is imported, by executing the text of
 453 the module as a parameterless function whose local and global symbol
 454 tables are the same, so names are defined in module's symbol table.
 455 (Modules implemented in C have another way to define symbols.)
 456
 457 A Python main program is read from standard input or from a script
 458 file passed as an argument to the interpreter.  It is executed as if
 459 an anonymous module was imported.  Since {\tt import} statements are
 460 executed like all other statements, the initialization order of the
 461 modules used in a program is defined by the flow of control through
 462 the program.
 463
 464 The `attribute' notation {\em m.name}, where {\em m} is a module,
 465 accesses the symbol {\em name} in that module's symbol table.  It can
 466 be assigned to as well.  This is in fact a special case of the
 467 construct {\em x.name} where {\em x} denotes an arbitrary object; the
 468 type of {\em x} determines how this is to be interpreted, and what
 469 assignment to it means.
 470
 471 For instance, when {\tt a} is a list object, {\tt a.append} yields a
 472 built-in `method' object which, when called, appends an item to {\tt a}.
 473 (If {\tt a} and {\tt b} are distinct list objects, {\tt a.append} and
 474 {\tt b.append} are distinguishable method objects.)  Normally, in
 475 statements like {\tt a.append(x)}, the method object {\tt a.append} is
 476 called and then discarded, but this is a matter of convention.
 477
 478 List attributes are read-only --- the user cannot define new list
 479 methods.  Some objects, like numbers and strings, have no attributes
 480 at all.  Like all type checking in Python, the meaning of an attribute
 481 is determined at run-time --- when the parser sees {\em x.name}, it
 482 has no idea of the type of {\em x}.  Note that {\em x} here does not
 483 have to be a variable --- it can be an arbitrary (perhaps
 484 parenthesized) expression.
 485
 486 Given the flexibility of the attribute notation, one is tempted to use
 487 methods to replace all standard operations.  Yet, Python has kept a
 488 small repertoire of built-in functions like {\tt len()} and {\tt
 489 abs()}.  The reason is that in some cases the function notation is
 490 more familiar than the method notation; just like programs would
 491 become less readable if all infix operators were replaced by function
 492 calls, they would become less readable if all function calls had to be
 493 replaced by method calls (and vice versa!).
 494
 495 The choice whether to make something a built-in function or a method
 496 is a matter of taste.  For arithmetic and string operations, function
 497 notation is preferred, since frequently the argument to such an
 498 operation is an expression using infix notation, as in {\tt abs(a+b)};
 499 this definitely looks better than {\tt (a+b).abs()}.  The choice
 500 between make something a built-in function or a function defined in a
 501 built-in method (requiring {\tt import}) is similarly guided by
 502 intuition; all in all, only functions needed by `general' programming
 503 techniques are built-in functions.
 504
 505 \subsection{Classes}
 506
 507 Python has a class mechanism distinct from the object-orientation
 508 already explained.  A class in Python is not much more than a
 509 collection of methods and a way to create class instances.  Class
 510 methods are ordinary functions whose first parameter is the class
 511 instance; they are called using the method notation.
 512
 513 For instance, a class can be defined as follows:
 514 \begin{verbatim}
 515 class Foo:
 516    def meth1(self, arg): ...
 517    def meth2(self): ...
 518 \end{verbatim}
 519 A class instance is created by
 520 {\tt x = Foo()}
 521 and its methods can be called thus:
 522 \begin{verbatim}
 523 x.meth1('Hi There!')
 524 x.meth2()
 525 \end{verbatim}
 526 The functions used as methods are also available as attributes of the
 527 class object, and the above method calls could also have been written
 528 as follows:
 529 \begin{verbatim}
 530 Foo.meth1(x, 'Hi There!')
 531 Foo.meth2(x)
 532 \end{verbatim}
 533 Class methods can store instance data by assigning to instance data
 534 attributes, e.g.:
 535 \begin{verbatim}
 536 self.size = 100
 537 self.title = 'Dear John'
 538 \end{verbatim}
 539 Data attributes do not have to be declared; as with local variables,
 540 they spring into existence when assigned to.  It is a matter of
 541 discretion to avoid name conflicts with method names.  This facility
 542 is also available to class users; instances of a method-less class can
 543 be used as records with named fields.
 544
 545 There is no built-in mechanism for instance initialization.  Classes
 546 by convention provide an {\tt init()} method which initializes the
 547 instance and then returns it, so the user can write
 548 \begin{verbatim}
 549 x = Foo().init('Dr. Strangelove')
 550 \end{verbatim}
 551
 552 Any user-defined class can be used as a base class to derive other
 553 classes.  However, built-in types like lists cannot be used as base
 554 classes.  (Incidentally, the same is true in \Cpp{} and Modula-3.)  A
 555 class may override any method of its base classes.  Instance methods
 556 are first searched in the method list of their class, and then,
 557 recursively, in the method lists of their base class.  Initialization
 558 methods of derived classes should explicitly call the initialization
 559 methods of their base class.
 560
 561 A simple form of multiple inheritance is also supported: a class can
 562 have multiple base classes, but the language rules for resolving name
 563 conflicts are somewhat simplistic, and consequently the feature has so
 564 far found little usage.
 565
 566 \subsection{The Python Library}
 567
 568 Python comes with an extensive library, structured as a collection of
 569 modules.  A few modules are built into the interpreter: these
 570 generally provide access to system libraries implemented in C such as
 571 mathematical functions or operating system calls.  Two built-in
 572 modules provide access to internals of the interpreter and its
 573 environment.  Even abusing these internals will at most cause an
 574 exception in the Python program; the interpreter will not dump core
 575 because of errors in Python code.
 576
 577 Most modules however are written in Python and distributed with the
 578 interpreter; they provide general programming tools like string
 579 operations and random number generators, provide more convenient
 580 interfaces to some built-in modules, or provide specialized services
 581 like a {\em getopt}-style command line option processor for
 582 stand-alone scripts.
 583
 584 There are also some modules written in Python that dig deep in the
 585 internals of the interpreter; there is a module to browse the stack
 586 backtrace when an unhandled exception has occurred, one to disassemble
 587 the internal representation of Python code, and even an interactive
 588 source code debugger which can trace Python code, set breakpoints,
 589 etc.
 590
 591 \subsection{Extensibility}
 592
 593 It is easy to add new built-in modules written in C to the Python
 594 interpreter.  Extensions appear to the Python user as built-in
 595 modules.  Using a built-in module is no different from using a module
 596 written in Python, but obviously the author of a built-in module can
 597 do things that cannot be implemented purely in Python.
 598
 599 In particular, built-in modules can contain Python-callable functions
 600 that call functions from particular system libraries (`wrapper
 601 functions'), and they can define new object types.  In general, if a
 602 built-in module defines a new object type, it should also provide at
 603 least one function that creates such objects.  Attributes of such
 604 object types are also implemented in C; they can return data
 605 associated with the object or methods, implemented as C functions.
 606
 607 For instance, an extension was created for Amoeba: it provides wrapper
 608 functions for the basic Amoeba name server functions, and defines a
 609 `capability' object type, whose methods are file server operations.
 610 Another extension is a built-in module called {\tt posix}; it provides
 611 wrappers around post UNIX system calls.  Extension modules also
 612 provide access to two different windowing/graphics interfaces: STDWIN
 613 \cite{STDWIN}
 614 (which connects to X11 on UNIX and to the Mac Toolbox on the
 615 Macintosh), and the Graphics Library (GL) for Silicon Graphics
 616 machines.
 617
 618 Any function in an extension module is supposed to type-check its
 619 arguments; the interpreter contains a convenience function to
 620 facilitate extracting C values from arguments and type-checking them
 621 at the same time.  Returning values is also painless, using standard
 622 functions to create Python objects from C values.
 623
 624 On some systems extension modules may be dynamically loaded, thus
 625 avoiding the need to maintain a private copy of the Python interpreter
 626 in order to use a private extension.
 627
 628 \section{A Short Description of AIL and Amoeba}
 629
 630 An RPC stub generator takes an interface description as input.  The
 631 designer of a stub generator has at least two choices for the input
 632 language: use a suitably restricted version of the target language, or
 633 design a new language.  The first solution was chosen, for instance,
 634 by the designers of Flume, the stub generator for the Topaz
 635 distributed operating system built at DEC SRC
 636 \cite{Flume,Evolving}.
 637
 638 Flume's one and only target language is Modula-2+ (the predecessor of
 639 Modula-3).  Modula-2+, like Modula-N for any N, has an interface
 640 syntax that is well suited as a stub generator input language: an
 641 interface module declares the functions that are `exported' by a
 642 module implementation, with their parameter and return types, plus the
 643 types and constants used for the parameters.  Therefore, the input to
 644 Flume is simply a Modula-2+ interface module.  But even in this ideal
 645 situation, an RPC stub generator needs to know things about functions
 646 that are not stated explicitly in the interface module: for instance,
 647 the transfer direction of VAR parameters (IN, OUT or both) is not
 648 given.  Flume solves this and other problems by a mixture of
 649 directives hidden in comments and a convention for the names of
 650 objects.  Thus, one could say that the designers of Flume really
 651 created a new language, even though it looks remarkably like their
 652 target language.
 653
 654 \subsection{The AIL Input Language}
 655
 656 Amoeba uses C as its primary programming language.  C function
 657 declarations (at least in `Classic' C) don't specify the types of
 658 the parameters, let alone their transfer direction.  Using this as
 659 input for a stub generator would require almost all information for
 660 the stub generator to be hidden inside comments, which would require a
 661 rather contorted scanner.  Therefore we decided to design the input
 662 syntax for Amoeba's stub generator `from scratch'.  This gave us the
 663 liberty to invent proper syntax not only for the transfer direction of
 664 parameters, but also for variable-length arrays.
 665
 666 On the other hand we decided not to abuse our freedom, and borrowed as
 667 much from C as we could.  For instance, AIL runs its input through the
 668 C preprocessor, so we get macros, include files and conditional
 669 compilation for free.  AIL's type declaration syntax is a superset of
 670 C's, so the user can include C header files to use the types declared
 671 there as function parameter types --- which are declared using
 672 function prototypes as in \Cpp{} or Standard C\@.  It should be clear by
 673 now that AIL's lexical conventions are also identical to C's.  The
 674 same is true for its expression syntax.
 675
 676 Where does AIL differ from C, then?  Function declarations in AIL are
 677 grouped in {\em classes}.  Classes in AIL are mostly intended as a
 678 grouping mechanism: all functions implemented by a server are grouped
 679 together in a class.  Inheritance is used to form new groups by adding
 680 elements to existing groups; multiple inheritance is supported to join
 681 groups together.  Classes can also contain constant and type
 682 definitions, and one form of output that AIL can generate is a header
 683 file for use by C programmers who wish to use functions from a
 684 particular AIL class.
 685
 686 Let's have a look at some (unrealistically simple) class definitions:
 687 \begin{verbatim}
 688 #include <amoeba.h>     /* Defines `capability', etc. */
 689
 690 class standard_ops [1000 .. 1999] {
 691     /* Operations supported by most interfaces */
 692     std_info(*, out char buf[size:100], out int size);
 693     std_destroy(*);
 694 };
 695 \end{verbatim}
 696 This defines a class called `standard\_ops' whose request codes are
 697 chosen by AIL from the range 1000-1999.  Request codes are small
 698 integers used to identify remote operations.  The author of the class
 699 must specify a range from which AIL chooses, and class authors must
 700 make sure they avoid conflicts, e.g. by using an `assigned number
 701 administration office'.  In the example, `std\_info' will be assigned
 702 request code 1000 and `std\_destroy' will get code 1001.  There is
 703 also an option to explicitly assign request codes, for compatibility
 704 with servers with manually written interfaces.
 705
 706 The class `standard\_ops' defines two operations, `std\_info' and
 707 `std\_destroy'.  The first parameter of each operation is a star
 708 (`*'); this is a placeholder for a capability that must be passed when
 709 the operation is called.  The description of Amoeba below explains the
 710 meaning and usage of capabilities; for now, it is sufficient to know
 711 that a capability is a small structure that uniquely identifies an
 712 object and a server or service.
 713
 714 The standard operation `std\_info' has two output parameters: a
 715 variable-size character buffer (which will be filled with a short
 716 descriptive string of the object to which the operation is applied)
 717 and an integer giving the length of this string.  The standard
 718 operation `std\_destroy' has no further parameters --- it just
 719 destroys the object, if the caller has the right to do so.
 720
 721 The next class is called `tty':
 722 \begin{verbatim}
 723 class tty [2000 .. 2099] {
 724     inherit standard_ops;
 725     const TTY_MAXBUF = 1000;
 726     tty_write(*, char buf[size:TTY_MAXBUF], int size);
 727     tty_read(*, out char buf[size:TTY_MAXBUF], out int size);
 728 };
 729 \end{verbatim}
 730 The request codes for operations defined in this class lie in the
 731 range 2000-2099; inherited operations use the request codes already
 732 assigned to them.  The operations defined by this class are
 733 `tty\_read' and `tty\_write', which pass variable-sized data buffers
 734 between client and server.  Class `tty' inherits class
 735 `standard\_ops', so tty objects also support the operations
 736 `std\_info' and `std\_destroy'.
 737
 738 Only the {\em interface} for `std\_info' and `std\_destroy' is shared
 739 between tty objects and other objects whose interface inherits
 740 `standard\_ops'; the implementation may differ.  Even multiple
 741 implementations of the `tty' interface may exist, e.g. a driver for a
 742 console terminal and a terminal emulator in a window.  To expand on
 743 the latter example, consider:
 744 \begin{verbatim}
 745 class window [2100 .. 2199] {
 746     inherit standard_ops;
 747     win_create(*, int x, int y, int width, int height,
 748                   out capability win_cap);
 749     win_reconfigure(*, int x, int y, int width, int height);
 750 };
 751
 752 class tty_emulator [2200 .. 2299] {
 753     inherit tty, window;
 754 };
 755 \end{verbatim}
 756 Here two new interface classes are defined.
 757 Class `window' could be used for creating and manipulating windows.
 758 Note that `win\_create' returns a capability for the new window.
 759 This request should probably should be sent to a generic window
 760 server capability, or it might create a subwindow when applied to a
 761 window object.
 762
 763 Class `tty\_emulator' demonstrates the essence of multiple inheritance.
 764 It is presumably the interface to a window-based terminal emulator.
 765 Inheritance is transitive, so `tty\_emulator' also implicitly inherits
 766 `standard\_ops'.
 767 In fact, it inherits it twice: once via `tty' and once via `window'.
 768 Since AIL class inheritance only means interface sharing, not
 769 implementation sharing, inheriting the same class multiple times is
 770 never a problem and has the same effect as inheriting it once.
 771
 772 Note that the power of AIL classes doesn't go as far as \Cpp{}.
 773 AIL classes cannot have data members, and there is
 774 no mechanism for a server that implements a derived class
 775 to inherit the implementation of the base
 776 class --- other than copying the source code.
 777 The syntax for class definitions and inheritance is also different.
 778
 779 \subsection{Amoeba}
 780
 781 The smell of `object-orientedness' that the use of classes in AIL
 782 creates matches nicely with Amoeba's object-oriented approach to
 783 RPC\@.  In Amoeba, almost all operating system entities (files,
 784 directories, processes, devices etc.) are implemented as {\em
 785 objects}.  Objects are managed by {\em services} and represented by
 786 {\em capabilities}.  A capability gives its holder access to the
 787 object it represents.  Capabilities are protected cryptographically
 788 against forgery and can thus be kept in user space.  A capability is a
 789 128-bit binary string, subdivided as follows:
 790
 791 % XXX Need a better version of this picture!
 792 \begin{verbatim}
 793         48             24          8           48       Bits
 794 +----------------+------------+--------+---------------+
 795 |    Service     |   Object   |  Perm. |     Check     |
 796 |      port      |   number   |  bits  |     word      |
 797 +----------------+------------+--------+---------------+
 798 \end{verbatim}
 799
 800 The service port is used by the RPC implementation in the Amoeba
 801 kernel to locate a server implementing the service that manages the
 802 object.  In many cases there is a one-to-one correspondence between
 803 servers and services (each service is implemented by exactly one
 804 server process), but some services are replicated.  For instance,
 805 Amoeba's directory service, which is crucial for gaining access to most
 806 other services, is implemented by two servers that listen on the same
 807 port and know about exactly the same objects.
 808
 809 The object number in the capability is used by the server receiving
 810 the request for identifying the object to which the operation applies.
 811 The permission bits specify which operations the holder of the capability
 812 may apply.  The last part of a capability is a 48-bit long `check
 813 word', which is used to prevent forgery.  The check word is computed
 814 by the server based upon the permission bits and a random key per object
 815 that it keeps secret.  If you change the permission bits you must compute
 816 the proper check word or else the server will refuse the capability.
 817 Due to the size of the check word and the nature of the cryptographic
 818 `one-way function' used to compute it, inverting this function is
 819 impractical, so forging capabilities is impossible.%
 820 \footnote{
 821 As computers become faster, inverting the one-way function becomes
 822 less impractical.
 823 Therefore, a next version of Amoeba will have 64-bit check words.
 824 }
 825
 826 A working Amoeba system is a collection of diverse servers, managing
 827 files, directories, processes, devices etc.  While most servers have
 828 their own interface, there are some requests that make sense for some
 829 or all object types.  For instance, the {\em std\_info()} request,
 830 which returns a short descriptive string, applies to all object types.
 831 Likewise, {\em std\_destroy()} applies to files, directories and
 832 processes, but not to devices.
 833
 834 Similarly, different file server implementations may want to offer the
 835 same interface for operations like {\em read()} and {\em write()} to
 836 their clients.  AIL's grouping of requests into classes is ideally
 837 suited to describe this kind of interface sharing, and a class
 838 hierarchy results which clearly shows the similarities between server
 839 interfaces (not necessarily their implementations!).
 840
 841 The base class of all classes defines the {\em std\_info()} request.
 842 Most server interfaces actually inherit a derived class that also
 843 defines {\em std\_destroy().} File servers inherit a class that
 844 defines the common operations on files, etc.
 845
 846 \subsection{How AIL Works}
 847
 848 The AIL stub generator functions in three phases:
 849 \begin{itemize}
 850 \item
 851 parsing,
 852 \item
 853 strategy determination,
 854 \item
 855 code generation.
 856 \end{itemize}
 857
 858 {\bf Phase one} parses the input and builds a symbol table containing
 859 everything it knows about the classes and other definitions found in
 860 the input.
 861
 862 {\bf Phase two} determines the strategy to use for each function
 863 declaration in turn and decides upon the request and reply message
 864 formats.  This is not a simple matter, because of various optimization
 865 attempts.  Amoeba's kernel interface for RPC requests takes a
 866 fixed-size header and one arbitrary-size buffer.  A large part of the
 867 header holds the capability of the object to which the request is
 868 directed, but there is some space left for a few integer parameters
 869 whose interpretation is left up to the server.  AIL tries to use these
 870 slots for simple integer parameters, for two reasons.
 871
 872 First, unlike the buffer, header fields are byte-swapped by the RPC
 873 layer in the kernel if necessary, so it saves a few byte swapping
 874 instructions in the user code.  Second, and more important, a common
 875 form of request transfers a few integers and one large buffer to or
 876 from a server.  The {\em read()} and {\em write()} requests of most
 877 file servers have this form, for instance.  If it is possible to place
 878 all integer parameters in the header, the address of the buffer
 879 parameter can be passed directly to the kernel RPC layer.  While AIL
 880 is perfectly capable of handling requests that do not fit this format,
 881 the resulting code involves allocating a new buffer and copying all
 882 parameters into it.  It is a top priority to avoid this copying
 883 (`marshalling') if at all possible, in order to maintain Amoeba's
 884 famous RPC performance.
 885
 886 When AIL resorts to copying parameters into a buffer, it reorders them
 887 so that integers indicating the lengths of variable-size arrays are
 888 placed in the buffer before the arrays they describe, since otherwise
 889 decoding the request would be impossible.  It also adds occasional
 890 padding bytes to ensure integers are aligned properly in the buffer ---
 891 this can speed up (un)marshalling.
 892
 893 {\bf Phase three} is the code generator, or back-end.  There are in
 894 fact many different back-ends that may be called in a single run to
 895 generate different types of output.  The most important output types
 896 are header files (for inclusion by the clients of an interface),
 897 client stubs, and `server main loop' code.  The latter decodes
 898 incoming requests in the server.  The generated code depends on the
 899 programming language requested, and there are separate back-ends for
 900 each supported language.
 901
 902 It is important that the strategy chosen by phase two is independent
 903 of the language requested for phase three --- otherwise the
 904 interoperability of servers and clients written in different languages
 905 would be compromised.
 906
 907 \section{Linking AIL to Python}
 908
 909 From the previous section it can be concluded that linking AIL to
 910 Python is a matter of writing a back-end for Python.  This is indeed
 911 what we did.
 912
 913 Considerable time went into the design of the back-end in order to
 914 make the resulting RPC interface for Python fit as smoothly as
 915 possible in Python's programming style.  For instance, the issues of
 916 parameter transfer, variable-size arrays, error handling, and call
 917 syntax were all solved in a manner that favors ease of use in Python
 918 rather than strict correspondence with the stubs generated for C,
 919 without compromising network-level compatibility.
 920
 921 \subsection{Mapping AIL Entities to Python}
 922
 923 For each programming language that AIL is to support, a mapping must
 924 be designed between the data types in AIL and those in that language.
 925 Other aspects of the programming languages, such as differences in
 926 function call semantics, must also be taken care of.
 927
 928 While the mapping for C is mostly straightforward, the mapping for
 929 Python requires a little thinking to get the best results for Python
 930 programmers.
 931
 932 \subsubsection{Parameter Transfer Direction}
 933
 934 Perhaps the simplest issue is that of parameter transfer direction.
 935 Parameters of functions declared in AIL are categorized as being of
 936 type {\tt in}, {\tt out} or {\tt in} {\tt out} (the same distinction
 937 as made in Ada).  Python only has call-by-value parameter semantics;
 938 functions can return multiple values as a tuple.  This means that,
 939 unlike the C back-end, the Python back-end cannot always generate
 940 Python functions with exactly the same parameter list as the AIL
 941 functions.
 942
 943 Instead, the Python parameter list consists of all {\tt in} and {\tt
 944 in} {\tt out} parameters, in the order in which they occur in the AIL
 945 parameter list; similarly, the Python function returns a tuple
 946 containing all {\tt in} {\tt out} and {\tt out} parameters.  In fact
 947 Python packs function parameters into a tuple as well, stressing the
 948 symmetry between parameters and return value.  For example, a stub
 949 with this AIL parameter list:
 950 \begin{verbatim}
 951 (*, in int p1, in out int p2, in int p3, out int p4)
 952 \end{verbatim}
 953 will have the following parameter list and return values in Python:
 954 \begin{verbatim}
 955 (p1, p2, p3)  ->  (p2, p4)
 956 \end{verbatim}
 957
 958 \subsubsection{Variable-size Entities}
 959
 960 The support for variable-size objects in AIL is strongly guided by the
 961 limitations of C in this matter.  Basically, AIL allows what is
 962 feasible in C: functions may have variable-size arrays as parameters
 963 (both input or output), provided their length is passed separately.
 964 In practice this is narrowed to the following rule: for each
 965 variable-size array parameter, there must be an integer parameter
 966 giving its length.  (An exception for null-terminated strings is
 967 planned but not yet realized.)
 968
 969 Variable-size arrays in AIL or C correspond to {\em sequences} in
 970 Python: lists, tuples or strings.  These are much easier to use than
 971 their C counterparts.  Given a sequence object in Python, it is always
 972 possible to determine its size: the built-in function {\tt len()}
 973 returns it.  It would be annoying to require the caller of an RPC stub
 974 with a variable-size parameter to also pass a parameter that
 975 explicitly gives its size.  Therefore we eliminate all parameters from
 976 the Python parameter list whose value is used as the size of a
 977 variable-size array.  Such parameters are easily found: the array
 978 bound expression contains the name of the parameter giving its size.
 979 This requires the stub code to work harder (it has to recover the
 980 value for size parameters from the corresponding sequence parameter),
 981 but at least part of this work would otherwise be needed as well, to
 982 check that the given and actual sizes match.
 983
 984 Because of the symmetry in Python between the parameter list and the
 985 return value of a function, the same elimination is performed on
 986 return values containing variable-size arrays: integers returned
 987 solely to tell the client the size of a returned array are not
 988 returned explicitly to the caller in Python.
 989
 990 \subsubsection{Error Handling}
 991
 992 Another point where Python is really better than C is the issue of
 993 error handling.  It is a fact of life that everything involving RPC
 994 may fail, for a variety of reasons outside the user's control: the
 995 network may be disconnected, the server may be down, etc.  Clients
 996 must be prepared to handle such failures and recover from them, or at
 997 least print an error message and die.  In C this means that every
 998 function returns an error status that must be checked by the caller,
 999 causing programs to be cluttered with error checks --- or worse,
1000 programs that ignore errors and carry on working with garbage data.
1001
1002 In Python, errors are generally indicated by exceptions, which can be
1003 handled out of line from the main flow of control if necessary, and
1004 cause immediate program termination (with a stack trace) if ignored.
1005 To profit from this feature, all RPC errors that may be encountered by
1006 AIL-generated stubs in Python are turned into exceptions.  An extra
1007 value passed together with the exception is used to relay the error
1008 code returned by the server to the handler.  Since in general RPC
1009 failures are rare, Python test programs can usually ignore exceptions
1010 --- making the program simpler --- without the risk of occasional
1011 errors going undetected.  (I still remember the embarrassment of a
1012 hundredfold speed improvement reported, long, long, ago, about a new
1013 version of a certain program, which later had to be attributed to a
1014 benchmark that silently dumped core...)
1015
1016 \subsubsection{Function Call Syntax}
1017
1018 Amoeba RPC operations always need a capability parameter (this is what
1019 the `*' in the AIL function templates stands for); the service is
1020 identified by the port field of the capability.  In C, the capability
1021 must always be the first parameter of the stub function, but in Python
1022 we can do better.
1023
1024 A Python capability is an opaque object type in its own right, which
1025 is used, for instance, as parameter to and return value from Amoeba's
1026 name server functions.  Python objects can have methods, so it is
1027 convenient to make all AIL-generated stubs methods of capabilities
1028 instead of just functions.  Therefore, instead of writing
1029 \begin{verbatim}
1030 some_stub(cap, other_parameters)
1031 \end{verbatim}
1032 as in C, Python programmers can write
1033 \begin{verbatim}
1034 cap.some_stub(other_parameters)
1035 \end{verbatim}
1036 This is better because it reduces name conflicts: in Python, no
1037 confusion is possible between a stub and a local or global variable or
1038 user-defined function with the same name.
1039
1040 \subsubsection{Example}
1041
1042 All the preceding principles can be seen at work in the following
1043 example.  Suppose a function is declared in AIL as follows:
1044 \begin{verbatim}
1045 some_stub(*, in char buf[size:1000], in int size,
1046              out int n_done, out int status);
1047 \end{verbatim}
1048 In C it might be called by the following code (including declarations,
1049 for clarity, but not initializations):
1050 \begin{verbatim}
1051 int err, n_done, status;
1052 capability cap;
1053 char buf[500];
1054 ...
1055 err = some_stub(&cap, buf, sizeof buf, &n_done, &status);
1056 if (err != 0) return err;
1057 printf("%d done; status = %d\n", n_done, status);
1058 \end{verbatim}
1059 Equivalent code in Python might be the following:
1060 \begin{verbatim}
1061 cap = ...
1062 buf = ...
1063 n_done, status = cap.some_stub(buf)
1064 print n_done, 'done;', 'status =', status
1065 \end{verbatim}
1066 No explicit error check is required in Python: if the RPC fails, an
1067 exception is raised so the {\tt print} statement is never reached.
1068
1069 \subsection{The Implementation}
1070
1071 More or less orthogonal to the issue of how to map AIL operations to
1072 the Python language is the question of how they should be implemented.
1073
1074 In principle it would be possible to use the same strategy that is
1075 used for C: add an interface to Amoeba's low-level RPC primitives to
1076 Python and generate Python code to marshal parameters into and out of
1077 a buffer.  However, Python's high-level data types are not well suited
1078 for marshalling: byte-level operations are clumsy and expensive, with
1079 the result that marshalling a single byte of data can take several
1080 Python statements.  This would mean that a large amount of code would
1081 be needed to implement a stub, which would cost a lot of time to parse
1082 and take up a lot of space in `compiled' form (as parse tree or pseudo
1083 code).  Execution of the marshalling code would be sluggish as well.
1084
1085 We therefore chose an alternate approach, writing the marshalling in
1086 C, which is efficient at such byte-level operations.  While it is easy
1087 enough to generate C code that can be linked with the Python
1088 interpreter, it would obviously not stimulate the use of Python for
1089 server testing if each change to an interface required relinking the
1090 interpreter (dynamic loading of C code is not yet available on
1091 Amoeba).  This is circumvented by the following solution: the
1092 marshalling is handled by a simple {\em virtual machine}, and AIL
1093 generates instructions for this machine.  An interpreter for the
1094 machine is linked into the Python interpreter and reads its
1095 instructions from a file written by AIL.
1096
1097 The machine language for our virtual machine is dubbed {\em Stubcode}.
1098 Stubcode is a super-specialized language.  There are two sets of of
1099 about a dozen instructions each: one set marshals Python objects
1100 representing parameters into a buffer, the other set (similar but not
1101 quite symmetric) unmarshals results from a buffer into Python objects.
1102 The Stubcode interpreter uses a stack to hold Python intermediate
1103 results.  Other state elements are an Amoeba header and buffer, a
1104 pointer indicating the current position in the buffer, and of course a
1105 program counter.  Besides (un)marshalling, the virtual machine must
1106 also implement type checking, and raise a Python exception when a
1107 parameter does not have the expected type.
1108
1109 The Stubcode interpreter marshals Python data types very efficiently,
1110 since each instruction can marshal a large amount of data.  For
1111 instance, a whole Python string is marshalled by a single Stubcode
1112 instruction, which (after some checking) executes the most efficient
1113 byte-copying loop possible --- it calls {\tt memcpy()}.
1114
1115
1116 Construction details of the Stubcode interpreter are straightforward.
1117 Most complications are caused by the peculiarities of AIL's strategy
1118 module and Python's type system.  By far the most complex single
1119 instruction is the `loop' instruction, which is used to marshal
1120 arrays.
1121
1122 As an example, here is the complete Stubcode program (with spaces and
1123 comments added for clarity) generated for the function {\tt
1124 some\_stub()} of the example above.  The stack contains pointers to
1125 Python objects, and its initial contents is the parameter to the
1126 function, the string {\tt buf}.  The final stack contents will be the
1127 function return value, the tuple {\tt (n\_done, status)}.  The name
1128 {\tt header} refers to the fixed size Amoeba RPC header structure.
1129 \vspace{1em}
1130
1131 {\tt
1132 \begin{tabular}{l l l}
1133 BufSize     & 1000            & {\em Allocate RPC buffer of 1000 bytes}    \\
1134 Dup         & 1               & {\em Duplicate stack top}                  \\
1135 StringS     &                 & {\em Replace stack top by its string size} \\
1136 PutI        & h\_extra int32  & {\em Store top element in }header.h\_extra \\
1137 TStringSlt  & 1000            & {\em Assert string size less than 1000}    \\
1138 PutVS       &                 & {\em Marshal variable-size string}         \\
1139             &                 &                                            \\
1140 Trans       & 1234            & {\em Execute the RPC (request code 1234)}  \\
1141             &                 &                                            \\
1142 GetI        & h\_extra int32  & {\em Push integer from} header.h\_extra    \\
1143 GetI        & h\_size int32   & {\em Push integer from} header.h\_size     \\
1144 Pack        & 2               & {\em Pack top 2 elements into a tuple}     \\
1145 \end{tabular}
1146 }
1147 \vspace{1em}
1148
1149 As much work as possible is done by the Python back-end in AIL, rather
1150 than in the Stubcode interpreter, to make the latter both simple and
1151 fast.  For instance, the decision to eliminate an array size parameter
1152 from the Python parameter list is taken by AIL, and Stubcode
1153 instructions are generated to recover the size from the actual
1154 parameter and to marshal it properly.  Similarly, there is a special
1155 alignment instruction (not used in the example) to meet alignment
1156 requirements.
1157
1158 Communication between AIL and the Stubcode generator is via the file
1159 system.  For each stub function, AIL creates a file in its output
1160 directory, named after the stub with a specific suffix.  This file
1161 contains a machine-readable version of the Stubcode program for the
1162 stub.  The Python user can specify a search path containing
1163 directories which the interpreter searches for a Stubcode file the
1164 first time the definition for a particular stub is needed.
1165
1166 The transformations on the parameter list and data types needed to map
1167 AIL data types to Python data types make it necessary to help the
1168 Python programmer a bit in figuring out the parameters to a call.
1169 Although in most cases the rules are simple enough, it is sometimes
1170 hard to figure out exactly what the parameter and return values of a
1171 particular stub are.  There are two sources of help in this case:
1172 first, the exception contains enough information so that the user can
1173 figure what type was expected; second, AIL's Python back-end
1174 optionally generates a human-readable `interface specification' file.
1175
1176 \section{Conclusion}
1177
1178 We have succeeded in creating a useful extension to Python that
1179 enables Amoeba server writers to test and experiment with their server
1180 in a much more interactive manner.  We hope that this facility will
1181 add to the popularity of AIL amongst Amoeba programmers.
1182
1183 Python's extensibility was proven convincingly by the exercise
1184 (performed by the second author) of adding the Stubcode interpreter to
1185 Python.  Standard data abstraction techniques are used to insulate
1186 extension modules from details of the rest of the Python interpreter.
1187 In the case of the Stubcode interpreter this worked well enough that
1188 it survived a major overhaul of the main Python interpreter virtually
1189 unchanged.
1190
1191 On the other hand, adding a new back-end to AIL turned out to be quite
1192 a bit of work.  One problem, specific to Python, was to be expected:
1193 Python's variable-size data types differ considerably from the
1194 C-derived data model that AIL favors.  Two additional problems we
1195 encountered were the complexity of the interface between AIL's second
1196 and third phases, and a number of remaining bugs in the second phase
1197 that surfaced when the implementation of the Python back-end was
1198 tested.  The bugs have been tracked down and fixed, but nothing
1199 has been done about the complexity of the interface.
1200
1201 \subsection{Future Plans}
1202
1203 AIL's C back-end generates server main loop code as well as client
1204 stubs.  The Python back-end currently only generates client stubs, so
1205 it is not yet possible to write servers in Python.  While it is
1206 clearly more important to be able to use Python as a client than as a
1207 server, the ability to write server prototypes in Python would be a
1208 valuable addition: it allows server designers to experiment with
1209 interfaces in a much earlier stage of the design, with a much smaller
1210 programming effort.  This makes it possible to concentrate on concepts
1211 first, before worrying about efficient implementation.
1212
1213 The unmarshalling done in the server is almost symmetric with the
1214 marshalling in the client, and vice versa, so relative small
1215 extensions to the Stubcode virtual machine will allow its use in a
1216 server main loop.  We hope to find the time to add this feature to a
1217 future version of Python.
1218
1219 \section{Availability}
1220
1221 The Python source distribution is available to Internet users by
1222 anonymous ftp to site {\tt ftp.cwi.nl} [IP address 192.16.184.180]
1223 from directory {\tt /pub}, file name {\tt python*.tar.Z} (where the
1224 {\tt *} stands for a version number).  This is a compressed UNIX tar
1225 file containing the C source and \LaTeX documentation for the Python
1226 interpreter.  It includes the Python library modules and the {\em
1227 Stubcode} interpreter, as well as many example Python programs.  Total
1228 disk space occupied by the distribution is about 3 Mb; compilation
1229 requires 1-3 Mb depending on the configuration built, the compile
1230 options, etc.
1231
1232 \bibliographystyle{plain}
1233
1234 \bibliography{quabib}
1235
1236 \end{document}