Doc/ext.tex

   1 \documentstyle[twoside,11pt,myformat]{report}
   2
   3 \title{Extending and Embedding the Python Interpreter}
   4
   5 \author{
   6         Guido van Rossum \\
   7         Dept. CST, CWI, P.O. Box 94079 \\
   8         1090 GB Amsterdam, The Netherlands \\
   9         E-mail: {\tt guido@cwi.nl}
  10 }
  11
  12 \date{4 May 1994 \\ Release 1.0.2} % XXX update before release!
  13
  14 % Tell \index to actually write the .idx file
  15 \makeindex
  16
  17 \begin{document}
  18
  19 \pagenumbering{roman}
  20
  21 \maketitle
  22
  23 \begin{abstract}
  24
  25 \noindent
  26 This document describes how to write modules in C or C++ to extend the
  27 Python interpreter.  It also describes how to use Python as an
  28 `embedded' language, and how extension modules can be loaded
  29 dynamically (at run time) into the interpreter, if the operating
  30 system supports this feature.
  31
  32 \end{abstract}
  33
  34 \pagebreak
  35
  36 {
  37 \parskip = 0mm
  38 \tableofcontents
  39 }
  40
  41 \pagebreak
  42
  43 \pagenumbering{arabic}
  44
  45
  46 \chapter{Extending Python with C or C++ code}
  47
  48
  49 \section{Introduction}
  50
  51 It is quite easy to add non-standard built-in modules to Python, if
  52 you know how to program in C.  A built-in module known to the Python
  53 programmer as \code{foo} is generally implemented by a file called
  54 \file{foomodule.c}.  All but the two most essential standard built-in
  55 modules also adhere to this convention, and in fact some of them form
  56 excellent examples of how to create an extension.
  57
  58 Extension modules can do two things that can't be done directly in
  59 Python: they can implement new data types (which are different from
  60 classes by the way), and they can make system calls or call C library
  61 functions.  Since the latter is usually the most important reason for
  62 adding an extension, I'll concentrate on adding `wrappers' around C
  63 library functions; the concrete example uses the wrapper for
  64 \code{system()} in module \code{posix}, found in (of course) the file
  65 \file{Modules/posixmodule.c}.
  66
  67 Note: unless otherwise mentioned, all file references in this
  68 document are relative to the toplevel directory of the Python
  69 distribution --- i.e. the directory that contains the \file{configure}
  70 script.
  71
  72 The compilation of an extension module depends on your system setup
  73 and the intended use of the module; details are given in a later
  74 section.
  75
  76
  77 \section{A first look at the code}
  78
  79 It is important not to be impressed by the size and complexity of
  80 the average extension module; much of this is straightforward
  81 `boilerplate' code (starting right with the copyright notice)!
  82
  83 Let's skip the boilerplate and have a look at an interesting function
  84 in \file{posixmodule.c} first:
  85
  86 \begin{verbatim}
  87     static object *
  88     posix_system(self, args)
  89         object *self;
  90         object *args;
  91     {
  92         char *command;
  93         int sts;
  94         if (!getargs(args, "s", &command))
  95             return NULL;
  96         sts = system(command);
  97         return mkvalue("i", sts);
  98     }
  99 \end{verbatim}
 100
 101 This is the prototypical top-level function in an extension module.
 102 It will be called (we'll see later how) when the Python program
 103 executes statements like
 104
 105 \begin{verbatim}
 106     >>> import posix
 107     >>> sts = posix.system('ls -l')
 108 \end{verbatim}
 109
 110 There is a straightforward translation from the arguments to the call
 111 in Python (here the single expression \code{'ls -l'}) to the arguments that
 112 are passed to the C function.  The C function always has two
 113 parameters, conventionally named \var{self} and \var{args}.  The
 114 \var{self} argument is used when the C function implements a builtin
 115 method --- this is advanced material and not covered in this document.
 116 In the example, \var{self} will always be a \code{NULL} pointer, since
 117 we are defining a function, not a method (this is done so that the
 118 interpreter doesn't have to understand two different types of C
 119 functions).
 120
 121 The \var{args} parameter will be a pointer to a Python object, or
 122 \code{NULL} if the Python function/method was called without
 123 arguments.  It is necessary to do full argument type checking on each
 124 call, since otherwise the Python user would be able to cause the
 125 Python interpreter to `dump core' by passing invalid arguments to a
 126 function in an extension module.  Because argument checking and
 127 converting arguments to C are such common tasks, there's a general
 128 function in the Python interpreter that combines them:
 129 \code{getargs()}.  It uses a template string to determine both the
 130 types of the Python argument and the types of the C variables into
 131 which it should store the converted values.\footnote{There are
 132 convenience macros \code{getnoarg()}, \code{getstrarg()},
 133 \code{getintarg()}, etc., for many common forms of \code{getargs()}
 134 templates.  These are relics from the past; the recommended practice
 135 is to call \code{getargs()} directly.}  (More about this later.)
 136
 137 If \code{getargs()} returns nonzero, the argument list has the right
 138 type and its components have been stored in the variables whose
 139 addresses are passed.  If it returns zero, an error has occurred.  In
 140 the latter case it has already raised an appropriate exception by so
 141 the calling function should return \code{NULL} immediately --- see the
 142 next section.
 143
 144
 145 \section{Intermezzo: errors and exceptions}
 146
 147 An important convention throughout the Python interpreter is the
 148 following: when a function fails, it should set an exception condition
 149 and return an error value (often a \code{NULL} pointer).  Exceptions
 150 are stored in a static global variable in \file{Python/errors.c}; if
 151 this variable is \code{NULL} no exception has occurred.  A second
 152 static global variable stores the `associated value' of the exception
 153 --- the second argument to \code{raise}.
 154
 155 The file \file{errors.h} declares a host of functions to set various
 156 types of exceptions.  The most common one is \code{err_setstr()} ---
 157 its arguments are an exception object (e.g. \code{RuntimeError} ---
 158 actually it can be any string object) and a C string indicating the
 159 cause of the error (this is converted to a string object and stored as
 160 the `associated value' of the exception).  Another useful function is
 161 \code{err_errno()}, which only takes an exception argument and
 162 constructs the associated value by inspection of the (UNIX) global
 163 variable errno.  The most general function is \code{err_set()}, which
 164 takes two object arguments, the exception and its associated value.
 165 You don't need to \code{INCREF()} the objects passed to any of these
 166 functions.
 167
 168 You can test non-destructively whether an exception has been set with
 169 \code{err_occurred()}.  However, most code never calls
 170 \code{err_occurred()} to see whether an error occurred or not, but
 171 relies on error return values from the functions it calls instead.
 172
 173 When a function that calls another function detects that the called
 174 function fails, it should return an error value (e.g. \code{NULL} or
 175 \code{-1}) but not call one of the \code{err_*} functions --- one has
 176 already been called.  The caller is then supposed to also return an
 177 error indication to {\em its} caller, again {\em without} calling
 178 \code{err_*()}, and so on --- the most detailed cause of the error was
 179 already reported by the function that first detected it.  Once the
 180 error has reached Python's interpreter main loop, this aborts the
 181 currently executing Python code and tries to find an exception handler
 182 specified by the Python programmer.
 183
 184 (There are situations where a module can actually give a more detailed
 185 error message by calling another \code{err_*} function, and in such
 186 cases it is fine to do so.  As a general rule, however, this is not
 187 necessary, and can cause information about the cause of the error to
 188 be lost: most operations can fail for a variety of reasons.)
 189
 190 To ignore an exception set by a function call that failed, the
 191 exception condition must be cleared explicitly by calling
 192 \code{err_clear()}.  The only time C code should call
 193 \code{err_clear()} is if it doesn't want to pass the error on to the
 194 interpreter but wants to handle it completely by itself (e.g. by
 195 trying something else or pretending nothing happened).
 196
 197 Finally, the function \code{err_get()} gives you both error variables
 198 {\em and clears them}.  Note that even if an error occurred the second
 199 one may be \code{NULL}.  You have to \code{XDECREF()} both when you
 200 are finished with them.  I doubt you will need to use this function.
 201
 202 Note that a failing \code{malloc()} call must also be turned into an
 203 exception --- the direct caller of \code{malloc()} (or
 204 \code{realloc()}) must call \code{err_nomem()} and return a failure
 205 indicator itself.  All the object-creating functions
 206 (\code{newintobject()} etc.) already do this, so only if you call
 207 \code{malloc()} directly this note is of importance.
 208
 209 Also note that, with the important exception of \code{getargs()},
 210 functions that return an integer status usually return \code{0} or a
 211 positive value for success and \code{-1} for failure.
 212
 213 Finally, be careful about cleaning up garbage (making \code{XDECREF()}
 214 or \code{DECREF()} calls for objects you have already created) when
 215 you return an error!
 216
 217 The choice of which exception to raise is entirely yours.  There are
 218 predeclared C objects corresponding to all built-in Python exceptions,
 219 e.g. \code{ZeroDevisionError} which you can use directly.  Of course,
 220 you should chose exceptions wisely --- don't use \code{TypeError} to
 221 mean that a file couldn't be opened (that should probably be
 222 \code{IOError}).  If anything's wrong with the argument list the
 223 \code{getargs()} function raises \code{TypeError}.  If you have an
 224 argument whose value which must be in a particular range or must
 225 satisfy other conditions, \code{ValueError} is appropriate.
 226
 227 You can also define a new exception that is unique to your module.
 228 For this, you usually declare a static object variable at the
 229 beginning of your file, e.g.
 230
 231 \begin{verbatim}
 232     static object *FooError;
 233 \end{verbatim}
 234
 235 and initialize it in your module's initialization function
 236 (\code{initfoo()}) with a string object, e.g. (leaving out the error
 237 checking for simplicity):
 238
 239 \begin{verbatim}
 240     void
 241     initfoo()
 242     {
 243         object *m, *d;
 244         m = initmodule("foo", foo_methods);
 245         d = getmoduledict(m);
 246         FooError = newstringobject("foo.error");
 247         dictinsert(d, "error", FooError);
 248     }
 249 \end{verbatim}
 250
 251
 252 \section{Back to the example}
 253
 254 Going back to \code{posix_system()}, you should now be able to
 255 understand this bit:
 256
 257 \begin{verbatim}
 258         if (!getargs(args, "s", &command))
 259             return NULL;
 260 \end{verbatim}
 261
 262 It returns \code{NULL} (the error indicator for functions of this
 263 kind) if an error is detected in the argument list, relying on the
 264 exception set by \code{getargs()}.  Otherwise the string value of the
 265 argument has been copied to the local variable \code{command} --- this
 266 is in fact just a pointer assignment and you are not supposed to
 267 modify the string to which it points.
 268
 269 If a function is called with multiple arguments, the argument list
 270 (the argument \code{args}) is turned into a tuple.  If it is called
 271 without arguments, \code{args} is \code{NULL}. \code{getargs()} knows
 272 about this; see later.
 273
 274 The next statement in \code{posix_system()} is a call to the C library
 275 function \code{system()}, passing it the string we just got from
 276 \code{getargs()}:
 277
 278 \begin{verbatim}
 279         sts = system(command);
 280 \end{verbatim}
 281
 282 Finally, \code{posix.system()} must return a value: the integer status
 283 returned by the C library \code{system()} function.  This is done
 284 using the function \code{mkvalue()}, which is something like the
 285 inverse of \code{getargs()}: it takes a format string and a variable
 286 number of C values and returns a new Python object.
 287
 288 \begin{verbatim}
 289         return mkvalue("i", sts);
 290 \end{verbatim}
 291
 292 In this case, it returns an integer object (yes, even integers are
 293 objects on the heap in Python!).  More info on \code{mkvalue()} is
 294 given later.
 295
 296 If you had a function that returned no useful argument (a.k.a. a
 297 procedure), you would need this idiom:
 298
 299 \begin{verbatim}
 300         INCREF(None);
 301         return None;
 302 \end{verbatim}
 303
 304 \code{None} is a unique Python object representing `no value'.  It
 305 differs from \code{NULL}, which means `error' in most contexts.
 306
 307
 308 \section{The module's function table}
 309
 310 I promised to show how I made the function \code{posix_system()}
 311 callable from Python programs.  This is shown later in
 312 \file{Modules/posixmodule.c}:
 313
 314 \begin{verbatim}
 315     static struct methodlist posix_methods[] = {
 316         ...
 317         {"system",  posix_system},
 318         ...
 319         {NULL,      NULL}        /* Sentinel */
 320     };
 321
 322     void
 323     initposix()
 324     {
 325         (void) initmodule("posix", posix_methods);
 326     }
 327 \end{verbatim}
 328
 329 (The actual \code{initposix()} is somewhat more complicated, but many
 330 extension modules can be as simple as shown here.)  When the Python
 331 program first imports module \code{posix}, \code{initposix()} is
 332 called, which calls \code{initmodule()} with specific parameters.
 333 This creates a `module object' (which is inserted in the table
 334 \code{sys.modules} under the key \code{'posix'}), and adds
 335 built-in-function objects to the newly created module based upon the
 336 table (of type struct methodlist) that was passed as its second
 337 parameter.  The function \code{initmodule()} returns a pointer to the
 338 module object that it creates (which is unused here).  It aborts with
 339 a fatal error if the module could not be initialized satisfactorily,
 340 so you don't need to check for errors.
 341
 342
 343 \section{Compilation and linkage}
 344
 345 There are two more things to do before you can use your new extension
 346 module: compiling and linking it with the Python system.  If you use
 347 dynamic loading, the details depend on the style of dynamic loading
 348 your system uses; see the chapter on Dynamic Loading for more info
 349 about this.
 350
 351 If you can't use dynamic loading, or if you want to make your module a
 352 permanent part of the Python interpreter, you will have to change the
 353 configuration setup and rebuild the interpreter.  Luckily, in the 1.0
 354 release this is very simple: just place your file (named
 355 \file{foomodule.c} for example) in the \file{Modules} directory, add a
 356 line to the file \file{Modules/Setup} describing your file:
 357
 358 \begin{verbatim}
 359     foo foomodule.o
 360 \end{verbatim}
 361
 362 and rebuild the interpreter by running \code{make} in the toplevel
 363 directory.  You can also run \code{make} in the \file{Modules}
 364 subdirectory, but then you must first rebuilt the \file{Makefile}
 365 there by running \code{make Makefile}.  (This is necessary each time
 366 you change the \file{Setup} file.)
 367
 368
 369 \section{Calling Python functions from C}
 370
 371 So far we have concentrated on making C functions callable from
 372 Python.  The reverse is also useful: calling Python functions from C.
 373 This is especially the case for libraries that support so-called
 374 `callback' functions.  If a C interface makes use of callbacks, the
 375 equivalent Python often needs to provide a callback mechanism to the
 376 Python programmer; the implementation will require calling the Python
 377 callback functions from a C callback.  Other uses are also imaginable.
 378
 379 Fortunately, the Python interpreter is easily called recursively, and
 380 there is a standard interface to call a Python function.  (I won't
 381 dwell on how to call the Python parser with a particular string as
 382 input --- if you're interested, have a look at the implementation of
 383 the \samp{-c} command line option in \file{Python/pythonmain.c}.)
 384
 385 Calling a Python function is easy.  First, the Python program must
 386 somehow pass you the Python function object.  You should provide a
 387 function (or some other interface) to do this.  When this function is
 388 called, save a pointer to the Python function object (be careful to
 389 \code{INCREF()} it!) in a global variable --- or whereever you see fit.
 390 For example, the following function might be part of a module
 391 definition:
 392
 393 \begin{verbatim}
 394     static object *my_callback = NULL;
 395
 396     static object *
 397     my_set_callback(dummy, arg)
 398         object *dummy, *arg;
 399     {
 400         XDECREF(my_callback); /* Dispose of previous callback */
 401         my_callback = arg;
 402         XINCREF(my_callback); /* Remember new callback */
 403         /* Boilerplate for "void" return */
 404         INCREF(None);
 405         return None;
 406     }
 407 \end{verbatim}
 408
 409 This particular function doesn't do any typechecking on its argument
 410 --- that will be done by \code{call_object()}, which is a bit late but
 411 at least protects the Python interpreter from shooting itself in its
 412 foot.  (The problem with typechecking functions is that there are at
 413 least five different Python object types that can be called, so the
 414 test would be somewhat cumbersome.)
 415
 416 The macros \code{XINCREF()} and \code{XDECREF()} increment/decrement
 417 the reference count of an object and are safe in the presence of
 418 \code{NULL} pointers.  More info on them in the section on Reference
 419 Counts below.
 420
 421 Later, when it is time to call the function, you call the C function
 422 \code{call_object()}.  This function has two arguments, both pointers
 423 to arbitrary Python objects: the Python function, and the argument
 424 list.  The argument list must always be a tuple object, whose length
 425 is the number of arguments.  To call the Python function with no
 426 arguments, you must pass an empty tuple.  For example:
 427
 428 \begin{verbatim}
 429     object *arglist;
 430     object *result;
 431     ...
 432     /* Time to call the callback */
 433     arglist = mktuple(0);
 434     result = call_object(my_callback, arglist);
 435     DECREF(arglist);
 436 \end{verbatim}
 437
 438 \code{call_object()} returns a Python object pointer: this is
 439 the return value of the Python function.  \code{call_object()} is
 440 `reference-count-neutral' with respect to its arguments.  In the
 441 example a new tuple was created to serve as the argument list, which
 442 is \code{DECREF()}-ed immediately after the call.
 443
 444 The return value of \code{call_object()} is `new': either it is a
 445 brand new object, or it is an existing object whose reference count
 446 has been incremented.  So, unless you want to save it in a global
 447 variable, you should somehow \code{DECREF()} the result, even
 448 (especially!) if you are not interested in its value.
 449
 450 Before you do this, however, it is important to check that the return
 451 value isn't \code{NULL}.  If it is, the Python function terminated by raising
 452 an exception.  If the C code that called \code{call_object()} is
 453 called from Python, it should now return an error indication to its
 454 Python caller, so the interpreter can print a stack trace, or the
 455 calling Python code can handle the exception.  If this is not possible
 456 or desirable, the exception should be cleared by calling
 457 \code{err_clear()}.  For example:
 458
 459 \begin{verbatim}
 460     if (result == NULL)
 461         return NULL; /* Pass error back */
 462     /* Here maybe use the result */
 463     DECREF(result);
 464 \end{verbatim}
 465
 466 Depending on the desired interface to the Python callback function,
 467 you may also have to provide an argument list to \code{call_object()}.
 468 In some cases the argument list is also provided by the Python
 469 program, through the same interface that specified the callback
 470 function.  It can then be saved and used in the same manner as the
 471 function object.  In other cases, you may have to construct a new
 472 tuple to pass as the argument list.  The simplest way to do this is to
 473 call \code{mkvalue()}.  For example, if you want to pass an integral
 474 event code, you might use the following code:
 475
 476 \begin{verbatim}
 477     object *arglist;
 478     ...
 479     arglist = mkvalue("(l)", eventcode);
 480     result = call_object(my_callback, arglist);
 481     DECREF(arglist);
 482     if (result == NULL)
 483         return NULL; /* Pass error back */
 484     /* Here maybe use the result */
 485     DECREF(result);
 486 \end{verbatim}
 487
 488 Note the placement of DECREF(argument) immediately after the call,
 489 before the error check!  Also note that strictly spoken this code is
 490 not complete: \code{mkvalue()} may run out of memory, and this should
 491 be checked.
 492
 493
 494 \section{Format strings for {\tt getargs()}}
 495
 496 The \code{getargs()} function is declared in \file{modsupport.h} as
 497 follows:
 498
 499 \begin{verbatim}
 500     int getargs(object *arg, char *format, ...);
 501 \end{verbatim}
 502
 503 The remaining arguments must be addresses of variables whose type is
 504 determined by the format string.  For the conversion to succeed, the
 505 \var{arg} object must match the format and the format must be exhausted.
 506 Note that while \code{getargs()} checks that the Python object really
 507 is of the specified type, it cannot check the validity of the
 508 addresses of C variables provided in the call: if you make mistakes
 509 there, your code will probably dump core.
 510
 511 A non-empty format string consists of a single `format unit'.  A
 512 format unit describes one Python object; it is usually a single
 513 character or a parenthesized sequence of format units.  The type of a
 514 format units is determined from its first character, the `format
 515 letter':
 516
 517 \begin{description}
 518
 519 \item[\samp{s} (string)]
 520 The Python object must be a string object.  The C argument must be a
 521 \code{(char**)} (i.e. the address of a character pointer), and a pointer
 522 to the C string contained in the Python object is stored into it.  You
 523 must not provide storage to store the string; a pointer to an existing
 524 string is stored into the character pointer variable whose address you
 525 pass.  If the next character in the format string is \samp{\#},
 526 another C argument of type \code{(int*)} must be present, and the
 527 length of the Python string (not counting the trailing zero byte) is
 528 stored into it.
 529
 530 \item[\samp{z} (string or zero, i.e. \code{NULL})]
 531 Like \samp{s}, but the object may also be None.  In this case the
 532 string pointer is set to \code{NULL} and if a \samp{\#} is present the
 533 size is set to 0.
 534
 535 \item[\samp{b} (byte, i.e. char interpreted as tiny int)]
 536 The object must be a Python integer.  The C argument must be a
 537 \code{(char*)}.
 538
 539 \item[\samp{h} (half, i.e. short)]
 540 The object must be a Python integer.  The C argument must be a
 541 \code{(short*)}.
 542
 543 \item[\samp{i} (int)]
 544 The object must be a Python integer.  The C argument must be an
 545 \code{(int*)}.
 546
 547 \item[\samp{l} (long)]
 548 The object must be a (plain!) Python integer.  The C argument must be
 549 a \code{(long*)}.
 550
 551 \item[\samp{c} (char)]
 552 The Python object must be a string of length 1.  The C argument must
 553 be a \code{(char*)}.  (Don't pass an \code{(int*)}!)
 554
 555 \item[\samp{f} (float)]
 556 The object must be a Python int or float.  The C argument must be a
 557 \code{(float*)}.
 558
 559 \item[\samp{d} (double)]
 560 The object must be a Python int or float.  The C argument must be a
 561 \code{(double*)}.
 562
 563 \item[\samp{S} (string object)]
 564 The object must be a Python string.  The C argument must be an
 565 \code{(object**)} (i.e. the address of an object pointer).  The C
 566 program thus gets back the actual string object that was passed, not
 567 just a pointer to its array of characters and its size as for format
 568 character \samp{s}.  The reference count of the object has not been
 569 increased.
 570
 571 \item[\samp{O} (object)]
 572 The object can be any Python object, including None, but not
 573 \code{NULL}.  The C argument must be an \code{(object**)}.  This can be
 574 used if an argument list must contain objects of a type for which no
 575 format letter exist: the caller must then check that it has the right
 576 type.  The reference count of the object has not been increased.
 577
 578 \item[\samp{(} (tuple)]
 579 The object must be a Python tuple.  Following the \samp{(} character
 580 in the format string must come a number of format units describing the
 581 elements of the tuple, followed by a \samp{)} character.  Tuple
 582 format units may be nested.  (There are no exceptions for empty and
 583 singleton tuples; \samp{()} specifies an empty tuple and \samp{(i)} a
 584 singleton of one integer.  Normally you don't want to use the latter,
 585 since it is hard for the Python user to specify.
 586
 587 \end{description}
 588
 589 More format characters will probably be added as the need arises.  It
 590 should (but currently isn't) be allowed to use Python long integers
 591 whereever integers are expected, and perform a range check.  (A range
 592 check is in fact always necessary for the \samp{b}, \samp{h} and
 593 \samp{i} format letters, but this is currently not implemented.)
 594
 595 Some example calls:
 596
 597 \begin{verbatim}
 598     int ok;
 599     int i, j;
 600     long k, l;
 601     char *s;
 602     int size;
 603
 604     ok = getargs(args, ""); /* No arguments */
 605         /* Python call: f() */
 606
 607     ok = getargs(args, "s", &s); /* A string */
 608         /* Possible Python call: f('whoops!') */
 609
 610     ok = getargs(args, "(lls)", &k, &l, &s); /* Two longs and a string */
 611         /* Possible Python call: f(1, 2, 'three') */
 612
 613     ok = getargs(args, "((ii)s#)", &i, &j, &s, &size);
 614         /* A pair of ints and a string, whose size is also returned */
 615         /* Possible Python call: f(1, 2, 'three') */
 616
 617     {
 618         int left, top, right, bottom, h, v;
 619         ok = getargs(args, "(((ii)(ii))(ii))",
 620                  &left, &top, &right, &bottom, &h, &v);
 621                  /* A rectangle and a point */
 622                  /* Possible Python call:
 623                     f( ((0, 0), (400, 300)), (10, 10)) */
 624     }
 625 \end{verbatim}
 626
 627 Note that the `top level' of a non-empty format string must consist of
 628 a single unit; strings like \samp{is} and \samp{(ii)s\#} are not valid
 629 format strings.  (But \samp{s\#} is.)  If you have multiple arguments,
 630 the format must therefore always be enclosed in parentheses, as in the
 631 examples \samp{((ii)s\#)} and \samp{(((ii)(ii))(ii)}.  (The current
 632 implementation does not complain when more than one unparenthesized
 633 format unit is given.  Sorry.)
 634
 635 The \code{getargs()} function does not support variable-length
 636 argument lists.  In simple cases you can fake these by trying several
 637 calls to
 638 \code{getargs()} until one succeeds, but you must take care to call
 639 \code{err_clear()} before each retry.  For example:
 640
 641 \begin{verbatim}
 642     static object *my_method(self, args) object *self, *args; {
 643         int i, j, k;
 644
 645         if (getargs(args, "(ii)", &i, &j)) {
 646             k = 0; /* Use default third argument */
 647         }
 648         else {
 649             err_clear();
 650             if (!getargs(args, "(iii)", &i, &j, &k))
 651                 return NULL;
 652         }
 653         /* ... use i, j and k here ... */
 654         INCREF(None);
 655         return None;
 656     }
 657 \end{verbatim}
 658
 659 (It is possible to think of an extension to the definition of format
 660 strings to accommodate this directly, e.g. placing a \samp{|} in a
 661 tuple might specify that the remaining arguments are optional.
 662 \code{getargs()} should then return one more than the number of
 663 variables stored into.)
 664
 665 Advanced users note: If you set the `varargs' flag in the method list
 666 for a function, the argument will always be a tuple (the `raw argument
 667 list').  In this case you must enclose single and empty argument lists
 668 in parentheses, e.g. \samp{(s)} and \samp{()}.
 669
 670
 671 \section{The {\tt mkvalue()} function}
 672
 673 This function is the counterpart to \code{getargs()}.  It is declared
 674 in \file{Include/modsupport.h} as follows:
 675
 676 \begin{verbatim}
 677     object *mkvalue(char *format, ...);
 678 \end{verbatim}
 679
 680 It supports exactly the same format letters as \code{getargs()}, but
 681 the arguments (which are input to the function, not output) must not
 682 be pointers, just values.  If a byte, short or float is passed to a
 683 varargs function, it is widened by the compiler to int or double, so
 684 \samp{b} and \samp{h} are treated as \samp{i} and \samp{f} is
 685 treated as \samp{d}.  \samp{S} is treated as \samp{O}, \samp{s} is
 686 treated as \samp{z}.  \samp{z\#} and \samp{s\#} are supported: a
 687 second argument specifies the length of the data (negative means use
 688 \code{strlen()}).  \samp{S} and \samp{O} add a reference to their
 689 argument (so you should \code{DECREF()} it if you've just created it
 690 and aren't going to use it again).
 691
 692 If the argument for \samp{O} or \samp{S} is a \code{NULL} pointer, it is
 693 assumed that this was caused because the call producing the argument
 694 found an error and set an exception.  Therefore, \code{mkvalue()} will
 695 return \code{NULL} but won't set an exception if one is already set.
 696 If no exception is set, \code{SystemError} is set.
 697
 698 If there is an error in the format string, the \code{SystemError}
 699 exception is set, since it is the calling C code's fault, not that of
 700 the Python user who sees the exception.
 701
 702 Example:
 703
 704 \begin{verbatim}
 705     return mkvalue("(ii)", 0, 0);
 706 \end{verbatim}
 707
 708 returns a tuple containing two zeros.  (Outer parentheses in the
 709 format string are actually superfluous, but you can use them for
 710 compatibility with \code{getargs()}, which requires them if more than
 711 one argument is expected.)
 712
 713
 714 \section{Reference counts}
 715
 716 Here's a useful explanation of \code{INCREF()} and \code{DECREF()}
 717 (after an original by Sjoerd Mullender).
 718
 719 Use \code{XINCREF()} or \code{XDECREF()} instead of \code{INCREF()} or
 720 \code{DECREF()} when the argument may be \code{NULL} --- the versions
 721 without \samp{X} are faster but wull dump core when they encounter a
 722 \code{NULL} pointer.
 723
 724 The basic idea is, if you create an extra reference to an object, you
 725 must \code{INCREF()} it, if you throw away a reference to an object,
 726 you must \code{DECREF()} it.  Functions such as
 727 \code{newstringobject()}, \code{newsizedstringobject()},
 728 \code{newintobject()}, etc. create a reference to an object.  If you
 729 want to throw away the object thus created, you must use
 730 \code{DECREF()}.
 731
 732 If you put an object into a tuple or list using \code{settupleitem()}
 733 or \code{setlistitem()}, the idea is that you usually don't want to
 734 keep a reference of your own around, so Python does not
 735 \code{INCREF()} the elements.  It does \code{DECREF()} the old value.
 736 This means that if you put something into such an object using the
 737 functions Python provides for this, you must \code{INCREF()} the
 738 object if you also want to keep a separate reference to the object around.
 739 Also, if you replace an element, you should \code{INCREF()} the old
 740 element first if you want to keep it.  If you didn't \code{INCREF()}
 741 it before you replaced it, you are not allowed to look at it anymore,
 742 since it may have been freed.
 743
 744 Returning an object to Python (i.e. when your C function returns)
 745 creates a reference to an object, but it does not change the reference
 746 count.  When your code does not keep another reference to the object,
 747 you should not \code{INCREF()} or \code{DECREF()} it (assuming it is a
 748 newly created object).  When you do keep a reference around, you
 749 should \code{INCREF()} the object.  Also, when you return a global
 750 object such as \code{None}, you should \code{INCREF()} it.
 751
 752 If you want to return a tuple, you should consider using
 753 \code{mkvalue()}.  This function creates a new tuple with a reference
 754 count of 1 which you can return.  If any of the elements you put into
 755 the tuple are objects (format codes \samp{O} or \samp{S}), they
 756 are \code{INCREF()}'ed by \code{mkvalue()}.  If you don't want to keep
 757 references to those elements around, you should \code{DECREF()} them
 758 after having called \code{mkvalue()}.
 759
 760 Usually you don't have to worry about arguments.  They are
 761 \code{INCREF()}'ed before your function is called and
 762 \code{DECREF()}'ed after your function returns.  When you keep a
 763 reference to an argument, you should \code{INCREF()} it and
 764 \code{DECREF()} when you throw it away.  Also, when you return an
 765 argument, you should \code{INCREF()} it, because returning the
 766 argument creates an extra reference to it.
 767
 768 If you use \code{getargs()} to parse the arguments, you can get a
 769 reference to an object (by using \samp{O} in the format string).  This
 770 object was not \code{INCREF()}'ed, so you should not \code{DECREF()}
 771 it.  If you want to keep the object, you must \code{INCREF()} it
 772 yourself.
 773
 774 If you create your own type of objects, you should use \code{NEWOBJ()}
 775 to create the object.  This sets the reference count to 1.  If you
 776 want to throw away the object, you should use \code{DECREF()}.  When
 777 the reference count reaches zero, your type's \code{dealloc()}
 778 function is called.  In it, you should \code{DECREF()} all object to
 779 which you keep references in your object, but you should not use
 780 \code{DECREF()} on your object.  You should use \code{DEL()} instead.
 781
 782
 783 \section{Writing extensions in C++}
 784
 785 It is possible to write extension modules in C++.  Some restrictions
 786 apply: since the main program (the Python interpreter) is compiled and
 787 linked by the C compiler, global or static objects with constructors
 788 cannot be used.  All functions that will be called directly or
 789 indirectly (i.e. via function pointers) by the Python interpreter will
 790 have to be declared using \code{extern "C"}; this applies to all
 791 `methods' as well as to the module's initialization function.
 792 It is unnecessary to enclose the Python header files in
 793 \code{extern "C" \{...\}} --- they do this already.
 794
 795
 796 \chapter{Embedding Python in another application}
 797
 798 Embedding Python is similar to extending it, but not quite.  The
 799 difference is that when you extend Python, the main program of the
 800 application is still the Python interpreter, while of you embed
 801 Python, the main program may have nothing to do with Python ---
 802 instead, some parts of the application occasionally call the Python
 803 interpreter to run some Python code.
 804
 805 So if you are embedding Python, you are providing your own main
 806 program.  One of the things this main program has to do is initialize
 807 the Python interpreter.  At the very least, you have to call the
 808 function \code{initall()}.  There are optional calls to pass command
 809 line arguments to Python.  Then later you can call the interpreter
 810 from any part of the application.
 811
 812 There are several different ways to call the interpreter: you can pass
 813 a string containing Python statements to \code{run_command()}, or you
 814 can pass a stdio file pointer and a file name (for identification in
 815 error messages only) to \code{run_script()}.  You can also call the
 816 lower-level operations described in the previous chapters to construct
 817 and use Python objects.
 818
 819 A simple demo of embedding Python can be found in the directory
 820 \file{Demo/embed}.
 821
 822
 823 \section{Embedding Python in C++}
 824
 825 It is also possible to embed Python in a C++ program; how this is done
 826 exactly will depend on the details of the C++ system used; in general
 827 you will need to write the main program in C++, and use the C++
 828 compiler to compile and link your program.  There is no need to
 829 recompile Python itself with C++.
 830
 831
 832 \chapter{Dynamic Loading}
 833
 834 On most modern systems it is possible to configure Python to support
 835 dynamic loading of extension modules implemented in C.  When shared
 836 libraries are used dynamic loading is configured automatically;
 837 otherwise you have to select it as a build option (see below).  Once
 838 configured, dynamic loading is trivial to use: when a Python program
 839 executes \code{import foo}, the search for modules tries to find a
 840 file \file{foomodule.o} (\file{foomodule.so} when using shared
 841 libraries) in the module search path, and if one is found, it is
 842 loaded into the executing binary and executed.  Once loaded, the
 843 module acts just like a built-in extension module.
 844
 845 The advantages of dynamic loading are twofold: the `core' Python
 846 binary gets smaller, and users can extend Python with their own
 847 modules implemented in C without having to build and maintain their
 848 own copy of the Python interpreter.  There are also disadvantages:
 849 dynamic loading isn't available on all systems (this just means that
 850 on some systems you have to use static loading), and dynamically
 851 loading a module that was compiled for a different version of Python
 852 (e.g. with a different representation of objects) may dump core.
 853
 854
 855 \section{Configuring and building the interpreter for dynamic loading}
 856
 857 There are three styles of dynamic loading: one using shared libraries,
 858 one using SGI IRIX 4 dynamic loading, and one using GNU dynamic
 859 loading.
 860
 861 \subsection{Shared libraries}
 862
 863 The following systems supports dynamic loading using shared libraries:
 864 SunOS 4; Solaris 2; SGI IRIX 5 (but not SGI IRIX 4!); and probably all
 865 systems derived from SVR4, or at least those SVR4 derivatives that
 866 support shared libraries (are there any that don't?).
 867
 868 You don't need to do anything to configure dynamic loading on these
 869 systems --- the \file{configure} detects the presence of the
 870 \file{<dlfcn.h>} header file and automatically configures dynamic
 871 loading.
 872
 873 \subsection{SGI dynamic loading}
 874
 875 Only SGI IRIX 4 supports dynamic loading of modules using SGI dynamic
 876 loading.  (SGI IRIX 5 might also support it but it is inferior to
 877 using shared libraries so there is no reason to; a small test didn't
 878 work right away so I gave up trying to support it.)
 879
 880 Before you build Python, you first need to fetch and build the \code{dl}
 881 package written by Jack Jansen.  This is available by anonymous ftp
 882 from host \file{ftp.cwi.nl}, directory \file{pub/dynload}, file
 883 \file{dl-1.6.tar.Z}.  (The version number may change.)  Follow the
 884 instructions in the package's \file{README} file to build it.
 885
 886 Once you have built \code{dl}, you can configure Python to use it.  To
 887 this end, you run the \file{configure} script with the option
 888 \code{--with-dl=\var{directory}} where \var{directory} is the absolute
 889 pathname of the \code{dl} directory.
 890
 891 Now build and install Python as you normally would (see the
 892 \file{README} file in the toplevel Python directory.)
 893
 894 \subsection{GNU dynamic loading}
 895
 896 GNU dynamic loading supports (according to its \file{README} file) the
 897 following hardware and software combinations: VAX (Ultrix), Sun 3
 898 (SunOS 3.4 and 4.0), Sparc (SunOS 4.0), Sequent Symmetry (Dynix), and
 899 Atari ST.  There is no reason to use it on a Sparc; I haven't seen a
 900 Sun 3 for years so I don't know if these have shared libraries or not.
 901
 902 You need to fetch and build two packages.  One is GNU DLD 3.2.3,
 903 available by anonymous ftp from host \file{ftp.cwi.nl}, directory
 904 \file{pub/dynload}, file \file{dld-3.2.3.tar.Z}.  (As far as I know,
 905 no further development on GNU DLD is being done.)  The other is an
 906 emulation of Jack Jansen's \code{dl} package that I wrote on top of
 907 GNU DLD 3.2.3.  This is available from the same host and directory,
 908 file dl-dld-1.1.tar.Z.  (The version number may change --- but I doubt
 909 it will.)  Follow the instructions in each package's \file{README}
 910 file to configure build them.
 911
 912 Now configure Python.  Run the \file{configure} script with the option
 913 \code{--with-dl-dld=\var{dl-directory},\var{dld-directory}} where
 914 \var{dl-directory} is the absolute pathname of the directory where you
 915 have built the \file{dl-dld} package, and \var{dld-directory} is that
 916 of the GNU DLD package.  The Python interpreter you build hereafter
 917 will support GNU dynamic loading.
 918
 919
 920 \section{Building a dynamically loadable module}
 921
 922 Since there are three styles of dynamic loading, there are also three
 923 groups of instructions for building a dynamically loadable module.
 924 Instructions common for all three styles are given first.  Assuming
 925 your module is called \code{foo}, the source filename must be
 926 \file{foomodule.c}, so the object name is \file{foomodule.o}.  The
 927 module must be written as a normal Python extension module (as
 928 described earlier).
 929
 930 Note that in all cases you will have to create your own Makefile that
 931 compiles your module file(s).  This Makefile will have to pass two
 932 \samp{-I} arguments to the C compiler which will make it find the
 933 Python header files.  If the Make variable \var{PYTHONTOP} points to
 934 the toplevel Python directory, your \var{CFLAGS} Make variable should
 935 contain the options \samp{-I\$(PYTHONTOP) -I\$(PYTHONTOP)/Include}.
 936 (Most header files are in the \file{Include} subdirectory, but the
 937 \file{config.h} header lives in the toplevel directory.)  You must
 938 also add \samp{-DHAVE_CONFIG_H} to the definition of \var{CFLAGS} to
 939 direct the Python headers to include \file{config.h}.
 940
 941
 942 \subsection{Shared libraries}
 943
 944 You must link the \samp{.o} file to produce a shared library.  This is
 945 done using a special invocation of the \UNIX{} loader/linker, {\em
 946 ld}(1).  Unfortunately the invocation differs slightly per system.
 947
 948 On SunOS 4, use
 949 \begin{verbatim}
 950     ld foomodule.o -o foomodule.so
 951 \end{verbatim}
 952
 953 On Solaris 2, use
 954 \begin{verbatim}
 955     ld -G foomodule.o -o foomodule.so
 956 \end{verbatim}
 957
 958 On SGI IRIX 5, use
 959 \begin{verbatim}
 960     ld -shared foomodule.o -o foomodule.so
 961 \end{verbatim}
 962
 963 On other systems, consult the manual page for {\em ld}(1) to find what
 964 flags, if any, must be used.
 965
 966 If your extension module uses system libraries that haven't already
 967 been linked with Python (e.g. a windowing system), these must be
 968 passed to the {\em ld} command as \samp{-l} options after the
 969 \samp{.o} file.
 970
 971 The resulting file \file{foomodule.so} must be copied into a directory
 972 along the Python module search path.
 973
 974
 975 \subsection{SGI dynamic loading}
 976
 977 {bf IMPORTANT:} You must compile your extension module with the
 978 additional C flag \samp{-G0} (or \samp{-G 0}).  This instruct the
 979 assembler to generate position-independent code.
 980
 981 You don't need to link the resulting \file{foomodule.o} file; just
 982 copy it into a directory along the Python module search path.
 983
 984 The first time your extension is loaded, it takes some extra time and
 985 a few messages may be printed.  This creates a file
 986 \file{foomodule.ld} which is an image that can be loaded quickly into
 987 the Python interpreter process.  When a new Python interpreter is
 988 installed, the \code{dl} package detects this and rebuilds
 989 \file{foomodule.ld}.  The file \file{foomodule.ld} is placed in the
 990 directory where \file{foomodule.o} was found, unless this directory is
 991 unwritable; in that case it is placed in a temporary
 992 directory.\footnote{Check the manual page of the \code{dl} package for
 993 details.}
 994
 995 If your extension modules uses additional system libraries, you must
 996 create a file \file{foomodule.libs} in the same directory as the
 997 \file{foomodule.o}.  This file should contain one or more lines with
 998 whitespace-separated options that will be passed to the linker ---
 999 normally only \samp{-l} options or absolute pathnames of libraries
1000 (\samp{.a} files) should be used.
1001
1002
1003 \subsection{GNU dynamic loading}
1004
1005 Just copy \file{foomodule.o} into a directory along the Python module
1006 search path.
1007
1008 If your extension modules uses additional system libraries, you must
1009 create a file \file{foomodule.libs} in the same directory as the
1010 \file{foomodule.o}.  This file should contain one or more lines with
1011 whitespace-separated absolute pathnames of libraries (\samp{.a}
1012 files).  No \samp{-l} options can be used.
1013
1014
1015 \input{ext.ind}
1016
1017 \end{document}