Doc/ext.tex

   1 \documentstyle[twoside,11pt,myformat]{report}
   2
   3 % XXX PM Modulator
   4
   5 \title{Extending and Embedding the Python Interpreter}
   6
   7 \input{boilerplate}
   8
   9 % Tell \index to actually write the .idx file
  10 \makeindex
  11
  12 \begin{document}
  13
  14 \pagenumbering{roman}
  15
  16 \maketitle
  17
  18 \input{copyright}
  19
  20 \begin{abstract}
  21
  22 \noindent
  23 Python is an interpreted, object-oriented programming language.  This
  24 document describes how to write modules in C or \Cpp{} to extend the
  25 Python interpreter with new modules.  Those modules can define new
  26 functions but also new object types and their methods.  The document
  27 also describes how to embed the Python interpreter in another
  28 application, for use as an extension language.  Finally, it shows how
  29 to compile and link extension modules so that they can be loaded
  30 dynamically (at run time) into the interpreter, if the underlying
  31 operating system supports this feature.
  32
  33 This document assumes basic knowledge about Python.  For an informal
  34 introduction to the language, see the Python Tutorial.  The Python
  35 Reference Manual gives a more formal definition of the language.  The
  36 Python Library Reference documents the existing object types,
  37 functions and modules (both built-in and written in Python) that give
  38 the language its wide application range.
  39
  40 \end{abstract}
  41
  42 \pagebreak
  43
  44 {
  45 \parskip = 0mm
  46 \tableofcontents
  47 }
  48
  49 \pagebreak
  50
  51 \pagenumbering{arabic}
  52
  53
  54 \chapter{Extending Python with C or \Cpp{} code}
  55
  56
  57 \section{Introduction}
  58
  59 It is quite easy to add new built-in modules to Python, if you know
  60 how to program in C.  Such \dfn{extension modules} can do two things
  61 that can't be done directly in Python: they can implement new built-in
  62 object types, and they can call C library functions and system calls.
  63
  64 To support extensions, the Python API (Application Programmers
  65 Interface) defines a set of functions, macros and variables that
  66 provide access to most aspects of the Python run-time system.  The
  67 Python API is incorporated in a C source file by including the header
  68 \code{"Python.h"}.
  69
  70 The compilation of an extension module depends on its intended use as
  71 well as on your system setup; details are given in a later section.
  72
  73
  74 \section{A Simple Example}
  75
  76 Let's create an extension module called \samp{spam} (the favorite food
  77 of Monty Python fans...) and let's say we want to create a Python
  78 interface to the C library function \code{system()}.\footnote{An
  79 interface for this function already exists in the standard module
  80 \code{os} --- it was chosen as a simple and straightfoward example.}
  81 This function takes a null-terminated character string as argument and
  82 returns an integer.  We want this function to be callable from Python
  83 as follows:
  84
  85 \begin{verbatim}
  86     >>> import spam
  87     >>> status = spam.system("ls -l")
  88 \end{verbatim}
  89
  90 Begin by creating a file \samp{spammodule.c}.  (In general, if a
  91 module is called \samp{spam}, the C file containing its implementation
  92 is called \file{spammodule.c}; if the module name is very long, like
  93 \samp{spammify}, the module name can be just \file{spammify.c}.)
  94
  95 The first line of our file can be:
  96
  97 \begin{verbatim}
  98     #include "Python.h"
  99 \end{verbatim}
 100
 101 which pulls in the Python API (you can add a comment describing the
 102 purpose of the module and a copyright notice if you like).
 103
 104 All user-visible symbols defined by \code{"Python.h"} have a prefix of
 105 \samp{Py} or \samp{PY}, except those defined in standard header files.
 106 For convenience, and since they are used extensively by the Python
 107 interpreter, \code{"Python.h"} includes a few standard header files:
 108 \code{<stdio.h>}, \code{<string.h>}, \code{<errno.h>}, and
 109 \code{<stdlib.h>}.  If the latter header file does not exist on your
 110 system, it declares the functions \code{malloc()}, \code{free()} and
 111 \code{realloc()} directly.
 112
 113 The next thing we add to our module file is the C function that will
 114 be called when the Python expression \samp{spam.system(\var{string})}
 115 is evaluated (we'll see shortly how it ends up being called):
 116
 117 \begin{verbatim}
 118     static PyObject *
 119     spam_system(self, args)
 120         PyObject *self;
 121         PyObject *args;
 122     {
 123         char *command;
 124         int sts;
 125         if (!PyArg_ParseTuple(args, "s", &command))
 126             return NULL;
 127         sts = system(command);
 128         return Py_BuildValue("i", sts);
 129     }
 130 \end{verbatim}
 131
 132 There is a straightforward translation from the argument list in
 133 Python (e.g.\ the single expression \code{"ls -l"}) to the arguments
 134 passed to the C function.  The C function always has two arguments,
 135 conventionally named \var{self} and \var{args}.
 136
 137 The \var{self} argument is only used when the C function implements a
 138 builtin method.  This will be discussed later. In the example,
 139 \var{self} will always be a \code{NULL} pointer, since we are defining
 140 a function, not a method.  (This is done so that the interpreter
 141 doesn't have to understand two different types of C functions.)
 142
 143 The \var{args} argument will be a pointer to a Python tuple object
 144 containing the arguments.  Each item of the tuple corresponds to an
 145 argument in the call's argument list.  The arguments are Python
 146 objects -- in order to do anything with them in our C function we have
 147 to convert them to C values.  The function \code{PyArg_ParseTuple()}
 148 in the Python API checks the argument types and converts them to C
 149 values.  It uses a template string to determine the required types of
 150 the arguments as well as the types of the C variables into which to
 151 store the converted values.  More about this later.
 152
 153 \code{PyArg_ParseTuple()} returns true (nonzero) if all arguments have
 154 the right type and its components have been stored in the variables
 155 whose addresses are passed.  It returns false (zero) if an invalid
 156 argument list was passed.  In the latter case it also raises an
 157 appropriate exception by so the calling function can return
 158 \code{NULL} immediately (as we saw in the example).
 159
 160
 161 \section{Intermezzo: Errors and Exceptions}
 162
 163 An important convention throughout the Python interpreter is the
 164 following: when a function fails, it should set an exception condition
 165 and return an error value (usually a \code{NULL} pointer).  Exceptions
 166 are stored in a static global variable inside the interpreter; if this
 167 variable is \code{NULL} no exception has occurred.  A second global
 168 variable stores the ``associated value'' of the exception (the second
 169 argument to \code{raise}).  A third variable contains the stack
 170 traceback in case the error originated in Python code.  These three
 171 variables are the C equivalents of the Python variables
 172 \code{sys.exc_type}, \code{sys.exc_value} and \code{sys.exc_traceback}
 173 (see the section on module \code{sys} in the Library Reference
 174 Manual).  It is important to know about them to understand how errors
 175 are passed around.
 176
 177 The Python API defines a number of functions to set various types of
 178 exceptions.
 179
 180 The most common one is \code{PyErr_SetString()}.  Its arguments are an
 181 exception object and a C string.  The exception object is usually a
 182 predefined object like \code{PyExc_ZeroDivisionError}.  The C string
 183 indicates the cause of the error and is converted to a Python string
 184 object and stored as the ``associated value'' of the exception.
 185
 186 Another useful function is \code{PyErr_SetFromErrno()}, which only
 187 takes an exception argument and constructs the associated value by
 188 inspection of the (\UNIX{}) global variable \code{errno}.  The most
 189 general function is \code{PyErr_SetObject()}, which takes two object
 190 arguments, the exception and its associated value.  You don't need to
 191 \code{Py_INCREF()} the objects passed to any of these functions.
 192
 193 You can test non-destructively whether an exception has been set with
 194 \code{PyErr_Occurred()}.  This returns the current exception object,
 195 or \code{NULL} if no exception has occurred.  You normally don't need
 196 to call \code{PyErr_Occurred()} to see whether an error occurred in a
 197 function call, since you should be able to tell from the return value.
 198
 199 When a function \var{f} that calls another function var{g} detects
 200 that the latter fails, \var{f} should itself return an error value
 201 (e.g. \code{NULL} or \code{-1}).  It should \emph{not} call one of the
 202 \code{PyErr_*()} functions --- one has already been called by \var{g}.
 203 \var{f}'s caller is then supposed to also return an error indication
 204 to \emph{its} caller, again \emph{without} calling \code{PyErr_*()},
 205 and so on --- the most detailed cause of the error was already
 206 reported by the function that first detected it.  Once the error
 207 reaches the Python interpreter's main loop, this aborts the currently
 208 executing Python code and tries to find an exception handler specified
 209 by the Python programmer.
 210
 211 (There are situations where a module can actually give a more detailed
 212 error message by calling another \code{PyErr_*()} function, and in
 213 such cases it is fine to do so.  As a general rule, however, this is
 214 not necessary, and can cause information about the cause of the error
 215 to be lost: most operations can fail for a variety of reasons.)
 216
 217 To ignore an exception set by a function call that failed, the exception
 218 condition must be cleared explicitly by calling \code{PyErr_Clear()}.
 219 The only time C code should call \code{PyErr_Clear()} is if it doesn't
 220 want to pass the error on to the interpreter but wants to handle it
 221 completely by itself (e.g. by trying something else or pretending
 222 nothing happened).
 223
 224 Note that a failing \code{malloc()} call must be turned into an
 225 exception --- the direct caller of \code{malloc()} (or
 226 \code{realloc()}) must call \code{PyErr_NoMemory()} and return a
 227 failure indicator itself.  All the object-creating functions
 228 (\code{PyInt_FromLong()} etc.) already do this, so only if you call
 229 \code{malloc()} directly this note is of importance.
 230
 231 Also note that, with the important exception of
 232 \code{PyArg_ParseTuple()} and friends, functions that return an
 233 integer status usually return a positive value or zero for success and
 234 \code{-1} for failure, like \UNIX{} system calls.
 235
 236 Finally, be careful to clean up garbage (by making \code{Py_XDECREF()}
 237 or \code{Py_DECREF()} calls for objects you have already created) when
 238 you return an error indicator!
 239
 240 The choice of which exception to raise is entirely yours.  There are
 241 predeclared C objects corresponding to all built-in Python exceptions,
 242 e.g. \code{PyExc_ZeroDevisionError} which you can use directly.  Of
 243 course, you should choose exceptions wisely --- don't use
 244 \code{PyExc_TypeError} to mean that a file couldn't be opened (that
 245 should probably be \code{PyExc_IOError}).  If something's wrong with
 246 the argument list, the \code{PyArg_ParseTuple()} function usually
 247 raises \code{PyExc_TypeError}.  If you have an argument whose value
 248 which must be in a particular range or must satisfy other conditions,
 249 \code{PyExc_ValueError} is appropriate.
 250
 251 You can also define a new exception that is unique to your module.
 252 For this, you usually declare a static object variable at the
 253 beginning of your file, e.g.
 254
 255 \begin{verbatim}
 256     static PyObject *SpamError;
 257 \end{verbatim}
 258
 259 and initialize it in your module's initialization function
 260 (\code{initspam()}) with a string object, e.g. (leaving out the error
 261 checking for now):
 262
 263 \begin{verbatim}
 264     void
 265     initspam()
 266     {
 267         PyObject *m, *d;
 268         m = Py_InitModule("spam", SpamMethods);
 269         d = PyModule_GetDict(m);
 270         SpamError = PyString_FromString("spam.error");
 271         PyDict_SetItemString(d, "error", SpamError);
 272     }
 273 \end{verbatim}
 274
 275 Note that the Python name for the exception object is
 276 \code{spam.error}.  It is conventional for module and exception names
 277 to be spelled in lower case.  It is also conventional that the
 278 \emph{value} of the exception object is the same as its name, e.g.\
 279 the string \code{"spam.error"}.
 280
 281
 282 \section{Back to the Example}
 283
 284 Going back to our example function, you should now be able to
 285 understand this statement:
 286
 287 \begin{verbatim}
 288         if (!PyArg_ParseTuple(args, "s", &command))
 289             return NULL;
 290 \end{verbatim}
 291
 292 It returns \code{NULL} (the error indicator for functions returning
 293 object pointers) if an error is detected in the argument list, relying
 294 on the exception set by \code{PyArg_ParseTuple()}.  Otherwise the
 295 string value of the argument has been copied to the local variable
 296 \code{command}.  This is a pointer assignment and you are not supposed
 297 to modify the string to which it points (so in Standard C, the variable
 298 \code{command} should properly be declared as \samp{const char
 299 *command}).
 300
 301 The next statement is a call to the \UNIX{} function \code{system()},
 302 passing it the string we just got from \code{PyArg_ParseTuple()}:
 303
 304 \begin{verbatim}
 305         sts = system(command);
 306 \end{verbatim}
 307
 308 Our \code{spam.system()} function must return the value of \code{sys}
 309 as a Python object.  This is done using the function
 310 \code{Py_BuildValue()}, which is something like the inverse of
 311 \code{PyArg_ParseTuple()}: it takes a format string and an arbitrary
 312 number of C values, and returns a new Python object.  More info on
 313 \code{Py_BuildValue()} is given later.
 314
 315 \begin{verbatim}
 316         return Py_BuildValue("i", sts);
 317 \end{verbatim}
 318
 319 In this case, it will return an integer object.  (Yes, even integers
 320 are objects on the heap in Python!)
 321
 322 If you have a C function that returns no useful argument (a function
 323 returning \code{void}), the corresponding Python function must return
 324 \code{None}.   You need this idiom to do so:
 325
 326 \begin{verbatim}
 327         Py_INCREF(Py_None);
 328         return Py_None;
 329 \end{verbatim}
 330
 331 \code{Py_None} is the C name for the special Python object
 332 \code{None}.  It is a genuine Python object (not a \code{NULL}
 333 pointer, which means ``error'' in most contexts, as we have seen).
 334
 335
 336 \section{The Module's Method Table and Initialization Function}
 337
 338 I promised to show how \code{spam_system()} is called from Python
 339 programs.  First, we need to list its name and address in a ``method
 340 table'':
 341
 342 \begin{verbatim}
 343     static PyMethodDef SpamMethods[] = {
 344         ...
 345         {"system",  spam_system, 1},
 346         ...
 347         {NULL,      NULL}        /* Sentinel */
 348     };
 349 \end{verbatim}
 350
 351 Note the third entry (\samp{1}).  This is a flag telling the
 352 interpreter the calling convention to be used for the C function.  It
 353 should normally always be \samp{1}; a value of \samp{0} means that an
 354 obsolete variant of \code{PyArg_ParseTuple()} is used.
 355
 356 The method table must be passed to the interpreter in the module's
 357 initialization function (which should be the only non-\code{static}
 358 item defined in the module file):
 359
 360 \begin{verbatim}
 361     void
 362     initspam()
 363     {
 364         (void) Py_InitModule("spam", SpamMethods);
 365     }
 366 \end{verbatim}
 367
 368 When the Python program imports module \code{spam} for the first time,
 369 \code{initspam()} is called.  It calls \code{Py_InitModule()}, which
 370 creates a ``module object'' (which is inserted in the dictionary
 371 \code{sys.modules} under the key \code{"spam"}), and inserts built-in
 372 function objects into the newly created module based upon the table
 373 (an array of \code{PyMethodDef} structures) that was passed as its
 374 second argument.  \code{Py_InitModule()} returns a pointer to the
 375 module object that it creates (which is unused here).  It aborts with
 376 a fatal error if the module could not be initialized satisfactorily,
 377 so the caller doesn't need to check for errors.
 378
 379
 380 \section{Compilation and Linkage}
 381
 382 There are two more things to do before you can use your new extension:
 383 compiling and linking it with the Python system.  If you use dynamic
 384 loading, the details depend on the style of dynamic loading your
 385 system uses; see the chapter on Dynamic Loading for more info about
 386 this.
 387
 388 If you can't use dynamic loading, or if you want to make your module a
 389 permanent part of the Python interpreter, you will have to change the
 390 configuration setup and rebuild the interpreter.  Luckily, this is
 391 very simple: just place your file (\file{spammodule.c} for example) in
 392 the \file{Modules} directory, add a line to the file
 393 \file{Modules/Setup} describing your file:
 394
 395 \begin{verbatim}
 396     spam spammodule.o
 397 \end{verbatim}
 398
 399 and rebuild the interpreter by running \code{make} in the toplevel
 400 directory.  You can also run \code{make} in the \file{Modules}
 401 subdirectory, but then you must first rebuilt the \file{Makefile}
 402 there by running \code{make Makefile}.  (This is necessary each time
 403 you change the \file{Setup} file.)
 404
 405 If your module requires additional libraries to link with, these can
 406 be listed on the line in the \file{Setup} file as well, for instance:
 407
 408 \begin{verbatim}
 409     spam spammodule.o -lX11
 410 \end{verbatim}
 411
 412
 413 \section{Calling Python Functions From C}
 414
 415 So far we have concentrated on making C functions callable from
 416 Python.  The reverse is also useful: calling Python functions from C.
 417 This is especially the case for libraries that support so-called
 418 ``callback'' functions.  If a C interface makes use of callbacks, the
 419 equivalent Python often needs to provide a callback mechanism to the
 420 Python programmer; the implementation will require calling the Python
 421 callback functions from a C callback.  Other uses are also imaginable.
 422
 423 Fortunately, the Python interpreter is easily called recursively, and
 424 there is a standard interface to call a Python function.  (I won't
 425 dwell on how to call the Python parser with a particular string as
 426 input --- if you're interested, have a look at the implementation of
 427 the \samp{-c} command line option in \file{Python/pythonmain.c}.)
 428
 429 Calling a Python function is easy.  First, the Python program must
 430 somehow pass you the Python function object.  You should provide a
 431 function (or some other interface) to do this.  When this function is
 432 called, save a pointer to the Python function object (be careful to
 433 \code{Py_INCREF()} it!) in a global variable --- or whereever you see fit.
 434 For example, the following function might be part of a module
 435 definition:
 436
 437 \begin{verbatim}
 438     static PyObject *my_callback = NULL;
 439
 440     static PyObject *
 441     my_set_callback(dummy, arg)
 442         PyObject *dummy, *arg;
 443     {
 444         Py_XDECREF(my_callback); /* Dispose of previous callback */
 445         Py_XINCREF(arg); /* Add a reference to new callback */
 446         my_callback = arg; /* Remember new callback */
 447         /* Boilerplate to return "None" */
 448         Py_INCREF(Py_None);
 449         return Py_None;
 450     }
 451 \end{verbatim}
 452
 453 The macros \code{Py_XINCREF()} and \code{Py_XDECREF()} increment/decrement
 454 the reference count of an object and are safe in the presence of
 455 \code{NULL} pointers.  More info on them in the section on Reference
 456 Counts below.
 457
 458 Later, when it is time to call the function, you call the C function
 459 \code{PyEval_CallObject()}.  This function has two arguments, both
 460 pointers to arbitrary Python objects: the Python function, and the
 461 argument list.  The argument list must always be a tuple object, whose
 462 length is the number of arguments.  To call the Python function with
 463 no arguments, pass an empty tuple; to call it with one argument, pass
 464 a singleton tuple.  \code{Py_BuildValue()} returns a tuple when its
 465 format string consists of zero or more format codes between
 466 parentheses.  For example:
 467
 468 \begin{verbatim}
 469     int arg;
 470     PyObject *arglist;
 471     PyObject *result;
 472     ...
 473     arg = 123;
 474     ...
 475     /* Time to call the callback */
 476     arglist = Py_BuildValue("(i)", arg);
 477     result = PyEval_CallObject(my_callback, arglist);
 478     Py_DECREF(arglist);
 479 \end{verbatim}
 480
 481 \code{PyEval_CallObject()} returns a Python object pointer: this is
 482 the return value of the Python function.  \code{PyEval_CallObject()} is
 483 ``reference-count-neutral'' with respect to its arguments.  In the
 484 example a new tuple was created to serve as the argument list, which
 485 is \code{Py_DECREF()}-ed immediately after the call.
 486
 487 The return value of \code{PyEval_CallObject()} is ``new'': either it
 488 is a brand new object, or it is an existing object whose reference
 489 count has been incremented.  So, unless you want to save it in a
 490 global variable, you should somehow \code{Py_DECREF()} the result,
 491 even (especially!) if you are not interested in its value.
 492
 493 Before you do this, however, it is important to check that the return
 494 value isn't \code{NULL}.  If it is, the Python function terminated by raising
 495 an exception.  If the C code that called \code{PyEval_CallObject()} is
 496 called from Python, it should now return an error indication to its
 497 Python caller, so the interpreter can print a stack trace, or the
 498 calling Python code can handle the exception.  If this is not possible
 499 or desirable, the exception should be cleared by calling
 500 \code{PyErr_Clear()}.  For example:
 501
 502 \begin{verbatim}
 503     if (result == NULL)
 504         return NULL; /* Pass error back */
 505     ...use result...
 506     Py_DECREF(result);
 507 \end{verbatim}
 508
 509 Depending on the desired interface to the Python callback function,
 510 you may also have to provide an argument list to \code{PyEval_CallObject()}.
 511 In some cases the argument list is also provided by the Python
 512 program, through the same interface that specified the callback
 513 function.  It can then be saved and used in the same manner as the
 514 function object.  In other cases, you may have to construct a new
 515 tuple to pass as the argument list.  The simplest way to do this is to
 516 call \code{Py_BuildValue()}.  For example, if you want to pass an integral
 517 event code, you might use the following code:
 518
 519 \begin{verbatim}
 520     PyObject *arglist;
 521     ...
 522     arglist = Py_BuildValue("(l)", eventcode);
 523     result = PyEval_CallObject(my_callback, arglist);
 524     Py_DECREF(arglist);
 525     if (result == NULL)
 526         return NULL; /* Pass error back */
 527     /* Here maybe use the result */
 528     Py_DECREF(result);
 529 \end{verbatim}
 530
 531 Note the placement of \code{Py_DECREF(argument)} immediately after the call,
 532 before the error check!  Also note that strictly spoken this code is
 533 not complete: \code{Py_BuildValue()} may run out of memory, and this should
 534 be checked.
 535
 536
 537 \section{Format Strings for {\tt PyArg_ParseTuple()}}
 538
 539 The \code{PyArg_ParseTuple()} function is declared as follows:
 540
 541 \begin{verbatim}
 542     int PyArg_ParseTuple(PyObject *arg, char *format, ...);
 543 \end{verbatim}
 544
 545 The \var{arg} argument must be a tuple object containing an argument
 546 list passed from Python to a C function.  The \var{format} argument
 547 must be a format string, whose syntax is explained below.  The
 548 remaining arguments must be addresses of variables whose type is
 549 determined by the format string.  For the conversion to succeed, the
 550 \var{arg} object must match the format and the format must be
 551 exhausted.
 552
 553 Note that while \code{PyArg_ParseTuple()} checks that the Python
 554 arguments have the required types, it cannot check the validity of the
 555 addresses of C variables passed to the call: if you make mistakes
 556 there, your code will probably crash or at least overwrite random bits
 557 in memory.  So be careful!
 558
 559 A format string consists of zero or more ``format units''.  A format
 560 unit describes one Python object; it is usually a single character or
 561 a parenthesized sequence of format units.  With a few exceptions, a
 562 format unit that is not a parenthesized sequence normally corresponds
 563 to a single address argument to \code{PyArg_ParseTuple()}.  In the
 564 following description, the quoted form is the format unit; the entry
 565 in (round) parentheses is the Python object type that matches the
 566 format unit; and the entry in [square] brackets is the type of the C
 567 variable(s) whose address should be passed.  (Use the \samp{\&}
 568 operator to pass a variable's address.)
 569
 570 \begin{description}
 571
 572 \item[\samp{s} (string) [char *]]
 573 Convert a Python string to a C pointer to a character string.  You
 574 must not provide storage for the string itself; a pointer to an
 575 existing string is stored into the character pointer variable whose
 576 address you pass.  The C string is null-terminated.  The Python string
 577 must not contain embedded null bytes; if it does, a \code{TypeError}
 578 exception is raised.
 579
 580 \item[\samp{s\#} (string) {[char *, int]}]
 581 This variant on \code{'s'} stores into two C variables, the first one
 582 a pointer to a character string, the second one its length.  In this
 583 case the Python string may contain embedded null bytes.
 584
 585 \item[\samp{z} (string or \code{None}) {[char *]}]
 586 Like \samp{s}, but the Python object may also be \code{None}, in which
 587 case the C pointer is set to \code{NULL}.
 588
 589 \item[\samp{z\#} (string or \code{None}) {[char *, int]}]
 590 This is to \code{'s\#'} as \code{'z'} is to \code{'s'}.
 591
 592 \item[\samp{b} (integer) {[char]}]
 593 Convert a Python integer to a tiny int, stored in a C \code{char}.
 594
 595 \item[\samp{h} (integer) {[short int]}]
 596 Convert a Python integer to a C \code{short int}.
 597
 598 \item[\samp{i} (integer) {[int]}]
 599 Convert a Python integer to a plain C \code{int}.
 600
 601 \item[\samp{l} (integer) {[long int]}]
 602 Convert a Python integer to a C \code{long int}.
 603
 604 \item[\samp{c} (string of length 1) {[char]}]
 605 Convert a Python character, represented as a string of length 1, to a
 606 C \code{char}.
 607
 608 \item[\samp{f} (float) {[float]}]
 609 Convert a Python floating point number to a C \code{float}.
 610
 611 \item[\samp{d} (float) {[double]}]
 612 Convert a Python floating point number to a C \code{double}.
 613
 614 \item[\samp{O} (object) {[PyObject *]}]
 615 Store a Python object (without any conversion) in a C object pointer.
 616 The C program thus receives the actual object that was passed.  The
 617 object's reference count is not increased.  The pointer stored is not
 618 \code{NULL}.
 619
 620 \item[\samp{O!} (object) {[\var{typeobject}, PyObject *]}]
 621 Store a Python object in a C object pointer.  This is similar to
 622 \samp{O}, but takes two C arguments: the first is the address of a
 623 Python type object, the second is the address of the C variable (of
 624 type \code{PyObject *}) into which the object pointer is stored.
 625 If the Python object does not have the required type, a
 626 \code{TypeError} exception is raised.
 627
 628 \item[\samp{O\&} (object) {[\var{converter}, \var{anything}]}]
 629 Convert a Python object to a C variable through a \var{converter}
 630 function.  This takes two arguments: the first is a function, the
 631 second is the address of a C variable (of arbitrary type), converted
 632 to \code{void *}.  The \var{converter} function in turn is called as
 633 follows:
 634
 635 \code{\var{status} = \var{converter}(\var{object}, \var{address});}
 636
 637 where \var{object} is the Python object to be converted and
 638 \var{address} is the \code{void *} argument that was passed to
 639 \code{PyArg_ConvertTuple()}.  The returned \var{status} should be
 640 \code{1} for a successful conversion and \code{0} if the conversion
 641 has failed.  When the conversion fails, the \var{converter} function
 642 should raise an exception.
 643
 644 \item[\samp{S} (string) {[PyStringObject *]}]
 645 Like \samp{O} but raises a \code{TypeError} exception that the object
 646 is a string object.  The C variable may also be declared as
 647 \code{PyObject *}.
 648
 649 \item[\samp{(\var{items})} (tuple) {[\var{matching-items}]}]
 650 The object must be a Python tuple whose length is the number of format
 651 units in \var{items}.  The C arguments must correspond to the
 652 individual format units in \var{items}.  Format units for tuples may
 653 be nested.
 654
 655 \end{description}
 656
 657 It is possible to pass Python long integers where integers are
 658 requested; however no proper range checking is done -- the most
 659 significant bits are silently truncated when the receiving field is
 660 too small to receive the value (actually, the semantics are inherited
 661 from downcasts in C --- your milage may vary).
 662
 663 A few other characters have a meaning in a format string.  These may
 664 not occur inside nested parentheses.  They are:
 665
 666 \begin{description}
 667
 668 \item[\samp{|}]
 669 Indicates that the remaining arguments in the Python argument list are
 670 optional.  The C variables corresponding to optional arguments should
 671 be initialized to their default value --- when an optional argument is
 672 not specified, the \code{PyArg_ParseTuple} does not touch the contents
 673 of the corresponding C variable(s).
 674
 675 \item[\samp{:}]
 676 The list of format units ends here; the string after the colon is used
 677 as the function name in error messages (the ``associated value'' of
 678 the exceptions that \code{PyArg_ParseTuple} raises).
 679
 680 \item[\samp{;}]
 681 The list of format units ends here; the string after the colon is used
 682 as the error message \emph{instead} of the default error message.
 683 Clearly, \samp{:} and \samp{;} mutually exclude each other.
 684
 685 \end{description}
 686
 687 Some example calls:
 688
 689 \begin{verbatim}
 690     int ok;
 691     int i, j;
 692     long k, l;
 693     char *s;
 694     int size;
 695
 696     ok = PyArg_ParseTuple(args, ""); /* No arguments */
 697         /* Python call: f() */
 698
 699     ok = PyArg_ParseTuple(args, "s", &s); /* A string */
 700         /* Possible Python call: f('whoops!') */
 701
 702     ok = PyArg_ParseTuple(args, "lls", &k, &l, &s); /* Two longs and a string */
 703         /* Possible Python call: f(1, 2, 'three') */
 704
 705     ok = PyArg_ParseTuple(args, "(ii)s#", &i, &j, &s, &size);
 706         /* A pair of ints and a string, whose size is also returned */
 707         /* Possible Python call: f(1, 2, 'three') */
 708
 709     {
 710         char *file;
 711         char *mode = "r";
 712         int bufsize = 0;
 713         ok = PyArg_ParseTuple(args, "s|si", &file, &mode, &bufsize);
 714         /* A string, and optionally another string and an integer */
 715         /* Possible Python calls:
 716            f('spam')
 717            f('spam', 'w')
 718            f('spam', 'wb', 100000) */
 719     }
 720
 721     {
 722         int left, top, right, bottom, h, v;
 723         ok = PyArg_ParseTuple(args, "((ii)(ii))(ii)",
 724                  &left, &top, &right, &bottom, &h, &v);
 725                  /* A rectangle and a point */
 726                  /* Possible Python call:
 727                     f(((0, 0), (400, 300)), (10, 10)) */
 728     }
 729 \end{verbatim}
 730
 731
 732 \section{The {\tt Py_BuildValue()} Function}
 733
 734 This function is the counterpart to \code{PyArg_ParseTuple()}.  It is
 735 declared as follows:
 736
 737 \begin{verbatim}
 738     PyObject *Py_BuildValue(char *format, ...);
 739 \end{verbatim}
 740
 741 It recognizes a set of format units similar to the ones recognized by
 742 \code{PyArg_ParseTuple()}, but the arguments (which are input to the
 743 function, not output) must not be pointers, just values.  It returns a
 744 new Python object, suitable for returning from a C function called
 745 from Python.
 746
 747 One difference with \code{PyArg_ParseTuple()}: while the latter
 748 requires its first argument to be a tuple (since Python argument lists
 749 are always represented as tuples internally), \code{BuildValue()} does
 750 not always build a tuple.  It builds a tuple only if its format string
 751 contains two or more format units.  If the format string is empty, it
 752 returns \code{None}; if it contains exactly one format unit, it
 753 returns whatever object is described by that format unit.  To force it
 754 to return a tuple of size 0 or one, parenthesize the format string.
 755
 756 In the following description, the quoted form is the format unit; the
 757 entry in (round) parentheses is the Python object type that the format
 758 unit will return; and the entry in [square] brackets is the type of
 759 the C value(s) to be passed.
 760
 761 The characters space, tab, colon and comma are ignored in format
 762 strings (but not within format units such as \samp{s\#}).  This can be
 763 used to make long format strings a tad more readable.
 764
 765 \begin{description}
 766
 767 \item[\samp{s} (string) {[char *]}]
 768 Convert a null-terminated C string to a Python object.  If the C
 769 string pointer is \code{NULL}, \code{None} is returned.
 770
 771 \item[\samp{s\#} (string) {[char *, int]}]
 772 Convert a C string and its length to a Python object.  If the C string
 773 pointer is \code{NULL}, the length is ignored and \code{None} is
 774 returned.
 775
 776 \item[\samp{z} (string or \code{None}) {[char *]}]
 777 Same as \samp{s}.
 778
 779 \item[\samp{z\#} (string or \code{None}) {[char *, int]}]
 780 Same as \samp{s\#}.
 781
 782 \item[\samp{i} (integer) {[int]}]
 783 Convert a plain C \code{int} to a Python integer object.
 784
 785 \item[\samp{b} (integer) {[char]}]
 786 Same as \samp{i}.
 787
 788 \item[\samp{h} (integer) {[short int]}]
 789 Same as \samp{i}.
 790
 791 \item[\samp{l} (integer) {[long int]}]
 792 Convert a C \code{long int} to a Python integer object.
 793
 794 \item[\samp{c} (string of length 1) {[char]}]
 795 Convert a C \code{int} representing a character to a Python string of
 796 length 1.
 797
 798 \item[\samp{d} (float) {[double]}]
 799 Convert a C \code{double} to a Python floating point number.
 800
 801 \item[\samp{f} (float) {[float]}]
 802 Same as \samp{d}.
 803
 804 \item[\samp{O} (object) {[PyObject *]}]
 805 Pass a Python object untouched (except for its reference count, which
 806 is incremented by one).  If the object passed in is a \code{NULL}
 807 pointer, it is assumed that this was caused because the call producing
 808 the argument found an error and set an exception.  Therefore,
 809 \code{Py_BuildValue()} will return \code{NULL} but won't raise an
 810 exception.  If no exception has been raised yet,
 811 \code{PyExc_SystemError} is set.
 812
 813 \item[\samp{S} (object) {[PyObject *]}]
 814 Same as \samp{O}.
 815
 816 \item[\samp{O\&} (object) {[\var{converter}, \var{anything}]}]
 817 Convert \var{anything} to a Python object through a \var{converter}
 818 function.  The function is called with \var{anything} (which should be
 819 compatible with \code{void *}) as its argument and should return a
 820 ``new'' Python object, or \code{NULL} if an error occurred.
 821
 822 \item[\samp{(\var{items})} (tuple) {[\var{matching-items}]}]
 823 Convert a sequence of C values to a Python tuple with the same number
 824 of items.
 825
 826 \item[\samp{[\var{items}]} (list) {[\var{matching-items}]}]
 827 Convert a sequence of C values to a Python list with the same number
 828 of items.
 829
 830 \item[\samp{\{\var{items}\}} (dictionary) {[\var{matching-items}]}]
 831 Convert a sequence of C values to a Python dictionary.  Each pair of
 832 consecutive C values adds one item to the dictionary, serving as key
 833 and value, respectively.
 834
 835 \end{description}
 836
 837 If there is an error in the format string, the
 838 \code{PyExc_SystemError} exception is raised and \code{NULL} returned.
 839
 840 Examples (to the left the call, to the right the resulting Python value):
 841
 842 \begin{verbatim}
 843     Py_BuildValue("")                        None
 844     Py_BuildValue("i", 123)                  123
 845     Py_BuildValue("iii", 123, 456, 789)      (123, 456, 789)
 846     Py_BuildValue("s", "hello")              'hello'
 847     Py_BuildValue("ss", "hello", "world")    ('hello', 'world')
 848     Py_BuildValue("s#", "hello", 4)          'hell'
 849     Py_BuildValue("()")                      ()
 850     Py_BuildValue("(i)", 123)                (123,)
 851     Py_BuildValue("(ii)", 123, 456)          (123, 456)
 852     Py_BuildValue("(i,i)", 123, 456)         (123, 456)
 853     Py_BuildValue("[i,i]", 123, 456)         [123, 456]
 854     Py_BuildValue("{s:i,s:i}",
 855                   "abc", 123, "def", 456)    {'abc': 123, 'def': 456}
 856     Py_BuildValue("((ii)(ii)) (ii)",
 857                   1, 2, 3, 4, 5, 6)          (((1, 2), (3, 4)), (5, 6))
 858 \end{verbatim}
 859
 860
 861 \section{Reference Counts}
 862
 863 \subsection{Introduction}
 864
 865 In languages like C or \Cpp{}, the programmer is responsible for
 866 dynamic allocation and deallocation of memory on the heap.  In C, this
 867 is done using the functions \code{malloc()} and \code{free()}.  In
 868 \Cpp{}, the operators \code{new} and \code{delete} are used with
 869 essentially the same meaning; they are actually implemented using
 870 \code{malloc()} and \code{free()}, so we'll restrict the following
 871 discussion to the latter.
 872
 873 Every block of memory allocated with \code{malloc()} should eventually
 874 be returned to the pool of available memory by exactly one call to
 875 \code{free()}.  It is important to call \code{free()} at the right
 876 time.  If a block's address is forgotten but \code{free()} is not
 877 called for it, the memory it occupies cannot be reused until the
 878 program terminates.  This is called a \dfn{memory leak}.  On the other
 879 hand, if a program calls \code{free()} for a block and then continues
 880 to use the block, it creates a conflict with re-use of the block
 881 through another \code{malloc()} call.  This is called \dfn{using freed
 882 memory} has the same bad consequences as referencing uninitialized
 883 data --- core dumps, wrong results, mysterious crashes.
 884
 885 Common causes of memory leaks are unusual paths through the code.  For
 886 instance, a function may allocate a block of memory, do some
 887 calculation, and then free the block again.  Now a change in the
 888 requirements for the function may add a test to the calculation that
 889 detects an error condition and can return prematurely from the
 890 function.  It's easy to forget to free the allocated memory block when
 891 taking this premature exit, especially when it is added later to the
 892 code.  Such leaks, once introduced, often go undetected for a long
 893 time: the error exit is taken only in a small fraction of all calls,
 894 and most modern machines have plenty of virtual memory, so the leak
 895 only becomes apparent in a long-running process that uses the leaking
 896 function frequently.  Therefore, it's important to prevent leaks from
 897 happening by having a coding convention or strategy that minimizes
 898 this kind of errors.
 899
 900 Since Python makes heavy use of \code{malloc()} and \code{free()}, it
 901 needs a strategy to avoid memory leaks as well as the use of freed
 902 memory.  The chosen method is called \dfn{reference counting}.  The
 903 principle is simple: every object contains a counter, which is
 904 incremented when a reference to the object is stored somewhere, and
 905 which is decremented when a reference to it is deleted.  When the
 906 counter reaches zero, the last reference to the object has been
 907 deleted and the object is freed.
 908
 909 An alternative strategy is called \dfn{automatic garbage collection}.
 910 (Sometimes, reference counting is also referred to as a garbage
 911 collection strategy, hence my use of ``automatic'' to distinguish the
 912 two.)  The big advantage of automatic garbage collection is that the
 913 user doesn't need to call \code{free()} explicitly.  (Another claimed
 914 advantage is an improvement in speed or memory usage --- this is no
 915 hard fact however.)  The disadvantage is that for C, there is no
 916 truly portable automatic garbage collector, while reference counting
 917 can be implemented portably (as long as the functions \code{malloc()}
 918 and \code{free()} are available --- which the C Standard guarantees).
 919 Maybe some day a sufficiently portable automatic garbage collector
 920 will be available for C.  Until then, we'll have to live with
 921 reference counts.
 922
 923 \subsection{Reference Counting in Python}
 924
 925 There are two macros, \code{Py_INCREF(x)} and \code{Py_DECREF(x)},
 926 which handle the incrementing and decrementing of the reference count.
 927 \code{Py_DECREF()} also frees the object when the count reaches zero.
 928 For flexibility, it doesn't call \code{free()} directly --- rather, it
 929 makes a call through a function pointer in the object's \dfn{type
 930 object}.  For this purpose (and others), every object also contains a
 931 pointer to its type object.
 932
 933 The big question now remains: when to use \code{Py_INCREF(x)} and
 934 \code{Py_DECREF(x)}?  Let's first introduce some terms.  Nobody
 935 ``owns'' an object; however, you can \dfn{own a reference} to an
 936 object.  An object's reference count is now defined as the number of
 937 owned references to it.  The owner of a reference is responsible for
 938 calling \code{Py_DECREF()} when the reference is no longer needed.
 939 Ownership of a reference can be transferred.  There are three ways to
 940 dispose of an owned reference: pass it on, store it, or call
 941 \code{Py_DECREF()}.  Forgetting to dispose of an owned reference creates
 942 a memory leak.
 943
 944 It is also possible to \dfn{borrow}\footnote{The metaphor of
 945 ``borrowing'' a reference is not completely correct: the owner still
 946 has a copy of the reference.} a reference to an object.  The borrower
 947 of a reference should not call \code{Py_DECREF()}.  The borrower must
 948 not hold on to the object longer than the owner from which it was
 949 borrowed.  Using a borrowed reference after the owner has disposed of
 950 it risks using freed memory and should be avoided
 951 completely.\footnote{Checking that the reference count is at least 1
 952 \strong{does not work} --- the reference count itself could be in
 953 freed memory and may thus be reused for another object!}
 954
 955 The advantage of borrowing over owning a reference is that you don't
 956 need to take care of disposing of the reference on all possible paths
 957 through the code --- in other words, with a borrowed reference you
 958 don't run the risk of leaking when a premature exit is taken.  The
 959 disadvantage of borrowing over leaking is that there are some subtle
 960 situations where in seemingly correct code a borrowed reference can be
 961 used after the owner from which it was borrowed has in fact disposed
 962 of it.
 963
 964 A borrowed reference can be changed into an owned reference by calling
 965 \code{Py_INCREF()}.  This does not affect the status of the owner from
 966 which the reference was borrowed --- it creates a new owned reference,
 967 and gives full owner responsibilities (i.e., the new owner must
 968 dispose of the reference properly, as well as the previous owner).
 969
 970 \subsection{Ownership Rules}
 971
 972 Whenever an object reference is passed into or out of a function, it
 973 is part of the function's interface specification whether ownership is
 974 transferred with the reference or not.
 975
 976 Most functions that return a reference to an object pass on ownership
 977 with the reference.  In particular, all functions whose function it is
 978 to create a new object, e.g.\ \code{PyInt_FromLong()} and
 979 \code{Py_BuildValue()}, pass ownership to the receiver.  Even if in
 980 fact, in some cases, you don't receive a reference to a brand new
 981 object, you still receive ownership of the reference.  For instance,
 982 \code{PyInt_FromLong()} maintains a cache of popular values and can
 983 return a reference to a cached item.
 984
 985 Many functions that extract objects from other objects also transfer
 986 ownership with the reference, for instance
 987 \code{PyObject_GetAttrString()}.  The picture is less clear, here,
 988 however, since a few common routines are exceptions:
 989 \code{PyTuple_GetItem()}, \code{PyList_GetItem()} and
 990 \code{PyDict_GetItem()} (and \code{PyDict_GetItemString()}) all return
 991 references that you borrow from the tuple, list or dictionary.
 992
 993 The function \code{PyImport_AddModule()} also returns a borrowed
 994 reference, even though it may actually create the object it returns:
 995 this is possible because an owned reference to the object is stored in
 996 \code{sys.modules}.
 997
 998 When you pass an object reference into another function, in general,
 999 the function borrows the reference from you --- if it needs to store
1000 it, it will use \code{Py_INCREF()} to become an independent owner.
1001 There are exactly two important exceptions to this rule:
1002 \code{PyTuple_SetItem()} and \code{PyList_SetItem()}.  These functions
1003 take over ownership of the item passed to them --- even if they fail!
1004 (Note that \code{PyDict_SetItem()} and friends don't take over
1005 ownership --- they are ``normal''.)
1006
1007 When a C function is called from Python, it borrows references to its
1008 arguments from the caller.  The caller owns a reference to the object,
1009 so the borrowed reference's lifetime is guaranteed until the function
1010 returns.  Only when such a borrowed reference must be stored or passed
1011 on, it must be turned into an owned reference by calling
1012 \code{Py_INCREF()}.
1013
1014 The object reference returned from a C function that is called from
1015 Python must be an owned reference --- ownership is tranferred from the
1016 function to its caller.
1017
1018 \subsection{Thin Ice}
1019
1020 There are a few situations where seemingly harmless use of a borrowed
1021 reference can lead to problems.  These all have to do with implicit
1022 invocations of the interpreter, which can cause the owner of a
1023 reference to dispose of it.
1024
1025 The first and most important case to know about is using
1026 \code{Py_DECREF()} on an unrelated object while borrowing a reference
1027 to a list item.  For instance:
1028
1029 \begin{verbatim}
1030 bug(PyObject *list) {
1031     PyObject *item = PyList_GetItem(list, 0);
1032     PyList_SetItem(list, 1, PyInt_FromLong(0L));
1033     PyObject_Print(item, stdout, 0); /* BUG! */
1034 }
1035 \end{verbatim}
1036
1037 This function first borrows a reference to \code{list[0]}, then
1038 replaces \code{list[1]} with the value \code{0}, and finally prints
1039 the borrowed reference.  Looks harmless, right?  But it's not!
1040
1041 Let's follow the control flow into \code{PyList_SetItem()}.  The list
1042 owns references to all its items, so when item 1 is replaced, it has
1043 to dispose of the original item 1.  Now let's suppose the original
1044 item 1 was an instance of a user-defined class, and let's further
1045 suppose that the class defined a \code{__del__()} method.  If this
1046 class instance has a reference count of 1, disposing of it will call
1047 its \code{__del__()} method.
1048
1049 Since it is written in Python, the \code{__del__()} method can execute
1050 arbitrary Python code.  Could it perhaps do something to invalidate
1051 the reference to \code{item} in \code{bug()}?  You bet!  Assuming that
1052 the list passed into \code{bug()} is accessible to the
1053 \code{__del__()} method, it could execute a statement to the effect of
1054 \code{del list[0]}, and assuming this was the last reference to that
1055 object, it would free the memory associated with it, thereby
1056 invalidating \code{item}.
1057
1058 The solution, once you know the source of the problem, is easy:
1059 temporarily increment the reference count.  The correct version of the
1060 function reads:
1061
1062 \begin{verbatim}
1063 no_bug(PyObject *list) {
1064     PyObject *item = PyList_GetItem(list, 0);
1065     Py_INCREF(item);
1066     PyList_SetItem(list, 1, PyInt_FromLong(0L));
1067     PyObject_Print(item, stdout, 0);
1068     Py_DECREF(item);
1069 }
1070 \end{verbatim}
1071
1072 This is a true story.  An older version of Python contained variants
1073 of this bug and someone spent a considerable amount of time in a C
1074 debugger to figure out why his \code{__del__()} methods would fail...
1075
1076 The second case of problems with a borrowed reference is a variant
1077 involving threads.  Normally, multiple threads in the Python
1078 interpreter can't get in each other's way, because there is a global
1079 lock protecting Python's entire object space.  However, it is possible
1080 to temporarily release this lock using the macro
1081 \code{Py_BEGIN_ALLOW_THREADS}, and to re-acquire it using
1082 \code{Py_END_ALLOW_THREADS}.  This is common around blocking I/O
1083 calls, to let other threads use the CPU while waiting for the I/O to
1084 complete.  Obviously, the following function has the same problem as
1085 the previous one:
1086
1087 \begin{verbatim}
1088 bug(PyObject *list) {
1089     PyObject *item = PyList_GetItem(list, 0);
1090     Py_BEGIN_ALLOW_THREADS
1091     ...some blocking I/O call...
1092     Py_END_ALLOW_THREADS
1093     PyObject_Print(item, stdout, 0); /* BUG! */
1094 }
1095 \end{verbatim}
1096
1097 \subsection{NULL Pointers}
1098
1099 In general, functions that take object references as arguments don't
1100 expect you to pass them \code{NULL} pointers, and will dump core (or
1101 cause later core dumps) if you do so.  Functions that return object
1102 references generally return \code{NULL} only to indicate that an
1103 exception occurred.  The reason for not testing for \code{NULL}
1104 arguments is that functions often pass the objects they receive on to
1105 other function --- if each function were to test for \code{NULL},
1106 there would be a lot of redundant tests and the code would run slower.
1107
1108 It is better to test for \code{NULL} only at the ``source'', i.e.\
1109 when a pointer that may be \code{NULL} is received, e.g.\ from
1110 \code{malloc()} or from a function that may raise an exception.
1111
1112 The macros \code{Py_INCREF()} and \code{Py_DECREF()}
1113 don't check for \code{NULL} pointers --- however, their variants
1114 \code{Py_XINCREF()} and \code{Py_XDECREF()} do.
1115
1116 The macros for checking for a particular object type
1117 (\code{Py\var{type}_Check()}) don't check for \code{NULL} pointers ---
1118 again, there is much code that calls several of these in a row to test
1119 an object against various different expected types, and this would
1120 generate redundant tests.  There are no variants with \code{NULL}
1121 checking.
1122
1123 The C function calling mechanism guarantees that the argument list
1124 passed to C functions (\code{args} in the examples) is never
1125 \code{NULL} --- in fact it guarantees that it is always a tuple.%
1126 \footnote{These guarantees don't hold when you use the ``old'' style
1127 calling convention --- this is still found in much existing code.}
1128
1129 It is a severe error to ever let a \code{NULL} pointer ``escape'' to
1130 the Python user.
1131
1132
1133 \section{Writing Extensions in \Cpp{}}
1134
1135 It is possible to write extension modules in \Cpp{}.  Some restrictions
1136 apply.  If the main program (the Python interpreter) is compiled and
1137 linked by the C compiler, global or static objects with constructors
1138 cannot be used.  This is not a problem if the main program is linked
1139 by the \Cpp{} compiler.  All functions that will be called directly or
1140 indirectly (i.e. via function pointers) by the Python interpreter will
1141 have to be declared using \code{extern "C"}; this applies to all
1142 ``methods'' as well as to the module's initialization function.
1143 It is unnecessary to enclose the Python header files in
1144 \code{extern "C" \{...\}} --- they use this form already if the symbol
1145 \samp{__cplusplus} is defined (all recent C++ compilers define this
1146 symbol).
1147
1148 \chapter{Embedding Python in another application}
1149
1150 Embedding Python is similar to extending it, but not quite.  The
1151 difference is that when you extend Python, the main program of the
1152 application is still the Python interpreter, while if you embed
1153 Python, the main program may have nothing to do with Python ---
1154 instead, some parts of the application occasionally call the Python
1155 interpreter to run some Python code.
1156
1157 So if you are embedding Python, you are providing your own main
1158 program.  One of the things this main program has to do is initialize
1159 the Python interpreter.  At the very least, you have to call the
1160 function \code{Py_Initialize()}.  There are optional calls to pass command
1161 line arguments to Python.  Then later you can call the interpreter
1162 from any part of the application.
1163
1164 There are several different ways to call the interpreter: you can pass
1165 a string containing Python statements to \code{PyRun_SimpleString()},
1166 or you can pass a stdio file pointer and a file name (for
1167 identification in error messages only) to \code{PyRun_SimpleFile()}.  You
1168 can also call the lower-level operations described in the previous
1169 chapters to construct and use Python objects.
1170
1171 A simple demo of embedding Python can be found in the directory
1172 \file{Demo/embed}.
1173
1174
1175 \section{Embedding Python in \Cpp{}}
1176
1177 It is also possible to embed Python in a \Cpp{} program; precisely how this
1178 is done will depend on the details of the \Cpp{} system used; in general you
1179 will need to write the main program in \Cpp{}, and use the \Cpp{} compiler
1180 to compile and link your program.  There is no need to recompile Python
1181 itself using \Cpp{}.
1182
1183
1184 \chapter{Dynamic Loading}
1185
1186 On most modern systems it is possible to configure Python to support
1187 dynamic loading of extension modules implemented in C.  When shared
1188 libraries are used dynamic loading is configured automatically;
1189 otherwise you have to select it as a build option (see below).  Once
1190 configured, dynamic loading is trivial to use: when a Python program
1191 executes \code{import spam}, the search for modules tries to find a
1192 file \file{spammodule.o} (\file{spammodule.so} when using shared
1193 libraries) in the module search path, and if one is found, it is
1194 loaded into the executing binary and executed.  Once loaded, the
1195 module acts just like a built-in extension module.
1196
1197 The advantages of dynamic loading are twofold: the ``core'' Python
1198 binary gets smaller, and users can extend Python with their own
1199 modules implemented in C without having to build and maintain their
1200 own copy of the Python interpreter.  There are also disadvantages:
1201 dynamic loading isn't available on all systems (this just means that
1202 on some systems you have to use static loading), and dynamically
1203 loading a module that was compiled for a different version of Python
1204 (e.g. with a different representation of objects) may dump core.
1205
1206
1207 \section{Configuring and Building the Interpreter for Dynamic Loading}
1208
1209 There are three styles of dynamic loading: one using shared libraries,
1210 one using SGI IRIX 4 dynamic loading, and one using GNU dynamic
1211 loading.
1212
1213 \subsection{Shared Libraries}
1214
1215 The following systems support dynamic loading using shared libraries:
1216 SunOS 4; Solaris 2; SGI IRIX 5 (but not SGI IRIX 4!); and probably all
1217 systems derived from SVR4, or at least those SVR4 derivatives that
1218 support shared libraries (are there any that don't?).
1219
1220 You don't need to do anything to configure dynamic loading on these
1221 systems --- the \file{configure} detects the presence of the
1222 \file{<dlfcn.h>} header file and automatically configures dynamic
1223 loading.
1224
1225 \subsection{SGI IRIX 4 Dynamic Loading}
1226
1227 Only SGI IRIX 4 supports dynamic loading of modules using SGI dynamic
1228 loading.  (SGI IRIX 5 might also support it but it is inferior to
1229 using shared libraries so there is no reason to; a small test didn't
1230 work right away so I gave up trying to support it.)
1231
1232 Before you build Python, you first need to fetch and build the \code{dl}
1233 package written by Jack Jansen.  This is available by anonymous ftp
1234 from host \file{ftp.cwi.nl}, directory \file{pub/dynload}, file
1235 \file{dl-1.6.tar.Z}.  (The version number may change.)  Follow the
1236 instructions in the package's \file{README} file to build it.
1237
1238 Once you have built \code{dl}, you can configure Python to use it.  To
1239 this end, you run the \file{configure} script with the option
1240 \code{--with-dl=\var{directory}} where \var{directory} is the absolute
1241 pathname of the \code{dl} directory.
1242
1243 Now build and install Python as you normally would (see the
1244 \file{README} file in the toplevel Python directory.)
1245
1246 \subsection{GNU Dynamic Loading}
1247
1248 GNU dynamic loading supports (according to its \file{README} file) the
1249 following hardware and software combinations: VAX (Ultrix), Sun 3
1250 (SunOS 3.4 and 4.0), Sparc (SunOS 4.0), Sequent Symmetry (Dynix), and
1251 Atari ST.  There is no reason to use it on a Sparc; I haven't seen a
1252 Sun 3 for years so I don't know if these have shared libraries or not.
1253
1254 You need to fetch and build two packages.  One is GNU DLD 3.2.3,
1255 available by anonymous ftp from host \file{ftp.cwi.nl}, directory
1256 \file{pub/dynload}, file \file{dld-3.2.3.tar.Z}.  (As far as I know,
1257 no further development on GNU DLD is being done.)  The other is an
1258 emulation of Jack Jansen's \code{dl} package that I wrote on top of
1259 GNU DLD 3.2.3.  This is available from the same host and directory,
1260 file dl-dld-1.1.tar.Z.  (The version number may change --- but I doubt
1261 it will.)  Follow the instructions in each package's \file{README}
1262 file to configure build them.
1263
1264 Now configure Python.  Run the \file{configure} script with the option
1265 \code{--with-dl-dld=\var{dl-directory},\var{dld-directory}} where
1266 \var{dl-directory} is the absolute pathname of the directory where you
1267 have built the \file{dl-dld} package, and \var{dld-directory} is that
1268 of the GNU DLD package.  The Python interpreter you build hereafter
1269 will support GNU dynamic loading.
1270
1271
1272 \section{Building a Dynamically Loadable Module}
1273
1274 Since there are three styles of dynamic loading, there are also three
1275 groups of instructions for building a dynamically loadable module.
1276 Instructions common for all three styles are given first.  Assuming
1277 your module is called \code{spam}, the source filename must be
1278 \file{spammodule.c}, so the object name is \file{spammodule.o}.  The
1279 module must be written as a normal Python extension module (as
1280 described earlier).
1281
1282 Note that in all cases you will have to create your own Makefile that
1283 compiles your module file(s).  This Makefile will have to pass two
1284 \samp{-I} arguments to the C compiler which will make it find the
1285 Python header files.  If the Make variable \var{PYTHONTOP} points to
1286 the toplevel Python directory, your \var{CFLAGS} Make variable should
1287 contain the options \samp{-I\$(PYTHONTOP) -I\$(PYTHONTOP)/Include}.
1288 (Most header files are in the \file{Include} subdirectory, but the
1289 \file{config.h} header lives in the toplevel directory.)  You must
1290 also add \samp{-DHAVE_CONFIG_H} to the definition of \var{CFLAGS} to
1291 direct the Python headers to include \file{config.h}.
1292
1293
1294 \subsection{Shared Libraries}
1295
1296 You must link the \samp{.o} file to produce a shared library.  This is
1297 done using a special invocation of the \UNIX{} loader/linker, {\em
1298 ld}(1).  Unfortunately the invocation differs slightly per system.
1299
1300 On SunOS 4, use
1301 \begin{verbatim}
1302     ld spammodule.o -o spammodule.so
1303 \end{verbatim}
1304
1305 On Solaris 2, use
1306 \begin{verbatim}
1307     ld -G spammodule.o -o spammodule.so
1308 \end{verbatim}
1309
1310 On SGI IRIX 5, use
1311 \begin{verbatim}
1312     ld -shared spammodule.o -o spammodule.so
1313 \end{verbatim}
1314
1315 On other systems, consult the manual page for \code{ld}(1) to find what
1316 flags, if any, must be used.
1317
1318 If your extension module uses system libraries that haven't already
1319 been linked with Python (e.g. a windowing system), these must be
1320 passed to the \code{ld} command as \samp{-l} options after the
1321 \samp{.o} file.
1322
1323 The resulting file \file{spammodule.so} must be copied into a directory
1324 along the Python module search path.
1325
1326
1327 \subsection{SGI IRIX 4 Dynamic Loading}
1328
1329 {\bf IMPORTANT:} You must compile your extension module with the
1330 additional C flag \samp{-G0} (or \samp{-G 0}).  This instruct the
1331 assembler to generate position-independent code.
1332
1333 You don't need to link the resulting \file{spammodule.o} file; just
1334 copy it into a directory along the Python module search path.
1335
1336 The first time your extension is loaded, it takes some extra time and
1337 a few messages may be printed.  This creates a file
1338 \file{spammodule.ld} which is an image that can be loaded quickly into
1339 the Python interpreter process.  When a new Python interpreter is
1340 installed, the \code{dl} package detects this and rebuilds
1341 \file{spammodule.ld}.  The file \file{spammodule.ld} is placed in the
1342 directory where \file{spammodule.o} was found, unless this directory is
1343 unwritable; in that case it is placed in a temporary
1344 directory.\footnote{Check the manual page of the \code{dl} package for
1345 details.}
1346
1347 If your extension modules uses additional system libraries, you must
1348 create a file \file{spammodule.libs} in the same directory as the
1349 \file{spammodule.o}.  This file should contain one or more lines with
1350 whitespace-separated options that will be passed to the linker ---
1351 normally only \samp{-l} options or absolute pathnames of libraries
1352 (\samp{.a} files) should be used.
1353
1354
1355 \subsection{GNU Dynamic Loading}
1356
1357 Just copy \file{spammodule.o} into a directory along the Python module
1358 search path.
1359
1360 If your extension modules uses additional system libraries, you must
1361 create a file \file{spammodule.libs} in the same directory as the
1362 \file{spammodule.o}.  This file should contain one or more lines with
1363 whitespace-separated absolute pathnames of libraries (\samp{.a}
1364 files).  No \samp{-l} options can be used.
1365
1366
1367 \input{ext.ind}
1368
1369 \end{document}