Doc/ext/ext.tex

   1 \documentclass{manual}
   2
   3 % XXX PM explain how to add new types to Python
   4
   5 \title{Extending and Embedding the Python Interpreter}
   6
   7 \input{boilerplate}
   8
   9 % Tell \index to actually write the .idx file
  10 \makeindex
  11
  12 \begin{document}
  13
  14 \maketitle
  15
  16 \ifhtml
  17 \chapter*{Front Matter\label{front}}
  18 \fi
  19
  20 \input{copyright}
  21
  22 %begin{latexonly}
  23 \vspace{1in}
  24 %end{latexonly}
  25 \strong{\large Acknowledgements}
  26
  27 % XXX This needs to be checked and updated manually before each
  28 % release.
  29
  30 The following people have contributed sections to this document:  Jim
  31 Fulton, Konrad Hinsen, Chris Phoenix, and Neil Schemenauer.
  32
  33 \begin{abstract}
  34
  35 \noindent
  36 Python is an interpreted, object-oriented programming language.  This
  37 document describes how to write modules in C or \Cpp{} to extend the
  38 Python interpreter with new modules.  Those modules can define new
  39 functions but also new object types and their methods.  The document
  40 also describes how to embed the Python interpreter in another
  41 application, for use as an extension language.  Finally, it shows how
  42 to compile and link extension modules so that they can be loaded
  43 dynamically (at run time) into the interpreter, if the underlying
  44 operating system supports this feature.
  45
  46 This document assumes basic knowledge about Python.  For an informal
  47 introduction to the language, see the
  48 \citetitle[../tut/tut.html]{Python Tutorial}.  The
  49 \citetitle[../ref/ref.html]{Python Reference Manual} gives a more
  50 formal definition of the language.  The
  51 \citetitle[../lib/lib.html]{Python Library Reference} documents the
  52 existing object types, functions and modules (both built-in and
  53 written in Python) that give the language its wide application range.
  54
  55 For a detailed description of the whole Python/C API, see the separate
  56 \citetitle[../api/api.html]{Python/C API Reference Manual}.
  57
  58 \end{abstract}
  59
  60 \tableofcontents
  61
  62
  63 \chapter{Extending Python with C or \Cpp{} \label{intro}}
  64
  65
  66 It is quite easy to add new built-in modules to Python, if you know
  67 how to program in C.  Such \dfn{extension modules} can do two things
  68 that can't be done directly in Python: they can implement new built-in
  69 object types, and they can call C library functions and system calls.
  70
  71 To support extensions, the Python API (Application Programmers
  72 Interface) defines a set of functions, macros and variables that
  73 provide access to most aspects of the Python run-time system.  The
  74 Python API is incorporated in a C source file by including the header
  75 \code{"Python.h"}.
  76
  77 The compilation of an extension module depends on its intended use as
  78 well as on your system setup; details are given in a later section.
  79
  80
  81 \section{A Simple Example
  82          \label{simpleExample}}
  83
  84 Let's create an extension module called \samp{spam} (the favorite food
  85 of Monty Python fans...) and let's say we want to create a Python
  86 interface to the C library function \cfunction{system()}.\footnote{An
  87 interface for this function already exists in the standard module
  88 \module{os} --- it was chosen as a simple and straightfoward example.}
  89 This function takes a null-terminated character string as argument and
  90 returns an integer.  We want this function to be callable from Python
  91 as follows:
  92
  93 \begin{verbatim}
  94 >>> import spam
  95 >>> status = spam.system("ls -l")
  96 \end{verbatim}
  97
  98 Begin by creating a file \file{spammodule.c}.  (In general, if a
  99 module is called \samp{spam}, the C file containing its implementation
 100 is called \file{spammodule.c}; if the module name is very long, like
 101 \samp{spammify}, the module name can be just \file{spammify.c}.)
 102
 103 The first line of our file can be:
 104
 105 \begin{verbatim}
 106 #include "Python.h"
 107 \end{verbatim}
 108
 109 which pulls in the Python API (you can add a comment describing the
 110 purpose of the module and a copyright notice if you like).
 111
 112 All user-visible symbols defined by \code{"Python.h"} have a prefix of
 113 \samp{Py} or \samp{PY}, except those defined in standard header files.
 114 For convenience, and since they are used extensively by the Python
 115 interpreter, \code{"Python.h"} includes a few standard header files:
 116 \code{<stdio.h>}, \code{<string.h>}, \code{<errno.h>}, and
 117 \code{<stdlib.h>}.  If the latter header file does not exist on your
 118 system, it declares the functions \cfunction{malloc()},
 119 \cfunction{free()} and \cfunction{realloc()} directly.
 120
 121 The next thing we add to our module file is the C function that will
 122 be called when the Python expression \samp{spam.system(\var{string})}
 123 is evaluated (we'll see shortly how it ends up being called):
 124
 125 \begin{verbatim}
 126 static PyObject *
 127 spam_system(self, args)
 128     PyObject *self;
 129     PyObject *args;
 130 {
 131     char *command;
 132     int sts;
 133
 134     if (!PyArg_ParseTuple(args, "s", &command))
 135         return NULL;
 136     sts = system(command);
 137     return Py_BuildValue("i", sts);
 138 }
 139 \end{verbatim}
 140
 141 There is a straightforward translation from the argument list in
 142 Python (e.g.\ the single expression \code{"ls -l"}) to the arguments
 143 passed to the C function.  The C function always has two arguments,
 144 conventionally named \var{self} and \var{args}.
 145
 146 The \var{self} argument is only used when the C function implements a
 147 built-in method, not a function. In the example, \var{self} will
 148 always be a \NULL{} pointer, since we are defining a function, not a
 149 method.  (This is done so that the interpreter doesn't have to
 150 understand two different types of C functions.)
 151
 152 The \var{args} argument will be a pointer to a Python tuple object
 153 containing the arguments.  Each item of the tuple corresponds to an
 154 argument in the call's argument list.  The arguments are Python
 155 objects --- in order to do anything with them in our C function we have
 156 to convert them to C values.  The function \cfunction{PyArg_ParseTuple()}
 157 in the Python API checks the argument types and converts them to C
 158 values.  It uses a template string to determine the required types of
 159 the arguments as well as the types of the C variables into which to
 160 store the converted values.  More about this later.
 161
 162 \cfunction{PyArg_ParseTuple()} returns true (nonzero) if all arguments have
 163 the right type and its components have been stored in the variables
 164 whose addresses are passed.  It returns false (zero) if an invalid
 165 argument list was passed.  In the latter case it also raises an
 166 appropriate exception by so the calling function can return
 167 \NULL{} immediately (as we saw in the example).
 168
 169
 170 \section{Intermezzo: Errors and Exceptions
 171          \label{errors}}
 172
 173 An important convention throughout the Python interpreter is the
 174 following: when a function fails, it should set an exception condition
 175 and return an error value (usually a \NULL{} pointer).  Exceptions
 176 are stored in a static global variable inside the interpreter; if this
 177 variable is \NULL{} no exception has occurred.  A second global
 178 variable stores the ``associated value'' of the exception (the second
 179 argument to \keyword{raise}).  A third variable contains the stack
 180 traceback in case the error originated in Python code.  These three
 181 variables are the C equivalents of the Python variables
 182 \code{sys.exc_type}, \code{sys.exc_value} and \code{sys.exc_traceback} (see
 183 the section on module \module{sys} in the
 184 \citetitle[../lib/lib.html]{Python Library Reference}).  It is
 185 important to know about them to understand how errors are passed
 186 around.
 187
 188 The Python API defines a number of functions to set various types of
 189 exceptions.
 190
 191 The most common one is \cfunction{PyErr_SetString()}.  Its arguments
 192 are an exception object and a C string.  The exception object is
 193 usually a predefined object like \cdata{PyExc_ZeroDivisionError}.  The
 194 C string indicates the cause of the error and is converted to a
 195 Python string object and stored as the ``associated value'' of the
 196 exception.
 197
 198 Another useful function is \cfunction{PyErr_SetFromErrno()}, which only
 199 takes an exception argument and constructs the associated value by
 200 inspection of the (\UNIX{}) global variable \cdata{errno}.  The most
 201 general function is \cfunction{PyErr_SetObject()}, which takes two object
 202 arguments, the exception and its associated value.  You don't need to
 203 \cfunction{Py_INCREF()} the objects passed to any of these functions.
 204
 205 You can test non-destructively whether an exception has been set with
 206 \cfunction{PyErr_Occurred()}.  This returns the current exception object,
 207 or \NULL{} if no exception has occurred.  You normally don't need
 208 to call \cfunction{PyErr_Occurred()} to see whether an error occurred in a
 209 function call, since you should be able to tell from the return value.
 210
 211 When a function \var{f} that calls another function \var{g} detects
 212 that the latter fails, \var{f} should itself return an error value
 213 (e.g.\ \NULL{} or \code{-1}).  It should \emph{not} call one of the
 214 \cfunction{PyErr_*()} functions --- one has already been called by \var{g}.
 215 \var{f}'s caller is then supposed to also return an error indication
 216 to \emph{its} caller, again \emph{without} calling \cfunction{PyErr_*()},
 217 and so on --- the most detailed cause of the error was already
 218 reported by the function that first detected it.  Once the error
 219 reaches the Python interpreter's main loop, this aborts the currently
 220 executing Python code and tries to find an exception handler specified
 221 by the Python programmer.
 222
 223 (There are situations where a module can actually give a more detailed
 224 error message by calling another \cfunction{PyErr_*()} function, and in
 225 such cases it is fine to do so.  As a general rule, however, this is
 226 not necessary, and can cause information about the cause of the error
 227 to be lost: most operations can fail for a variety of reasons.)
 228
 229 To ignore an exception set by a function call that failed, the exception
 230 condition must be cleared explicitly by calling \cfunction{PyErr_Clear()}.
 231 The only time C code should call \cfunction{PyErr_Clear()} is if it doesn't
 232 want to pass the error on to the interpreter but wants to handle it
 233 completely by itself (e.g.\ by trying something else or pretending
 234 nothing happened).
 235
 236 Note that a failing \cfunction{malloc()} call must be turned into an
 237 exception --- the direct caller of \cfunction{malloc()} (or
 238 \cfunction{realloc()}) must call \cfunction{PyErr_NoMemory()} and
 239 return a failure indicator itself.  All the object-creating functions
 240 (\cfunction{PyInt_FromLong()} etc.) already do this, so only if you
 241 call \cfunction{malloc()} directly this note is of importance.
 242
 243 Also note that, with the important exception of
 244 \cfunction{PyArg_ParseTuple()} and friends, functions that return an
 245 integer status usually return a positive value or zero for success and
 246 \code{-1} for failure, like \UNIX{} system calls.
 247
 248 Finally, be careful to clean up garbage (by making
 249 \cfunction{Py_XDECREF()} or \cfunction{Py_DECREF()} calls for objects
 250 you have already created) when you return an error indicator!
 251
 252 The choice of which exception to raise is entirely yours.  There are
 253 predeclared C objects corresponding to all built-in Python exceptions,
 254 e.g.\ \cdata{PyExc_ZeroDivisionError}, which you can use directly.  Of
 255 course, you should choose exceptions wisely --- don't use
 256 \cdata{PyExc_TypeError} to mean that a file couldn't be opened (that
 257 should probably be \cdata{PyExc_IOError}).  If something's wrong with
 258 the argument list, the \cfunction{PyArg_ParseTuple()} function usually
 259 raises \cdata{PyExc_TypeError}.  If you have an argument whose value
 260 must be in a particular range or must satisfy other conditions,
 261 \cdata{PyExc_ValueError} is appropriate.
 262
 263 You can also define a new exception that is unique to your module.
 264 For this, you usually declare a static object variable at the
 265 beginning of your file, e.g.
 266
 267 \begin{verbatim}
 268 static PyObject *SpamError;
 269 \end{verbatim}
 270
 271 and initialize it in your module's initialization function
 272 (\cfunction{initspam()}) with an exception object, e.g.\ (leaving out
 273 the error checking for now):
 274
 275 \begin{verbatim}
 276 void
 277 initspam()
 278 {
 279     PyObject *m, *d;
 280
 281     m = Py_InitModule("spam", SpamMethods);
 282     d = PyModule_GetDict(m);
 283     SpamError = PyErr_NewException("spam.error", NULL, NULL);
 284     PyDict_SetItemString(d, "error", SpamError);
 285 }
 286 \end{verbatim}
 287
 288 Note that the Python name for the exception object is
 289 \exception{spam.error}.  The \cfunction{PyErr_NewException()} function
 290 may create either a string or class, depending on whether the
 291 \programopt{-X} flag was passed to the interpreter.  If
 292 \programopt{-X} was used, \cdata{SpamError} will be a string object,
 293 otherwise it will be a class object with the base class being
 294 \exception{Exception}, described in the
 295 \citetitle[../lib/lib.html]{Python Library Reference} under ``Built-in
 296 Exceptions.''
 297
 298
 299 \section{Back to the Example
 300          \label{backToExample}}
 301
 302 Going back to our example function, you should now be able to
 303 understand this statement:
 304
 305 \begin{verbatim}
 306     if (!PyArg_ParseTuple(args, "s", &command))
 307         return NULL;
 308 \end{verbatim}
 309
 310 It returns \NULL{} (the error indicator for functions returning
 311 object pointers) if an error is detected in the argument list, relying
 312 on the exception set by \cfunction{PyArg_ParseTuple()}.  Otherwise the
 313 string value of the argument has been copied to the local variable
 314 \cdata{command}.  This is a pointer assignment and you are not supposed
 315 to modify the string to which it points (so in Standard C, the variable
 316 \cdata{command} should properly be declared as \samp{const char
 317 *command}).
 318
 319 The next statement is a call to the \UNIX{} function
 320 \cfunction{system()}, passing it the string we just got from
 321 \cfunction{PyArg_ParseTuple()}:
 322
 323 \begin{verbatim}
 324     sts = system(command);
 325 \end{verbatim}
 326
 327 Our \function{spam.system()} function must return the value of
 328 \cdata{sts} as a Python object.  This is done using the function
 329 \cfunction{Py_BuildValue()}, which is something like the inverse of
 330 \cfunction{PyArg_ParseTuple()}: it takes a format string and an
 331 arbitrary number of C values, and returns a new Python object.
 332 More info on \cfunction{Py_BuildValue()} is given later.
 333
 334 \begin{verbatim}
 335     return Py_BuildValue("i", sts);
 336 \end{verbatim}
 337
 338 In this case, it will return an integer object.  (Yes, even integers
 339 are objects on the heap in Python!)
 340
 341 If you have a C function that returns no useful argument (a function
 342 returning \ctype{void}), the corresponding Python function must return
 343 \code{None}.   You need this idiom to do so:
 344
 345 \begin{verbatim}
 346     Py_INCREF(Py_None);
 347     return Py_None;
 348 \end{verbatim}
 349
 350 \cdata{Py_None} is the C name for the special Python object
 351 \code{None}.  It is a genuine Python object rather than a \NULL{}
 352 pointer, which means ``error'' in most contexts, as we have seen.
 353
 354
 355 \section{The Module's Method Table and Initialization Function
 356          \label{methodTable}}
 357
 358 I promised to show how \cfunction{spam_system()} is called from Python
 359 programs.  First, we need to list its name and address in a ``method
 360 table'':
 361
 362 \begin{verbatim}
 363 static PyMethodDef SpamMethods[] = {
 364     ...
 365     {"system",  spam_system, METH_VARARGS},
 366     ...
 367     {NULL,      NULL}        /* Sentinel */
 368 };
 369 \end{verbatim}
 370
 371 Note the third entry (\samp{METH_VARARGS}).  This is a flag telling
 372 the interpreter the calling convention to be used for the C
 373 function.  It should normally always be \samp{METH_VARARGS} or
 374 \samp{METH_VARARGS | METH_KEYWORDS}; a value of \code{0} means that an
 375 obsolete variant of \cfunction{PyArg_ParseTuple()} is used.
 376
 377 When using only \samp{METH_VARARGS}, the function should expect
 378 the Python-level parameters to be passed in as a tuple acceptable for
 379 parsing via \cfunction{PyArg_ParseTuple()}; more information on this
 380 function is provided below.
 381
 382 The \constant{METH_KEYWORDS} bit may be set in the third field if keyword
 383 arguments should be passed to the function.  In this case, the C
 384 function should accept a third \samp{PyObject *} parameter which will
 385 be a dictionary of keywords.  Use \cfunction{PyArg_ParseTupleAndKeywords()}
 386 to parse the arguments to such a function.
 387
 388 The method table must be passed to the interpreter in the module's
 389 initialization function (which should be the only non-\code{static}
 390 item defined in the module file):
 391
 392 \begin{verbatim}
 393 void
 394 initspam()
 395 {
 396     (void) Py_InitModule("spam", SpamMethods);
 397 }
 398 \end{verbatim}
 399
 400 When the Python program imports module \module{spam} for the first
 401 time, \cfunction{initspam()} is called.  It calls
 402 \cfunction{Py_InitModule()}, which creates a ``module object'' (which
 403 is inserted in the dictionary \code{sys.modules} under the key
 404 \code{"spam"}), and inserts built-in function objects into the newly
 405 created module based upon the table (an array of \ctype{PyMethodDef}
 406 structures) that was passed as its second argument.
 407 \cfunction{Py_InitModule()} returns a pointer to the module object
 408 that it creates (which is unused here).  It aborts with a fatal error
 409 if the module could not be initialized satisfactorily, so the caller
 410 doesn't need to check for errors.
 411
 412 \strong{Note:}  Removing entries from \code{sys.modules} or importing
 413 compiled modules into multiple interpreters within a process (or
 414 following a \cfunction{fork()} without an intervening
 415 \cfunction{exec()}) can create problems for some extension modules.
 416 Extension module authors should exercise caution when initializing
 417 internal data structures.
 418
 419
 420 \section{Compilation and Linkage
 421          \label{compilation}}
 422
 423 There are two more things to do before you can use your new extension:
 424 compiling and linking it with the Python system.  If you use dynamic
 425 loading, the details depend on the style of dynamic loading your
 426 system uses; see the chapter ``Dynamic Loading'' for more information
 427 about this.
 428
 429 If you can't use dynamic loading, or if you want to make your module a
 430 permanent part of the Python interpreter, you will have to change the
 431 configuration setup and rebuild the interpreter.  Luckily, this is
 432 very simple: just place your file (\file{spammodule.c} for example) in
 433 the \file{Modules/} directory of an unpacked source distribution, add
 434 a line to the file \file{Modules/Setup.local} describing your file:
 435
 436 \begin{verbatim}
 437 spam spammodule.o
 438 \end{verbatim}
 439
 440 and rebuild the interpreter by running \program{make} in the toplevel
 441 directory.  You can also run \program{make} in the \file{Modules/}
 442 subdirectory, but then you must first rebuild \file{Makefile}
 443 there by running `\program{make} Makefile'.  (This is necessary each
 444 time you change the \file{Setup} file.)
 445
 446 If your module requires additional libraries to link with, these can
 447 be listed on the line in the configuration file as well, for instance:
 448
 449 \begin{verbatim}
 450 spam spammodule.o -lX11
 451 \end{verbatim}
 452
 453 \section{Calling Python Functions from C
 454          \label{callingPython}}
 455
 456 So far we have concentrated on making C functions callable from
 457 Python.  The reverse is also useful: calling Python functions from C.
 458 This is especially the case for libraries that support so-called
 459 ``callback'' functions.  If a C interface makes use of callbacks, the
 460 equivalent Python often needs to provide a callback mechanism to the
 461 Python programmer; the implementation will require calling the Python
 462 callback functions from a C callback.  Other uses are also imaginable.
 463
 464 Fortunately, the Python interpreter is easily called recursively, and
 465 there is a standard interface to call a Python function.  (I won't
 466 dwell on how to call the Python parser with a particular string as
 467 input --- if you're interested, have a look at the implementation of
 468 the \programopt{-c} command line option in \file{Python/pythonmain.c}
 469 from the Python source code.)
 470
 471 Calling a Python function is easy.  First, the Python program must
 472 somehow pass you the Python function object.  You should provide a
 473 function (or some other interface) to do this.  When this function is
 474 called, save a pointer to the Python function object (be careful to
 475 \cfunction{Py_INCREF()} it!) in a global variable --- or wherever you
 476 see fit. For example, the following function might be part of a module
 477 definition:
 478
 479 \begin{verbatim}
 480 static PyObject *my_callback = NULL;
 481
 482 static PyObject *
 483 my_set_callback(dummy, arg)
 484     PyObject *dummy, *arg;
 485 {
 486     PyObject *result = NULL;
 487     PyObject *temp;
 488
 489     if (PyArg_ParseTuple(args, "O:set_callback", &temp)) {
 490         if (!PyCallable_Check(temp)) {
 491             PyErr_SetString(PyExc_TypeError, "parameter must be callable");
 492             return NULL;
 493         }
 494         Py_XINCREF(temp);         /* Add a reference to new callback */
 495         Py_XDECREF(my_callback);  /* Dispose of previous callback */
 496         my_callback = temp;       /* Remember new callback */
 497         /* Boilerplate to return "None" */
 498         Py_INCREF(Py_None);
 499         result = Py_None;
 500     }
 501     return result;
 502 }
 503 \end{verbatim}
 504
 505 This function must be registered with the interpreter using the
 506 \constant{METH_VARARGS} flag; this is described in section
 507 \ref{methodTable}, ``The Module's Method Table and Initialization
 508 Function.''  The \cfunction{PyArg_ParseTuple()} function and its
 509 arguments are documented in section \ref{parseTuple}, ``Format Strings
 510 for \cfunction{PyArg_ParseTuple()}.''
 511
 512 The macros \cfunction{Py_XINCREF()} and \cfunction{Py_XDECREF()}
 513 increment/decrement the reference count of an object and are safe in
 514 the presence of \NULL{} pointers (but note that \var{temp} will not be
 515 \NULL{} in this context).  More info on them in section
 516 \ref{refcounts}, ``Reference Counts.''
 517
 518 Later, when it is time to call the function, you call the C function
 519 \cfunction{PyEval_CallObject()}.  This function has two arguments, both
 520 pointers to arbitrary Python objects: the Python function, and the
 521 argument list.  The argument list must always be a tuple object, whose
 522 length is the number of arguments.  To call the Python function with
 523 no arguments, pass an empty tuple; to call it with one argument, pass
 524 a singleton tuple.  \cfunction{Py_BuildValue()} returns a tuple when its
 525 format string consists of zero or more format codes between
 526 parentheses.  For example:
 527
 528 \begin{verbatim}
 529     int arg;
 530     PyObject *arglist;
 531     PyObject *result;
 532     ...
 533     arg = 123;
 534     ...
 535     /* Time to call the callback */
 536     arglist = Py_BuildValue("(i)", arg);
 537     result = PyEval_CallObject(my_callback, arglist);
 538     Py_DECREF(arglist);
 539 \end{verbatim}
 540
 541 \cfunction{PyEval_CallObject()} returns a Python object pointer: this is
 542 the return value of the Python function.  \cfunction{PyEval_CallObject()} is
 543 ``reference-count-neutral'' with respect to its arguments.  In the
 544 example a new tuple was created to serve as the argument list, which
 545 is \cfunction{Py_DECREF()}-ed immediately after the call.
 546
 547 The return value of \cfunction{PyEval_CallObject()} is ``new'': either it
 548 is a brand new object, or it is an existing object whose reference
 549 count has been incremented.  So, unless you want to save it in a
 550 global variable, you should somehow \cfunction{Py_DECREF()} the result,
 551 even (especially!) if you are not interested in its value.
 552
 553 Before you do this, however, it is important to check that the return
 554 value isn't \NULL{}.  If it is, the Python function terminated by
 555 raising an exception.  If the C code that called
 556 \cfunction{PyEval_CallObject()} is called from Python, it should now
 557 return an error indication to its Python caller, so the interpreter
 558 can print a stack trace, or the calling Python code can handle the
 559 exception.  If this is not possible or desirable, the exception should
 560 be cleared by calling \cfunction{PyErr_Clear()}.  For example:
 561
 562 \begin{verbatim}
 563     if (result == NULL)
 564         return NULL; /* Pass error back */
 565     ...use result...
 566     Py_DECREF(result);
 567 \end{verbatim}
 568
 569 Depending on the desired interface to the Python callback function,
 570 you may also have to provide an argument list to
 571 \cfunction{PyEval_CallObject()}.  In some cases the argument list is
 572 also provided by the Python program, through the same interface that
 573 specified the callback function.  It can then be saved and used in the
 574 same manner as the function object.  In other cases, you may have to
 575 construct a new tuple to pass as the argument list.  The simplest way
 576 to do this is to call \cfunction{Py_BuildValue()}.  For example, if
 577 you want to pass an integral event code, you might use the following
 578 code:
 579
 580 \begin{verbatim}
 581     PyObject *arglist;
 582     ...
 583     arglist = Py_BuildValue("(l)", eventcode);
 584     result = PyEval_CallObject(my_callback, arglist);
 585     Py_DECREF(arglist);
 586     if (result == NULL)
 587         return NULL; /* Pass error back */
 588     /* Here maybe use the result */
 589     Py_DECREF(result);
 590 \end{verbatim}
 591
 592 Note the placement of \samp{Py_DECREF(arglist)} immediately after the
 593 call, before the error check!  Also note that strictly spoken this
 594 code is not complete: \cfunction{Py_BuildValue()} may run out of
 595 memory, and this should be checked.
 596
 597
 598 \section{Format Strings for \cfunction{PyArg_ParseTuple()}
 599          \label{parseTuple}}
 600
 601 The \cfunction{PyArg_ParseTuple()} function is declared as follows:
 602
 603 \begin{verbatim}
 604 int PyArg_ParseTuple(PyObject *arg, char *format, ...);
 605 \end{verbatim}
 606
 607 The \var{arg} argument must be a tuple object containing an argument
 608 list passed from Python to a C function.  The \var{format} argument
 609 must be a format string, whose syntax is explained below.  The
 610 remaining arguments must be addresses of variables whose type is
 611 determined by the format string.  For the conversion to succeed, the
 612 \var{arg} object must match the format and the format must be
 613 exhausted.
 614
 615 Note that while \cfunction{PyArg_ParseTuple()} checks that the Python
 616 arguments have the required types, it cannot check the validity of the
 617 addresses of C variables passed to the call: if you make mistakes
 618 there, your code will probably crash or at least overwrite random bits
 619 in memory.  So be careful!
 620
 621 A format string consists of zero or more ``format units''.  A format
 622 unit describes one Python object; it is usually a single character or
 623 a parenthesized sequence of format units.  With a few exceptions, a
 624 format unit that is not a parenthesized sequence normally corresponds
 625 to a single address argument to \cfunction{PyArg_ParseTuple()}.  In the
 626 following description, the quoted form is the format unit; the entry
 627 in (round) parentheses is the Python object type that matches the
 628 format unit; and the entry in [square] brackets is the type of the C
 629 variable(s) whose address should be passed.  (Use the \samp{\&}
 630 operator to pass a variable's address.)
 631
 632 \begin{description}
 633
 634 \item[\samp{s} (string) {[char *]}]
 635 Convert a Python string to a C pointer to a character string.  You
 636 must not provide storage for the string itself; a pointer to an
 637 existing string is stored into the character pointer variable whose
 638 address you pass.  The C string is null-terminated.  The Python string
 639 must not contain embedded null bytes; if it does, a \exception{TypeError}
 640 exception is raised.
 641
 642 \item[\samp{s\#} (string) {[char *, int]}]
 643 This variant on \samp{s} stores into two C variables, the first one
 644 a pointer to a character string, the second one its length.  In this
 645 case the Python string may contain embedded null bytes.
 646
 647 \item[\samp{z} (string or \code{None}) {[char *]}]
 648 Like \samp{s}, but the Python object may also be \code{None}, in which
 649 case the C pointer is set to \NULL{}.
 650
 651 \item[\samp{z\#} (string or \code{None}) {[char *, int]}]
 652 This is to \samp{s\#} as \samp{z} is to \samp{s}.
 653
 654 \item[\samp{b} (integer) {[char]}]
 655 Convert a Python integer to a tiny int, stored in a C \ctype{char}.
 656
 657 \item[\samp{h} (integer) {[short int]}]
 658 Convert a Python integer to a C \ctype{short int}.
 659
 660 \item[\samp{i} (integer) {[int]}]
 661 Convert a Python integer to a plain C \ctype{int}.
 662
 663 \item[\samp{l} (integer) {[long int]}]
 664 Convert a Python integer to a C \ctype{long int}.
 665
 666 \item[\samp{c} (string of length 1) {[char]}]
 667 Convert a Python character, represented as a string of length 1, to a
 668 C \ctype{char}.
 669
 670 \item[\samp{f} (float) {[float]}]
 671 Convert a Python floating point number to a C \ctype{float}.
 672
 673 \item[\samp{d} (float) {[double]}]
 674 Convert a Python floating point number to a C \ctype{double}.
 675
 676 \item[\samp{D} (complex) {[Py_complex]}]
 677 Convert a Python complex number to a C \ctype{Py_complex} structure.
 678
 679 \item[\samp{O} (object) {[PyObject *]}]
 680 Store a Python object (without any conversion) in a C object pointer.
 681 The C program thus receives the actual object that was passed.  The
 682 object's reference count is not increased.  The pointer stored is not
 683 \NULL{}.
 684
 685 \item[\samp{O!} (object) {[\var{typeobject}, PyObject *]}]
 686 Store a Python object in a C object pointer.  This is similar to
 687 \samp{O}, but takes two C arguments: the first is the address of a
 688 Python type object, the second is the address of the C variable (of
 689 type \ctype{PyObject *}) into which the object pointer is stored.
 690 If the Python object does not have the required type, a
 691 \exception{TypeError} exception is raised.
 692
 693 \item[\samp{O\&} (object) {[\var{converter}, \var{anything}]}]
 694 Convert a Python object to a C variable through a \var{converter}
 695 function.  This takes two arguments: the first is a function, the
 696 second is the address of a C variable (of arbitrary type), converted
 697 to \ctype{void *}.  The \var{converter} function in turn is called as
 698 follows:
 699
 700 \var{status}\code{ = }\var{converter}\code{(}\var{object}, \var{address}\code{);}
 701
 702 where \var{object} is the Python object to be converted and
 703 \var{address} is the \ctype{void *} argument that was passed to
 704 \cfunction{PyArg_ConvertTuple()}.  The returned \var{status} should be
 705 \code{1} for a successful conversion and \code{0} if the conversion
 706 has failed.  When the conversion fails, the \var{converter} function
 707 should raise an exception.
 708
 709 \item[\samp{S} (string) {[PyStringObject *]}]
 710 Like \samp{O} but requires that the Python object is a string object.
 711 Raises a \exception{TypeError} exception if the object is not a string
 712 object.  The C variable may also be declared as \ctype{PyObject *}.
 713
 714 \item[\samp{t\#} (read-only character buffer) {[char *, int]}]
 715 Like \samp{s\#}, but accepts any object which implements the read-only
 716 buffer interface.  The \ctype{char *} variable is set to point to the
 717 first byte of the buffer, and the \ctype{int} is set to the length of
 718 the buffer.  Only single-segment buffer objects are accepted;
 719 \exception{TypeError} is raised for all others.
 720
 721 \item[\samp{w} (read-write character buffer) {[char *]}]
 722 Similar to \samp{s}, but accepts any object which implements the
 723 read-write buffer interface.  The caller must determine the length of
 724 the buffer by other means, or use \samp{w\#} instead.  Only
 725 single-segment buffer objects are accepted; \exception{TypeError} is
 726 raised for all others.
 727
 728 \item[\samp{w\#} (read-write character buffer) {[char *, int]}]
 729 Like \samp{s\#}, but accepts any object which implements the
 730 read-write buffer interface.  The \ctype{char *} variable is set to
 731 point to the first byte of the buffer, and the \ctype{int} is set to
 732 the length of the buffer.  Only single-segment buffer objects are
 733 accepted; \exception{TypeError} is raised for all others.
 734
 735 \item[\samp{(\var{items})} (tuple) {[\var{matching-items}]}]
 736 The object must be a Python sequence whose length is the number of
 737 format units in \var{items}.  The C arguments must correspond to the
 738 individual format units in \var{items}.  Format units for sequences
 739 may be nested.
 740
 741 \strong{Note:} Prior to Python version 1.5.2, this format specifier
 742 only accepted a tuple containing the individual parameters, not an
 743 arbitrary sequence.  Code which previously caused a
 744 \exception{TypeError} to be raised here may now proceed without an
 745 exception.  This is not expected to be a problem for existing code.
 746
 747 \end{description}
 748
 749 It is possible to pass Python long integers where integers are
 750 requested; however no proper range checking is done --- the most
 751 significant bits are silently truncated when the receiving field is
 752 too small to receive the value (actually, the semantics are inherited
 753 from downcasts in C --- your mileage may vary).
 754
 755 A few other characters have a meaning in a format string.  These may
 756 not occur inside nested parentheses.  They are:
 757
 758 \begin{description}
 759
 760 \item[\samp{|}]
 761 Indicates that the remaining arguments in the Python argument list are
 762 optional.  The C variables corresponding to optional arguments should
 763 be initialized to their default value --- when an optional argument is
 764 not specified, \cfunction{PyArg_ParseTuple()} does not touch the contents
 765 of the corresponding C variable(s).
 766
 767 \item[\samp{:}]
 768 The list of format units ends here; the string after the colon is used
 769 as the function name in error messages (the ``associated value'' of
 770 the exception that \cfunction{PyArg_ParseTuple()} raises).
 771
 772 \item[\samp{;}]
 773 The list of format units ends here; the string after the colon is used
 774 as the error message \emph{instead} of the default error message.
 775 Clearly, \samp{:} and \samp{;} mutually exclude each other.
 776
 777 \end{description}
 778
 779 Some example calls:
 780
 781 \begin{verbatim}
 782     int ok;
 783     int i, j;
 784     long k, l;
 785     char *s;
 786     int size;
 787
 788     ok = PyArg_ParseTuple(args, ""); /* No arguments */
 789         /* Python call: f() */
 790 \end{verbatim}
 791
 792 \begin{verbatim}
 793     ok = PyArg_ParseTuple(args, "s", &s); /* A string */
 794         /* Possible Python call: f('whoops!') */
 795 \end{verbatim}
 796
 797 \begin{verbatim}
 798     ok = PyArg_ParseTuple(args, "lls", &k, &l, &s); /* Two longs and a string */
 799         /* Possible Python call: f(1, 2, 'three') */
 800 \end{verbatim}
 801
 802 \begin{verbatim}
 803     ok = PyArg_ParseTuple(args, "(ii)s#", &i, &j, &s, &size);
 804         /* A pair of ints and a string, whose size is also returned */
 805         /* Possible Python call: f((1, 2), 'three') */
 806 \end{verbatim}
 807
 808 \begin{verbatim}
 809     {
 810         char *file;
 811         char *mode = "r";
 812         int bufsize = 0;
 813         ok = PyArg_ParseTuple(args, "s|si", &file, &mode, &bufsize);
 814         /* A string, and optionally another string and an integer */
 815         /* Possible Python calls:
 816            f('spam')
 817            f('spam', 'w')
 818            f('spam', 'wb', 100000) */
 819     }
 820 \end{verbatim}
 821
 822 \begin{verbatim}
 823     {
 824         int left, top, right, bottom, h, v;
 825         ok = PyArg_ParseTuple(args, "((ii)(ii))(ii)",
 826                  &left, &top, &right, &bottom, &h, &v);
 827         /* A rectangle and a point */
 828         /* Possible Python call:
 829            f(((0, 0), (400, 300)), (10, 10)) */
 830     }
 831 \end{verbatim}
 832
 833 \begin{verbatim}
 834     {
 835         Py_complex c;
 836         ok = PyArg_ParseTuple(args, "D:myfunction", &c);
 837         /* a complex, also providing a function name for errors */
 838         /* Possible Python call: myfunction(1+2j) */
 839     }
 840 \end{verbatim}
 841
 842
 843 \section{Keyword Parsing with \cfunction{PyArg_ParseTupleAndKeywords()}
 844          \label{parseTupleAndKeywords}}
 845
 846 The \cfunction{PyArg_ParseTupleAndKeywords()} function is declared as
 847 follows:
 848
 849 \begin{verbatim}
 850 int PyArg_ParseTupleAndKeywords(PyObject *arg, PyObject *kwdict,
 851                                 char *format, char **kwlist, ...);
 852 \end{verbatim}
 853
 854 The \var{arg} and \var{format} parameters are identical to those of the
 855 \cfunction{PyArg_ParseTuple()} function.  The \var{kwdict} parameter
 856 is the dictionary of keywords received as the third parameter from the
 857 Python runtime.  The \var{kwlist} parameter is a \NULL{}-terminated
 858 list of strings which identify the parameters; the names are matched
 859 with the type information from \var{format} from left to right.
 860
 861 \strong{Note:}  Nested tuples cannot be parsed when using keyword
 862 arguments!  Keyword parameters passed in which are not present in the
 863 \var{kwlist} will cause \exception{TypeError} to be raised.
 864
 865 Here is an example module which uses keywords, based on an example by
 866 Geoff Philbrick (\email{philbrick@hks.com}):%
 867 \index{Philbrick, Geoff}
 868
 869 \begin{verbatim}
 870 #include <stdio.h>
 871 #include "Python.h"
 872
 873 static PyObject *
 874 keywdarg_parrot(self, args, keywds)
 875     PyObject *self;
 876     PyObject *args;
 877     PyObject *keywds;
 878 {
 879     int voltage;
 880     char *state = "a stiff";
 881     char *action = "voom";
 882     char *type = "Norwegian Blue";
 883
 884     static char *kwlist[] = {"voltage", "state", "action", "type", NULL};
 885
 886     if (!PyArg_ParseTupleAndKeywords(args, keywds, "i|sss", kwlist,
 887                                      &voltage, &state, &action, &type))
 888         return NULL;
 889
 890     printf("-- This parrot wouldn't %s if you put %i Volts through it.\n",
 891            action, voltage);
 892     printf("-- Lovely plumage, the %s -- It's %s!\n", type, state);
 893
 894     Py_INCREF(Py_None);
 895
 896     return Py_None;
 897 }
 898
 899 static PyMethodDef keywdarg_methods[] = {
 900     /* The cast of the function is necessary since PyCFunction values
 901      * only take two PyObject* parameters, and keywdarg_parrot() takes
 902      * three.
 903      */
 904     {"parrot", (PyCFunction)keywdarg_parrot, METH_VARARGS|METH_KEYWORDS},
 905     {NULL,  NULL}   /* sentinel */
 906 };
 907
 908 void
 909 initkeywdarg()
 910 {
 911   /* Create the module and add the functions */
 912   Py_InitModule("keywdarg", keywdarg_methods);
 913 }
 914 \end{verbatim}
 915
 916
 917 \section{The \cfunction{Py_BuildValue()} Function
 918          \label{buildValue}}
 919
 920 This function is the counterpart to \cfunction{PyArg_ParseTuple()}.  It is
 921 declared as follows:
 922
 923 \begin{verbatim}
 924 PyObject *Py_BuildValue(char *format, ...);
 925 \end{verbatim}
 926
 927 It recognizes a set of format units similar to the ones recognized by
 928 \cfunction{PyArg_ParseTuple()}, but the arguments (which are input to the
 929 function, not output) must not be pointers, just values.  It returns a
 930 new Python object, suitable for returning from a C function called
 931 from Python.
 932
 933 One difference with \cfunction{PyArg_ParseTuple()}: while the latter
 934 requires its first argument to be a tuple (since Python argument lists
 935 are always represented as tuples internally),
 936 \cfunction{Py_BuildValue()} does not always build a tuple.  It builds
 937 a tuple only if its format string contains two or more format units.
 938 If the format string is empty, it returns \code{None}; if it contains
 939 exactly one format unit, it returns whatever object is described by
 940 that format unit.  To force it to return a tuple of size 0 or one,
 941 parenthesize the format string.
 942
 943 In the following description, the quoted form is the format unit; the
 944 entry in (round) parentheses is the Python object type that the format
 945 unit will return; and the entry in [square] brackets is the type of
 946 the C value(s) to be passed.
 947
 948 The characters space, tab, colon and comma are ignored in format
 949 strings (but not within format units such as \samp{s\#}).  This can be
 950 used to make long format strings a tad more readable.
 951
 952 \begin{description}
 953
 954 \item[\samp{s} (string) {[char *]}]
 955 Convert a null-terminated C string to a Python object.  If the C
 956 string pointer is \NULL{}, \code{None} is returned.
 957
 958 \item[\samp{s\#} (string) {[char *, int]}]
 959 Convert a C string and its length to a Python object.  If the C string
 960 pointer is \NULL{}, the length is ignored and \code{None} is
 961 returned.
 962
 963 \item[\samp{z} (string or \code{None}) {[char *]}]
 964 Same as \samp{s}.
 965
 966 \item[\samp{z\#} (string or \code{None}) {[char *, int]}]
 967 Same as \samp{s\#}.
 968
 969 \item[\samp{i} (integer) {[int]}]
 970 Convert a plain C \ctype{int} to a Python integer object.
 971
 972 \item[\samp{b} (integer) {[char]}]
 973 Same as \samp{i}.
 974
 975 \item[\samp{h} (integer) {[short int]}]
 976 Same as \samp{i}.
 977
 978 \item[\samp{l} (integer) {[long int]}]
 979 Convert a C \ctype{long int} to a Python integer object.
 980
 981 \item[\samp{c} (string of length 1) {[char]}]
 982 Convert a C \ctype{int} representing a character to a Python string of
 983 length 1.
 984
 985 \item[\samp{d} (float) {[double]}]
 986 Convert a C \ctype{double} to a Python floating point number.
 987
 988 \item[\samp{f} (float) {[float]}]
 989 Same as \samp{d}.
 990
 991 \item[\samp{O} (object) {[PyObject *]}]
 992 Pass a Python object untouched (except for its reference count, which
 993 is incremented by one).  If the object passed in is a \NULL{}
 994 pointer, it is assumed that this was caused because the call producing
 995 the argument found an error and set an exception.  Therefore,
 996 \cfunction{Py_BuildValue()} will return \NULL{} but won't raise an
 997 exception.  If no exception has been raised yet,
 998 \cdata{PyExc_SystemError} is set.
 999
1000 \item[\samp{S} (object) {[PyObject *]}]
1001 Same as \samp{O}.
1002
1003 \item[\samp{N} (object) {[PyObject *]}]
1004 Same as \samp{O}, except it doesn't increment the reference count on
1005 the object.  Useful when the object is created by a call to an object
1006 constructor in the argument list.
1007
1008 \item[\samp{O\&} (object) {[\var{converter}, \var{anything}]}]
1009 Convert \var{anything} to a Python object through a \var{converter}
1010 function.  The function is called with \var{anything} (which should be
1011 compatible with \ctype{void *}) as its argument and should return a
1012 ``new'' Python object, or \NULL{} if an error occurred.
1013
1014 \item[\samp{(\var{items})} (tuple) {[\var{matching-items}]}]
1015 Convert a sequence of C values to a Python tuple with the same number
1016 of items.
1017
1018 \item[\samp{[\var{items}]} (list) {[\var{matching-items}]}]
1019 Convert a sequence of C values to a Python list with the same number
1020 of items.
1021
1022 \item[\samp{\{\var{items}\}} (dictionary) {[\var{matching-items}]}]
1023 Convert a sequence of C values to a Python dictionary.  Each pair of
1024 consecutive C values adds one item to the dictionary, serving as key
1025 and value, respectively.
1026
1027 \end{description}
1028
1029 If there is an error in the format string, the
1030 \cdata{PyExc_SystemError} exception is raised and \NULL{} returned.
1031
1032 Examples (to the left the call, to the right the resulting Python value):
1033
1034 \begin{verbatim}
1035     Py_BuildValue("")                        None
1036     Py_BuildValue("i", 123)                  123
1037     Py_BuildValue("iii", 123, 456, 789)      (123, 456, 789)
1038     Py_BuildValue("s", "hello")              'hello'
1039     Py_BuildValue("ss", "hello", "world")    ('hello', 'world')
1040     Py_BuildValue("s#", "hello", 4)          'hell'
1041     Py_BuildValue("()")                      ()
1042     Py_BuildValue("(i)", 123)                (123,)
1043     Py_BuildValue("(ii)", 123, 456)          (123, 456)
1044     Py_BuildValue("(i,i)", 123, 456)         (123, 456)
1045     Py_BuildValue("[i,i]", 123, 456)         [123, 456]
1046     Py_BuildValue("{s:i,s:i}",
1047                   "abc", 123, "def", 456)    {'abc': 123, 'def': 456}
1048     Py_BuildValue("((ii)(ii)) (ii)",
1049                   1, 2, 3, 4, 5, 6)          (((1, 2), (3, 4)), (5, 6))
1050 \end{verbatim}
1051
1052
1053 \section{Reference Counts
1054          \label{refcounts}}
1055
1056 In languages like C or \Cpp{}, the programmer is responsible for
1057 dynamic allocation and deallocation of memory on the heap.  In C,
1058 this is done using the functions \cfunction{malloc()} and
1059 \cfunction{free()}.  In \Cpp{}, the operators \keyword{new} and
1060 \keyword{delete} are used with essentially the same meaning; they are
1061 actually implemented using \cfunction{malloc()} and
1062 \cfunction{free()}, so we'll restrict the following discussion to the
1063 latter.
1064
1065 Every block of memory allocated with \cfunction{malloc()} should
1066 eventually be returned to the pool of available memory by exactly one
1067 call to \cfunction{free()}.  It is important to call
1068 \cfunction{free()} at the right time.  If a block's address is
1069 forgotten but \cfunction{free()} is not called for it, the memory it
1070 occupies cannot be reused until the program terminates.  This is
1071 called a \dfn{memory leak}.  On the other hand, if a program calls
1072 \cfunction{free()} for a block and then continues to use the block, it
1073 creates a conflict with re-use of the block through another
1074 \cfunction{malloc()} call.  This is called \dfn{using freed memory}.
1075 It has the same bad consequences as referencing uninitialized data ---
1076 core dumps, wrong results, mysterious crashes.
1077
1078 Common causes of memory leaks are unusual paths through the code.  For
1079 instance, a function may allocate a block of memory, do some
1080 calculation, and then free the block again.  Now a change in the
1081 requirements for the function may add a test to the calculation that
1082 detects an error condition and can return prematurely from the
1083 function.  It's easy to forget to free the allocated memory block when
1084 taking this premature exit, especially when it is added later to the
1085 code.  Such leaks, once introduced, often go undetected for a long
1086 time: the error exit is taken only in a small fraction of all calls,
1087 and most modern machines have plenty of virtual memory, so the leak
1088 only becomes apparent in a long-running process that uses the leaking
1089 function frequently.  Therefore, it's important to prevent leaks from
1090 happening by having a coding convention or strategy that minimizes
1091 this kind of errors.
1092
1093 Since Python makes heavy use of \cfunction{malloc()} and
1094 \cfunction{free()}, it needs a strategy to avoid memory leaks as well
1095 as the use of freed memory.  The chosen method is called
1096 \dfn{reference counting}.  The principle is simple: every object
1097 contains a counter, which is incremented when a reference to the
1098 object is stored somewhere, and which is decremented when a reference
1099 to it is deleted.  When the counter reaches zero, the last reference
1100 to the object has been deleted and the object is freed.
1101
1102 An alternative strategy is called \dfn{automatic garbage collection}.
1103 (Sometimes, reference counting is also referred to as a garbage
1104 collection strategy, hence my use of ``automatic'' to distinguish the
1105 two.)  The big advantage of automatic garbage collection is that the
1106 user doesn't need to call \cfunction{free()} explicitly.  (Another claimed
1107 advantage is an improvement in speed or memory usage --- this is no
1108 hard fact however.)  The disadvantage is that for C, there is no
1109 truly portable automatic garbage collector, while reference counting
1110 can be implemented portably (as long as the functions \cfunction{malloc()}
1111 and \cfunction{free()} are available --- which the C Standard guarantees).
1112 Maybe some day a sufficiently portable automatic garbage collector
1113 will be available for C.  Until then, we'll have to live with
1114 reference counts.
1115
1116 \subsection{Reference Counting in Python
1117             \label{refcountsInPython}}
1118
1119 There are two macros, \code{Py_INCREF(x)} and \code{Py_DECREF(x)},
1120 which handle the incrementing and decrementing of the reference count.
1121 \cfunction{Py_DECREF()} also frees the object when the count reaches zero.
1122 For flexibility, it doesn't call \cfunction{free()} directly --- rather, it
1123 makes a call through a function pointer in the object's \dfn{type
1124 object}.  For this purpose (and others), every object also contains a
1125 pointer to its type object.
1126
1127 The big question now remains: when to use \code{Py_INCREF(x)} and
1128 \code{Py_DECREF(x)}?  Let's first introduce some terms.  Nobody
1129 ``owns'' an object; however, you can \dfn{own a reference} to an
1130 object.  An object's reference count is now defined as the number of
1131 owned references to it.  The owner of a reference is responsible for
1132 calling \cfunction{Py_DECREF()} when the reference is no longer
1133 needed.  Ownership of a reference can be transferred.  There are three
1134 ways to dispose of an owned reference: pass it on, store it, or call
1135 \cfunction{Py_DECREF()}.  Forgetting to dispose of an owned reference
1136 creates a memory leak.
1137
1138 It is also possible to \dfn{borrow}\footnote{The metaphor of
1139 ``borrowing'' a reference is not completely correct: the owner still
1140 has a copy of the reference.} a reference to an object.  The borrower
1141 of a reference should not call \cfunction{Py_DECREF()}.  The borrower must
1142 not hold on to the object longer than the owner from which it was
1143 borrowed.  Using a borrowed reference after the owner has disposed of
1144 it risks using freed memory and should be avoided
1145 completely.\footnote{Checking that the reference count is at least 1
1146 \strong{does not work} --- the reference count itself could be in
1147 freed memory and may thus be reused for another object!}
1148
1149 The advantage of borrowing over owning a reference is that you don't
1150 need to take care of disposing of the reference on all possible paths
1151 through the code --- in other words, with a borrowed reference you
1152 don't run the risk of leaking when a premature exit is taken.  The
1153 disadvantage of borrowing over leaking is that there are some subtle
1154 situations where in seemingly correct code a borrowed reference can be
1155 used after the owner from which it was borrowed has in fact disposed
1156 of it.
1157
1158 A borrowed reference can be changed into an owned reference by calling
1159 \cfunction{Py_INCREF()}.  This does not affect the status of the owner from
1160 which the reference was borrowed --- it creates a new owned reference,
1161 and gives full owner responsibilities (i.e., the new owner must
1162 dispose of the reference properly, as well as the previous owner).
1163
1164
1165 \subsection{Ownership Rules
1166             \label{ownershipRules}}
1167
1168 Whenever an object reference is passed into or out of a function, it
1169 is part of the function's interface specification whether ownership is
1170 transferred with the reference or not.
1171
1172 Most functions that return a reference to an object pass on ownership
1173 with the reference.  In particular, all functions whose function it is
1174 to create a new object, e.g.\ \cfunction{PyInt_FromLong()} and
1175 \cfunction{Py_BuildValue()}, pass ownership to the receiver.  Even if in
1176 fact, in some cases, you don't receive a reference to a brand new
1177 object, you still receive ownership of the reference.  For instance,
1178 \cfunction{PyInt_FromLong()} maintains a cache of popular values and can
1179 return a reference to a cached item.
1180
1181 Many functions that extract objects from other objects also transfer
1182 ownership with the reference, for instance
1183 \cfunction{PyObject_GetAttrString()}.  The picture is less clear, here,
1184 however, since a few common routines are exceptions:
1185 \cfunction{PyTuple_GetItem()}, \cfunction{PyList_GetItem()},
1186 \cfunction{PyDict_GetItem()}, and \cfunction{PyDict_GetItemString()}
1187 all return references that you borrow from the tuple, list or
1188 dictionary.
1189
1190 The function \cfunction{PyImport_AddModule()} also returns a borrowed
1191 reference, even though it may actually create the object it returns:
1192 this is possible because an owned reference to the object is stored in
1193 \code{sys.modules}.
1194
1195 When you pass an object reference into another function, in general,
1196 the function borrows the reference from you --- if it needs to store
1197 it, it will use \cfunction{Py_INCREF()} to become an independent
1198 owner.  There are exactly two important exceptions to this rule:
1199 \cfunction{PyTuple_SetItem()} and \cfunction{PyList_SetItem()}.  These
1200 functions take over ownership of the item passed to them --- even if
1201 they fail!  (Note that \cfunction{PyDict_SetItem()} and friends don't
1202 take over ownership --- they are ``normal.'')
1203
1204 When a C function is called from Python, it borrows references to its
1205 arguments from the caller.  The caller owns a reference to the object,
1206 so the borrowed reference's lifetime is guaranteed until the function
1207 returns.  Only when such a borrowed reference must be stored or passed
1208 on, it must be turned into an owned reference by calling
1209 \cfunction{Py_INCREF()}.
1210
1211 The object reference returned from a C function that is called from
1212 Python must be an owned reference --- ownership is tranferred from the
1213 function to its caller.
1214
1215
1216 \subsection{Thin Ice
1217             \label{thinIce}}
1218
1219 There are a few situations where seemingly harmless use of a borrowed
1220 reference can lead to problems.  These all have to do with implicit
1221 invocations of the interpreter, which can cause the owner of a
1222 reference to dispose of it.
1223
1224 The first and most important case to know about is using
1225 \cfunction{Py_DECREF()} on an unrelated object while borrowing a
1226 reference to a list item.  For instance:
1227
1228 \begin{verbatim}
1229 bug(PyObject *list) {
1230     PyObject *item = PyList_GetItem(list, 0);
1231
1232     PyList_SetItem(list, 1, PyInt_FromLong(0L));
1233     PyObject_Print(item, stdout, 0); /* BUG! */
1234 }
1235 \end{verbatim}
1236
1237 This function first borrows a reference to \code{list[0]}, then
1238 replaces \code{list[1]} with the value \code{0}, and finally prints
1239 the borrowed reference.  Looks harmless, right?  But it's not!
1240
1241 Let's follow the control flow into \cfunction{PyList_SetItem()}.  The list
1242 owns references to all its items, so when item 1 is replaced, it has
1243 to dispose of the original item 1.  Now let's suppose the original
1244 item 1 was an instance of a user-defined class, and let's further
1245 suppose that the class defined a \method{__del__()} method.  If this
1246 class instance has a reference count of 1, disposing of it will call
1247 its \method{__del__()} method.
1248
1249 Since it is written in Python, the \method{__del__()} method can execute
1250 arbitrary Python code.  Could it perhaps do something to invalidate
1251 the reference to \code{item} in \cfunction{bug()}?  You bet!  Assuming
1252 that the list passed into \cfunction{bug()} is accessible to the
1253 \method{__del__()} method, it could execute a statement to the effect of
1254 \samp{del list[0]}, and assuming this was the last reference to that
1255 object, it would free the memory associated with it, thereby
1256 invalidating \code{item}.
1257
1258 The solution, once you know the source of the problem, is easy:
1259 temporarily increment the reference count.  The correct version of the
1260 function reads:
1261
1262 \begin{verbatim}
1263 no_bug(PyObject *list) {
1264     PyObject *item = PyList_GetItem(list, 0);
1265
1266     Py_INCREF(item);
1267     PyList_SetItem(list, 1, PyInt_FromLong(0L));
1268     PyObject_Print(item, stdout, 0);
1269     Py_DECREF(item);
1270 }
1271 \end{verbatim}
1272
1273 This is a true story.  An older version of Python contained variants
1274 of this bug and someone spent a considerable amount of time in a C
1275 debugger to figure out why his \method{__del__()} methods would fail...
1276
1277 The second case of problems with a borrowed reference is a variant
1278 involving threads.  Normally, multiple threads in the Python
1279 interpreter can't get in each other's way, because there is a global
1280 lock protecting Python's entire object space.  However, it is possible
1281 to temporarily release this lock using the macro
1282 \code{Py_BEGIN_ALLOW_THREADS}, and to re-acquire it using
1283 \code{Py_END_ALLOW_THREADS}.  This is common around blocking I/O
1284 calls, to let other threads use the CPU while waiting for the I/O to
1285 complete.  Obviously, the following function has the same problem as
1286 the previous one:
1287
1288 \begin{verbatim}
1289 bug(PyObject *list) {
1290     PyObject *item = PyList_GetItem(list, 0);
1291     Py_BEGIN_ALLOW_THREADS
1292     ...some blocking I/O call...
1293     Py_END_ALLOW_THREADS
1294     PyObject_Print(item, stdout, 0); /* BUG! */
1295 }
1296 \end{verbatim}
1297
1298
1299 \subsection{NULL Pointers
1300             \label{nullPointers}}
1301
1302 In general, functions that take object references as arguments do not
1303 expect you to pass them \NULL{} pointers, and will dump core (or
1304 cause later core dumps) if you do so.  Functions that return object
1305 references generally return \NULL{} only to indicate that an
1306 exception occurred.  The reason for not testing for \NULL{}
1307 arguments is that functions often pass the objects they receive on to
1308 other function --- if each function were to test for \NULL{},
1309 there would be a lot of redundant tests and the code would run slower.
1310
1311 It is better to test for \NULL{} only at the ``source'', i.e.\ when a
1312 pointer that may be \NULL{} is received, e.g.\ from
1313 \cfunction{malloc()} or from a function that may raise an exception.
1314
1315 The macros \cfunction{Py_INCREF()} and \cfunction{Py_DECREF()}
1316 do not check for \NULL{} pointers --- however, their variants
1317 \cfunction{Py_XINCREF()} and \cfunction{Py_XDECREF()} do.
1318
1319 The macros for checking for a particular object type
1320 (\code{Py\var{type}_Check()}) don't check for \NULL{} pointers ---
1321 again, there is much code that calls several of these in a row to test
1322 an object against various different expected types, and this would
1323 generate redundant tests.  There are no variants with \NULL{}
1324 checking.
1325
1326 The C function calling mechanism guarantees that the argument list
1327 passed to C functions (\code{args} in the examples) is never
1328 \NULL{} --- in fact it guarantees that it is always a tuple.\footnote{
1329 These guarantees don't hold when you use the ``old'' style
1330 calling convention --- this is still found in much existing code.}
1331
1332 It is a severe error to ever let a \NULL{} pointer ``escape'' to
1333 the Python user.
1334
1335
1336 \section{Writing Extensions in \Cpp{}
1337          \label{cplusplus}}
1338
1339 It is possible to write extension modules in \Cpp{}.  Some restrictions
1340 apply.  If the main program (the Python interpreter) is compiled and
1341 linked by the C compiler, global or static objects with constructors
1342 cannot be used.  This is not a problem if the main program is linked
1343 by the \Cpp{} compiler.  Functions that will be called by the
1344 Python interpreter (in particular, module initalization functions)
1345 have to be declared using \code{extern "C"}.
1346 It is unnecessary to enclose the Python header files in
1347 \code{extern "C" \{...\}} --- they use this form already if the symbol
1348 \samp{__cplusplus} is defined (all recent \Cpp{} compilers define this
1349 symbol).
1350
1351
1352 \section{Providing a C API for an Extension Module
1353          \label{using-cobjects}}
1354 \sectionauthor{Konrad Hinsen}{hinsen@cnrs-orleans.fr}
1355
1356 Many extension modules just provide new functions and types to be
1357 used from Python, but sometimes the code in an extension module can
1358 be useful for other extension modules. For example, an extension
1359 module could implement a type ``collection'' which works like lists
1360 without order. Just like the standard Python list type has a C API
1361 which permits extension modules to create and manipulate lists, this
1362 new collection type should have a set of C functions for direct
1363 manipulation from other extension modules.
1364
1365 At first sight this seems easy: just write the functions (without
1366 declaring them \keyword{static}, of course), provide an appropriate
1367 header file, and document the C API. And in fact this would work if
1368 all extension modules were always linked statically with the Python
1369 interpreter. When modules are used as shared libraries, however, the
1370 symbols defined in one module may not be visible to another module.
1371 The details of visibility depend on the operating system; some systems
1372 use one global namespace for the Python interpreter and all extension
1373 modules (e.g.\ Windows), whereas others require an explicit list of
1374 imported symbols at module link time (e.g.\ AIX), or offer a choice of
1375 different strategies (most Unices). And even if symbols are globally
1376 visible, the module whose functions one wishes to call might not have
1377 been loaded yet!
1378
1379 Portability therefore requires not to make any assumptions about
1380 symbol visibility. This means that all symbols in extension modules
1381 should be declared \keyword{static}, except for the module's
1382 initialization function, in order to avoid name clashes with other
1383 extension modules (as discussed in section~\ref{methodTable}). And it
1384 means that symbols that \emph{should} be accessible from other
1385 extension modules must be exported in a different way.
1386
1387 Python provides a special mechanism to pass C-level information (i.e.
1388 pointers) from one extension module to another one: CObjects.
1389 A CObject is a Python data type which stores a pointer (\ctype{void
1390 *}).  CObjects can only be created and accessed via their C API, but
1391 they can be passed around like any other Python object. In particular,
1392 they can be assigned to a name in an extension module's namespace.
1393 Other extension modules can then import this module, retrieve the
1394 value of this name, and then retrieve the pointer from the CObject.
1395
1396 There are many ways in which CObjects can be used to export the C API
1397 of an extension module. Each name could get its own CObject, or all C
1398 API pointers could be stored in an array whose address is published in
1399 a CObject. And the various tasks of storing and retrieving the pointers
1400 can be distributed in different ways between the module providing the
1401 code and the client modules.
1402
1403 The following example demonstrates an approach that puts most of the
1404 burden on the writer of the exporting module, which is appropriate
1405 for commonly used library modules. It stores all C API pointers
1406 (just one in the example!) in an array of \ctype{void} pointers which
1407 becomes the value of a CObject. The header file corresponding to
1408 the module provides a macro that takes care of importing the module
1409 and retrieving its C API pointers; client modules only have to call
1410 this macro before accessing the C API.
1411
1412 The exporting module is a modification of the \module{spam} module from
1413 section~\ref{simpleExample}. The function \function{spam.system()}
1414 does not call the C library function \cfunction{system()} directly,
1415 but a function \cfunction{PySpam_System()}, which would of course do
1416 something more complicated in reality (such as adding ``spam'' to
1417 every command). This function \cfunction{PySpam_System()} is also
1418 exported to other extension modules.
1419
1420 The function \cfunction{PySpam_System()} is a plain C function,
1421 declared \keyword{static} like everything else:
1422
1423 \begin{verbatim}
1424 static int
1425 PySpam_System(command)
1426     char *command;
1427 {
1428     return system(command);
1429 }
1430 \end{verbatim}
1431
1432 The function \cfunction{spam_system()} is modified in a trivial way:
1433
1434 \begin{verbatim}
1435 static PyObject *
1436 spam_system(self, args)
1437     PyObject *self;
1438     PyObject *args;
1439 {
1440     char *command;
1441     int sts;
1442
1443     if (!PyArg_ParseTuple(args, "s", &command))
1444         return NULL;
1445     sts = PySpam_System(command);
1446     return Py_BuildValue("i", sts);
1447 }
1448 \end{verbatim}
1449
1450 In the beginning of the module, right after the line
1451
1452 \begin{verbatim}
1453 #include "Python.h"
1454 \end{verbatim}
1455
1456 two more lines must be added:
1457
1458 \begin{verbatim}
1459 #define SPAM_MODULE
1460 #include "spammodule.h"
1461 \end{verbatim}
1462
1463 The \code{\#define} is used to tell the header file that it is being
1464 included in the exporting module, not a client module. Finally,
1465 the module's initialization function must take care of initializing
1466 the C API pointer array:
1467
1468 \begin{verbatim}
1469 void
1470 initspam()
1471 {
1472     PyObject *m, *d;
1473     static void *PySpam_API[PySpam_API_pointers];
1474     PyObject *c_api_object;
1475     m = Py_InitModule("spam", SpamMethods);
1476
1477     /* Initialize the C API pointer array */
1478     PySpam_API[PySpam_System_NUM] = (void *)PySpam_System;
1479
1480     /* Create a CObject containing the API pointer array's address */
1481     c_api_object = PyCObject_FromVoidPtr((void *)PySpam_API, NULL);
1482
1483     /* Create a name for this object in the module's namespace */
1484     d = PyModule_GetDict(m);
1485     PyDict_SetItemString(d, "_C_API", c_api_object);
1486 }
1487 \end{verbatim}
1488
1489 Note that \code{PySpam_API} is declared \code{static}; otherwise
1490 the pointer array would disappear when \code{initspam} terminates!
1491
1492 The bulk of the work is in the header file \file{spammodule.h},
1493 which looks like this:
1494
1495 \begin{verbatim}
1496 #ifndef Py_SPAMMODULE_H
1497 #define Py_SPAMMODULE_H
1498 #ifdef __cplusplus
1499 extern "C" {
1500 #endif
1501
1502 /* Header file for spammodule */
1503
1504 /* C API functions */
1505 #define PySpam_System_NUM 0
1506 #define PySpam_System_RETURN int
1507 #define PySpam_System_PROTO Py_PROTO((char *command))
1508
1509 /* Total number of C API pointers */
1510 #define PySpam_API_pointers 1
1511
1512
1513 #ifdef SPAM_MODULE
1514 /* This section is used when compiling spammodule.c */
1515
1516 static PySpam_System_RETURN PySpam_System PySpam_System_PROTO;
1517
1518 #else
1519 /* This section is used in modules that use spammodule's API */
1520
1521 static void **PySpam_API;
1522
1523 #define PySpam_System \
1524  (*(PySpam_System_RETURN (*)PySpam_System_PROTO) PySpam_API[PySpam_System_NUM])
1525
1526 #define import_spam() \
1527 { \
1528   PyObject *module = PyImport_ImportModule("spam"); \
1529   if (module != NULL) { \
1530     PyObject *module_dict = PyModule_GetDict(module); \
1531     PyObject *c_api_object = PyDict_GetItemString(module_dict, "_C_API"); \
1532     if (PyCObject_Check(c_api_object)) { \
1533       PySpam_API = (void **)PyCObject_AsVoidPtr(c_api_object); \
1534     } \
1535   } \
1536 }
1537
1538 #endif
1539
1540 #ifdef __cplusplus
1541 }
1542 #endif
1543
1544 #endif /* !defined(Py_SPAMMODULE_H */
1545 \end{verbatim}
1546
1547 All that a client module must do in order to have access to the
1548 function \cfunction{PySpam_System()} is to call the function (or
1549 rather macro) \cfunction{import_spam()} in its initialization
1550 function:
1551
1552 \begin{verbatim}
1553 void
1554 initclient()
1555 {
1556     PyObject *m;
1557
1558     Py_InitModule("client", ClientMethods);
1559     import_spam();
1560 }
1561 \end{verbatim}
1562
1563 The main disadvantage of this approach is that the file
1564 \file{spammodule.h} is rather complicated. However, the
1565 basic structure is the same for each function that is
1566 exported, so it has to be learned only once.
1567
1568 Finally it should be mentioned that CObjects offer additional
1569 functionality, which is especially useful for memory allocation and
1570 deallocation of the pointer stored in a CObject. The details
1571 are described in the \citetitle[../api/api.html]{Python/C API
1572 Reference Manual} in the section ``CObjects'' and in the
1573 implementation of CObjects (files \file{Include/cobject.h} and
1574 \file{Objects/cobject.c} in the Python source code distribution).
1575
1576
1577 \chapter{Building C and \Cpp{} Extensions on \UNIX{}
1578          \label{building-on-unix}}
1579
1580 \sectionauthor{Jim Fulton}{jim@Digicool.com}
1581
1582
1583 %The make file make file, building C extensions on Unix
1584
1585
1586 Starting in Python 1.4, Python provides a special make file for
1587 building make files for building dynamically-linked extensions and
1588 custom interpreters.  The make file make file builds a make file
1589 that reflects various system variables determined by configure when
1590 the Python interpreter was built, so people building module's don't
1591 have to resupply these settings.  This vastly simplifies the process
1592 of building extensions and custom interpreters on Unix systems.
1593
1594 The make file make file is distributed as the file
1595 \file{Misc/Makefile.pre.in} in the Python source distribution.  The
1596 first step in building extensions or custom interpreters is to copy
1597 this make file to a development directory containing extension module
1598 source.
1599
1600 The make file make file, \file{Makefile.pre.in} uses metadata
1601 provided in a file named \file{Setup}.  The format of the \file{Setup}
1602 file is the same as the \file{Setup} (or \file{Setup.in}) file
1603 provided in the \file{Modules/} directory of the Python source
1604 distribution.  The \file{Setup} file contains variable definitions:
1605
1606 \begin{verbatim}
1607 EC=/projects/ExtensionClass
1608 \end{verbatim}
1609
1610 and module description lines.  It can also contain blank lines and
1611 comment lines that start with \character{\#}.
1612
1613 A module description line includes a module name, source files,
1614 options, variable references, and other input files, such
1615 as libraries or object files.  Consider a simple example::
1616
1617 \begin{verbatim}
1618 ExtensionClass ExtensionClass.c
1619 \end{verbatim}
1620
1621 This is the simplest form of a module definition line.  It defines a
1622 module, \module{ExtensionClass}, which has a single source file,
1623 \file{ExtensionClass.c}.
1624
1625 This slightly more complex example uses an \strong{-I} option to
1626 specify an include directory:
1627
1628 \begin{verbatim}
1629 EC=/projects/ExtensionClass
1630 cPersistence cPersistence.c -I$(EC)
1631 \end{verbatim} % $ <-- bow to font lock
1632
1633 This example also illustrates the format for variable references.
1634
1635 For systems that support dynamic linking, the \file{Setup} file should
1636 begin:
1637
1638 \begin{verbatim}
1639 *shared*
1640 \end{verbatim}
1641
1642 to indicate that the modules defined in \file{Setup} are to be built
1643 as dynamically linked modules.  A line containing only \samp{*static*}
1644 can be used to indicate the subsequently listed modules should be
1645 statically linked.
1646
1647 Here is a complete \file{Setup} file for building a
1648 \module{cPersistent} module:
1649
1650 \begin{verbatim}
1651 # Set-up file to build the cPersistence module.
1652 # Note that the text should begin in the first column.
1653 *shared*
1654
1655 # We need the path to the directory containing the ExtensionClass
1656 # include file.
1657 EC=/projects/ExtensionClass
1658 cPersistence cPersistence.c -I$(EC)
1659 \end{verbatim} % $ <-- bow to font lock
1660
1661 After the \file{Setup} file has been created, \file{Makefile.pre.in}
1662 is run with the \samp{boot} target to create a make file:
1663
1664 \begin{verbatim}
1665 make -f Makefile.pre.in boot
1666 \end{verbatim}
1667
1668 This creates the file, Makefile.  To build the extensions, simply
1669 run the created make file:
1670
1671 \begin{verbatim}
1672 make
1673 \end{verbatim}
1674
1675 It's not necessary to re-run \file{Makefile.pre.in} if the
1676 \file{Setup} file is changed.  The make file automatically rebuilds
1677 itself if the \file{Setup} file changes.
1678
1679
1680 \section{Building Custom Interpreters \label{custom-interps}}
1681
1682 The make file built by \file{Makefile.pre.in} can be run with the
1683 \samp{static} target to build an interpreter:
1684
1685 \begin{verbatim}
1686 make static
1687 \end{verbatim}
1688
1689 Any modules defined in the Setup file before the \samp{*shared*} line
1690 will be statically linked into the interpreter.  Typically, a
1691 \samp{*shared*} line is omitted from the Setup file when a custom
1692 interpreter is desired.
1693
1694
1695 \section{Module Definition Options \label{module-defn-options}}
1696
1697 Several compiler options are supported:
1698
1699 \begin{tableii}{l|l}{}{Option}{Meaning}
1700   \lineii{-C}{Tell the C pre-processor not to discard comments}
1701   \lineii{-D\var{name}=\var{value}}{Define a macro}
1702   \lineii{-I\var{dir}}{Specify an include directory, \var{dir}}
1703   \lineii{-L\var{dir}}{Specify a link-time library directory, \var{dir}}
1704   \lineii{-R\var{dir}}{Specify a run-time library directory, \var{dir}}
1705   \lineii{-l\var{lib}}{Link a library, \var{lib}}
1706   \lineii{-U\var{name}}{Undefine a macro}
1707 \end{tableii}
1708
1709 Other compiler options can be included (snuck in) by putting them
1710 in variables.
1711
1712 Source files can include files with \file{.c}, \file{.C}, \file{.cc},
1713 \file{.cpp}, \file{.cxx}, and \file{.c++} extensions.
1714
1715 Other input files include files with \file{.a}, \file{.o}, \file{.sl},
1716 and \file{.so} extensions.
1717
1718
1719 \section{Example \label{module-defn-example}}
1720
1721 Here is a more complicated example from \file{Modules/Setup.in}:
1722
1723 \begin{verbatim}
1724 GMP=/ufs/guido/src/gmp
1725 mpz mpzmodule.c -I$(GMP) $(GMP)/libgmp.a
1726 \end{verbatim}
1727
1728 which could also be written as:
1729
1730 \begin{verbatim}
1731 mpz mpzmodule.c -I$(GMP) -L$(GMP) -lgmp
1732 \end{verbatim}
1733
1734
1735 \section{Distributing your extension modules
1736          \label{distributing}}
1737
1738 When distributing your extension modules in source form, make sure to
1739 include a \file{Setup} file.  The \file{Setup} file should be named
1740 \file{Setup.in} in the distribution.  The make file make file,
1741 \file{Makefile.pre.in}, will copy \file{Setup.in} to \file{Setup}.
1742 Distributing a \file{Setup.in} file makes it easy for people to
1743 customize the \file{Setup} file while keeping the original in
1744 \file{Setup.in}.
1745
1746 It is a good idea to include a copy of \file{Makefile.pre.in} for
1747 people who do not have a source distribution of Python.
1748
1749 Do not distribute a make file.  People building your modules
1750 should use \file{Makefile.pre.in} to build their own make file.  A
1751 \file{README} file included in the package should provide simple
1752 instructions to perform the build.
1753
1754 Work is being done to make building and installing Python extensions
1755 easier for all platforms; this work in likely to supplant the current
1756 approach at some point in the future.  For more information or to
1757 participate in the effort, refer to
1758 \url{http://www.python.org/sigs/distutils-sig/} on the Python Web
1759 site.
1760
1761
1762 \chapter{Building C and \Cpp{} Extensions on Windows
1763          \label{building-on-windows}}
1764
1765
1766 This chapter briefly explains how to create a Windows extension module
1767 for Python using Microsoft Visual \Cpp{}, and follows with more
1768 detailed background information on how it works.  The explanatory
1769 material is useful for both the Windows programmer learning to build
1770 Python extensions and the \UNIX{} programming interested in producing
1771 software which can be successfully built on both \UNIX{} and Windows.
1772
1773
1774 \section{A Cookbook Approach \label{win-cookbook}}
1775
1776 \sectionauthor{Neil Schemenauer}{neil_schemenauer@transcanada.com}
1777
1778 This section provides a recipe for building a Python extension on
1779 Windows.
1780
1781 Grab the binary installer from \url{http://www.python.org/} and
1782 install Python.  The binary installer has all of the required header
1783 files except for \file{config.h}.
1784
1785 Get the source distribution and extract it into a convenient location.
1786 Copy the \file{config.h} from the \file{PC/} directory into the
1787 \file{include/} directory created by the installer.
1788
1789 Create a \file{Setup} file for your extension module, as described in
1790 Chapter \ref{building-on-unix}.
1791
1792 Get David Ascher's \file{compile.py} script from
1793 \url{http://starship.python.net/crew/da/compile/}.  Run the script to
1794 create Microsoft Visual \Cpp{} project files.
1795
1796 Open the DSW file in V\Cpp{} and select \strong{Build}.
1797
1798 If your module creates a new type, you may have trouble with this line:
1799
1800 \begin{verbatim}
1801     PyObject_HEAD_INIT(&PyType_Type)
1802 \end{verbatim}
1803
1804 Change it to:
1805
1806 \begin{verbatim}
1807     PyObject_HEAD_INIT(NULL)
1808 \end{verbatim}
1809
1810 and add the following to the module initialization function:
1811
1812 \begin{verbatim}
1813     MyObject_Type.ob_type = &PyType_Type;
1814 \end{verbatim}
1815
1816 Refer to section 3 of the Python FAQ
1817 (\url{http://www.python.org/doc/FAQ.html}) for details on why you must
1818 do this.
1819
1820
1821 \section{Differences Between \UNIX{} and Windows
1822          \label{dynamic-linking}}
1823 \sectionauthor{Chris Phoenix}{cphoenix@best.com}
1824
1825
1826 \UNIX{} and Windows use completely different paradigms for run-time
1827 loading of code.  Before you try to build a module that can be
1828 dynamically loaded, be aware of how your system works.
1829
1830 In \UNIX{}, a shared object (.so) file contains code to be used by the
1831 program, and also the names of functions and data that it expects to
1832 find in the program.  When the file is joined to the program, all
1833 references to those functions and data in the file's code are changed
1834 to point to the actual locations in the program where the functions
1835 and data are placed in memory.  This is basically a link operation.
1836
1837 In Windows, a dynamic-link library (\file{.dll}) file has no dangling
1838 references.  Instead, an access to functions or data goes through a
1839 lookup table.  So the DLL code does not have to be fixed up at runtime
1840 to refer to the program's memory; instead, the code already uses the
1841 DLL's lookup table, and the lookup table is modified at runtime to
1842 point to the functions and data.
1843
1844 In \UNIX{}, there is only one type of library file (\file{.a}) which
1845 contains code from several object files (\file{.o}).  During the link
1846 step to create a shared object file (\file{.so}), the linker may find
1847 that it doesn't know where an identifier is defined.  The linker will
1848 look for it in the object files in the libraries; if it finds it, it
1849 will include all the code from that object file.
1850
1851 In Windows, there are two types of library, a static library and an
1852 import library (both called \file{.lib}).  A static library is like a
1853 \UNIX{} \file{.a} file; it contains code to be included as necessary.
1854 An import library is basically used only to reassure the linker that a
1855 certain identifier is legal, and will be present in the program when
1856 the DLL is loaded.  So the linker uses the information from the
1857 import library to build the lookup table for using identifiers that
1858 are not included in the DLL.  When an application or a DLL is linked,
1859 an import library may be generated, which will need to be used for all
1860 future DLLs that depend on the symbols in the application or DLL.
1861
1862 Suppose you are building two dynamic-load modules, B and C, which should
1863 share another block of code A.  On \UNIX{}, you would \emph{not} pass
1864 \file{A.a} to the linker for \file{B.so} and \file{C.so}; that would
1865 cause it to be included twice, so that B and C would each have their
1866 own copy.  In Windows, building \file{A.dll} will also build
1867 \file{A.lib}.  You \emph{do} pass \file{A.lib} to the linker for B and
1868 C.  \file{A.lib} does not contain code; it just contains information
1869 which will be used at runtime to access A's code.
1870
1871 In Windows, using an import library is sort of like using \samp{import
1872 spam}; it gives you access to spam's names, but does not create a
1873 separate copy.  On \UNIX{}, linking with a library is more like
1874 \samp{from spam import *}; it does create a separate copy.
1875
1876
1877 \section{Using DLLs in Practice \label{win-dlls}}
1878 \sectionauthor{Chris Phoenix}{cphoenix@best.com}
1879
1880 Windows Python is built in Microsoft Visual \Cpp{}; using other
1881 compilers may or may not work (though Borland seems to).  The rest of
1882 this section is MSV\Cpp{} specific.
1883
1884 When creating DLLs in Windows, you must pass \file{python15.lib} to
1885 the linker.  To build two DLLs, spam and ni (which uses C functions
1886 found in spam), you could use these commands:
1887
1888 \begin{verbatim}
1889 cl /LD /I/python/include spam.c ../libs/python15.lib
1890 cl /LD /I/python/include ni.c spam.lib ../libs/python15.lib
1891 \end{verbatim}
1892
1893 The first command created three files: \file{spam.obj},
1894 \file{spam.dll} and \file{spam.lib}.  \file{Spam.dll} does not contain
1895 any Python functions (such as \cfunction{PyArg_ParseTuple()}), but it
1896 does know how to find the Python code thanks to \file{python15.lib}.
1897
1898 The second command created \file{ni.dll} (and \file{.obj} and
1899 \file{.lib}), which knows how to find the necessary functions from
1900 spam, and also from the Python executable.
1901
1902 Not every identifier is exported to the lookup table.  If you want any
1903 other modules (including Python) to be able to see your identifiers,
1904 you have to say \samp{_declspec(dllexport)}, as in \samp{void
1905 _declspec(dllexport) initspam(void)} or \samp{PyObject
1906 _declspec(dllexport) *NiGetSpamData(void)}.
1907
1908 Developer Studio will throw in a lot of import libraries that you do
1909 not really need, adding about 100K to your executable.  To get rid of
1910 them, use the Project Settings dialog, Link tab, to specify
1911 \emph{ignore default libraries}.  Add the correct
1912 \file{msvcrt\var{xx}.lib} to the list of libraries.
1913
1914
1915 \chapter{Embedding Python in Another Application
1916          \label{embedding}}
1917
1918 Embedding Python is similar to extending it, but not quite.  The
1919 difference is that when you extend Python, the main program of the
1920 application is still the Python interpreter, while if you embed
1921 Python, the main program may have nothing to do with Python ---
1922 instead, some parts of the application occasionally call the Python
1923 interpreter to run some Python code.
1924
1925 So if you are embedding Python, you are providing your own main
1926 program.  One of the things this main program has to do is initialize
1927 the Python interpreter.  At the very least, you have to call the
1928 function \cfunction{Py_Initialize()}.  There are optional calls to
1929 pass command line arguments to Python.  Then later you can call the
1930 interpreter from any part of the application.
1931
1932 There are several different ways to call the interpreter: you can pass
1933 a string containing Python statements to
1934 \cfunction{PyRun_SimpleString()}, or you can pass a stdio file pointer
1935 and a file name (for identification in error messages only) to
1936 \cfunction{PyRun_SimpleFile()}.  You can also call the lower-level
1937 operations described in the previous chapters to construct and use
1938 Python objects.
1939
1940 A simple demo of embedding Python can be found in the directory
1941 \file{Demo/embed/} of the source distribution.
1942
1943
1944 \section{Embedding Python in \Cpp{}
1945          \label{embeddingInCplusplus}}
1946
1947 It is also possible to embed Python in a \Cpp{} program; precisely how this
1948 is done will depend on the details of the \Cpp{} system used; in general you
1949 will need to write the main program in \Cpp{}, and use the \Cpp{} compiler
1950 to compile and link your program.  There is no need to recompile Python
1951 itself using \Cpp{}.
1952
1953 \end{document}