This commit was manufactured by cvs2svn to create tag 'mac102'.
[python/dscho.git] / Doc / ext.tex
blobfbe05adde5aba128e2bfbcab95c938f47dbe164a
1 \documentstyle[twoside,11pt,myformat]{report}
3 \title{Extending and Embedding the Python Interpreter}
5 \author{
6 Guido van Rossum \\
7 Dept. CST, CWI, P.O. Box 94079 \\
8 1090 GB Amsterdam, The Netherlands \\
9 E-mail: {\tt guido@cwi.nl}
12 \date{4 May 1994 \\ Release 1.0.2} % XXX update before release!
14 % Tell \index to actually write the .idx file
15 \makeindex
17 \begin{document}
19 \pagenumbering{roman}
21 \maketitle
23 \begin{abstract}
25 \noindent
26 This document describes how to write modules in C or C++ to extend the
27 Python interpreter. It also describes how to use Python as an
28 `embedded' language, and how extension modules can be loaded
29 dynamically (at run time) into the interpreter, if the operating
30 system supports this feature.
32 \end{abstract}
34 \pagebreak
37 \parskip = 0mm
38 \tableofcontents
41 \pagebreak
43 \pagenumbering{arabic}
46 \chapter{Extending Python with C or C++ code}
49 \section{Introduction}
51 It is quite easy to add non-standard built-in modules to Python, if
52 you know how to program in C. A built-in module known to the Python
53 programmer as \code{foo} is generally implemented by a file called
54 \file{foomodule.c}. All but the two most essential standard built-in
55 modules also adhere to this convention, and in fact some of them form
56 excellent examples of how to create an extension.
58 Extension modules can do two things that can't be done directly in
59 Python: they can implement new data types (which are different from
60 classes by the way), and they can make system calls or call C library
61 functions. Since the latter is usually the most important reason for
62 adding an extension, I'll concentrate on adding `wrappers' around C
63 library functions; the concrete example uses the wrapper for
64 \code{system()} in module \code{posix}, found in (of course) the file
65 \file{Modules/posixmodule.c}.
67 Note: unless otherwise mentioned, all file references in this
68 document are relative to the toplevel directory of the Python
69 distribution --- i.e. the directory that contains the \file{configure}
70 script.
72 The compilation of an extension module depends on your system setup
73 and the intended use of the module; details are given in a later
74 section.
77 \section{A first look at the code}
79 It is important not to be impressed by the size and complexity of
80 the average extension module; much of this is straightforward
81 `boilerplate' code (starting right with the copyright notice)!
83 Let's skip the boilerplate and have a look at an interesting function
84 in \file{posixmodule.c} first:
86 \begin{verbatim}
87 static object *
88 posix_system(self, args)
89 object *self;
90 object *args;
92 char *command;
93 int sts;
94 if (!getargs(args, "s", &command))
95 return NULL;
96 sts = system(command);
97 return mkvalue("i", sts);
99 \end{verbatim}
101 This is the prototypical top-level function in an extension module.
102 It will be called (we'll see later how) when the Python program
103 executes statements like
105 \begin{verbatim}
106 >>> import posix
107 >>> sts = posix.system('ls -l')
108 \end{verbatim}
110 There is a straightforward translation from the arguments to the call
111 in Python (here the single expression \code{'ls -l'}) to the arguments that
112 are passed to the C function. The C function always has two
113 parameters, conventionally named \var{self} and \var{args}. The
114 \var{self} argument is used when the C function implements a builtin
115 method --- this is advanced material and not covered in this document.
116 In the example, \var{self} will always be a \code{NULL} pointer, since
117 we are defining a function, not a method (this is done so that the
118 interpreter doesn't have to understand two different types of C
119 functions).
121 The \var{args} parameter will be a pointer to a Python object, or
122 \code{NULL} if the Python function/method was called without
123 arguments. It is necessary to do full argument type checking on each
124 call, since otherwise the Python user would be able to cause the
125 Python interpreter to `dump core' by passing invalid arguments to a
126 function in an extension module. Because argument checking and
127 converting arguments to C are such common tasks, there's a general
128 function in the Python interpreter that combines them:
129 \code{getargs()}. It uses a template string to determine both the
130 types of the Python argument and the types of the C variables into
131 which it should store the converted values.\footnote{There are
132 convenience macros \code{getnoarg()}, \code{getstrarg()},
133 \code{getintarg()}, etc., for many common forms of \code{getargs()}
134 templates. These are relics from the past; the recommended practice
135 is to call \code{getargs()} directly.} (More about this later.)
137 If \code{getargs()} returns nonzero, the argument list has the right
138 type and its components have been stored in the variables whose
139 addresses are passed. If it returns zero, an error has occurred. In
140 the latter case it has already raised an appropriate exception by so
141 the calling function should return \code{NULL} immediately --- see the
142 next section.
145 \section{Intermezzo: errors and exceptions}
147 An important convention throughout the Python interpreter is the
148 following: when a function fails, it should set an exception condition
149 and return an error value (often a \code{NULL} pointer). Exceptions
150 are stored in a static global variable in \file{Python/errors.c}; if
151 this variable is \code{NULL} no exception has occurred. A second
152 static global variable stores the `associated value' of the exception
153 --- the second argument to \code{raise}.
155 The file \file{errors.h} declares a host of functions to set various
156 types of exceptions. The most common one is \code{err_setstr()} ---
157 its arguments are an exception object (e.g. \code{RuntimeError} ---
158 actually it can be any string object) and a C string indicating the
159 cause of the error (this is converted to a string object and stored as
160 the `associated value' of the exception). Another useful function is
161 \code{err_errno()}, which only takes an exception argument and
162 constructs the associated value by inspection of the (UNIX) global
163 variable errno. The most general function is \code{err_set()}, which
164 takes two object arguments, the exception and its associated value.
165 You don't need to \code{INCREF()} the objects passed to any of these
166 functions.
168 You can test non-destructively whether an exception has been set with
169 \code{err_occurred()}. However, most code never calls
170 \code{err_occurred()} to see whether an error occurred or not, but
171 relies on error return values from the functions it calls instead.
173 When a function that calls another function detects that the called
174 function fails, it should return an error value (e.g. \code{NULL} or
175 \code{-1}) but not call one of the \code{err_*} functions --- one has
176 already been called. The caller is then supposed to also return an
177 error indication to {\em its} caller, again {\em without} calling
178 \code{err_*()}, and so on --- the most detailed cause of the error was
179 already reported by the function that first detected it. Once the
180 error has reached Python's interpreter main loop, this aborts the
181 currently executing Python code and tries to find an exception handler
182 specified by the Python programmer.
184 (There are situations where a module can actually give a more detailed
185 error message by calling another \code{err_*} function, and in such
186 cases it is fine to do so. As a general rule, however, this is not
187 necessary, and can cause information about the cause of the error to
188 be lost: most operations can fail for a variety of reasons.)
190 To ignore an exception set by a function call that failed, the
191 exception condition must be cleared explicitly by calling
192 \code{err_clear()}. The only time C code should call
193 \code{err_clear()} is if it doesn't want to pass the error on to the
194 interpreter but wants to handle it completely by itself (e.g. by
195 trying something else or pretending nothing happened).
197 Finally, the function \code{err_get()} gives you both error variables
198 {\em and clears them}. Note that even if an error occurred the second
199 one may be \code{NULL}. You have to \code{XDECREF()} both when you
200 are finished with them. I doubt you will need to use this function.
202 Note that a failing \code{malloc()} call must also be turned into an
203 exception --- the direct caller of \code{malloc()} (or
204 \code{realloc()}) must call \code{err_nomem()} and return a failure
205 indicator itself. All the object-creating functions
206 (\code{newintobject()} etc.) already do this, so only if you call
207 \code{malloc()} directly this note is of importance.
209 Also note that, with the important exception of \code{getargs()},
210 functions that return an integer status usually return \code{0} or a
211 positive value for success and \code{-1} for failure.
213 Finally, be careful about cleaning up garbage (making \code{XDECREF()}
214 or \code{DECREF()} calls for objects you have already created) when
215 you return an error!
217 The choice of which exception to raise is entirely yours. There are
218 predeclared C objects corresponding to all built-in Python exceptions,
219 e.g. \code{ZeroDevisionError} which you can use directly. Of course,
220 you should chose exceptions wisely --- don't use \code{TypeError} to
221 mean that a file couldn't be opened (that should probably be
222 \code{IOError}). If anything's wrong with the argument list the
223 \code{getargs()} function raises \code{TypeError}. If you have an
224 argument whose value which must be in a particular range or must
225 satisfy other conditions, \code{ValueError} is appropriate.
227 You can also define a new exception that is unique to your module.
228 For this, you usually declare a static object variable at the
229 beginning of your file, e.g.
231 \begin{verbatim}
232 static object *FooError;
233 \end{verbatim}
235 and initialize it in your module's initialization function
236 (\code{initfoo()}) with a string object, e.g. (leaving out the error
237 checking for simplicity):
239 \begin{verbatim}
240 void
241 initfoo()
243 object *m, *d;
244 m = initmodule("foo", foo_methods);
245 d = getmoduledict(m);
246 FooError = newstringobject("foo.error");
247 dictinsert(d, "error", FooError);
249 \end{verbatim}
252 \section{Back to the example}
254 Going back to \code{posix_system()}, you should now be able to
255 understand this bit:
257 \begin{verbatim}
258 if (!getargs(args, "s", &command))
259 return NULL;
260 \end{verbatim}
262 It returns \code{NULL} (the error indicator for functions of this
263 kind) if an error is detected in the argument list, relying on the
264 exception set by \code{getargs()}. Otherwise the string value of the
265 argument has been copied to the local variable \code{command} --- this
266 is in fact just a pointer assignment and you are not supposed to
267 modify the string to which it points.
269 If a function is called with multiple arguments, the argument list
270 (the argument \code{args}) is turned into a tuple. If it is called
271 without arguments, \code{args} is \code{NULL}. \code{getargs()} knows
272 about this; see later.
274 The next statement in \code{posix_system()} is a call to the C library
275 function \code{system()}, passing it the string we just got from
276 \code{getargs()}:
278 \begin{verbatim}
279 sts = system(command);
280 \end{verbatim}
282 Finally, \code{posix.system()} must return a value: the integer status
283 returned by the C library \code{system()} function. This is done
284 using the function \code{mkvalue()}, which is something like the
285 inverse of \code{getargs()}: it takes a format string and a variable
286 number of C values and returns a new Python object.
288 \begin{verbatim}
289 return mkvalue("i", sts);
290 \end{verbatim}
292 In this case, it returns an integer object (yes, even integers are
293 objects on the heap in Python!). More info on \code{mkvalue()} is
294 given later.
296 If you had a function that returned no useful argument (a.k.a. a
297 procedure), you would need this idiom:
299 \begin{verbatim}
300 INCREF(None);
301 return None;
302 \end{verbatim}
304 \code{None} is a unique Python object representing `no value'. It
305 differs from \code{NULL}, which means `error' in most contexts.
308 \section{The module's function table}
310 I promised to show how I made the function \code{posix_system()}
311 callable from Python programs. This is shown later in
312 \file{Modules/posixmodule.c}:
314 \begin{verbatim}
315 static struct methodlist posix_methods[] = {
317 {"system", posix_system},
319 {NULL, NULL} /* Sentinel */
322 void
323 initposix()
325 (void) initmodule("posix", posix_methods);
327 \end{verbatim}
329 (The actual \code{initposix()} is somewhat more complicated, but many
330 extension modules can be as simple as shown here.) When the Python
331 program first imports module \code{posix}, \code{initposix()} is
332 called, which calls \code{initmodule()} with specific parameters.
333 This creates a `module object' (which is inserted in the table
334 \code{sys.modules} under the key \code{'posix'}), and adds
335 built-in-function objects to the newly created module based upon the
336 table (of type struct methodlist) that was passed as its second
337 parameter. The function \code{initmodule()} returns a pointer to the
338 module object that it creates (which is unused here). It aborts with
339 a fatal error if the module could not be initialized satisfactorily,
340 so you don't need to check for errors.
343 \section{Compilation and linkage}
345 There are two more things to do before you can use your new extension
346 module: compiling and linking it with the Python system. If you use
347 dynamic loading, the details depend on the style of dynamic loading
348 your system uses; see the chapter on Dynamic Loading for more info
349 about this.
351 If you can't use dynamic loading, or if you want to make your module a
352 permanent part of the Python interpreter, you will have to change the
353 configuration setup and rebuild the interpreter. Luckily, in the 1.0
354 release this is very simple: just place your file (named
355 \file{foomodule.c} for example) in the \file{Modules} directory, add a
356 line to the file \file{Modules/Setup} describing your file:
358 \begin{verbatim}
359 foo foomodule.o
360 \end{verbatim}
362 and rebuild the interpreter by running \code{make} in the toplevel
363 directory. You can also run \code{make} in the \file{Modules}
364 subdirectory, but then you must first rebuilt the \file{Makefile}
365 there by running \code{make Makefile}. (This is necessary each time
366 you change the \file{Setup} file.)
369 \section{Calling Python functions from C}
371 So far we have concentrated on making C functions callable from
372 Python. The reverse is also useful: calling Python functions from C.
373 This is especially the case for libraries that support so-called
374 `callback' functions. If a C interface makes use of callbacks, the
375 equivalent Python often needs to provide a callback mechanism to the
376 Python programmer; the implementation will require calling the Python
377 callback functions from a C callback. Other uses are also imaginable.
379 Fortunately, the Python interpreter is easily called recursively, and
380 there is a standard interface to call a Python function. (I won't
381 dwell on how to call the Python parser with a particular string as
382 input --- if you're interested, have a look at the implementation of
383 the \samp{-c} command line option in \file{Python/pythonmain.c}.)
385 Calling a Python function is easy. First, the Python program must
386 somehow pass you the Python function object. You should provide a
387 function (or some other interface) to do this. When this function is
388 called, save a pointer to the Python function object (be careful to
389 \code{INCREF()} it!) in a global variable --- or whereever you see fit.
390 For example, the following function might be part of a module
391 definition:
393 \begin{verbatim}
394 static object *my_callback = NULL;
396 static object *
397 my_set_callback(dummy, arg)
398 object *dummy, *arg;
400 XDECREF(my_callback); /* Dispose of previous callback */
401 my_callback = arg;
402 XINCREF(my_callback); /* Remember new callback */
403 /* Boilerplate for "void" return */
404 INCREF(None);
405 return None;
407 \end{verbatim}
409 This particular function doesn't do any typechecking on its argument
410 --- that will be done by \code{call_object()}, which is a bit late but
411 at least protects the Python interpreter from shooting itself in its
412 foot. (The problem with typechecking functions is that there are at
413 least five different Python object types that can be called, so the
414 test would be somewhat cumbersome.)
416 The macros \code{XINCREF()} and \code{XDECREF()} increment/decrement
417 the reference count of an object and are safe in the presence of
418 \code{NULL} pointers. More info on them in the section on Reference
419 Counts below.
421 Later, when it is time to call the function, you call the C function
422 \code{call_object()}. This function has two arguments, both pointers
423 to arbitrary Python objects: the Python function, and the argument
424 list. The argument list must always be a tuple object, whose length
425 is the number of arguments. To call the Python function with no
426 arguments, you must pass an empty tuple. For example:
428 \begin{verbatim}
429 object *arglist;
430 object *result;
432 /* Time to call the callback */
433 arglist = mktuple(0);
434 result = call_object(my_callback, arglist);
435 DECREF(arglist);
436 \end{verbatim}
438 \code{call_object()} returns a Python object pointer: this is
439 the return value of the Python function. \code{call_object()} is
440 `reference-count-neutral' with respect to its arguments. In the
441 example a new tuple was created to serve as the argument list, which
442 is \code{DECREF()}-ed immediately after the call.
444 The return value of \code{call_object()} is `new': either it is a
445 brand new object, or it is an existing object whose reference count
446 has been incremented. So, unless you want to save it in a global
447 variable, you should somehow \code{DECREF()} the result, even
448 (especially!) if you are not interested in its value.
450 Before you do this, however, it is important to check that the return
451 value isn't \code{NULL}. If it is, the Python function terminated by raising
452 an exception. If the C code that called \code{call_object()} is
453 called from Python, it should now return an error indication to its
454 Python caller, so the interpreter can print a stack trace, or the
455 calling Python code can handle the exception. If this is not possible
456 or desirable, the exception should be cleared by calling
457 \code{err_clear()}. For example:
459 \begin{verbatim}
460 if (result == NULL)
461 return NULL; /* Pass error back */
462 /* Here maybe use the result */
463 DECREF(result);
464 \end{verbatim}
466 Depending on the desired interface to the Python callback function,
467 you may also have to provide an argument list to \code{call_object()}.
468 In some cases the argument list is also provided by the Python
469 program, through the same interface that specified the callback
470 function. It can then be saved and used in the same manner as the
471 function object. In other cases, you may have to construct a new
472 tuple to pass as the argument list. The simplest way to do this is to
473 call \code{mkvalue()}. For example, if you want to pass an integral
474 event code, you might use the following code:
476 \begin{verbatim}
477 object *arglist;
479 arglist = mkvalue("(l)", eventcode);
480 result = call_object(my_callback, arglist);
481 DECREF(arglist);
482 if (result == NULL)
483 return NULL; /* Pass error back */
484 /* Here maybe use the result */
485 DECREF(result);
486 \end{verbatim}
488 Note the placement of DECREF(argument) immediately after the call,
489 before the error check! Also note that strictly spoken this code is
490 not complete: \code{mkvalue()} may run out of memory, and this should
491 be checked.
494 \section{Format strings for {\tt getargs()}}
496 The \code{getargs()} function is declared in \file{modsupport.h} as
497 follows:
499 \begin{verbatim}
500 int getargs(object *arg, char *format, ...);
501 \end{verbatim}
503 The remaining arguments must be addresses of variables whose type is
504 determined by the format string. For the conversion to succeed, the
505 \var{arg} object must match the format and the format must be exhausted.
506 Note that while \code{getargs()} checks that the Python object really
507 is of the specified type, it cannot check the validity of the
508 addresses of C variables provided in the call: if you make mistakes
509 there, your code will probably dump core.
511 A non-empty format string consists of a single `format unit'. A
512 format unit describes one Python object; it is usually a single
513 character or a parenthesized sequence of format units. The type of a
514 format units is determined from its first character, the `format
515 letter':
517 \begin{description}
519 \item[\samp{s} (string)]
520 The Python object must be a string object. The C argument must be a
521 \code{(char**)} (i.e. the address of a character pointer), and a pointer
522 to the C string contained in the Python object is stored into it. You
523 must not provide storage to store the string; a pointer to an existing
524 string is stored into the character pointer variable whose address you
525 pass. If the next character in the format string is \samp{\#},
526 another C argument of type \code{(int*)} must be present, and the
527 length of the Python string (not counting the trailing zero byte) is
528 stored into it.
530 \item[\samp{z} (string or zero, i.e. \code{NULL})]
531 Like \samp{s}, but the object may also be None. In this case the
532 string pointer is set to \code{NULL} and if a \samp{\#} is present the
533 size is set to 0.
535 \item[\samp{b} (byte, i.e. char interpreted as tiny int)]
536 The object must be a Python integer. The C argument must be a
537 \code{(char*)}.
539 \item[\samp{h} (half, i.e. short)]
540 The object must be a Python integer. The C argument must be a
541 \code{(short*)}.
543 \item[\samp{i} (int)]
544 The object must be a Python integer. The C argument must be an
545 \code{(int*)}.
547 \item[\samp{l} (long)]
548 The object must be a (plain!) Python integer. The C argument must be
549 a \code{(long*)}.
551 \item[\samp{c} (char)]
552 The Python object must be a string of length 1. The C argument must
553 be a \code{(char*)}. (Don't pass an \code{(int*)}!)
555 \item[\samp{f} (float)]
556 The object must be a Python int or float. The C argument must be a
557 \code{(float*)}.
559 \item[\samp{d} (double)]
560 The object must be a Python int or float. The C argument must be a
561 \code{(double*)}.
563 \item[\samp{S} (string object)]
564 The object must be a Python string. The C argument must be an
565 \code{(object**)} (i.e. the address of an object pointer). The C
566 program thus gets back the actual string object that was passed, not
567 just a pointer to its array of characters and its size as for format
568 character \samp{s}. The reference count of the object has not been
569 increased.
571 \item[\samp{O} (object)]
572 The object can be any Python object, including None, but not
573 \code{NULL}. The C argument must be an \code{(object**)}. This can be
574 used if an argument list must contain objects of a type for which no
575 format letter exist: the caller must then check that it has the right
576 type. The reference count of the object has not been increased.
578 \item[\samp{(} (tuple)]
579 The object must be a Python tuple. Following the \samp{(} character
580 in the format string must come a number of format units describing the
581 elements of the tuple, followed by a \samp{)} character. Tuple
582 format units may be nested. (There are no exceptions for empty and
583 singleton tuples; \samp{()} specifies an empty tuple and \samp{(i)} a
584 singleton of one integer. Normally you don't want to use the latter,
585 since it is hard for the Python user to specify.
587 \end{description}
589 More format characters will probably be added as the need arises. It
590 should (but currently isn't) be allowed to use Python long integers
591 whereever integers are expected, and perform a range check. (A range
592 check is in fact always necessary for the \samp{b}, \samp{h} and
593 \samp{i} format letters, but this is currently not implemented.)
595 Some example calls:
597 \begin{verbatim}
598 int ok;
599 int i, j;
600 long k, l;
601 char *s;
602 int size;
604 ok = getargs(args, ""); /* No arguments */
605 /* Python call: f() */
607 ok = getargs(args, "s", &s); /* A string */
608 /* Possible Python call: f('whoops!') */
610 ok = getargs(args, "(lls)", &k, &l, &s); /* Two longs and a string */
611 /* Possible Python call: f(1, 2, 'three') */
613 ok = getargs(args, "((ii)s#)", &i, &j, &s, &size);
614 /* A pair of ints and a string, whose size is also returned */
615 /* Possible Python call: f(1, 2, 'three') */
618 int left, top, right, bottom, h, v;
619 ok = getargs(args, "(((ii)(ii))(ii))",
620 &left, &top, &right, &bottom, &h, &v);
621 /* A rectangle and a point */
622 /* Possible Python call:
623 f( ((0, 0), (400, 300)), (10, 10)) */
625 \end{verbatim}
627 Note that the `top level' of a non-empty format string must consist of
628 a single unit; strings like \samp{is} and \samp{(ii)s\#} are not valid
629 format strings. (But \samp{s\#} is.) If you have multiple arguments,
630 the format must therefore always be enclosed in parentheses, as in the
631 examples \samp{((ii)s\#)} and \samp{(((ii)(ii))(ii)}. (The current
632 implementation does not complain when more than one unparenthesized
633 format unit is given. Sorry.)
635 The \code{getargs()} function does not support variable-length
636 argument lists. In simple cases you can fake these by trying several
637 calls to
638 \code{getargs()} until one succeeds, but you must take care to call
639 \code{err_clear()} before each retry. For example:
641 \begin{verbatim}
642 static object *my_method(self, args) object *self, *args; {
643 int i, j, k;
645 if (getargs(args, "(ii)", &i, &j)) {
646 k = 0; /* Use default third argument */
648 else {
649 err_clear();
650 if (!getargs(args, "(iii)", &i, &j, &k))
651 return NULL;
653 /* ... use i, j and k here ... */
654 INCREF(None);
655 return None;
657 \end{verbatim}
659 (It is possible to think of an extension to the definition of format
660 strings to accommodate this directly, e.g. placing a \samp{|} in a
661 tuple might specify that the remaining arguments are optional.
662 \code{getargs()} should then return one more than the number of
663 variables stored into.)
665 Advanced users note: If you set the `varargs' flag in the method list
666 for a function, the argument will always be a tuple (the `raw argument
667 list'). In this case you must enclose single and empty argument lists
668 in parentheses, e.g. \samp{(s)} and \samp{()}.
671 \section{The {\tt mkvalue()} function}
673 This function is the counterpart to \code{getargs()}. It is declared
674 in \file{Include/modsupport.h} as follows:
676 \begin{verbatim}
677 object *mkvalue(char *format, ...);
678 \end{verbatim}
680 It supports exactly the same format letters as \code{getargs()}, but
681 the arguments (which are input to the function, not output) must not
682 be pointers, just values. If a byte, short or float is passed to a
683 varargs function, it is widened by the compiler to int or double, so
684 \samp{b} and \samp{h} are treated as \samp{i} and \samp{f} is
685 treated as \samp{d}. \samp{S} is treated as \samp{O}, \samp{s} is
686 treated as \samp{z}. \samp{z\#} and \samp{s\#} are supported: a
687 second argument specifies the length of the data (negative means use
688 \code{strlen()}). \samp{S} and \samp{O} add a reference to their
689 argument (so you should \code{DECREF()} it if you've just created it
690 and aren't going to use it again).
692 If the argument for \samp{O} or \samp{S} is a \code{NULL} pointer, it is
693 assumed that this was caused because the call producing the argument
694 found an error and set an exception. Therefore, \code{mkvalue()} will
695 return \code{NULL} but won't set an exception if one is already set.
696 If no exception is set, \code{SystemError} is set.
698 If there is an error in the format string, the \code{SystemError}
699 exception is set, since it is the calling C code's fault, not that of
700 the Python user who sees the exception.
702 Example:
704 \begin{verbatim}
705 return mkvalue("(ii)", 0, 0);
706 \end{verbatim}
708 returns a tuple containing two zeros. (Outer parentheses in the
709 format string are actually superfluous, but you can use them for
710 compatibility with \code{getargs()}, which requires them if more than
711 one argument is expected.)
714 \section{Reference counts}
716 Here's a useful explanation of \code{INCREF()} and \code{DECREF()}
717 (after an original by Sjoerd Mullender).
719 Use \code{XINCREF()} or \code{XDECREF()} instead of \code{INCREF()} or
720 \code{DECREF()} when the argument may be \code{NULL} --- the versions
721 without \samp{X} are faster but wull dump core when they encounter a
722 \code{NULL} pointer.
724 The basic idea is, if you create an extra reference to an object, you
725 must \code{INCREF()} it, if you throw away a reference to an object,
726 you must \code{DECREF()} it. Functions such as
727 \code{newstringobject()}, \code{newsizedstringobject()},
728 \code{newintobject()}, etc. create a reference to an object. If you
729 want to throw away the object thus created, you must use
730 \code{DECREF()}.
732 If you put an object into a tuple or list using \code{settupleitem()}
733 or \code{setlistitem()}, the idea is that you usually don't want to
734 keep a reference of your own around, so Python does not
735 \code{INCREF()} the elements. It does \code{DECREF()} the old value.
736 This means that if you put something into such an object using the
737 functions Python provides for this, you must \code{INCREF()} the
738 object if you also want to keep a separate reference to the object around.
739 Also, if you replace an element, you should \code{INCREF()} the old
740 element first if you want to keep it. If you didn't \code{INCREF()}
741 it before you replaced it, you are not allowed to look at it anymore,
742 since it may have been freed.
744 Returning an object to Python (i.e. when your C function returns)
745 creates a reference to an object, but it does not change the reference
746 count. When your code does not keep another reference to the object,
747 you should not \code{INCREF()} or \code{DECREF()} it (assuming it is a
748 newly created object). When you do keep a reference around, you
749 should \code{INCREF()} the object. Also, when you return a global
750 object such as \code{None}, you should \code{INCREF()} it.
752 If you want to return a tuple, you should consider using
753 \code{mkvalue()}. This function creates a new tuple with a reference
754 count of 1 which you can return. If any of the elements you put into
755 the tuple are objects (format codes \samp{O} or \samp{S}), they
756 are \code{INCREF()}'ed by \code{mkvalue()}. If you don't want to keep
757 references to those elements around, you should \code{DECREF()} them
758 after having called \code{mkvalue()}.
760 Usually you don't have to worry about arguments. They are
761 \code{INCREF()}'ed before your function is called and
762 \code{DECREF()}'ed after your function returns. When you keep a
763 reference to an argument, you should \code{INCREF()} it and
764 \code{DECREF()} when you throw it away. Also, when you return an
765 argument, you should \code{INCREF()} it, because returning the
766 argument creates an extra reference to it.
768 If you use \code{getargs()} to parse the arguments, you can get a
769 reference to an object (by using \samp{O} in the format string). This
770 object was not \code{INCREF()}'ed, so you should not \code{DECREF()}
771 it. If you want to keep the object, you must \code{INCREF()} it
772 yourself.
774 If you create your own type of objects, you should use \code{NEWOBJ()}
775 to create the object. This sets the reference count to 1. If you
776 want to throw away the object, you should use \code{DECREF()}. When
777 the reference count reaches zero, your type's \code{dealloc()}
778 function is called. In it, you should \code{DECREF()} all object to
779 which you keep references in your object, but you should not use
780 \code{DECREF()} on your object. You should use \code{DEL()} instead.
783 \section{Writing extensions in C++}
785 It is possible to write extension modules in C++. Some restrictions
786 apply: since the main program (the Python interpreter) is compiled and
787 linked by the C compiler, global or static objects with constructors
788 cannot be used. All functions that will be called directly or
789 indirectly (i.e. via function pointers) by the Python interpreter will
790 have to be declared using \code{extern "C"}; this applies to all
791 `methods' as well as to the module's initialization function.
792 It is unnecessary to enclose the Python header files in
793 \code{extern "C" \{...\}} --- they do this already.
796 \chapter{Embedding Python in another application}
798 Embedding Python is similar to extending it, but not quite. The
799 difference is that when you extend Python, the main program of the
800 application is still the Python interpreter, while of you embed
801 Python, the main program may have nothing to do with Python ---
802 instead, some parts of the application occasionally call the Python
803 interpreter to run some Python code.
805 So if you are embedding Python, you are providing your own main
806 program. One of the things this main program has to do is initialize
807 the Python interpreter. At the very least, you have to call the
808 function \code{initall()}. There are optional calls to pass command
809 line arguments to Python. Then later you can call the interpreter
810 from any part of the application.
812 There are several different ways to call the interpreter: you can pass
813 a string containing Python statements to \code{run_command()}, or you
814 can pass a stdio file pointer and a file name (for identification in
815 error messages only) to \code{run_script()}. You can also call the
816 lower-level operations described in the previous chapters to construct
817 and use Python objects.
819 A simple demo of embedding Python can be found in the directory
820 \file{Demo/embed}.
823 \section{Embedding Python in C++}
825 It is also possible to embed Python in a C++ program; how this is done
826 exactly will depend on the details of the C++ system used; in general
827 you will need to write the main program in C++, and use the C++
828 compiler to compile and link your program. There is no need to
829 recompile Python itself with C++.
832 \chapter{Dynamic Loading}
834 On most modern systems it is possible to configure Python to support
835 dynamic loading of extension modules implemented in C. When shared
836 libraries are used dynamic loading is configured automatically;
837 otherwise you have to select it as a build option (see below). Once
838 configured, dynamic loading is trivial to use: when a Python program
839 executes \code{import foo}, the search for modules tries to find a
840 file \file{foomodule.o} (\file{foomodule.so} when using shared
841 libraries) in the module search path, and if one is found, it is
842 loaded into the executing binary and executed. Once loaded, the
843 module acts just like a built-in extension module.
845 The advantages of dynamic loading are twofold: the `core' Python
846 binary gets smaller, and users can extend Python with their own
847 modules implemented in C without having to build and maintain their
848 own copy of the Python interpreter. There are also disadvantages:
849 dynamic loading isn't available on all systems (this just means that
850 on some systems you have to use static loading), and dynamically
851 loading a module that was compiled for a different version of Python
852 (e.g. with a different representation of objects) may dump core.
855 \section{Configuring and building the interpreter for dynamic loading}
857 There are three styles of dynamic loading: one using shared libraries,
858 one using SGI IRIX 4 dynamic loading, and one using GNU dynamic
859 loading.
861 \subsection{Shared libraries}
863 The following systems supports dynamic loading using shared libraries:
864 SunOS 4; Solaris 2; SGI IRIX 5 (but not SGI IRIX 4!); and probably all
865 systems derived from SVR4, or at least those SVR4 derivatives that
866 support shared libraries (are there any that don't?).
868 You don't need to do anything to configure dynamic loading on these
869 systems --- the \file{configure} detects the presence of the
870 \file{<dlfcn.h>} header file and automatically configures dynamic
871 loading.
873 \subsection{SGI dynamic loading}
875 Only SGI IRIX 4 supports dynamic loading of modules using SGI dynamic
876 loading. (SGI IRIX 5 might also support it but it is inferior to
877 using shared libraries so there is no reason to; a small test didn't
878 work right away so I gave up trying to support it.)
880 Before you build Python, you first need to fetch and build the \code{dl}
881 package written by Jack Jansen. This is available by anonymous ftp
882 from host \file{ftp.cwi.nl}, directory \file{pub/dynload}, file
883 \file{dl-1.6.tar.Z}. (The version number may change.) Follow the
884 instructions in the package's \file{README} file to build it.
886 Once you have built \code{dl}, you can configure Python to use it. To
887 this end, you run the \file{configure} script with the option
888 \code{--with-dl=\var{directory}} where \var{directory} is the absolute
889 pathname of the \code{dl} directory.
891 Now build and install Python as you normally would (see the
892 \file{README} file in the toplevel Python directory.)
894 \subsection{GNU dynamic loading}
896 GNU dynamic loading supports (according to its \file{README} file) the
897 following hardware and software combinations: VAX (Ultrix), Sun 3
898 (SunOS 3.4 and 4.0), Sparc (SunOS 4.0), Sequent Symmetry (Dynix), and
899 Atari ST. There is no reason to use it on a Sparc; I haven't seen a
900 Sun 3 for years so I don't know if these have shared libraries or not.
902 You need to fetch and build two packages. One is GNU DLD 3.2.3,
903 available by anonymous ftp from host \file{ftp.cwi.nl}, directory
904 \file{pub/dynload}, file \file{dld-3.2.3.tar.Z}. (As far as I know,
905 no further development on GNU DLD is being done.) The other is an
906 emulation of Jack Jansen's \code{dl} package that I wrote on top of
907 GNU DLD 3.2.3. This is available from the same host and directory,
908 file dl-dld-1.1.tar.Z. (The version number may change --- but I doubt
909 it will.) Follow the instructions in each package's \file{README}
910 file to configure build them.
912 Now configure Python. Run the \file{configure} script with the option
913 \code{--with-dl-dld=\var{dl-directory},\var{dld-directory}} where
914 \var{dl-directory} is the absolute pathname of the directory where you
915 have built the \file{dl-dld} package, and \var{dld-directory} is that
916 of the GNU DLD package. The Python interpreter you build hereafter
917 will support GNU dynamic loading.
920 \section{Building a dynamically loadable module}
922 Since there are three styles of dynamic loading, there are also three
923 groups of instructions for building a dynamically loadable module.
924 Instructions common for all three styles are given first. Assuming
925 your module is called \code{foo}, the source filename must be
926 \file{foomodule.c}, so the object name is \file{foomodule.o}. The
927 module must be written as a normal Python extension module (as
928 described earlier).
930 Note that in all cases you will have to create your own Makefile that
931 compiles your module file(s). This Makefile will have to pass two
932 \samp{-I} arguments to the C compiler which will make it find the
933 Python header files. If the Make variable \var{PYTHONTOP} points to
934 the toplevel Python directory, your \var{CFLAGS} Make variable should
935 contain the options \samp{-I\$(PYTHONTOP) -I\$(PYTHONTOP)/Include}.
936 (Most header files are in the \file{Include} subdirectory, but the
937 \file{config.h} header lives in the toplevel directory.) You must
938 also add \samp{-DHAVE_CONFIG_H} to the definition of \var{CFLAGS} to
939 direct the Python headers to include \file{config.h}.
942 \subsection{Shared libraries}
944 You must link the \samp{.o} file to produce a shared library. This is
945 done using a special invocation of the \UNIX{} loader/linker, {\em
946 ld}(1). Unfortunately the invocation differs slightly per system.
948 On SunOS 4, use
949 \begin{verbatim}
950 ld foomodule.o -o foomodule.so
951 \end{verbatim}
953 On Solaris 2, use
954 \begin{verbatim}
955 ld -G foomodule.o -o foomodule.so
956 \end{verbatim}
958 On SGI IRIX 5, use
959 \begin{verbatim}
960 ld -shared foomodule.o -o foomodule.so
961 \end{verbatim}
963 On other systems, consult the manual page for {\em ld}(1) to find what
964 flags, if any, must be used.
966 If your extension module uses system libraries that haven't already
967 been linked with Python (e.g. a windowing system), these must be
968 passed to the {\em ld} command as \samp{-l} options after the
969 \samp{.o} file.
971 The resulting file \file{foomodule.so} must be copied into a directory
972 along the Python module search path.
975 \subsection{SGI dynamic loading}
977 {bf IMPORTANT:} You must compile your extension module with the
978 additional C flag \samp{-G0} (or \samp{-G 0}). This instruct the
979 assembler to generate position-independent code.
981 You don't need to link the resulting \file{foomodule.o} file; just
982 copy it into a directory along the Python module search path.
984 The first time your extension is loaded, it takes some extra time and
985 a few messages may be printed. This creates a file
986 \file{foomodule.ld} which is an image that can be loaded quickly into
987 the Python interpreter process. When a new Python interpreter is
988 installed, the \code{dl} package detects this and rebuilds
989 \file{foomodule.ld}. The file \file{foomodule.ld} is placed in the
990 directory where \file{foomodule.o} was found, unless this directory is
991 unwritable; in that case it is placed in a temporary
992 directory.\footnote{Check the manual page of the \code{dl} package for
993 details.}
995 If your extension modules uses additional system libraries, you must
996 create a file \file{foomodule.libs} in the same directory as the
997 \file{foomodule.o}. This file should contain one or more lines with
998 whitespace-separated options that will be passed to the linker ---
999 normally only \samp{-l} options or absolute pathnames of libraries
1000 (\samp{.a} files) should be used.
1003 \subsection{GNU dynamic loading}
1005 Just copy \file{foomodule.o} into a directory along the Python module
1006 search path.
1008 If your extension modules uses additional system libraries, you must
1009 create a file \file{foomodule.libs} in the same directory as the
1010 \file{foomodule.o}. This file should contain one or more lines with
1011 whitespace-separated absolute pathnames of libraries (\samp{.a}
1012 files). No \samp{-l} options can be used.
1015 \input{ext.ind}
1017 \end{document}