1 \chapter{Introduction
\label{intro
}}
4 The Application Programmer's Interface to Python gives C and
5 \Cpp{} programmers access to the Python interpreter at a variety of
6 levels. The API is equally usable from
\Cpp, but for brevity it is
7 generally referred to as the Python/C API. There are two
8 fundamentally different reasons for using the Python/C API. The first
9 reason is to write
\emph{extension modules
} for specific purposes;
10 these are C modules that extend the Python interpreter. This is
11 probably the most common use. The second reason is to use Python as a
12 component in a larger application; this technique is generally
13 referred to as
\dfn{embedding
} Python in an application.
15 Writing an extension module is a relatively well-understood process,
16 where a ``cookbook'' approach works well. There are several tools
17 that automate the process to some extent. While people have embedded
18 Python in other applications since its early existence, the process of
19 embedding Python is less straightforward than writing an extension.
21 Many API functions are useful independent of whether you're embedding
22 or extending Python; moreover, most applications that embed Python
23 will need to provide a custom extension as well, so it's probably a
24 good idea to become familiar with writing an extension before
25 attempting to embed Python in a real application.
28 \section{Include Files
\label{includes
}}
30 All function, type and macro definitions needed to use the Python/C
31 API are included in your code by the following line:
37 This implies inclusion of the following standard headers:
38 \code{<stdio.h>
},
\code{<string.h>
},
\code{<errno.h>
},
39 \code{<limits.h>
}, and
\code{<stdlib.h>
} (if available).
41 \begin{notice
}[warning
]
42 Since Python may define some pre-processor definitions which affect
43 the standard headers on some systems, you
\emph{must
} include
44 \file{Python.h
} before any standard headers are included.
47 All user visible names defined by Python.h (except those defined by
48 the included standard headers) have one of the prefixes
\samp{Py
} or
49 \samp{_Py
}. Names beginning with
\samp{_Py
} are for internal use by
50 the Python implementation and should not be used by extension writers.
51 Structure member names do not have a reserved prefix.
53 \strong{Important:
} user code should never define names that begin
54 with
\samp{Py
} or
\samp{_Py
}. This confuses the reader, and
55 jeopardizes the portability of the user code to future Python
56 versions, which may define additional names beginning with one of
59 The header files are typically installed with Python. On
\UNIX, these
60 are located in the directories
61 \file{\envvar{prefix
}/include/python
\var{version
}/
} and
62 \file{\envvar{exec_prefix
}/include/python
\var{version
}/
}, where
63 \envvar{prefix
} and
\envvar{exec_prefix
} are defined by the
64 corresponding parameters to Python's
\program{configure
} script and
65 \var{version
} is
\code{sys.version
[:
3]}. On Windows, the headers are
66 installed in
\file{\envvar{prefix
}/include
}, where
\envvar{prefix
} is
67 the installation directory specified to the installer.
69 To include the headers, place both directories (if different) on your
70 compiler's search path for includes. Do
\emph{not
} place the parent
71 directories on the search path and then use
72 \samp{\#include <python
\shortversion/Python.h>
}; this will break on
73 multi-platform builds since the platform independent headers under
74 \envvar{prefix
} include the platform specific headers from
77 \Cpp{} users should note that though the API is defined entirely using
78 C, the header files do properly declare the entry points to be
79 \code{extern "C"
}, so there is no need to do anything special to use
83 \section{Objects, Types and Reference Counts
\label{objects
}}
85 Most Python/C API functions have one or more arguments as well as a
86 return value of type
\ctype{PyObject*
}. This type is a pointer
87 to an opaque data type representing an arbitrary Python
88 object. Since all Python object types are treated the same way by the
89 Python language in most situations (e.g., assignments, scope rules,
90 and argument passing), it is only fitting that they should be
91 represented by a single C type. Almost all Python objects live on the
92 heap: you never declare an automatic or static variable of type
93 \ctype{PyObject
}, only pointer variables of type
\ctype{PyObject*
} can
94 be declared. The sole exception are the type objects
\obindex{type
};
95 since these must never be deallocated, they are typically static
96 \ctype{PyTypeObject
} objects.
98 All Python objects (even Python integers) have a
\dfn{type
} and a
99 \dfn{reference count
}. An object's type determines what kind of object
100 it is (e.g., an integer, a list, or a user-defined function; there are
101 many more as explained in the
\citetitle[../ref/ref.html
]{Python
102 Reference Manual
}). For each of the well-known types there is a macro
103 to check whether an object is of that type; for instance,
104 \samp{PyList_Check(
\var{a
})
} is true if (and only if) the object
105 pointed to by
\var{a
} is a Python list.
108 \subsection{Reference Counts
\label{refcounts
}}
110 The reference count is important because today's computers have a
111 finite (and often severely limited) memory size; it counts how many
112 different places there are that have a reference to an object. Such a
113 place could be another object, or a global (or static) C variable, or
114 a local variable in some C function. When an object's reference count
115 becomes zero, the object is deallocated. If it contains references to
116 other objects, their reference count is decremented. Those other
117 objects may be deallocated in turn, if this decrement makes their
118 reference count become zero, and so on. (There's an obvious problem
119 with objects that reference each other here; for now, the solution is
122 Reference counts are always manipulated explicitly. The normal way is
123 to use the macro
\cfunction{Py_INCREF()
}\ttindex{Py_INCREF()
} to
124 increment an object's reference count by one, and
125 \cfunction{Py_DECREF()
}\ttindex{Py_DECREF()
} to decrement it by
126 one. The
\cfunction{Py_DECREF()
} macro is considerably more complex
127 than the incref one, since it must check whether the reference count
128 becomes zero and then cause the object's deallocator to be called.
129 The deallocator is a function pointer contained in the object's type
130 structure. The type-specific deallocator takes care of decrementing
131 the reference counts for other objects contained in the object if this
132 is a compound object type, such as a list, as well as performing any
133 additional finalization that's needed. There's no chance that the
134 reference count can overflow; at least as many bits are used to hold
135 the reference count as there are distinct memory locations in virtual
136 memory (assuming
\code{sizeof(long) >= sizeof(char*)
}). Thus, the
137 reference count increment is a simple operation.
139 It is not necessary to increment an object's reference count for every
140 local variable that contains a pointer to an object. In theory, the
141 object's reference count goes up by one when the variable is made to
142 point to it and it goes down by one when the variable goes out of
143 scope. However, these two cancel each other out, so at the end the
144 reference count hasn't changed. The only real reason to use the
145 reference count is to prevent the object from being deallocated as
146 long as our variable is pointing to it. If we know that there is at
147 least one other reference to the object that lives at least as long as
148 our variable, there is no need to increment the reference count
149 temporarily. An important situation where this arises is in objects
150 that are passed as arguments to C functions in an extension module
151 that are called from Python; the call mechanism guarantees to hold a
152 reference to every argument for the duration of the call.
154 However, a common pitfall is to extract an object from a list and
155 hold on to it for a while without incrementing its reference count.
156 Some other operation might conceivably remove the object from the
157 list, decrementing its reference count and possible deallocating it.
158 The real danger is that innocent-looking operations may invoke
159 arbitrary Python code which could do this; there is a code path which
160 allows control to flow back to the user from a
\cfunction{Py_DECREF()
},
161 so almost any operation is potentially dangerous.
163 A safe approach is to always use the generic operations (functions
164 whose name begins with
\samp{PyObject_
},
\samp{PyNumber_
},
165 \samp{PySequence_
} or
\samp{PyMapping_
}). These operations always
166 increment the reference count of the object they return. This leaves
167 the caller with the responsibility to call
168 \cfunction{Py_DECREF()
} when they are done with the result; this soon
169 becomes second nature.
172 \subsubsection{Reference Count Details
\label{refcountDetails
}}
174 The reference count behavior of functions in the Python/C API is best
175 explained in terms of
\emph{ownership of references
}. Ownership
176 pertains to references, never to objects (objects are not owned: they
177 are always shared). "Owning a reference" means being responsible for
178 calling Py_DECREF on it when the reference is no longer needed.
179 Ownership can also be transferred, meaning that the code that receives
180 ownership of the reference then becomes responsible for eventually
181 decref'ing it by calling
\cfunction{Py_DECREF()
} or
182 \cfunction{Py_XDECREF()
} when it's no longer needed --or passing on
183 this responsibility (usually to its caller).
184 When a function passes ownership of a reference on to its caller, the
185 caller is said to receive a
\emph{new
} reference. When no ownership
186 is transferred, the caller is said to
\emph{borrow
} the reference.
187 Nothing needs to be done for a borrowed reference.
189 Conversely, when a calling function passes it a reference to an
190 object, there are two possibilities: the function
\emph{steals
} a
191 reference to the object, or it does not. Few functions steal
192 references; the two notable exceptions are
193 \cfunction{PyList_SetItem()
}\ttindex{PyList_SetItem()
} and
194 \cfunction{PyTuple_SetItem()
}\ttindex{PyTuple_SetItem()
}, which
195 steal a reference to the item (but not to the tuple or list into which
196 the item is put!). These functions were designed to steal a reference
197 because of a common idiom for populating a tuple or list with newly
198 created objects; for example, the code to create the tuple
\code{(
1,
199 2, "three")
} could look like this (forgetting about error handling for
200 the moment; a better way to code this is shown below):
206 PyTuple_SetItem(t,
0, PyInt_FromLong(
1L));
207 PyTuple_SetItem(t,
1, PyInt_FromLong(
2L));
208 PyTuple_SetItem(t,
2, PyString_FromString("three"));
211 Incidentally,
\cfunction{PyTuple_SetItem()
} is the
\emph{only
} way to
212 set tuple items;
\cfunction{PySequence_SetItem()
} and
213 \cfunction{PyObject_SetItem()
} refuse to do this since tuples are an
214 immutable data type. You should only use
215 \cfunction{PyTuple_SetItem()
} for tuples that you are creating
218 Equivalent code for populating a list can be written using
219 \cfunction{PyList_New()
} and
\cfunction{PyList_SetItem()
}. Such code
220 can also use
\cfunction{PySequence_SetItem()
}; this illustrates the
221 difference between the two (the extra
\cfunction{Py_DECREF()
} calls):
227 x = PyInt_FromLong(
1L);
228 PySequence_SetItem(l,
0, x); Py_DECREF(x);
229 x = PyInt_FromLong(
2L);
230 PySequence_SetItem(l,
1, x); Py_DECREF(x);
231 x = PyString_FromString("three");
232 PySequence_SetItem(l,
2, x); Py_DECREF(x);
235 You might find it strange that the ``recommended'' approach takes more
236 code. However, in practice, you will rarely use these ways of
237 creating and populating a tuple or list. There's a generic function,
238 \cfunction{Py_BuildValue()
}, that can create most common objects from
239 C values, directed by a
\dfn{format string
}. For example, the
240 above two blocks of code could be replaced by the following (which
241 also takes care of the error checking):
246 t = Py_BuildValue("(iis)",
1,
2, "three");
247 l = Py_BuildValue("
[iis
]",
1,
2, "three");
250 It is much more common to use
\cfunction{PyObject_SetItem()
} and
251 friends with items whose references you are only borrowing, like
252 arguments that were passed in to the function you are writing. In
253 that case, their behaviour regarding reference counts is much saner,
254 since you don't have to increment a reference count so you can give a
255 reference away (``have it be stolen''). For example, this function
256 sets all items of a list (actually, any mutable sequence) to a given
261 set_all(PyObject *target, PyObject *item)
265 n = PyObject_Length(target);
268 for (i =
0; i < n; i++)
{
269 if (PyObject_SetItem(target, i, item) <
0)
277 The situation is slightly different for function return values.
278 While passing a reference to most functions does not change your
279 ownership responsibilities for that reference, many functions that
280 return a reference to an object give you ownership of the reference.
281 The reason is simple: in many cases, the returned object is created
282 on the fly, and the reference you get is the only reference to the
283 object. Therefore, the generic functions that return object
284 references, like
\cfunction{PyObject_GetItem()
} and
285 \cfunction{PySequence_GetItem()
}, always return a new reference (the
286 caller becomes the owner of the reference).
288 It is important to realize that whether you own a reference returned
289 by a function depends on which function you call only ---
\emph{the
290 plumage
} (the type of the object passed as an
291 argument to the function)
\emph{doesn't enter into it!
} Thus, if you
292 extract an item from a list using
\cfunction{PyList_GetItem()
}, you
293 don't own the reference --- but if you obtain the same item from the
294 same list using
\cfunction{PySequence_GetItem()
} (which happens to
295 take exactly the same arguments), you do own a reference to the
298 Here is an example of how you could write a function that computes the
299 sum of the items in a list of integers; once using
300 \cfunction{PyList_GetItem()
}\ttindex{PyList_GetItem()
}, and once using
301 \cfunction{PySequence_GetItem()
}\ttindex{PySequence_GetItem()
}.
305 sum_list(PyObject *list)
311 n = PyList_Size(list);
313 return -
1; /* Not a list */
314 for (i =
0; i < n; i++)
{
315 item = PyList_GetItem(list, i); /* Can't fail */
316 if (!PyInt_Check(item)) continue; /* Skip non-integers */
317 total += PyInt_AsLong(item);
326 sum_sequence(PyObject *sequence)
331 n = PySequence_Length(sequence);
333 return -
1; /* Has no length */
334 for (i =
0; i < n; i++)
{
335 item = PySequence_GetItem(sequence, i);
337 return -
1; /* Not a sequence, or other failure */
338 if (PyInt_Check(item))
339 total += PyInt_AsLong(item);
340 Py_DECREF(item); /* Discard reference ownership */
345 \ttindex{sum_sequence()
}
348 \subsection{Types
\label{types
}}
350 There are few other data types that play a significant role in
351 the Python/C API; most are simple C types such as
\ctype{int
},
352 \ctype{long
},
\ctype{double
} and
\ctype{char*
}. A few structure types
353 are used to describe static tables used to list the functions exported
354 by a module or the data attributes of a new object type, and another
355 is used to describe the value of a complex number. These will
356 be discussed together with the functions that use them.
359 \section{Exceptions
\label{exceptions
}}
361 The Python programmer only needs to deal with exceptions if specific
362 error handling is required; unhandled exceptions are automatically
363 propagated to the caller, then to the caller's caller, and so on, until
364 they reach the top-level interpreter, where they are reported to the
365 user accompanied by a stack traceback.
367 For C programmers, however, error checking always has to be explicit.
368 All functions in the Python/C API can raise exceptions, unless an
369 explicit claim is made otherwise in a function's documentation. In
370 general, when a function encounters an error, it sets an exception,
371 discards any object references that it owns, and returns an
372 error indicator --- usually
\NULL{} or
\code{-
1}. A few functions
373 return a Boolean true/false result, with false indicating an error.
374 Very few functions return no explicit error indicator or have an
375 ambiguous return value, and require explicit testing for errors with
376 \cfunction{PyErr_Occurred()
}\ttindex{PyErr_Occurred()
}.
378 Exception state is maintained in per-thread storage (this is
379 equivalent to using global storage in an unthreaded application). A
380 thread can be in one of two states: an exception has occurred, or not.
381 The function
\cfunction{PyErr_Occurred()
} can be used to check for
382 this: it returns a borrowed reference to the exception type object
383 when an exception has occurred, and
\NULL{} otherwise. There are a
384 number of functions to set the exception state:
385 \cfunction{PyErr_SetString()
}\ttindex{PyErr_SetString()
} is the most
386 common (though not the most general) function to set the exception
387 state, and
\cfunction{PyErr_Clear()
}\ttindex{PyErr_Clear()
} clears the
390 The full exception state consists of three objects (all of which can
391 be
\NULL): the exception type, the corresponding exception
392 value, and the traceback. These have the same meanings as the Python
393 \withsubitem{(in module sys)
}{
394 \ttindex{exc_type
}\ttindex{exc_value
}\ttindex{exc_traceback
}}
395 objects
\code{sys.exc_type
},
\code{sys.exc_value
}, and
396 \code{sys.exc_traceback
}; however, they are not the same: the Python
397 objects represent the last exception being handled by a Python
398 \keyword{try
} \ldots\
\keyword{except
} statement, while the C level
399 exception state only exists while an exception is being passed on
400 between C functions until it reaches the Python bytecode interpreter's
401 main loop, which takes care of transferring it to
\code{sys.exc_type
}
404 Note that starting with Python
1.5, the preferred, thread-safe way to
405 access the exception state from Python code is to call the function
406 \withsubitem{(in module sys)
}{\ttindex{exc_info()
}}
407 \function{sys.exc_info()
}, which returns the per-thread exception state
408 for Python code. Also, the semantics of both ways to access the
409 exception state have changed so that a function which catches an
410 exception will save and restore its thread's exception state so as to
411 preserve the exception state of its caller. This prevents common bugs
412 in exception handling code caused by an innocent-looking function
413 overwriting the exception being handled; it also reduces the often
414 unwanted lifetime extension for objects that are referenced by the
415 stack frames in the traceback.
417 As a general principle, a function that calls another function to
418 perform some task should check whether the called function raised an
419 exception, and if so, pass the exception state on to its caller. It
420 should discard any object references that it owns, and return an
421 error indicator, but it should
\emph{not
} set another exception ---
422 that would overwrite the exception that was just raised, and lose
423 important information about the exact cause of the error.
425 A simple example of detecting exceptions and passing them on is shown
426 in the
\cfunction{sum_sequence()
}\ttindex{sum_sequence()
} example
427 above. It so happens that that example doesn't need to clean up any
428 owned references when it detects an error. The following example
429 function shows some error cleanup. First, to remind you why you like
430 Python, we show the equivalent Python code:
433 def incr_item(dict, key):
440 \ttindex{incr_item()
}
442 Here is the corresponding C code, in all its glory:
446 incr_item(PyObject *dict, PyObject *key)
448 /* Objects all initialized to NULL for Py_XDECREF */
449 PyObject *item = NULL, *const_one = NULL, *incremented_item = NULL;
450 int rv = -
1; /* Return value initialized to -
1 (failure) */
452 item = PyObject_GetItem(dict, key);
454 /* Handle KeyError only: */
455 if (!PyErr_ExceptionMatches(PyExc_KeyError))
458 /* Clear the error and use zero: */
460 item = PyInt_FromLong(
0L);
464 const_one = PyInt_FromLong(
1L);
465 if (const_one == NULL)
468 incremented_item = PyNumber_Add(item, const_one);
469 if (incremented_item == NULL)
472 if (PyObject_SetItem(dict, key, incremented_item) <
0)
474 rv =
0; /* Success */
475 /* Continue with cleanup code */
478 /* Cleanup code, shared by success and failure path */
480 /* Use Py_XDECREF() to ignore NULL references */
482 Py_XDECREF(const_one);
483 Py_XDECREF(incremented_item);
485 return rv; /* -
1 for error,
0 for success */
488 \ttindex{incr_item()
}
490 This example represents an endorsed use of the
\keyword{goto
} statement
491 in C! It illustrates the use of
492 \cfunction{PyErr_ExceptionMatches()
}\ttindex{PyErr_ExceptionMatches()
} and
493 \cfunction{PyErr_Clear()
}\ttindex{PyErr_Clear()
} to
494 handle specific exceptions, and the use of
495 \cfunction{Py_XDECREF()
}\ttindex{Py_XDECREF()
} to
496 dispose of owned references that may be
\NULL{} (note the
497 \character{X
} in the name;
\cfunction{Py_DECREF()
} would crash when
498 confronted with a
\NULL{} reference). It is important that the
499 variables used to hold owned references are initialized to
\NULL{} for
500 this to work; likewise, the proposed return value is initialized to
501 \code{-
1} (failure) and only set to success after the final call made
505 \section{Embedding Python
\label{embedding
}}
507 The one important task that only embedders (as opposed to extension
508 writers) of the Python interpreter have to worry about is the
509 initialization, and possibly the finalization, of the Python
510 interpreter. Most functionality of the interpreter can only be used
511 after the interpreter has been initialized.
513 The basic initialization function is
514 \cfunction{Py_Initialize()
}\ttindex{Py_Initialize()
}.
515 This initializes the table of loaded modules, and creates the
516 fundamental modules
\module{__builtin__
}\refbimodindex{__builtin__
},
517 \module{__main__
}\refbimodindex{__main__
},
\module{sys
}\refbimodindex{sys
},
518 and
\module{exceptions
}.
\refbimodindex{exceptions
} It also initializes
519 the module search path (
\code{sys.path
}).
%
520 \indexiii{module
}{search
}{path
}
521 \withsubitem{(in module sys)
}{\ttindex{path
}}
523 \cfunction{Py_Initialize()
} does not set the ``script argument list''
524 (
\code{sys.argv
}). If this variable is needed by Python code that
525 will be executed later, it must be set explicitly with a call to
526 \code{PySys_SetArgv(
\var{argc
},
527 \var{argv
})
}\ttindex{PySys_SetArgv()
} subsequent to the call to
528 \cfunction{Py_Initialize()
}.
530 On most systems (in particular, on
\UNIX{} and Windows, although the
531 details are slightly different),
532 \cfunction{Py_Initialize()
} calculates the module search path based
533 upon its best guess for the location of the standard Python
534 interpreter executable, assuming that the Python library is found in a
535 fixed location relative to the Python interpreter executable. In
536 particular, it looks for a directory named
537 \file{lib/python
\shortversion} relative to the parent directory where
538 the executable named
\file{python
} is found on the shell command
539 search path (the environment variable
\envvar{PATH
}).
541 For instance, if the Python executable is found in
542 \file{/usr/local/bin/python
}, it will assume that the libraries are in
543 \file{/usr/local/lib/python
\shortversion}. (In fact, this particular path
544 is also the ``fallback'' location, used when no executable file named
545 \file{python
} is found along
\envvar{PATH
}.) The user can override
546 this behavior by setting the environment variable
\envvar{PYTHONHOME
},
547 or insert additional directories in front of the standard path by
548 setting
\envvar{PYTHONPATH
}.
550 The embedding application can steer the search by calling
551 \code{Py_SetProgramName(
\var{file
})
}\ttindex{Py_SetProgramName()
} \emph{before
} calling
552 \cfunction{Py_Initialize()
}. Note that
\envvar{PYTHONHOME
} still
553 overrides this and
\envvar{PYTHONPATH
} is still inserted in front of
554 the standard path. An application that requires total control has to
555 provide its own implementation of
556 \cfunction{Py_GetPath()
}\ttindex{Py_GetPath()
},
557 \cfunction{Py_GetPrefix()
}\ttindex{Py_GetPrefix()
},
558 \cfunction{Py_GetExecPrefix()
}\ttindex{Py_GetExecPrefix()
}, and
559 \cfunction{Py_GetProgramFullPath()
}\ttindex{Py_GetProgramFullPath()
} (all
560 defined in
\file{Modules/getpath.c
}).
562 Sometimes, it is desirable to ``uninitialize'' Python. For instance,
563 the application may want to start over (make another call to
564 \cfunction{Py_Initialize()
}) or the application is simply done with its
565 use of Python and wants to free all memory allocated by Python. This
566 can be accomplished by calling
\cfunction{Py_Finalize()
}. The function
567 \cfunction{Py_IsInitialized()
}\ttindex{Py_IsInitialized()
} returns
568 true if Python is currently in the initialized state. More
569 information about these functions is given in a later chapter.