This commit was manufactured by cvs2svn to create tag 'cnrisync'.
[python/dscho.git] / Doc / libparser.tex
blob1f5d4fda750ac7fb191b391b61e1d7a03986f1ac
1 % libparser.tex
3 % Introductory documentation for the new parser built-in module.
5 % Copyright 1995 Virginia Polytechnic Institute and State University
6 % and Fred L. Drake, Jr. This copyright notice must be distributed on
7 % all copies, but this document otherwise may be distributed as part
8 % of the Python distribution. No fee may be charged for this document
9 % in any representation, either on paper or electronically. This
10 % restriction does not affect other elements in a distributed package
11 % in any way.
14 \section{Built-in Module \sectcode{parser}}
15 \bimodindex{parser}
18 % ==== 2. ====
19 % Give a short overview of what the module does.
20 % If it is platform specific, mention this.
21 % Mention other important restrictions or general operating principles.
23 The \code{parser} module provides an interface to Python's internal
24 parser and byte-code compiler. The primary purpose for this interface
25 is to allow Python code to edit the parse tree of a Python expression
26 and create executable code from this. This can be better than trying
27 to parse and modify an arbitrary Python code fragment as a string, and
28 ensures that parsing is performed in a manner identical to the code
29 forming the application. It's also faster.
31 There are a few things to note about this module which are important
32 to making use of the data structures created. This is not a tutorial
33 on editing the parse trees for Python code.
35 Most importantly, a good understanding of the Python grammar processed
36 by the internal parser is required. For full information on the
37 language syntax, refer to the Language Reference. The parser itself
38 is created from a grammar specification defined in the file
39 \code{Grammar/Grammar} in the standard Python distribution. The parse
40 trees stored in the ``AST objects'' created by this module are the
41 actual output from the internal parser when created by the
42 \code{expr()} or \code{suite()} functions, described below. The AST
43 objects created by \code{tuple2ast()} faithfully simulate those
44 structures.
46 Each element of the tuples returned by \code{ast2tuple()} has a simple
47 form. Tuples representing non-terminal elements in the grammar always
48 have a length greater than one. The first element is an integer which
49 identifies a production in the grammar. These integers are given
50 symbolic names in the C header file \code{Include/graminit.h} and the
51 Python module \code{Lib/symbol.py}. Each additional element of the
52 tuple represents a component of the production as recognized in the
53 input string: these are always tuples which have the same form as the
54 parent. An important aspect of this structure which should be noted
55 is that keywords used to identify the parent node type, such as the
56 keyword \code{if} in an \emph{if\_stmt}, are included in the node tree
57 without any special treatment. For example, the \code{if} keyword is
58 represented by the tuple \code{(1, 'if')}, where \code{1} is the
59 numeric value associated with all \code{NAME} elements, including
60 variable and function names defined by the user.
62 Terminal elements are represented in much the same way, but without
63 any child elements and the addition of the source text which was
64 identified. The example of the \code{if} keyword above is
65 representative. The various types of terminal symbols are defined in
66 the C header file \code{Include/token.h} and the Python module
67 \code{Lib/token.py}.
69 The AST objects are not actually required to support the functionality
70 of this module, but are provided for three purposes: to allow an
71 application to amortize the cost of processing complex parse trees, to
72 provide a parse tree representation which conserves memory space when
73 compared to the Python tuple representation, and to ease the creation
74 of additional modules in C which manipulate parse trees. A simple
75 ``wrapper'' module may be created in Python if desired to hide the use
76 of AST objects.
79 % ==== 3. ====
80 % List the public functions defined by the module. Begin with a
81 % standard phrase. You may also list the exceptions and other data
82 % items defined in the module, insofar as they are important for the
83 % user.
85 The \code{parser} module defines the following functions:
87 % ---- 3.1. ----
88 % Redefine the ``indexsubitem'' macro to point to this module
89 % (alternatively, you can put this at the top of the file):
91 \renewcommand{\indexsubitem}{(in module parser)}
93 % ---- 3.2. ----
94 % For each function, use a ``funcdesc'' block. This has exactly two
95 % parameters (each parameters is contained in a set of curly braces):
96 % the first parameter is the function name (this automatically
97 % generates an index entry); the second parameter is the function's
98 % argument list. If there are no arguments, use an empty pair of
99 % curly braces. If there is more than one argument, separate the
100 % arguments with backslash-comma. Optional parts of the parameter
101 % list are contained in \optional{...} (this generates a set of square
102 % brackets around its parameter). Arguments are automatically set in
103 % italics in the parameter list. Each argument should be mentioned at
104 % least once in the description; each usage (even inside \code{...})
105 % should be enclosed in \var{...}.
107 \begin{funcdesc}{ast2tuple}{ast}
108 This function accepts an AST object from the caller in
109 \code{\var{ast}} and returns a Python tuple representing the
110 equivelent parse tree. The resulting tuple representation can be used
111 for inspection or the creation of a new parse tree in tuple form.
112 This function does not fail so long as memory is available to build
113 the tuple representation.
114 \end{funcdesc}
117 \begin{funcdesc}{compileast}{ast\optional{\, filename \code{= '<ast>'}}}
118 The Python byte compiler can be invoked on an AST object to produce
119 code objects which can be used as part of an \code{exec} statement or
120 a call to the built-in \code{eval()} function. This function provides
121 the interface to the compiler, passing the internal parse tree from
122 \code{\var{ast}} to the parser, using the source file name specified
123 by the \code{\var{filename}} parameter. The default value supplied
124 for \code{\var{filename}} indicates that the source was an AST object.
125 \end{funcdesc}
128 \begin{funcdesc}{expr}{string}
129 The \code{expr()} function parses the parameter \code{\var{string}}
130 as if it were an input to \code{compile(\var{string}, 'eval')}. If
131 the parse succeeds, an AST object is created to hold the internal
132 parse tree representation, otherwise an appropriate exception is
133 thrown.
134 \end{funcdesc}
137 \begin{funcdesc}{isexpr}{ast}
138 When \code{\var{ast}} represents an \code{'eval'} form, this function
139 returns a true value (\code{1}), otherwise it returns false
140 (\code{0}). This is useful, since code objects normally cannot be
141 queried for this information using existing built-in functions. Note
142 that the code objects created by \code{compileast()} cannot be queried
143 like this either, and are identical to those created by the built-in
144 \code{compile()} function.
145 \end{funcdesc}
148 \begin{funcdesc}{issuite}{ast}
149 This function mirrors \code{isexpr()} in that it reports whether an
150 AST object represents a suite of statements. It is not safe to assume
151 that this function is equivelent to \code{not isexpr(\var{ast})}, as
152 additional syntactic fragments may be supported in the future.
153 \end{funcdesc}
156 \begin{funcdesc}{suite}{string}
157 The \code{suite()} function parses the parameter \code{\var{string}}
158 as if it were an input to \code{compile(\var{string}, 'exec')}. If
159 the parse succeeds, an AST object is created to hold the internal
160 parse tree representation, otherwise an appropriate exception is
161 thrown.
162 \end{funcdesc}
165 \begin{funcdesc}{tuple2ast}{tuple}
166 This function accepts a parse tree represented as a tuple and builds
167 an internal representation if possible. If it can validate that the
168 tree conforms to the Python syntax and all nodes are valid node types
169 in the host version of Python, an AST object is created from the
170 internal representation and returned to the called. If there is a
171 problem creating the internal representation, or if the tree cannot be
172 validated, a \code{ParserError} exception is thrown. An AST object
173 created this way should not be assumed to compile correctly; normal
174 exceptions thrown by compilation may still be initiated when the AST
175 object is passed to \code{compileast()}. This will normally indicate
176 problems not related to syntax (such as a \code{MemoryError}
177 exception).
178 \end{funcdesc}
181 % --- 3.4. ---
182 % Exceptions are described using a ``excdesc'' block. This has only
183 % one parameter: the exception name.
185 \subsection{Exceptions and Error Handling}
187 The parser module defines a single exception, but may also pass other
188 built-in exceptions from other portions of the Python runtime
189 environment. See each function for information about the exceptions
190 it can raise.
192 \begin{excdesc}{ParserError}
193 Exception raised when a failure occurs within the parser module. This
194 is generally produced for validation failures rather than the built in
195 \code{SyntaxError} thrown during normal parsing.
196 The exception argument is either a string describing the reason of the
197 failure or a tuple containing a tuple causing the failure from a parse
198 tree passed to \code{tuple2ast()} and an explanatory string. Calls to
199 \code{tuple2ast()} need to be able to handle either type of exception,
200 while calls to other functions in the module will only need to be
201 aware of the simple string values.
202 \end{excdesc}
204 Note that the functions \code{compileast()}, \code{expr()}, and
205 \code{suite()} may throw exceptions which are normally thrown by the
206 parsing and compilation process. These include the built in
207 exceptions \code{MemoryError}, \code{OverflowError},
208 \code{SyntaxError}, and \code{SystemError}. In these cases, these
209 exceptions carry all the meaning normally associated with them. Refer
210 to the descriptions of each function for detailed information.
212 % ---- 3.5. ----
213 % There is no standard block type for classes. I generally use
214 % ``funcdesc'' blocks, since class instantiation looks very much like
215 % a function call.
218 % ==== 4. ====
219 % Now is probably a good time for a complete example. (Alternatively,
220 % an example giving the flavor of the module may be given before the
221 % detailed list of functions.)
223 \subsection{Example}
225 A simple example:
227 \begin{verbatim}
228 >>> import parser
229 >>> ast = parser.expr('a + 5')
230 >>> code = parser.compileast(ast)
231 >>> a = 5
232 >>> eval(code)
234 \end{verbatim}
237 \subsection{AST Objects}
239 AST objects (returned by \code{expr()}, \code{suite()}, and
240 \code{tuple2ast()}, described above) have no methods of their own.
241 Some of the functions defined which accept an AST object as their
242 first argument may change to object methods in the future.
244 Ordered and equality comparisons are supported between AST objects.
246 \renewcommand{\indexsubitem}{(ast method)}
248 %\begin{funcdesc}{empty}{}
249 %Empty the can into the trash.
250 %\end{funcdesc}