1 \chapter{Python compiler package
\label{compiler
}}
3 \sectionauthor{Jeremy Hylton
}{jeremy@zope.com
}
6 The Python compiler package is a tool for analyzing Python source code
7 and generating Python bytecode. The compiler contains libraries to
8 generate an abstract syntax tree from Python source code and to
9 generate Python bytecode from the tree.
11 The
\refmodule{compiler
} package is a Python source to bytecode
12 translator written in Python. It uses the built-in parser and
13 standard
\refmodule{parser
} module to generated a concrete syntax
14 tree. This tree is used to generate an abstract syntax tree (AST) and
17 The full functionality of the package duplicates the builtin compiler
18 provided with the Python interpreter. It is intended to match its
19 behavior almost exactly. Why implement another compiler that does the
20 same thing? The package is useful for a variety of purposes. It can
21 be modified more easily than the builtin compiler. The AST it
22 generates is useful for analyzing Python source code.
24 This chapter explains how the various components of the
25 \refmodule{compiler
} package work. It blends reference material with
28 The following modules are part of the
\refmodule{compiler
} package:
33 \section{The basic interface
}
35 \declaremodule{}{compiler
}
37 The top-level of the package defines four functions. If you import
38 \module{compiler
}, you will get these functions and a collection of
39 modules contained in the package.
41 \begin{funcdesc
}{parse
}{buf
}
42 Returns an abstract syntax tree for the Python source code in
\var{buf
}.
43 The function raises SyntaxError if there is an error in the source
44 code. The return value is a
\class{compiler.ast.Module
} instance that
48 \begin{funcdesc
}{parseFile
}{path
}
49 Return an abstract syntax tree for the Python source code in the file
50 specified by
\var{path
}. It is equivalent to
51 \code{parse(open(
\var{path
}).read())
}.
54 \begin{funcdesc
}{walk
}{ast, visitor
\optional{, verbose
}}
55 Do a pre-order walk over the abstract syntax tree
\var{ast
}. Call the
56 appropriate method on the
\var{visitor
} instance for each node
60 \begin{funcdesc
}{compile
}{source, filename, mode, flags=None,
62 Compile the string
\var{source
}, a Python module, statement or
63 expression, into a code object that can be executed by the exec
64 statement or
\function{eval()
}. This function is a replacement for the
65 built-in
\function{compile()
} function.
67 The
\var{filename
} will be used for run-time error messages.
69 The
\var{mode
} must be 'exec' to compile a module, 'single' to compile a
70 single (interactive) statement, or 'eval' to compile an expression.
72 The
\var{flags
} and
\var{dont_inherit
} arguments affect future-related
73 statements, but are not supported yet.
76 \begin{funcdesc
}{compileFile
}{source
}
77 Compiles the file
\var{source
} and generates a .pyc file.
80 The
\module{compiler
} package contains the following modules:
81 \refmodule[compiler.ast
]{ast
},
\module{consts
},
\module{future
},
82 \module{misc
},
\module{pyassem
},
\module{pycodegen
},
\module{symbols
},
83 \module{transformer
}, and
\refmodule[compiler.visitor
]{visitor
}.
87 There are some problems with the error checking of the compiler
88 package. The interpreter detects syntax errors in two distinct
89 phases. One set of errors is detected by the interpreter's parser,
90 the other set by the compiler. The compiler package relies on the
91 interpreter's parser, so it get the first phases of error checking for
92 free. It implements the second phase itself, and that implement is
93 incomplete. For example, the compiler package does not raise an error
94 if a name appears more than once in an argument list:
95 \code{def f(x, x): ...
}
97 A future version of the compiler should fix these problems.
99 \section{Python Abstract Syntax
}
101 The
\module{compiler.ast
} module defines an abstract syntax for
102 Python. In the abstract syntax tree, each node represents a syntactic
103 construct. The root of the tree is
\class{Module
} object.
105 The abstract syntax offers a higher level interface to parsed Python
106 source code. The
\ulink{\module{parser
}}
107 {http://www.python.org/doc/current/lib/module-parser.html
}
108 module and the compiler written in C for the Python interpreter use a
109 concrete syntax tree. The concrete syntax is tied closely to the
110 grammar description used for the Python parser. Instead of a single
111 node for a construct, there are often several levels of nested nodes
112 that are introduced by Python's precedence rules.
114 The abstract syntax tree is created by the
115 \module{compiler.transformer
} module. The transformer relies on the
116 builtin Python parser to generate a concrete syntax tree. It
117 generates an abstract syntax tree from the concrete tree.
119 The
\module{transformer
} module was created by Greg
120 Stein
\index{Stein, Greg
} and Bill Tutt
\index{Tutt, Bill
} for an
121 experimental Python-to-C compiler. The current version contains a
122 number of modifications and improvements, but the basic form of the
123 abstract syntax and of the transformer are due to Stein and Tutt.
125 \subsection{AST Nodes
}
127 \declaremodule{}{compiler.ast
}
129 The
\module{compiler.ast
} module is generated from a text file that
130 describes each node type and its elements. Each node type is
131 represented as a class that inherits from the abstract base class
132 \class{compiler.ast.Node
} and defines a set of named attributes for
135 \begin{classdesc
}{Node
}{}
137 The
\class{Node
} instances are created automatically by the parser
138 generator. The recommended interface for specific
\class{Node
}
139 instances is to use the public attributes to access child nodes. A
140 public attribute may be bound to a single node or to a sequence of
141 nodes, depending on the
\class{Node
} type. For example, the
142 \member{bases
} attribute of the
\class{Class
} node, is bound to a
143 list of base class nodes, and the
\member{doc
} attribute is bound to
146 Each
\class{Node
} instance has a
\member{lineno
} attribute which may
147 be
\code{None
}. XXX Not sure what the rules are for which nodes
148 will have a useful lineno.
151 All
\class{Node
} objects offer the following methods:
153 \begin{methoddesc
}{getChildren
}{}
154 Returns a flattened list of the child nodes and objects in the
155 order they occur. Specifically, the order of the nodes is the
156 order in which they appear in the Python grammar. Not all of the
157 children are
\class{Node
} instances. The names of functions and
158 classes, for example, are plain strings.
161 \begin{methoddesc
}{getChildNodes
}{}
162 Returns a flattened list of the child nodes in the order they
163 occur. This method is like
\method{getChildren()
}, except that it
164 only returns those children that are
\class{Node
} instances.
167 Two examples illustrate the general structure of
\class{Node
}
168 classes. The
\keyword{while
} statement is defined by the following
172 while_stmt: "while" expression ":" suite
176 The
\class{While
} node has three attributes:
\member{test
},
177 \member{body
}, and
\member{else_
}. (If the natural name for an
178 attribute is also a Python reserved word, it can't be used as an
179 attribute name. An underscore is appended to the word to make it a
180 legal identifier, hence
\member{else_
} instead of
\keyword{else
}.)
182 The
\keyword{if
} statement is more complicated because it can include
186 if_stmt: 'if' test ':' suite ('elif' test ':' suite)*
['else' ':' suite
]
189 The
\class{If
} node only defines two attributes:
\member{tests
} and
190 \member{else_
}. The
\member{tests
} attribute is a sequence of test
191 expression, consequent body pairs. There is one pair for each
192 \keyword{if
}/
\keyword{elif
} clause. The first element of the pair is
193 the test expression. The second elements is a
\class{Stmt
} node that
194 contains the code to execute if the test is true.
196 The
\method{getChildren()
} method of
\class{If
} returns a flat list of
197 child nodes. If there are three
\keyword{if
}/
\keyword{elif
} clauses
198 and no
\keyword{else
} clause, then
\method{getChildren()
} will return
199 a list of six elements: the first test expression, the first
200 \class{Stmt
}, the second text expression, etc.
202 The following table lists each of the
\class{Node
} subclasses defined
203 in
\module{compiler.ast
} and each of the public attributes available
204 on their instances. The values of most of the attributes are
205 themselves
\class{Node
} instances or sequences of instances. When the
206 value is something other than an instance, the type is noted in the
207 comment. The attributes are listed in the order in which they are
208 returned by
\method{getChildren()
} and
\method{getChildNodes()
}.
213 \subsection{Assignment nodes
}
215 There is a collection of nodes used to represent assignments. Each
216 assignment statement in the source code becomes a single
217 \class{Assign
} node in the AST. The
\member{nodes
} attribute is a
218 list that contains a node for each assignment target. This is
219 necessary because assignment can be chained, e.g.
\code{a = b =
2}.
220 Each
\class{Node
} in the list will be one of the following classes:
221 \class{AssAttr
},
\class{AssList
},
\class{AssName
}, or
224 Each target assignment node will describe the kind of object being
225 assigned to:
\class{AssName
} for a simple name, e.g.
\code{a =
1}.
226 \class{AssAttr
} for an attribute assigned, e.g.
\code{a.x =
1}.
227 \class{AssList
} and
\class{AssTuple
} for list and tuple expansion
228 respectively, e.g.
\code{a, b, c = a_tuple
}.
230 The target assignment nodes also have a
\member{flags
} attribute that
231 indicates whether the node is being used for assignment or in a delete
232 statement. The
\class{AssName
} is also used to represent a delete
233 statement, e.g.
\class{del x
}.
235 When an expression contains several attribute references, an
236 assignment or delete statement will contain only one
\class{AssAttr
}
237 node -- for the final attribute reference. The other attribute
238 references will be represented as
\class{Getattr
} nodes in the
239 \member{expr
} attribute of the
\class{AssAttr
} instance.
241 \subsection{Examples
}
243 This section shows several simple examples of ASTs for Python source
244 code. The examples demonstrate how to use the
\function{parse()
}
245 function, what the repr of an AST looks like, and how to access
246 attributes of an AST node.
248 The first module defines a single function. Assume it is stored in
249 \file{/tmp/doublelib.py
}.
252 """This is an example module.
254 This is the docstring.
258 "Return twice the argument"
262 In the interactive interpreter session below, I have reformatted the
263 long AST reprs for readability. The AST reprs use unqualified class
264 names. If you want to create an instance from a repr, you must import
265 the class names from the
\module{compiler.ast
} module.
269 >>> mod = compiler.parseFile("/tmp/doublelib.py")
271 Module('This is an example module.
\n\nThis is the docstring.
\n',
272 Stmt(
[Function('double',
['x'
],
[],
0, 'Return twice the argument',
273 Stmt(
[Return(Mul((Name('x'), Const(
2))))
]))
]))
274 >>> from compiler.ast import *
275 >>> Module('This is an example module.
\n\nThis is the docstring.
\n',
276 ... Stmt(
[Function('double',
['x'
],
[],
0, 'Return twice the argument',
277 ... Stmt(
[Return(Mul((Name('x'), Const(
2))))
]))
]))
278 Module('This is an example module.
\n\nThis is the docstring.
\n',
279 Stmt(
[Function('double',
['x'
],
[],
0, 'Return twice the argument',
280 Stmt(
[Return(Mul((Name('x'), Const(
2))))
]))
]))
282 'This is an example module.
\n\nThis is the docstring.
\n'
283 >>> for node in mod.node.nodes:
286 Function('double',
['x'
],
[],
0, 'Return twice the argument',
287 Stmt(
[Return(Mul((Name('x'), Const(
2))))
]))
288 >>> func = mod.node.nodes
[0]
290 Stmt(
[Return(Mul((Name('x'), Const(
2))))
])
293 \section{Using Visitors to Walk ASTs
}
295 \declaremodule{}{compiler.visitor
}
297 The visitor pattern is ... The
\refmodule{compiler
} package uses a
298 variant on the visitor pattern that takes advantage of Python's
299 introspection features to elminiate the need for much of the visitor's
302 The classes being visited do not need to be programmed to accept
303 visitors. The visitor need only define visit methods for classes it
304 is specifically interested in; a default visit method can handle the
307 XXX The magic
\method{visit()
} method for visitors.
309 \begin{funcdesc
}{walk
}{tree, visitor
\optional{, verbose
}}
312 \begin{classdesc
}{ASTVisitor
}{}
314 The
\class{ASTVisitor
} is responsible for walking over the tree in the
315 correct order. A walk begins with a call to
\method{preorder()
}. For
316 each node, it checks the
\var{visitor
} argument to
\method{preorder()
}
317 for a method named `visitNodeType,' where NodeType is the name of the
318 node's class, e.g. for a
\class{While
} node a
\method{visitWhile()
}
319 would be called. If the method exists, it is called with the node as
322 The visitor method for a particular node type can control how child
323 nodes are visited during the walk. The
\class{ASTVisitor
} modifies
324 the visitor argument by adding a visit method to the visitor; this
325 method can be used to visit a particular child node. If no visitor is
326 found for a particular node type, the
\method{default()
} method is
330 \class{ASTVisitor
} objects have the following methods:
332 XXX describe extra arguments
334 \begin{methoddesc
}{default
}{node
\optional{,
\moreargs}}
337 \begin{methoddesc
}{dispatch
}{node
\optional{,
\moreargs}}
340 \begin{methoddesc
}{preorder
}{tree, visitor
}
344 \section{Bytecode Generation
}
346 The code generator is a visitor that emits bytecodes. Each visit method
347 can call the
\method{emit()
} method to emit a new bytecode. The basic
348 code generator is specialized for modules, classes, and functions. An
349 assembler converts that emitted instructions to the low-level bytecode
350 format. It handles things like generator of constant lists of code
351 objects and calculation of jump offsets.