1 # Copyright (C) 2007-2010, Parrot Foundation
4 =head1 PDD 26: Compiler Tools - Abstract Syntax Tree
8 This PDD describes the node types and semantics of the Parrot
9 Abstract Syntax Tree representation.
17 The Parrot Abstract Syntax Tree (PAST) is a set of Parrot
18 classes useful for generating an abstract semantic representation
19 of a program written in a high level language such as Perl or
20 Python. Once a program has been translated into its equivalent
21 PAST representation, the Parrot Compiler Toolkit can be used to
22 generate executable PIR or bytecode from the abstract syntax
25 PAST is designed to provide the common operations and semantics
26 needed by many of the high level languages targeted by Parrot,
27 while also being extensible to support the unique needs of
34 C<PAST::Node> is the base class for all nodes in an abstract
35 syntax tree. Each C<PAST::Node> element has an array of
36 children nodes (which may be empty), as well as a number of
37 attributes and accessor methods for the node.
39 Every PAST node has C<name>, C<source>, and C<pos> attributes.
40 The C<name> attribute is the node's name, if any, while
41 C<source> and C<pos> are used to identify the location of the
42 node in the original source code.
46 =item init([child1, child2, ..., ] [attr1=>val1, attr2=>val2, ...])
48 Initialize a PAST node with the given children and attributes.
49 Adds each child to the node (using the C<push> method, below)
50 and calls the appropriate accessor method for each attribute.
52 =item new([child1, child2, ..., ] [attr1=>val1, attr2=>val2, ...])
54 Create a new PAST node with the same type as the invocant
55 and initialized with the given children and attributes.
56 Returns the newly created node.
60 Add C<child> to the end of the node's array of children.
64 Remove the last child from the node's array of children.
69 Add C<child> to the beginning of the node's array of children.
73 Remove the first child from the node's array of children.
78 Return a newly initialized C<Iterator> for the node's list
83 Set the invocant's C<source> and C<pos> attributes to be the
84 same as C<val>. If C<val> is another PAST node, then C<source>
85 and C<pos> are simply copied from that node. Otherwise, C<val>
86 is assumed to be a C<Match> object (e.g., from PGE) and the
87 source and position information are obtained from that.
91 Accessor method -- sets/returns the C<name> attribute of the node.
95 Accessor method -- for nodes that are being used as named arguments,
96 sets/returns the name to be associated with the argument.
100 Accessor method -- sets/returns the "flatten" flag on nodes being
103 =item attr(STR attrname, PMC value, INT has_value)
105 Helper method for writing accessors -- if C<has_value> is true
106 then set the node's value of C<attrname> to C<value>. Returns
107 the (resulting) value of C<attrname> for the invocant.
113 C<PAST::Stmts> is simply a C<PAST::Node> used to contain a
114 sequence of other PAST nodes to be evaluated. It has no
115 specific methods or attributes.
119 C<PAST::Val> nodes represent constant values in the abstract
120 syntax tree. The C<value> attribute represents the value of
127 Accessor method to get/set the constant value for this node.
129 =item returns([typename])
131 Get/set the type of value to be generated by this node.
132 If not specified, the type is taken from the type of
133 the value attribute given above.
139 C<PAST::Var> nodes represent variable-like items within the
140 abstract syntax tree. The C<name> attribute (inherited from
141 C<PAST::Node>) identifies the name of the variable as given
142 by the high level language program. The other attributes for
143 C<PAST::Var> nodes provide the details of how the variable
144 is defined and accessed.
150 A variable node's "scope" indicates how the variable is to be
151 defined or accessed within the program. The available scopes
158 Lexical variables are scoped to the C<PAST::Block> in which
159 they are declared, including any nested blocks. If the
160 node's C<isdecl> attribute is true, then this node defines
161 a new lexical variable within the current block. Otherwise,
162 the node refers to a lexical variable already declared in the current
167 Package variables represent global or namespace-managed
168 variables in the program. The node's C<namespace> attribute
169 may be used to explicitly identify the namespace of the variable,
170 otherwise it is assumed to be within the namespace of the
171 C<PAST::Block> containing the node.
175 Parameter variables are the formal arguments to subroutine and
176 methods, typically represented as C<PAST::Block> nodes. The
177 parameter's lexical name is given by the node's C<name> attribute.
178 Positional parameters are defined by the order in which they
179 appear in the C<PAST::Block> node, named parameters have values
180 for the C<named> attribute. Optional parameters are identified
181 via the C<viviself> attribute (see below) indicating how the parameter
182 is to be initialized if not supplied by the caller. Slurpy parameters
183 are indicated via the node's C<slurpy> attribute.
187 Keyed variables represent the elements of aggregates such as
188 arrays and hashes. Nodes representing keyed elements have two
189 children; the first child is the PAST representation of the
190 aggregate, and the second child is the PAST representation of
191 the key or index. The C<vivibase> attribute below may be used
192 to specify how to generate the aggregate if it doesn't already
197 Attribute variables represent object attributes (in some languages
198 they're called "member variables"). The attribute's name is given
199 by the node's C<name> attribute. Nodes representing attribute
200 variables have an optional child, representing the object to which
201 the attribute belongs. If this child is not present, the attribute
202 is assumed to belong to the current invocant, indicated by the
203 special variable C<self> (which is implicitly passed to all subs
204 that are flagged as a C<:method> or C<:vtable>).
208 Register variables are limited in scope to the C<PAST::Block> node
209 in which they are declared. This is different from the C<lexical>
210 scope, which I<includes> any nested C<PAST::Block> nodes. If the
211 node's C<isdecl> attribute is true, then this node defines a new
212 register variable within the current block. Register variables
213 are mapped to Parrot registers, and are useful for handling the
214 PIR pseudo-variable C<self> and storing intermediate results.
215 Names given to the C<name> attribute must conform to rules for
216 PIR identifiers. If no C<name> attribute is set, Parrot registers
217 are used. In this case, setting the C<isdecl> does not have any
223 If C<scope> is not explicitly provided in the node, then PAST will
224 look at the local symbol tables of any outer C<PAST::Block> nodes
225 to try to determine the scope of the named variable. If this still
226 does not result in a scope, then 'lexical' is assumed.
228 =item viviself([value])
230 Accessor method for the C<viviself> attribute, which specifies
231 how uninitialized variables are to be initialized when first
232 used. The C<viviself> attribute may be either a string or array
233 representing a type (e.g., C<Integer>), or it may be a PAST
234 representation for generating the variable's initial value.
236 =item vivibase([value])
238 Accessor method for the C<vivibase> attribute, which specifies
239 how the base portion of aggregate variables (see "keyed scope" above)
240 is to be created if it doesn't already exist. C<Vivibase>
241 may either be a string or array representing the type of the
242 newly created aggregate, or it may be a PAST representation for
243 generating the aggregate.
247 Accessor method for the C<isdecl> attribute. A true value
248 of C<isdecl> indicates that the variable given by this node
249 is being created at this point within the current lexical
250 scope. Otherwise, the node refers to a pre-existing variable
251 (possibly from an outer scope).
255 Accessor method for the C<lvalue> attribute, which indicates
256 whether this variable is being used in an lvalue context.
260 Accessor method for the C<slurpy> attribute of parameter variables.
261 A true value of C<slurpy> indicates that the parameter variable
262 given by this node is to be created as a slurpy parameter (consuming
263 all remaining arguments passed in). Named slurpy parameters are
264 indicated by having a non-empty C<named> attribute and a true value of
271 C<PAST::Op> nodes represent the operations in an abstract
272 syntax tree. The primary function of the node is given by
273 its C<pasttype> attribute, secondary functions may be indicated
274 by the node's C<name>, C<pirop>, or other attributes as given
279 =item pasttype([value])
281 Accessor method for the node's C<pasttype> attribute. The
282 C<pasttype> is the primary indicator of the type of operation
283 to be performed, the operands are typically given as the
284 children of the node. Defined values of C<pasttype> are:
290 A short-circuiting chain of operations. In a sequence of nodes
291 with pasttype 'chain', the right operand of a node serves as
292 the left operand of its parent. Each node is evaluated only
293 once, and the first false result short-circuits the chain.
294 In other words, C<< $x < $y < $z >> is true only if
295 $x < $y and $y < $z, but $y only gets evaluated once.
299 Copy the value of the node's second child into the variable
300 expression given by its first child.
304 Bind the variable given by the node's first child to the
305 value given by its second child.
309 The first, second, and third children represent the
310 "condition", "then", and "else" parts of a conditional
311 expression. The first child is evaluated; if the result
312 is true then the second child is evaluated and returned
313 as the result of the C<PAST::Op> node, otherwise the
314 third child is evaluated and returned as the result.
315 This implements the standard if-then-else logic needed
316 by most higher level languages, and can also be used
317 for implementing the ternary operator.
319 If the node is missing its second ("then") or third ("else")
320 child, then the result of the condition is used as the
321 corresponding result of the operation. This makes it easy
322 to implement the "short-circuit and" operator commonly used in many
323 high level languages. For example, the standard C<&&> operator
324 may be implemented using an "if" node, where the left operand is
325 the first (condition) child, the right operand is the
326 second (then) child, and the third child is left as null
329 It's also possible to build a "short-circuit or" (C<||>)
330 operator using this pasttype, by setting the left operand to
331 C<||> as the first child and the right operand as the I<third>
332 child (leaving the second child as null). However, it's probably
333 simpler to use the "unless" type as described below.
337 Same as C<if> above, except that the second child is evaluated
338 if the first child evaluates to false and the third child is
339 evaluated if the first child evaluates to true.
341 The C<unless> type can be used to implement "short-circuit or"
342 semantics; simply set the first child to the left operand and
343 the second child to the right operand, leaving the third
344 child empty or uninitialized. If the first child evaluates to
345 true it is returned as the result of the operation, otherwise the
346 second child is evaluated and returned as the result.
350 Evaluate the first child (condition), if the result is true
351 then evaluate the second child (body) and repeat.
355 Evaluate the first child (condition), if the result is false then evaluate
356 the second child (body) and repeat.
358 =item repeat_while, repeat_until
360 Same as C<while> and C<until> above, except the second child is evaluated
361 before the conditional first child is evaluated for continuation of
366 Iterate over the first child in groups of elements given by
367 C<arity> (default 1). For each iteration, invoke the second
368 child, passing the elements as parameters.
372 Call the subroutine given by the C<name> attribute, passing
373 the results of any child nodes as arguments. If the node
374 has no C<name> attribute, then the first child is assumed
375 to evaluate to a callable subroutine, and any remaining
376 children are used as arguments.
380 Invoke the method given by C<name> on the first child,
381 passing the results of any child nodes as arguments. If the
382 node has no C<name> attribute, then the first child is
383 evaluated as a method to be called, the second child is
384 the invocant, and any remaining children are arguments to
389 Execute the PIR opcode given by the C<pirop> attribute. See the
390 C<pirop> method below for details. This is also the default
391 behavior when a C<pirop> attribute is set and C<pasttype> is
396 Execute the sequence of PIR statements given by the
397 node's C<inline> attribute (a string). See the C<inline>
398 method below for details.
402 Generate a return exception, using the first child (if any) as
407 Evaluates the first child, if any exceptions occur then they
408 are handled by the code given by the second child (if any).
412 Evaluate the child nodes looking for exactly one true result
413 to be returned. If two true results are encountered, the
414 operation immediately short-circuits and returns false.
415 Otherwise, after all children have been evaluated the result
416 of any true child is used as the result of the operation, or
417 the result of the last child if none of the children evaluated
422 =item pirop([opcode])
424 Get/set the PIR opcode to be executed for this node. Internally
425 the PAST and POST (Parrot Opcode Syntax Tree) implementations
426 understand the register types available for common PIR operations
427 and will handle any needed register or constant conversion of
428 operands automatically. Note that except for the C<assign>
429 opcode, any destination is typically generated automatically
430 and should not be explicitly given as a child operand to the node.
431 The table of PIR opcodes that PAST "knows" about is given in
432 F<compilers/pct/src/PAST/Compiler.pir> .
436 Get/set whether this node is an lvalue, or treats its first
437 child as an lvalue (e.g., for assignment).
439 =item inline([STRING code])
441 Get/set the code to be used for inline PIR when C<pasttype> is
442 set to "inline". The C<code> argument is PIR text to be inserted in
443 the final generated code sequence. Sequences of "%0", "%1",
444 "%2", ... "%9" in C<code> are replaced with the evaluated
445 results of the first, second, third, ..., tenth children nodes.
446 (If you need more than ten arguments to your inline PIR, consider
447 making it a subroutine call instead.)
449 The register to hold the result of the inline PIR operation is
450 given by "%r", "%t", or "%u" in the C<code> string:
452 %r - Generate a unique PMC register for the result.
453 %t - Generate a unique PMC register for the result,
454 and initialize it with an object of type C<returns>
455 {{Pm: or possibly C<viviself> }}
456 before the execution of the inline PIR.
457 %u - Re-use the first child's PMC (%0) if it's a temporary
458 result, otherwise same as %t above.
460 If none of %r, %t, or %u appear in C<code>, then the first
461 child's (%0) is used as the result of this operation.
467 C<PAST::Block> nodes represent lexical scopes within an abstract
468 syntax tree, and roughly translate to individual Parrot subroutines.
469 A C<PAST::Block> node nested within another C<PAST::Block> node
470 acts like a nested lexical scope.
472 If the block has a C<name> attribute, that becomes the name of
473 the resulting Parrot sub. Otherwise, a unique name is automatically
474 generated for the block.
476 Each PAST::Block node can maintain its own local symbol table, see
477 the C<symbol> method below.
481 =item blocktype([type])
483 Get/set the type of the block to C<type>. The currently
484 understood values are 'declaration', 'immediate', and 'method'.
485 'Declaration' indicates that a block is simply being defined at
486 this point, the result of the node is a reference to the block.
487 A C<blocktype> of 'immediate' indicates a block that is to be immediately
488 executed when it is evaluated in the AST, and the result of the
489 last operation is used as the return value for the block.
491 =item closure([value])
493 Get/set the closure flag for the block to C<value>. If the closure
494 flag on a (non-immediate) block is true, then the node returns a
495 reference to a clone of the block that has captured the current
498 =item namespace([value])
500 Get/set the namespace for this particular block (and any nested
501 blocks that do not explicitly provide a namespace). C<Value>
502 may either be a simple string or an array of strings representing
507 Get/set the HLL namespace for this block (and any nested blocks
508 that do not explicitly provide a C<hll>).
510 =item symbol(name, [attr1 => val1, attr2 => val2, ...])
512 Each PAST::Block node can use the C<symbol> method to maintain
513 its own compile-time notion of a local symbol table. Each value
514 of C<name> is given its own hash to hold information about that
515 symbol for the block (i.e., the symbol table acts like a hash of
516 hashes indexed by symbol name). If the C<symbol> method is called
517 with named arguments, then the method sets the entries in the hash
518 corresponding to C<name> in the block's symbol table. If C<symbol>
519 is called with just a single C<name> argument, then the current hash
520 for local symbol C<name> is returned.
522 HLLs are free to place any values in the symbol hashes that
523 may be useful. However, the C<scope> entry for a symbol is
524 typically used to provide the C<scope> attribute for any nested
525 C<PAST::Var> nodes that do not provide an explicit C<scope> attribute.
527 =item symbol_defaults([attr1 => val1, attr2 => val2, ...])
529 Sets default attributes for symbols that aren't explicitly
530 given by the C<symbol> method above. For example, an abstract
531 syntax tree can use this method on a top-level C<PAST::Block>
532 to specify the C<scope> attribute to be used for all
533 variable nodes that don't otherwise provide one.
537 Get/set the unique subid to be used for this block. If
538 no subid is explicitly set, a unique subid is randomly
539 generated for the block.
541 =item pirflags([value])
543 Get/set any PIR flags to be used when generating the block.
545 =item compiler([compiler_name])
547 Specify that the children nodes of this block are to be compiled
548 using C<compiler_name> instead of being treated as standard PAST
549 nodes. This is useful when a program may contain components of
550 code written in other HLLs. For example, the C<perl6> compiler
551 uses this feature to use PGE to compile pre-parsed Perl 6 regular
552 expressions, rather than trying to represent the semantics of
553 those expressions in PAST itself.
555 When code is generated from a C<PAST::Block> node having a
556 C<compiler> attribute, the compiler is invoked with C<name>
557 and C<outer> arguments so that any generated subs can have
558 names and lexical scoping appropriate to the block (it's the
559 responsibility of the external compiler to support this).
563 =head3 PAST::CompUnit
565 C<PAST::CompUnit> nodes are used for the abstract representation
566 of compilation units. Specific attributes and semantics are
567 yet to be determined.
579 vim: expandtab shiftwidth=4: