docs/pdds/pdd26_ast.pod

   1 # Copyright (C) 2007-2010, Parrot Foundation
   2 # $Id$
   3
   4 =head1 PDD 26: Compiler Tools - Abstract Syntax Tree
   5
   6 =head2 Abstract
   7
   8 This PDD describes the node types and semantics of the Parrot
   9 Abstract Syntax Tree representation.
  10
  11 =head2 Version
  12
  13 $Revision$
  14
  15 =head2 Description
  16
  17 The Parrot Abstract Syntax Tree (PAST) is a set of Parrot
  18 classes useful for generating an abstract semantic representation
  19 of a program written in a high level language such as Perl or
  20 Python.  Once a program has been translated into its equivalent
  21 PAST representation, the Parrot Compiler Toolkit can be used to
  22 generate executable PIR or bytecode from the abstract syntax
  23 tree representation.
  24
  25 PAST is designed to provide the common operations and semantics
  26 needed by many of the high level languages targeted by Parrot,
  27 while also being extensible to support the unique needs of
  28 specific languages.
  29
  30 =head2 Implementation
  31
  32 =head3 PAST::Node
  33
  34 C<PAST::Node> is the base class for all nodes in an abstract
  35 syntax tree.  Each C<PAST::Node> element has an array of
  36 children nodes (which may be empty), as well as a number of
  37 attributes and accessor methods for the node.
  38
  39 Every PAST node has C<name>, C<source>, and C<pos> attributes.
  40 The C<name> attribute is the node's name, if any, while
  41 C<source> and C<pos> are used to identify the location of the
  42 node in the original source code.
  43
  44 =over 4
  45
  46 =item init([child1, child2, ..., ] [attr1=>val1, attr2=>val2, ...])
  47
  48 Initialize a PAST node with the given children and attributes.
  49 Adds each child to the node (using the C<push> method, below)
  50 and calls the appropriate accessor method for each attribute.
  51
  52 =item new([child1, child2, ..., ] [attr1=>val1, attr2=>val2, ...])
  53
  54 Create a new PAST node with the same type as the invocant
  55 and initialized with the given children and attributes.
  56 Returns the newly created node.
  57
  58 =item push(child)
  59
  60 Add C<child> to the end of the node's array of children.
  61
  62 =item pop()
  63
  64 Remove the last child from the node's array of children.
  65 Returns the child.
  66
  67 =item unshift(child)
  68
  69 Add C<child> to the beginning of the node's array of children.
  70
  71 =item shift()
  72
  73 Remove the first child from the node's array of children.
  74 Returns the child.
  75
  76 =item iterator( )
  77
  78 Return a newly initialized C<Iterator> for the node's list
  79 of children.
  80
  81 =item node(val)
  82
  83 Set the invocant's C<source> and C<pos> attributes to be the
  84 same as C<val>.  If C<val> is another PAST node, then C<source>
  85 and C<pos> are simply copied from that node.  Otherwise, C<val>
  86 is assumed to be a C<Match> object (e.g., from PGE) and the
  87 source and position information are obtained from that.
  88
  89 =item name([value])
  90
  91 Accessor method -- sets/returns the C<name> attribute of the node.
  92
  93 =item named([value])
  94
  95 Accessor method -- for nodes that are being used as named arguments,
  96 sets/returns the name to be associated with the argument.
  97
  98 =item flat([value])
  99
 100 Accessor method -- sets/returns the "flatten" flag on nodes being
 101 used as arguments.
 102
 103 =item attr(STR attrname, PMC value, INT has_value)
 104
 105 Helper method for writing accessors -- if C<has_value> is true
 106 then set the node's value of C<attrname> to C<value>.  Returns
 107 the (resulting) value of C<attrname> for the invocant.
 108
 109 =back
 110
 111 =head3 PAST::Stmts
 112
 113 C<PAST::Stmts> is simply a C<PAST::Node> used to contain a
 114 sequence of other PAST nodes to be evaluated.  It has no
 115 specific methods or attributes.
 116
 117 =head3 PAST::Val
 118
 119 C<PAST::Val> nodes represent constant values in the abstract
 120 syntax tree.  The C<value> attribute represents the value of
 121 the node itself.
 122
 123 =over 4
 124
 125 =item value([value])
 126
 127 Accessor method to get/set the constant value for this node.
 128
 129 =item returns([typename])
 130
 131 Get/set the type of value to be generated by this node.
 132 If not specified, the type is taken from the type of
 133 the value attribute given above.
 134
 135 =back
 136
 137 =head3 PAST::Var
 138
 139 C<PAST::Var> nodes represent variable-like items within the
 140 abstract syntax tree.  The C<name> attribute (inherited from
 141 C<PAST::Node>) identifies the name of the variable as given
 142 by the high level language program.  The other attributes for
 143 C<PAST::Var> nodes provide the details of how the variable
 144 is defined and accessed.
 145
 146 =over 4
 147
 148 =item scope([value])
 149
 150 A variable node's "scope" indicates how the variable is to be
 151 defined or accessed within the program.  The available scopes
 152 include:
 153
 154 =over 4
 155
 156 =item "lexical"
 157
 158 Lexical variables are scoped to the C<PAST::Block> in which
 159 they are declared, including any nested blocks.  If the
 160 node's C<isdecl> attribute is true, then this node defines
 161 a new lexical variable within the current block.  Otherwise,
 162 the node refers to a lexical variable already declared in the current
 163 or outer block.
 164
 165 =item "package"
 166
 167 Package variables represent global or namespace-managed
 168 variables in the program.  The node's C<namespace> attribute
 169 may be used to explicitly identify the namespace of the variable,
 170 otherwise it is assumed to be within the namespace of the
 171 C<PAST::Block> containing the node.
 172
 173 =item "parameter"
 174
 175 Parameter variables are the formal arguments to subroutine and
 176 methods, typically represented as C<PAST::Block> nodes.  The
 177 parameter's lexical name is given by the node's C<name> attribute.
 178 Positional parameters are defined by the order in which they
 179 appear in the C<PAST::Block> node, named parameters have values
 180 for the C<named> attribute.  Optional parameters are identified
 181 via the C<viviself> attribute (see below) indicating how the parameter
 182 is to be initialized if not supplied by the caller.  Slurpy parameters
 183 are indicated via the node's C<slurpy> attribute.
 184
 185 =item "keyed"
 186
 187 Keyed variables represent the elements of aggregates such as
 188 arrays and hashes.  Nodes representing keyed elements have two
 189 children; the first child is the PAST representation of the
 190 aggregate, and the second child is the PAST representation of
 191 the key or index.  The C<vivibase> attribute below may be used
 192 to specify how to generate the aggregate if it doesn't already
 193 exist.
 194
 195 =item "attribute"
 196
 197 Attribute variables represent object attributes (in some languages
 198 they're called "member variables"). The attribute's name is given
 199 by the node's C<name> attribute. Nodes representing attribute
 200 variables have an optional child, representing the object to which
 201 the attribute belongs. If this child is not present, the attribute
 202 is assumed to belong to the current invocant, indicated by the
 203 special variable C<self> (which is implicitly passed to all subs
 204 that are flagged as a C<:method> or C<:vtable>).
 205
 206 =item "register"
 207
 208 Register variables are limited in scope to the C<PAST::Block> node
 209 in which they are declared. This is different from the C<lexical>
 210 scope, which I<includes> any nested C<PAST::Block> nodes. If the
 211 node's C<isdecl> attribute is true, then this node defines a new
 212 register variable within the current block. Register variables
 213 are mapped to Parrot registers, and are useful for handling the
 214 PIR pseudo-variable C<self> and storing intermediate results.
 215 Names given to the C<name> attribute must conform to rules for
 216 PIR identifiers. If no C<name> attribute is set, Parrot registers
 217 are used. In this case, setting the C<isdecl> does not have any
 218 effect.
 219
 220
 221 =back
 222
 223 If C<scope> is not explicitly provided in the node, then PAST will
 224 look at the local symbol tables of any outer C<PAST::Block> nodes
 225 to try to determine the scope of the named variable.  If this still
 226 does not result in a scope, then 'lexical' is assumed.
 227
 228 =item viviself([value])
 229
 230 Accessor method for the C<viviself> attribute, which specifies
 231 how uninitialized variables are to be initialized when first
 232 used.  The C<viviself> attribute may be either a string or array
 233 representing a type (e.g., C<Integer>), or it may be a PAST
 234 representation for generating the variable's initial value.
 235
 236 =item vivibase([value])
 237
 238 Accessor method for the C<vivibase> attribute, which specifies
 239 how the base portion of aggregate variables (see "keyed scope" above)
 240 is to be created if it doesn't already exist.  C<Vivibase>
 241 may either be a string or array representing the type of the
 242 newly created aggregate, or it may be a PAST representation for
 243 generating the aggregate.
 244
 245 =item isdecl([flag])
 246
 247 Accessor method for the C<isdecl> attribute.  A true value
 248 of C<isdecl> indicates that the variable given by this node
 249 is being created at this point within the current lexical
 250 scope.  Otherwise, the node refers to a pre-existing variable
 251 (possibly from an outer scope).
 252
 253 =item lvalue([flag])
 254
 255 Accessor method for the C<lvalue> attribute, which indicates
 256 whether this variable is being used in an lvalue context.
 257
 258 =item slurpy([flag])
 259
 260 Accessor method for the C<slurpy> attribute of parameter variables.
 261 A true value of C<slurpy> indicates that the parameter variable
 262 given by this node is to be created as a slurpy parameter (consuming
 263 all remaining arguments passed in).  Named slurpy parameters are
 264 indicated by having a non-empty C<named> attribute and a true value of
 265 C<slurpy>.
 266
 267 =back
 268
 269 =head3 PAST::Op
 270
 271 C<PAST::Op> nodes represent the operations in an abstract
 272 syntax tree.  The primary function of the node is given by
 273 its C<pasttype> attribute, secondary functions may be indicated
 274 by the node's C<name>, C<pirop>, or other attributes as given
 275 below.
 276
 277 =over 4
 278
 279 =item pasttype([value])
 280
 281 Accessor method for the node's C<pasttype> attribute.  The
 282 C<pasttype> is the primary indicator of the type of operation
 283 to be performed, the operands are typically given as the
 284 children of the node.  Defined values of C<pasttype> are:
 285
 286 =over 4
 287
 288 =item chain
 289
 290 A short-circuiting chain of operations.  In a sequence of nodes
 291 with pasttype 'chain', the right operand of a node serves as
 292 the left operand of its parent.  Each node is evaluated only
 293 once, and the first false result short-circuits the chain.
 294 In other words,  C<<  $x < $y < $z >>  is true only if
 295 $x < $y and $y < $z, but $y only gets evaluated once.
 296
 297 =item copy
 298
 299 Copy the value of the node's second child into the variable
 300 expression given by its first child.
 301
 302 =item bind
 303
 304 Bind the variable given by the node's first child to the
 305 value given by its second child.
 306
 307 =item if
 308
 309 The first, second, and third children represent the
 310 "condition", "then", and "else" parts of a conditional
 311 expression.  The first child is evaluated; if the result
 312 is true then the second child is evaluated and returned
 313 as the result of the C<PAST::Op> node, otherwise the
 314 third child is evaluated and returned as the result.
 315 This implements the standard if-then-else logic needed
 316 by most higher level languages, and can also be used
 317 for implementing the ternary operator.
 318
 319 If the node is missing its second ("then") or third ("else")
 320 child, then the result of the condition is used as the
 321 corresponding result of the operation.  This makes it easy
 322 to implement the "short-circuit and" operator commonly used in many
 323 high level languages.  For example, the standard C<&&> operator
 324 may be implemented using an "if" node, where the left operand is
 325 the first (condition) child, the right operand is the
 326 second (then) child, and the third child is left as null
 327 or uninitialized.
 328
 329 It's also possible to build a "short-circuit or" (C<||>)
 330 operator using this pasttype, by setting the left operand to
 331 C<||> as the first child and the right operand as the I<third>
 332 child (leaving the second child as null).  However, it's probably
 333 simpler to use the "unless" type as described below.
 334
 335 =item unless
 336
 337 Same as C<if> above, except that the second child is evaluated
 338 if the first child evaluates to false and the third child is
 339 evaluated if the first child evaluates to true.
 340
 341 The C<unless> type can be used to implement "short-circuit or"
 342 semantics; simply set the first child to the left operand and
 343 the second child to the right operand, leaving the third
 344 child empty or uninitialized.  If the first child evaluates to
 345 true it is returned as the result of the operation, otherwise the
 346 second child is evaluated and returned as the result.
 347
 348 =item while
 349
 350 Evaluate the first child (condition), if the result is true
 351 then evaluate the second child (body) and repeat.
 352
 353 =item until
 354
 355 Evaluate the first child (condition), if the result is false then evaluate
 356 the second child (body) and repeat.
 357
 358 =item repeat_while, repeat_until
 359
 360 Same as C<while> and C<until> above, except the second child is evaluated
 361 before the conditional first child is evaluated for continuation of
 362 the loop.
 363
 364 =item for
 365
 366 Iterate over the first child in groups of elements given by
 367 C<arity> (default 1).  For each iteration, invoke the second
 368 child, passing the elements as parameters.
 369
 370 =item call
 371
 372 Call the subroutine given by the C<name> attribute, passing
 373 the results of any child nodes as arguments.  If the node
 374 has no C<name> attribute, then the first child is assumed
 375 to evaluate to a callable subroutine, and any remaining
 376 children are used as arguments.
 377
 378 =item callmethod
 379
 380 Invoke the method given by C<name> on the first child,
 381 passing the results of any child nodes as arguments.  If the
 382 node has no C<name> attribute, then the first child is
 383 evaluated as a method to be called, the second child is
 384 the invocant, and any remaining children are arguments to
 385 the method call.
 386
 387 =item pirop
 388
 389 Execute the PIR opcode given by the C<pirop> attribute.  See the
 390 C<pirop> method below for details.  This is also the default
 391 behavior when a C<pirop> attribute is set and C<pasttype> is
 392 not.
 393
 394 =item inline
 395
 396 Execute the sequence of PIR statements given by the
 397 node's C<inline> attribute (a string).  See the C<inline>
 398 method below for details.
 399
 400 =item return
 401
 402 Generate a return exception, using the first child (if any) as
 403 a return value.
 404
 405 =item try
 406
 407 Evaluates the first child, if any exceptions occur then they
 408 are handled by the code given by the second child (if any).
 409
 410 =item xor
 411
 412 Evaluate the child nodes looking for exactly one true result
 413 to be returned.  If two true results are encountered, the
 414 operation immediately short-circuits and returns false.
 415 Otherwise, after all children have been evaluated the result
 416 of any true child is used as the result of the operation, or
 417 the result of the last child if none of the children evaluated
 418 to true.
 419
 420 =back
 421
 422 =item pirop([opcode])
 423
 424 Get/set the PIR opcode to be executed for this node.  Internally
 425 the PAST and POST (Parrot Opcode Syntax Tree) implementations
 426 understand the register types available for common PIR operations
 427 and will handle any needed register or constant conversion of
 428 operands automatically.  Note that except for the C<assign>
 429 opcode, any destination is typically generated automatically
 430 and should not be explicitly given as a child operand to the node.
 431 The table of PIR opcodes that PAST "knows" about is given in
 432 F<compilers/pct/src/PAST/Compiler.pir> .
 433
 434 =item lvalue([flag])
 435
 436 Get/set whether this node is an lvalue, or treats its first
 437 child as an lvalue (e.g., for assignment).
 438
 439 =item inline([STRING code])
 440
 441 Get/set the code to be used for inline PIR when C<pasttype> is
 442 set to "inline".  The C<code> argument is PIR text to be inserted in
 443 the final generated code sequence.  Sequences of "%0", "%1",
 444 "%2", ... "%9" in C<code> are replaced with the evaluated
 445 results of the first, second, third, ..., tenth children nodes.
 446 (If you need more than ten arguments to your inline PIR, consider
 447 making it a subroutine call instead.)
 448
 449 The register to hold the result of the inline PIR operation is
 450 given by "%r", "%t", or "%u" in the C<code> string:
 451
 452   %r   - Generate a unique PMC register for the result.
 453   %t   - Generate a unique PMC register for the result,
 454          and initialize it with an object of type C<returns>
 455          {{Pm: or possibly C<viviself> }}
 456          before the execution of the inline PIR.
 457   %u   - Re-use the first child's PMC (%0) if it's a temporary
 458          result, otherwise same as %t above.
 459
 460 If none of %r, %t, or %u appear in C<code>, then the first
 461 child's (%0) is used as the result of this operation.
 462
 463 =back
 464
 465 =head3 PAST::Block
 466
 467 C<PAST::Block> nodes represent lexical scopes within an abstract
 468 syntax tree, and roughly translate to individual Parrot subroutines.
 469 A C<PAST::Block> node nested within another C<PAST::Block> node
 470 acts like a nested lexical scope.
 471
 472 If the block has a C<name> attribute, that becomes the name of
 473 the resulting Parrot sub.  Otherwise, a unique name is automatically
 474 generated for the block.
 475
 476 Each PAST::Block node can maintain its own local symbol table, see
 477 the C<symbol> method below.
 478
 479 =over 4
 480
 481 =item blocktype([type])
 482
 483 Get/set the type of the block to C<type>.  The currently
 484 understood values are 'declaration', 'immediate',  and 'method'.
 485 'Declaration' indicates that a block is simply being defined at
 486 this point, the result of the node is a reference to the block.
 487 A C<blocktype> of 'immediate' indicates a block that is to be immediately
 488 executed when it is evaluated in the AST, and the result of the
 489 last operation is used as the return value for the block.
 490
 491 =item closure([value])
 492
 493 Get/set the closure flag for the block to C<value>.  If the closure
 494 flag on a (non-immediate) block is true, then the node returns a
 495 reference to a clone of the block that has captured the current
 496 lexical context.
 497
 498 =item namespace([value])
 499
 500 Get/set the namespace for this particular block (and any nested
 501 blocks that do not explicitly provide a namespace).  C<Value>
 502 may either be a simple string or an array of strings representing
 503 a nested namespace.
 504
 505 =item hll([value])
 506
 507 Get/set the HLL namespace for this block (and any nested blocks
 508 that do not explicitly provide a C<hll>).
 509
 510 =item symbol(name, [attr1 => val1, attr2 => val2, ...])
 511
 512 Each PAST::Block node can use the C<symbol> method to maintain
 513 its own compile-time notion of a local symbol table.  Each value
 514 of C<name> is given its own hash to hold information about that
 515 symbol for the block (i.e., the symbol table acts like a hash of
 516 hashes indexed by symbol name).  If the C<symbol> method is called
 517 with named arguments, then the method sets the entries in the hash
 518 corresponding to C<name> in the block's symbol table.  If C<symbol>
 519 is called with just a single C<name> argument, then the current hash
 520 for local symbol C<name> is returned.
 521
 522 HLLs are free to place any values in the symbol hashes that
 523 may be useful.  However, the C<scope> entry for a symbol is
 524 typically used to provide the C<scope> attribute for any nested
 525 C<PAST::Var> nodes that do not provide an explicit C<scope> attribute.
 526
 527 =item symbol_defaults([attr1 => val1, attr2 => val2, ...])
 528
 529 Sets default attributes for symbols that aren't explicitly
 530 given by the C<symbol> method above.  For example, an abstract
 531 syntax tree can use this method on a top-level C<PAST::Block>
 532 to specify the C<scope> attribute to be used for all
 533 variable nodes that don't otherwise provide one.
 534
 535 =item subid([value])
 536
 537 Get/set the unique subid to be used for this block.  If
 538 no subid is explicitly set, a unique subid is randomly
 539 generated for the block.
 540
 541 =item pirflags([value])
 542
 543 Get/set any PIR flags to be used when generating the block.
 544
 545 =item compiler([compiler_name])
 546
 547 Specify that the children nodes of this block are to be compiled
 548 using C<compiler_name> instead of being treated as standard PAST
 549 nodes.  This is useful when a program may contain components of
 550 code written in other HLLs.  For example, the C<perl6> compiler
 551 uses this feature to use PGE to compile pre-parsed Perl 6 regular
 552 expressions, rather than trying to represent the semantics of
 553 those expressions in PAST itself.
 554
 555 When code is generated from a C<PAST::Block> node having a
 556 C<compiler> attribute, the compiler is invoked with C<name>
 557 and C<outer> arguments so that any generated subs can have
 558 names and lexical scoping appropriate to the block (it's the
 559 responsibility of the external compiler to support this).
 560
 561 =back
 562
 563 =head3 PAST::CompUnit
 564
 565 C<PAST::CompUnit> nodes are used for the abstract representation
 566 of compilation units.  Specific attributes and semantics are
 567 yet to be determined.
 568
 569 =head2 References
 570
 571 None.
 572
 573 =cut
 574
 575 __END__
 576 Local Variables:
 577   fill-column:78
 578 End:
 579 vim: expandtab shiftwidth=4: