src/backend/optimizer/README

   1 src/backend/optimizer/README
   2
   3 Optimizer
   4 =========
   5
   6 These directories take the Query structure returned by the parser, and
   7 generate a plan used by the executor.  The /plan directory generates the
   8 actual output plan, the /path code generates all possible ways to join the
   9 tables, and /prep handles various preprocessing steps for special cases.
  10 /util is utility stuff.  /geqo is the separate "genetic optimization" planner
  11 --- it does a semi-random search through the join tree space, rather than
  12 exhaustively considering all possible join trees.  (But each join considered
  13 by /geqo is given to /path to create paths for, so we consider all possible
  14 implementation paths for each specific join pair even in GEQO mode.)
  15
  16
  17 Paths and Join Pairs
  18 --------------------
  19
  20 During the planning/optimizing process, we build "Path" trees representing
  21 the different ways of doing a query.  We select the cheapest Path that
  22 generates the desired relation and turn it into a Plan to pass to the
  23 executor.  (There is pretty nearly a one-to-one correspondence between the
  24 Path and Plan trees, but Path nodes omit info that won't be needed during
  25 planning, and include info needed for planning that won't be needed by the
  26 executor.)
  27
  28 The optimizer builds a RelOptInfo structure for each base relation used in
  29 the query.  Base rels are either primitive tables, or subquery subselects
  30 that are planned via a separate recursive invocation of the planner.  A
  31 RelOptInfo is also built for each join relation that is considered during
  32 planning.  A join rel is simply a combination of base rels.  There is only
  33 one join RelOptInfo for any given set of baserels --- for example, the join
  34 {A B C} is represented by the same RelOptInfo no matter whether we build it
  35 by joining A and B first and then adding C, or joining B and C first and
  36 then adding A, etc.  These different means of building the joinrel are
  37 represented as Paths.  For each RelOptInfo we build a list of Paths that
  38 represent plausible ways to implement the scan or join of that relation.
  39 Once we've considered all the plausible Paths for a rel, we select the one
  40 that is cheapest according to the planner's cost estimates.  The final plan
  41 is derived from the cheapest Path for the RelOptInfo that includes all the
  42 base rels of the query.
  43
  44 Possible Paths for a primitive table relation include plain old sequential
  45 scan, plus index scans for any indexes that exist on the table, plus bitmap
  46 index scans using one or more indexes.  Specialized RTE types, such as
  47 function RTEs, may have only one possible Path.
  48
  49 Joins always occur using two RelOptInfos.  One is outer, the other inner.
  50 Outers drive lookups of values in the inner.  In a nested loop, lookups of
  51 values in the inner occur by scanning the inner path once per outer tuple
  52 to find each matching inner row.  In a mergejoin, inner and outer rows are
  53 ordered, and are accessed in order, so only one scan is required to perform
  54 the entire join: both inner and outer paths are scanned in-sync.  (There's
  55 not a lot of difference between inner and outer in a mergejoin...)  In a
  56 hashjoin, the inner is scanned first and all its rows are entered in a
  57 hashtable, then the outer is scanned and for each row we lookup the join
  58 key in the hashtable.
  59
  60 A Path for a join relation is actually a tree structure, with the topmost
  61 Path node representing the last-applied join method.  It has left and right
  62 subpaths that represent the scan or join methods used for the two input
  63 relations.
  64
  65
  66 Join Tree Construction
  67 ----------------------
  68
  69 The optimizer generates optimal query plans by doing a more-or-less
  70 exhaustive search through the ways of executing the query.  The best Path
  71 tree is found by a recursive process:
  72
  73 1) Take each base relation in the query, and make a RelOptInfo structure
  74 for it.  Find each potentially useful way of accessing the relation,
  75 including sequential and index scans, and make Paths representing those
  76 ways.  All the Paths made for a given relation are placed in its
  77 RelOptInfo.pathlist.  (Actually, we discard Paths that are obviously
  78 inferior alternatives before they ever get into the pathlist --- what
  79 ends up in the pathlist is the cheapest way of generating each potentially
  80 useful sort ordering and parameterization of the relation.)  Also create a
  81 RelOptInfo.joininfo list including all the join clauses that involve this
  82 relation.  For example, the WHERE clause "tab1.col1 = tab2.col1" generates
  83 entries in both tab1 and tab2's joininfo lists.
  84
  85 If we have only a single base relation in the query, we are done.
  86 Otherwise we have to figure out how to join the base relations into a
  87 single join relation.
  88
  89 2) Normally, any explicit JOIN clauses are "flattened" so that we just
  90 have a list of relations to join.  However, FULL OUTER JOIN clauses are
  91 never flattened, and other kinds of JOIN might not be either, if the
  92 flattening process is stopped by join_collapse_limit or from_collapse_limit
  93 restrictions.  Therefore, we end up with a planning problem that contains
  94 lists of relations to be joined in any order, where any individual item
  95 might be a sub-list that has to be joined together before we can consider
  96 joining it to its siblings.  We process these sub-problems recursively,
  97 bottom up.  Note that the join list structure constrains the possible join
  98 orders, but it doesn't constrain the join implementation method at each
  99 join (nestloop, merge, hash), nor does it say which rel is considered outer
 100 or inner at each join.  We consider all these possibilities in building
 101 Paths. We generate a Path for each feasible join method, and select the
 102 cheapest Path.
 103
 104 For each planning problem, therefore, we will have a list of relations
 105 that are either base rels or joinrels constructed per sub-join-lists.
 106 We can join these rels together in any order the planner sees fit.
 107 The standard (non-GEQO) planner does this as follows:
 108
 109 Consider joining each RelOptInfo to each other RelOptInfo for which there
 110 is a usable joinclause, and generate a Path for each possible join method
 111 for each such pair.  (If we have a RelOptInfo with no join clauses, we have
 112 no choice but to generate a clauseless Cartesian-product join; so we
 113 consider joining that rel to each other available rel.  But in the presence
 114 of join clauses we will only consider joins that use available join
 115 clauses.  Note that join-order restrictions induced by outer joins and
 116 IN/EXISTS clauses are also checked, to ensure that we find a workable join
 117 order in cases where those restrictions force a clauseless join to be done.)
 118
 119 If we only had two relations in the list, we are done: we just pick
 120 the cheapest path for the join RelOptInfo.  If we had more than two, we now
 121 need to consider ways of joining join RelOptInfos to each other to make
 122 join RelOptInfos that represent more than two list items.
 123
 124 The join tree is constructed using a "dynamic programming" algorithm:
 125 in the first pass (already described) we consider ways to create join rels
 126 representing exactly two list items.  The second pass considers ways
 127 to make join rels that represent exactly three list items; the next pass,
 128 four items, etc.  The last pass considers how to make the final join
 129 relation that includes all list items --- obviously there can be only one
 130 join rel at this top level, whereas there can be more than one join rel
 131 at lower levels.  At each level we use joins that follow available join
 132 clauses, if possible, just as described for the first level.
 133
 134 For example:
 135
 136     SELECT  *
 137     FROM    tab1, tab2, tab3, tab4
 138     WHERE   tab1.col = tab2.col AND
 139         tab2.col = tab3.col AND
 140         tab3.col = tab4.col
 141
 142     Tables 1, 2, 3, and 4 are joined as:
 143     {1 2},{2 3},{3 4}
 144     {1 2 3},{2 3 4}
 145     {1 2 3 4}
 146     (other possibilities will be excluded for lack of join clauses)
 147
 148     SELECT  *
 149     FROM    tab1, tab2, tab3, tab4
 150     WHERE   tab1.col = tab2.col AND
 151         tab1.col = tab3.col AND
 152         tab1.col = tab4.col
 153
 154     Tables 1, 2, 3, and 4 are joined as:
 155     {1 2},{1 3},{1 4}
 156     {1 2 3},{1 3 4},{1 2 4}
 157     {1 2 3 4}
 158
 159 We consider left-handed plans (the outer rel of an upper join is a joinrel,
 160 but the inner is always a single list item); right-handed plans (outer rel
 161 is always a single item); and bushy plans (both inner and outer can be
 162 joins themselves).  For example, when building {1 2 3 4} we consider
 163 joining {1 2 3} to {4} (left-handed), {4} to {1 2 3} (right-handed), and
 164 {1 2} to {3 4} (bushy), among other choices.  Although the jointree
 165 scanning code produces these potential join combinations one at a time,
 166 all the ways to produce the same set of joined base rels will share the
 167 same RelOptInfo, so the paths produced from different join combinations
 168 that produce equivalent joinrels will compete in add_path().
 169
 170 The dynamic-programming approach has an important property that's not
 171 immediately obvious: we will finish constructing all paths for a given
 172 relation before we construct any paths for relations containing that rel.
 173 This means that we can reliably identify the "cheapest path" for each rel
 174 before higher-level relations need to know that.  Also, we can safely
 175 discard a path when we find that another path for the same rel is better,
 176 without worrying that maybe there is already a reference to that path in
 177 some higher-level join path.  Without this, memory management for paths
 178 would be much more complicated.
 179
 180 Once we have built the final join rel, we use either the cheapest path
 181 for it or the cheapest path with the desired ordering (if that's cheaper
 182 than applying a sort to the cheapest other path).
 183
 184 If the query contains one-sided outer joins (LEFT or RIGHT joins), or
 185 IN or EXISTS WHERE clauses that were converted to semijoins or antijoins,
 186 then some of the possible join orders may be illegal.  These are excluded
 187 by having join_is_legal consult a side list of such "special" joins to see
 188 whether a proposed join is illegal.  (The same consultation allows it to
 189 see which join style should be applied for a valid join, ie, JOIN_INNER,
 190 JOIN_LEFT, etc.)
 191
 192
 193 Valid OUTER JOIN Optimizations
 194 ------------------------------
 195
 196 The planner's treatment of outer join reordering is based on the following
 197 identities:
 198
 199 1.      (A leftjoin B on (Pab)) innerjoin C on (Pac)
 200         = (A innerjoin C on (Pac)) leftjoin B on (Pab)
 201
 202 where Pac is a predicate referencing A and C, etc (in this case, clearly
 203 Pac cannot reference B, or the transformation is nonsensical).
 204
 205 2.      (A leftjoin B on (Pab)) leftjoin C on (Pac)
 206         = (A leftjoin C on (Pac)) leftjoin B on (Pab)
 207
 208 3.      (A leftjoin B on (Pab)) leftjoin C on (Pbc)
 209         = A leftjoin (B leftjoin C on (Pbc)) on (Pab)
 210
 211 Identity 3 only holds if predicate Pbc must fail for all-null B rows
 212 (that is, Pbc is strict for at least one column of B).  If Pbc is not
 213 strict, the first form might produce some rows with nonnull C columns
 214 where the second form would make those entries null.
 215
 216 RIGHT JOIN is equivalent to LEFT JOIN after switching the two input
 217 tables, so the same identities work for right joins.
 218
 219 An example of a case that does *not* work is moving an innerjoin into or
 220 out of the nullable side of an outer join:
 221
 222         A leftjoin (B join C on (Pbc)) on (Pab)
 223         != (A leftjoin B on (Pab)) join C on (Pbc)
 224
 225 SEMI joins work a little bit differently.  A semijoin can be reassociated
 226 into or out of the lefthand side of another semijoin, left join, or
 227 antijoin, but not into or out of the righthand side.  Likewise, an inner
 228 join, left join, or antijoin can be reassociated into or out of the
 229 lefthand side of a semijoin, but not into or out of the righthand side.
 230
 231 ANTI joins work approximately like LEFT joins, except that identity 3
 232 fails if the join to C is an antijoin (even if Pbc is strict, and in
 233 both the cases where the other join is a leftjoin and where it is an
 234 antijoin).  So we can't reorder antijoins into or out of the RHS of a
 235 leftjoin or antijoin, even if the relevant clause is strict.
 236
 237 The current code does not attempt to re-order FULL JOINs at all.
 238 FULL JOIN ordering is enforced by not collapsing FULL JOIN nodes when
 239 translating the jointree to "joinlist" representation.  Other types of
 240 JOIN nodes are normally collapsed so that they participate fully in the
 241 join order search.  To avoid generating illegal join orders, the planner
 242 creates a SpecialJoinInfo node for each non-inner join, and join_is_legal
 243 checks this list to decide if a proposed join is legal.
 244
 245 What we store in SpecialJoinInfo nodes are the minimum sets of Relids
 246 required on each side of the join to form the outer join.  Note that
 247 these are minimums; there's no explicit maximum, since joining other
 248 rels to the OJ's syntactic rels may be legal.  Per identities 1 and 2,
 249 non-FULL joins can be freely associated into the lefthand side of an
 250 OJ, but in some cases they can't be associated into the righthand side.
 251 So the restriction enforced by join_is_legal is that a proposed join
 252 can't join a rel within or partly within an RHS boundary to one outside
 253 the boundary, unless the proposed join is a LEFT join that can associate
 254 into the SpecialJoinInfo's RHS using identity 3.
 255
 256 The use of minimum Relid sets has some pitfalls; consider a query like
 257         A leftjoin (B leftjoin (C innerjoin D) on (Pbcd)) on Pa
 258 where Pa doesn't mention B/C/D at all.  In this case a naive computation
 259 would give the upper leftjoin's min LHS as {A} and min RHS as {C,D} (since
 260 we know that the innerjoin can't associate out of the leftjoin's RHS, and
 261 enforce that by including its relids in the leftjoin's min RHS).  And the
 262 lower leftjoin has min LHS of {B} and min RHS of {C,D}.  Given such
 263 information, join_is_legal would think it's okay to associate the upper
 264 join into the lower join's RHS, transforming the query to
 265         B leftjoin (A leftjoin (C innerjoin D) on Pa) on (Pbcd)
 266 which yields totally wrong answers.  We prevent that by forcing the min RHS
 267 for the upper join to include B.  This is perhaps overly restrictive, but
 268 such cases don't arise often so it's not clear that it's worth developing a
 269 more complicated system.
 270
 271
 272 Pulling Up Subqueries
 273 ---------------------
 274
 275 As we described above, a subquery appearing in the range table is planned
 276 independently and treated as a "black box" during planning of the outer
 277 query.  This is necessary when the subquery uses features such as
 278 aggregates, GROUP, or DISTINCT.  But if the subquery is just a simple
 279 scan or join, treating the subquery as a black box may produce a poor plan
 280 compared to considering it as part of the entire plan search space.
 281 Therefore, at the start of the planning process the planner looks for
 282 simple subqueries and pulls them up into the main query's jointree.
 283
 284 Pulling up a subquery may result in FROM-list joins appearing below the top
 285 of the join tree.  Each FROM-list is planned using the dynamic-programming
 286 search method described above.
 287
 288 If pulling up a subquery produces a FROM-list as a direct child of another
 289 FROM-list, then we can merge the two FROM-lists together.  Once that's
 290 done, the subquery is an absolutely integral part of the outer query and
 291 will not constrain the join tree search space at all.  However, that could
 292 result in unpleasant growth of planning time, since the dynamic-programming
 293 search has runtime exponential in the number of FROM-items considered.
 294 Therefore, we don't merge FROM-lists if the result would have too many
 295 FROM-items in one list.
 296
 297
 298 Vars and PlaceHolderVars
 299 ------------------------
 300
 301 A Var node is simply the parse-tree representation of a table column
 302 reference.  However, in the presence of outer joins, that concept is
 303 more subtle than it might seem.  We need to distinguish the values of
 304 a Var "above" and "below" any outer join that could force the Var to
 305 null.  As an example, consider
 306
 307         SELECT * FROM t1 LEFT JOIN t2 ON (t1.x = t2.y) WHERE foo(t2.z)
 308
 309 (Assume foo() is not strict, so that we can't reduce the left join to
 310 a plain join.)  A naive implementation might try to push the foo(t2.z)
 311 call down to the scan of t2, but that is not correct because
 312 (a) what foo() should actually see for a null-extended join row is NULL,
 313 and (b) if foo() returns false, we should suppress the t1 row from the
 314 join altogether, not emit it with a null-extended t2 row.  On the other
 315 hand, it *would* be correct (and desirable) to push that call down to
 316 the scan level if the query were
 317
 318         SELECT * FROM t1 LEFT JOIN t2 ON (t1.x = t2.y AND foo(t2.z))
 319
 320 This motivates considering "t2.z" within the left join's ON clause
 321 to be a different value from "t2.z" outside the JOIN clause.  The
 322 former can be identified with t2.z as seen at the relation scan level,
 323 but the latter can't.
 324
 325 Another example occurs in connection with EquivalenceClasses (discussed
 326 below).  Given
 327
 328         SELECT * FROM t1 LEFT JOIN t2 ON (t1.x = t2.y) WHERE t1.x = 42
 329
 330 we would like to use the EquivalenceClass mechanisms to derive "t2.y = 42"
 331 to use as a restriction clause for the scan of t2.  (That works, because t2
 332 rows having y different from 42 cannot affect the query result.)  However,
 333 it'd be wrong to conclude that t2.y will be equal to t1.x in every joined
 334 row.  Part of the solution to this problem is to deem that "t2.y" in the
 335 ON clause refers to the relation-scan-level value of t2.y, but not to the
 336 value that y will have in joined rows, where it might be NULL rather than
 337 equal to t1.x.
 338
 339 Therefore, Var nodes are decorated with "varnullingrels", which are sets
 340 of the rangetable indexes of outer joins that potentially null the Var
 341 at the point where it appears in the query.  (Using a set, not an ordered
 342 list, is fine since it doesn't matter which join forced the value to null;
 343 and that avoids having to change the representation when we consider
 344 different outer-join orders.)  In the examples above, all occurrences of
 345 t1.x would have empty varnullingrels, since the left join doesn't null t1.
 346 The t2 references within the JOIN ON clauses would also have empty
 347 varnullingrels.  But outside the JOIN clauses, any Vars referencing t2
 348 would have varnullingrels containing the index of the JOIN's rangetable
 349 entry (RTE), so that they'd be understood as potentially different from
 350 the t2 values seen at scan level.  Labeling t2.z in the WHERE clause with
 351 the JOIN's RT index lets us recognize that that occurrence of foo(t2.z)
 352 cannot be pushed down to the t2 scan level: we cannot evaluate that value
 353 at the scan level, but only after the join has been done.
 354
 355 For LEFT and RIGHT outer joins, only Vars coming from the nullable side
 356 of the join are marked with that join's RT index.  For FULL joins, Vars
 357 from both inputs are marked.  (Such marking doesn't let us tell which
 358 side of the full join a Var came from; but that information can be found
 359 elsewhere at need.)
 360
 361 Notionally, a Var having nonempty varnullingrels can be thought of as
 362         CASE WHEN any-of-these-outer-joins-produced-a-null-extended-row
 363           THEN NULL
 364           ELSE the-scan-level-value-of-the-column
 365           END
 366 It's only notional, because no such calculation is ever done explicitly.
 367 In a finished plan, Vars occurring in scan-level plan nodes represent
 368 the actual table column values, but upper-level Vars are always
 369 references to outputs of lower-level plan nodes.  When a join node emits
 370 a null-extended row, it just returns nulls for the relevant output
 371 columns rather than copying up values from its input.  Because we don't
 372 ever have to do this calculation explicitly, it's not necessary to
 373 distinguish which side of an outer join got null-extended, which'd
 374 otherwise be essential information for FULL JOIN cases.
 375
 376 Outer join identity 3 (discussed above) complicates this picture
 377 a bit.  In the form
 378         A leftjoin (B leftjoin C on (Pbc)) on (Pab)
 379 all of the Vars in clauses Pbc and Pab will have empty varnullingrels,
 380 but if we start with
 381         (A leftjoin B on (Pab)) leftjoin C on (Pbc)
 382 then the parser will have marked Pbc's B Vars with the A/B join's
 383 RT index, making this form artificially different from the first.
 384 For discussion's sake, let's denote this marking with a star:
 385         (A leftjoin B on (Pab)) leftjoin C on (Pb*c)
 386 To cope with this, once we have detected that commuting these joins
 387 is legal, we generate both the Pbc and Pb*c forms of that ON clause,
 388 by either removing or adding the first join's RT index in the B Vars
 389 that the parser created.  While generating paths for a plan step that
 390 joins B and C, we include as a relevant join qual only the form that
 391 is appropriate depending on whether A has already been joined to B.
 392
 393 It's also worth noting that identity 3 makes "the left join's RT index"
 394 itself a bit of a fuzzy concept, since the syntactic scope of each join
 395 RTE will depend on which form was produced by the parser.  We resolve
 396 this by considering that a left join's identity is determined by its
 397 minimum set of right-hand-side input relations.  In both forms allowed
 398 by identity 3, we can identify the first join as having minimum RHS B
 399 and the second join as having minimum RHS C.
 400
 401 Another thing to notice is that C Vars appearing outside the nested
 402 JOIN clauses will be marked as nulled by both left joins if the
 403 original parser input was in the first form of identity 3, but if the
 404 parser input was in the second form, such Vars will only be marked as
 405 nulled by the second join.  This is not really a semantic problem:
 406 such Vars will be marked the same way throughout the upper part of the
 407 query, so they will all look equal() which is correct; and they will not
 408 look equal() to any C Var appearing in the JOIN ON clause or below these
 409 joins.  However, when building Vars representing the outputs of join
 410 relations, we need to ensure that their varnullingrels are set to
 411 values consistent with the syntactic join order, so that they will
 412 appear equal() to pre-existing Vars in the upper part of the query.
 413
 414 Outer joins also complicate handling of subquery pull-up.  Consider
 415
 416         SELECT ..., ss.x FROM tab1
 417           LEFT JOIN (SELECT *, 42 AS x FROM tab2) ss ON ...
 418
 419 We want to be able to pull up the subquery as discussed previously,
 420 but we can't just replace the "ss.x" Var in the top-level SELECT list
 421 with the constant 42.  That'd result in always emitting 42, rather
 422 than emitting NULL in null-extended join rows.
 423
 424 To solve this, we introduce the concept of PlaceHolderVars.
 425 A PlaceHolderVar is somewhat like a Var, in that its value originates
 426 at a relation scan level and can then be forced to null by higher-level
 427 outer joins; hence PlaceHolderVars carry a set of nulling rel IDs just
 428 like Vars.  Unlike a Var, whose original value comes from a table,
 429 a PlaceHolderVar's original value is defined by a query-determined
 430 expression ("42" in this example); so we represent the PlaceHolderVar
 431 as a node with that expression as child.  We insert a PlaceHolderVar
 432 whenever subquery pullup needs to replace a subquery-referencing Var
 433 that has nonempty varnullingrels with an expression that is not simply a
 434 Var.  (When the replacement expression is a pulled-up Var, we can just
 435 add the replaced Var's varnullingrels to its set.  Also, if the replaced
 436 Var has empty varnullingrels, we don't need a PlaceHolderVar: there is
 437 nothing that'd force the value to null, so the pulled-up expression is
 438 fine to use as-is.)  In a finished plan, a PlaceHolderVar becomes just
 439 the contained expression at whatever plan level it's supposed to be
 440 evaluated at, and then upper-level occurrences are replaced by Var
 441 references to that output column of the lower plan level.  That causes
 442 the value to go to null when appropriate at an outer join, in the same
 443 way as for normal Vars.  Thus, PlaceHolderVars are never seen outside
 444 the planner.
 445
 446 PlaceHolderVars (PHVs) are more complicated than Vars in another way:
 447 their original value might need to be calculated at a join, not a
 448 base-level relation scan.  This can happen when a pulled-up subquery
 449 contains a join.  Because of this, a PHV can create a join order
 450 constraint that wouldn't otherwise exist, to ensure that it can
 451 be calculated before it is used.  A PHV's expression can also contain
 452 LATERAL references, adding complications that are discussed below.
 453
 454
 455 Relation Identification and Qual Clause Placement
 456 -------------------------------------------------
 457
 458 A qual clause obtained from WHERE or JOIN/ON can be enforced at the lowest
 459 scan or join level that includes all relations used in the clause.  For
 460 this purpose we consider that outer joins listed in varnullingrels or
 461 phnullingrels are used in the clause, since we can't compute the qual's
 462 result correctly until we know whether such Vars have gone to null.
 463
 464 The one exception to this general rule is that a non-degenerate outer
 465 JOIN/ON qual (one that references the non-nullable side of the join)
 466 cannot be enforced below that join, even if it doesn't reference the
 467 nullable side.  Pushing it down into the non-nullable side would result
 468 in rows disappearing from the join's result, rather than appearing as
 469 null-extended rows.  To handle that, when we identify such a qual we
 470 artificially add the join's minimum input relid set to the set of
 471 relations it is considered to use, forcing it to be evaluated exactly at
 472 that join level.  The same happens for outer-join quals that mention no
 473 relations at all.
 474
 475 When attaching a qual clause to a join plan node that is performing an
 476 outer join, the qual clause is considered a "join clause" (that is, it is
 477 applied before the join performs null-extension) if it does not reference
 478 that outer join in any varnullingrels or phnullingrels set, or a "filter
 479 clause" (applied after null-extension) if it does reference that outer
 480 join.  A qual clause that originally appeared in that outer join's JOIN/ON
 481 will fall into the first category, since the parser would not have marked
 482 any of its Vars as referencing the outer join.  A qual clause that
 483 originally came from some upper ON clause or WHERE clause will be seen as
 484 referencing the outer join if it references any of the nullable side's
 485 Vars, since those Vars will be so marked by the parser.  But, if such a
 486 qual does not reference any nullable-side Vars, it's okay to push it down
 487 into the non-nullable side, so it won't get attached to the join node in
 488 the first place.
 489
 490 These things lead us to identify join relations within the planner
 491 by the sets of base relation RT indexes plus outer join RT indexes
 492 that they include.  In that way, the sets of relations used by qual
 493 clauses can be directly compared to join relations' relid sets to
 494 see where to place the clauses.  These identifying sets are unique
 495 because, for any given collection of base relations, there is only
 496 one valid set of outer joins to have performed along the way to
 497 joining that set of base relations (although the order of applying
 498 them could vary, as discussed above).
 499
 500 SEMI joins do not have RT indexes, because they are artifacts made by
 501 the planner rather than the parser.  (We could create rangetable
 502 entries for them, but there seems no need at present.)  This does not
 503 cause a problem for qual placement, because the nullable side of a
 504 semijoin is not referenceable from above the join, so there is never a
 505 need to cite it in varnullingrels or phnullingrels.  It does not cause a
 506 problem for join relation identification either, since whether a semijoin
 507 has been completed is again implicit in the set of base relations
 508 included in the join.
 509
 510 As usual, outer join identity 3 complicates matters.  If we start with
 511         (A leftjoin B on (Pab)) leftjoin C on (Pbc)
 512 then the parser will have marked any C Vars appearing above these joins
 513 with the RT index of the B/C join.  If we now transform to
 514         A leftjoin (B leftjoin C on (Pbc)) on (Pab)
 515 then it would appear that a clause using only such Vars could be pushed
 516 down and applied as a filter clause (not a join clause) at the lower
 517 B/C join.  But *this might not give the right answer* since the clause
 518 might see a non-null value for the C Var that will be replaced by null
 519 once the A/B join is performed.  We handle this by saying that the
 520 pushed-down join hasn't completely performed the work of the B/C join
 521 and hence is not entitled to include that outer join relid in its
 522 relid set.  When we form the A/B join, both outer joins' relids will
 523 be added to its relid set, and then the upper clause will be applied
 524 at the correct join level.  (Note there is no problem when identity 3
 525 is applied in the other direction: if we started with the second form
 526 then upper C Vars are marked with both outer join relids, so they
 527 cannot drop below whichever join is applied second.)  Similarly,
 528 Vars representing the output of a pushed-down join do not acquire
 529 nullingrel bits for that join until after the upper join is performed.
 530
 531 There is one additional complication for qual clause placement, which
 532 occurs when we have made multiple versions of an outer-join clause as
 533 described previously (that is, we have both "Pbc" and "Pb*c" forms of
 534 the same clause seen in outer join identity 3).  When forming an outer
 535 join we only want to apply one of the redundant versions of the clause.
 536 If we are forming the B/C join without having yet computed the A/B
 537 join, it's easy to reject the "Pb*c" form since its required relid
 538 set includes the A/B join relid which is not in the input.  However,
 539 if we form B/C after A/B, then both forms of the clause are applicable
 540 so far as that test can tell.  We have to look more closely to notice
 541 that the "Pbc" clause form refers to relation B which is no longer
 542 directly accessible.  While such a check could be performed using the
 543 per-relation RelOptInfo.nulling_relids data, it would be annoyingly
 544 expensive to do over and over as we consider different join paths.
 545 To make this simple and reliable, we compute an "incompatible_relids"
 546 set for each variant version (clone) of a redundant clause.  A clone
 547 clause should not be applied if any of the outer-join relids listed in
 548 incompatible_relids has already been computed below the current join.
 549
 550
 551 Optimizer Functions
 552 -------------------
 553
 554 The primary entry point is planner().
 555
 556 planner()
 557 set up for recursive handling of subqueries
 558 -subquery_planner()
 559  pull up sublinks and subqueries from rangetable, if possible
 560  canonicalize qual
 561      Attempt to simplify WHERE clause to the most useful form; this includes
 562      flattening nested AND/ORs and detecting clauses that are duplicated in
 563      different branches of an OR.
 564  simplify constant expressions
 565  process sublinks
 566  convert Vars of outer query levels into Params
 567 --grouping_planner()
 568   preprocess target list for non-SELECT queries
 569   handle UNION/INTERSECT/EXCEPT, GROUP BY, HAVING, aggregates,
 570         ORDER BY, DISTINCT, LIMIT
 571 ---query_planner()
 572    make list of base relations used in query
 573    split up the qual into restrictions (a=1) and joins (b=c)
 574    find qual clauses that enable merge and hash joins
 575 ----make_one_rel()
 576      set_base_rel_pathlists()
 577       find seqscan and all index paths for each base relation
 578       find selectivity of columns used in joins
 579      make_rel_from_joinlist()
 580       hand off join subproblems to a plugin, GEQO, or standard_join_search()
 581 ------standard_join_search()
 582       call join_search_one_level() for each level of join tree needed
 583       join_search_one_level():
 584         For each joinrel of the prior level, do make_rels_by_clause_joins()
 585         if it has join clauses, or make_rels_by_clauseless_joins() if not.
 586         Also generate "bushy plan" joins between joinrels of lower levels.
 587       Back at standard_join_search(), generate gather paths if needed for
 588       each newly constructed joinrel, then apply set_cheapest() to extract
 589       the cheapest path for it.
 590       Loop back if this wasn't the top join level.
 591   Back at grouping_planner:
 592   do grouping (GROUP BY) and aggregation
 593   do window functions
 594   make unique (DISTINCT)
 595   do sorting (ORDER BY)
 596   do limit (LIMIT/OFFSET)
 597 Back at planner():
 598 convert finished Path tree into a Plan tree
 599 do final cleanup after planning
 600
 601
 602 Optimizer Data Structures
 603 -------------------------
 604
 605 PlannerGlobal   - global information for a single planner invocation
 606
 607 PlannerInfo     - information for planning a particular Query (we make
 608                   a separate PlannerInfo node for each sub-Query)
 609
 610 RelOptInfo      - a relation or joined relations
 611
 612  RestrictInfo   - WHERE clauses, like "x = 3" or "y = z"
 613                   (note the same structure is used for restriction and
 614                    join clauses)
 615
 616  Path           - every way to generate a RelOptInfo(sequential,index,joins)
 617   A plain Path node can represent several simple plans, per its pathtype:
 618     T_SeqScan   - sequential scan
 619     T_SampleScan - tablesample scan
 620     T_FunctionScan - function-in-FROM scan
 621     T_TableFuncScan - table function scan
 622     T_ValuesScan - VALUES scan
 623     T_CteScan   - CTE (WITH) scan
 624     T_NamedTuplestoreScan - ENR scan
 625     T_WorkTableScan - scan worktable of a recursive CTE
 626     T_Result    - childless Result plan node (used for FROM-less SELECT)
 627   IndexPath     - index scan
 628   BitmapHeapPath - top of a bitmapped index scan
 629   TidPath       - scan by CTID
 630   TidRangePath  - scan a contiguous range of CTIDs
 631   SubqueryScanPath - scan a subquery-in-FROM
 632   ForeignPath   - scan a foreign table, foreign join or foreign upper-relation
 633   CustomPath    - for custom scan providers
 634   AppendPath    - append multiple subpaths together
 635   MergeAppendPath - merge multiple subpaths, preserving their common sort order
 636   GroupResultPath - childless Result plan node (used for degenerate grouping)
 637   MaterialPath  - a Material plan node
 638   MemoizePath   - a Memoize plan node for caching tuples from sub-paths
 639   UniquePath    - remove duplicate rows (either by hashing or sorting)
 640   GatherPath    - collect the results of parallel workers
 641   GatherMergePath - collect parallel results, preserving their common sort order
 642   ProjectionPath - a Result plan node with child (used for projection)
 643   ProjectSetPath - a ProjectSet plan node applied to some sub-path
 644   SortPath      - a Sort plan node applied to some sub-path
 645   IncrementalSortPath - an IncrementalSort plan node applied to some sub-path
 646   GroupPath     - a Group plan node applied to some sub-path
 647   UpperUniquePath - a Unique plan node applied to some sub-path
 648   AggPath       - an Agg plan node applied to some sub-path
 649   GroupingSetsPath - an Agg plan node used to implement GROUPING SETS
 650   MinMaxAggPath - a Result plan node with subplans performing MIN/MAX
 651   WindowAggPath - a WindowAgg plan node applied to some sub-path
 652   SetOpPath     - a SetOp plan node applied to two sub-paths
 653   RecursiveUnionPath - a RecursiveUnion plan node applied to two sub-paths
 654   LockRowsPath  - a LockRows plan node applied to some sub-path
 655   ModifyTablePath - a ModifyTable plan node applied to some sub-path(s)
 656   LimitPath     - a Limit plan node applied to some sub-path
 657   NestPath      - nested-loop joins
 658   MergePath     - merge joins
 659   HashPath      - hash joins
 660
 661  EquivalenceClass - a data structure representing a set of values known equal
 662
 663  PathKey        - a data structure representing the sort ordering of a path
 664
 665 The optimizer spends a good deal of its time worrying about the ordering
 666 of the tuples returned by a path.  The reason this is useful is that by
 667 knowing the sort ordering of a path, we may be able to use that path as
 668 the left or right input of a mergejoin and avoid an explicit sort step.
 669 Nestloops and hash joins don't really care what the order of their inputs
 670 is, but mergejoin needs suitably ordered inputs.  Therefore, all paths
 671 generated during the optimization process are marked with their sort order
 672 (to the extent that it is known) for possible use by a higher-level merge.
 673
 674 It is also possible to avoid an explicit sort step to implement a user's
 675 ORDER BY clause if the final path has the right ordering already, so the
 676 sort ordering is of interest even at the top level.  grouping_planner() will
 677 look for the cheapest path with a sort order matching the desired order,
 678 then compare its cost to the cost of using the cheapest-overall path and
 679 doing an explicit sort on that.
 680
 681 When we are generating paths for a particular RelOptInfo, we discard a path
 682 if it is more expensive than another known path that has the same or better
 683 sort order.  We will never discard a path that is the only known way to
 684 achieve a given sort order (without an explicit sort, that is).  In this
 685 way, the next level up will have the maximum freedom to build mergejoins
 686 without sorting, since it can pick from any of the paths retained for its
 687 inputs.
 688
 689
 690 EquivalenceClasses
 691 ------------------
 692
 693 During the deconstruct_jointree() scan of the query's qual clauses, we
 694 look for mergejoinable equality clauses A = B.  When we find one, we
 695 create an EquivalenceClass containing the expressions A and B to record
 696 that they are equal.  If we later find another equivalence clause B = C,
 697 we add C to the existing EquivalenceClass for {A B}; this may require
 698 merging two existing EquivalenceClasses.  At the end of the scan, we have
 699 sets of values that are known all transitively equal to each other.  We can
 700 therefore use a comparison of any pair of the values as a restriction or
 701 join clause (when these values are available at the scan or join, of
 702 course); furthermore, we need test only one such comparison, not all of
 703 them.  Therefore, equivalence clauses are removed from the standard qual
 704 distribution process.  Instead, when preparing a restriction or join clause
 705 list, we examine each EquivalenceClass to see if it can contribute a
 706 clause, and if so we select an appropriate pair of values to compare.  For
 707 example, if we are trying to join A's relation to C's, we can generate the
 708 clause A = C, even though this appeared nowhere explicitly in the original
 709 query.  This may allow us to explore join paths that otherwise would have
 710 been rejected as requiring Cartesian-product joins.
 711
 712 Sometimes an EquivalenceClass may contain a pseudo-constant expression
 713 (i.e., one not containing Vars or Aggs of the current query level, nor
 714 volatile functions).  In this case we do not follow the policy of
 715 dynamically generating join clauses: instead, we dynamically generate
 716 restriction clauses "var = const" wherever one of the variable members of
 717 the class can first be computed.  For example, if we have A = B and B = 42,
 718 we effectively generate the restriction clauses A = 42 and B = 42, and then
 719 we need not bother with explicitly testing the join clause A = B when the
 720 relations are joined.  In effect, all the class members can be tested at
 721 relation-scan level and there's never a need for join tests.
 722
 723 The precise technical interpretation of an EquivalenceClass is that it
 724 asserts that at any plan node where more than one of its member values
 725 can be computed, output rows in which the values are not all equal may
 726 be discarded without affecting the query result.  (We require all levels
 727 of the plan to enforce EquivalenceClasses, hence a join need not recheck
 728 equality of values that were computable by one of its children.)
 729
 730 Outer joins complicate this picture quite a bit, however.  While we could
 731 theoretically use mergejoinable equality clauses that appear in outer-join
 732 conditions as sources of EquivalenceClasses, there's a serious difficulty:
 733 the resulting deductions are not valid everywhere.  For example, given
 734
 735         SELECT * FROM a LEFT JOIN b ON (a.x = b.y AND a.x = 42);
 736
 737 we can safely derive b.y = 42 and use that in the scan of B, because B
 738 rows not having b.y = 42 will not contribute to the join result.  However,
 739 we cannot apply a.x = 42 at the scan of A, or we will remove rows that
 740 should appear in the join result.  We could apply a.x = 42 as an outer join
 741 condition (and then it would be unnecessary to also check a.x = b.y).
 742 This is not yet implemented, however.
 743
 744 A related issue is that constants appearing below an outer join are
 745 less constant than they appear.  Ordinarily, if we find "A = 1" and
 746 "B = 1", it's okay to put A and B into the same EquivalenceClass.
 747 But consider
 748
 749         SELECT * FROM a
 750           LEFT JOIN (SELECT * FROM b WHERE b.z = 1) b ON (a.x = b.y)
 751         WHERE a.x = 1;
 752
 753 It would be a serious error to conclude that a.x = b.z, so we cannot
 754 form a single EquivalenceClass {a.x b.z 1}.
 755
 756 This leads to considering EquivalenceClasses as applying within "join
 757 domains", which are sets of relations that are inner-joined to each other.
 758 (We can treat semijoins as if they were inner joins for this purpose.)
 759 There is a top-level join domain, and then each outer join in the query
 760 creates a new join domain comprising its nullable side.  Full joins create
 761 two join domains, one for each side.  EquivalenceClasses generated from
 762 WHERE are associated with the top-level join domain.  EquivalenceClasses
 763 generated from the ON clause of an outer join are associated with the
 764 domain created by that outer join.  EquivalenceClasses generated from the
 765 ON clause of an inner or semi join are associated with the syntactically
 766 most closely nested join domain.
 767
 768 Having defined these domains, we can fix the not-so-constant-constants
 769 problem by considering that constants only match EquivalenceClass members
 770 when they come from clauses within the same join domain.  In the above
 771 example, this means we keep {a.x 1} and {b.z 1} as separate
 772 EquivalenceClasses and don't erroneously merge them.  We don't have to
 773 worry about this for Vars (or expressions containing Vars), because
 774 references to the "same" column from different join domains will have
 775 different varnullingrels and thus won't be equal() anyway.
 776
 777 In the future, the join-domain concept may allow us to treat mergejoinable
 778 outer-join conditions as sources of EquivalenceClasses.  The idea would be
 779 that conditions derived from such classes could only be enforced at scans
 780 or joins that are within the appropriate join domain.  This is not
 781 implemented yet, however, as the details are trickier than they appear.
 782
 783 Another instructive example is:
 784
 785         SELECT *
 786           FROM a LEFT JOIN
 787                (SELECT * FROM b JOIN c ON b.y = c.z WHERE b.y = 10) ss
 788                ON a.x = ss.y
 789           ORDER BY ss.y;
 790
 791 We can form the EquivalenceClass {b.y c.z 10} and thereby apply c.z = 10
 792 while scanning C, as well as b.y = 10 while scanning B, so that no clause
 793 needs to be checked at the inner join.  The left-join clause "a.x = ss.y"
 794 (really "a.x = b.y") is not considered an equivalence clause, so we do
 795 not insert a.x into that same EquivalenceClass; if we did, we'd falsely
 796 conclude a.x = 10.  In the future though we might be able to do that,
 797 if we can keep from applying a.x = 10 at the scan of A, which in principle
 798 we could do by noting that the EquivalenceClass only applies within the
 799 {B,C} join domain.
 800
 801 Also notice that ss.y in the ORDER BY is really b.y* (that is, the
 802 possibly-nulled form of b.y), so we will not confuse it with the b.y member
 803 of the lower EquivalenceClass.  Thus, we won't mistakenly conclude that
 804 that ss.y is equal to a constant, which if true would lead us to think that
 805 sorting for the ORDER BY is unnecessary (see discussion of PathKeys below).
 806 Instead, there will be a separate EquivalenceClass containing only b.y*,
 807 which will form the basis for the PathKey describing the required sort
 808 order.
 809
 810 Also consider this variant:
 811
 812         SELECT *
 813           FROM a LEFT JOIN
 814                (SELECT * FROM b JOIN c ON b.y = c.z WHERE b.y = 10) ss
 815                ON a.x = ss.y
 816           WHERE a.x = 42;
 817
 818 We still form the EquivalenceClass {b.y c.z 10}, and additionally
 819 we have an EquivalenceClass {a.x 42} belonging to a different join domain.
 820 We cannot use "a.x = b.y" to merge these classes.  However, we can compare
 821 that outer join clause to the existing EquivalenceClasses and form the
 822 derived clause "b.y = 42", which we can treat as a valid equivalence
 823 within the lower join domain (since no row of that domain not having
 824 b.y = 42 can contribute to the outer-join result).  That makes the lower
 825 EquivalenceClass {42 b.y c.z 10}, resulting in the contradiction 10 = 42,
 826 which lets the planner deduce that the B/C join need not be computed at
 827 all: the result of that whole join domain can be forced to empty.
 828 (This gets implemented as a gating Result filter, since more usually the
 829 potential contradiction involves Param values rather than just Consts, and
 830 thus it has to be checked at runtime.  We can use the join domain to
 831 determine the join level at which to place the gating condition.)
 832
 833 There is an additional complication when re-ordering outer joins according
 834 to identity 3.  Recall that the two choices we consider for such joins are
 835
 836         A leftjoin (B leftjoin C on (Pbc)) on (Pab)
 837         (A leftjoin B on (Pab)) leftjoin C on (Pb*c)
 838
 839 where the star denotes varnullingrels markers on B's Vars.  When Pbc
 840 is (or includes) a mergejoinable clause, we have something like
 841
 842         A leftjoin (B leftjoin C on (b.b = c.c)) on (Pab)
 843         (A leftjoin B on (Pab)) leftjoin C on (b.b* = c.c)
 844
 845 We could generate an EquivalenceClause linking b.b and c.c, but if we
 846 then also try to link b.b* and c.c, we end with a nonsensical conclusion
 847 that b.b and b.b* are equal (at least in some parts of the plan tree).
 848 In any case, the conclusions we could derive from such a thing would be
 849 largely duplicative.  Conditions involving b.b* can't be computed below
 850 this join nest, while any conditions that can be computed would be
 851 duplicative of what we'd get from the b.b/c.c combination.  Therefore,
 852 we choose to generate an EquivalenceClause linking b.b and c.c, but
 853 "b.b* = c.c" is handled as just an ordinary clause.
 854
 855 To aid in determining the sort ordering(s) that can work with a mergejoin,
 856 we mark each mergejoinable clause with the EquivalenceClasses of its left
 857 and right inputs.  For an equivalence clause, these are of course the same
 858 EquivalenceClass.  For a non-equivalence mergejoinable clause (such as an
 859 outer-join qualification), we generate two separate EquivalenceClasses for
 860 the left and right inputs.  This may result in creating single-item
 861 equivalence "classes", though of course these are still subject to merging
 862 if other equivalence clauses are later found to bear on the same
 863 expressions.
 864
 865 Another way that we may form a single-item EquivalenceClass is in creation
 866 of a PathKey to represent a desired sort order (see below).  This happens
 867 if an ORDER BY or GROUP BY key is not mentioned in any equivalence
 868 clause.  We need to reason about sort orders in such queries, and our
 869 representation of sort ordering is a PathKey which depends on an
 870 EquivalenceClass, so we have to make an EquivalenceClass.  This is a bit
 871 different from the above cases because such an EquivalenceClass might
 872 contain an aggregate function or volatile expression.  (A clause containing
 873 a volatile function will never be considered mergejoinable, even if its top
 874 operator is mergejoinable, so there is no way for a volatile expression to
 875 get into EquivalenceClasses otherwise.  Aggregates are disallowed in WHERE
 876 altogether, so will never be found in a mergejoinable clause.)  This is just
 877 a convenience to maintain a uniform PathKey representation: such an
 878 EquivalenceClass will never be merged with any other.  Note in particular
 879 that a single-item EquivalenceClass {a.x} is *not* meant to imply an
 880 assertion that a.x = a.x; the practical effect of this is that a.x could
 881 be NULL.
 882
 883 An EquivalenceClass also contains a list of btree opfamily OIDs, which
 884 determines what the equalities it represents actually "mean".  All the
 885 equivalence clauses that contribute to an EquivalenceClass must have
 886 equality operators that belong to the same set of opfamilies.  (Note: most
 887 of the time, a particular equality operator belongs to only one family, but
 888 it's possible that it belongs to more than one.  We keep track of all the
 889 families to ensure that we can make use of an index belonging to any one of
 890 the families for mergejoin purposes.)
 891
 892 For the same sort of reason, an EquivalenceClass is also associated
 893 with a particular collation, if its datatype(s) care about collation.
 894
 895 An EquivalenceClass can contain "em_is_child" members, which are copies
 896 of members that contain appendrel parent relation Vars, transposed to
 897 contain the equivalent child-relation variables or expressions.  These
 898 members are *not* full-fledged members of the EquivalenceClass and do not
 899 affect the class's overall properties at all.  They are kept only to
 900 simplify matching of child-relation expressions to EquivalenceClasses.
 901 Most operations on EquivalenceClasses should ignore child members.
 902
 903
 904 PathKeys
 905 --------
 906
 907 The PathKeys data structure represents what is known about the sort order
 908 of the tuples generated by a particular Path.  A path's pathkeys field is a
 909 list of PathKey nodes, where the n'th item represents the n'th sort key of
 910 the result.  Each PathKey contains these fields:
 911
 912         * a reference to an EquivalenceClass
 913         * a btree opfamily OID (must match one of those in the EC)
 914         * a sort direction (ascending or descending)
 915         * a nulls-first-or-last flag
 916
 917 The EquivalenceClass represents the value being sorted on.  Since the
 918 various members of an EquivalenceClass are known equal according to the
 919 opfamily, we can consider a path sorted by any one of them to be sorted by
 920 any other too; this is what justifies referencing the whole
 921 EquivalenceClass rather than just one member of it.
 922
 923 In single/base relation RelOptInfo's, the Paths represent various ways
 924 of scanning the relation and the resulting ordering of the tuples.
 925 Sequential scan Paths have NIL pathkeys, indicating no known ordering.
 926 Index scans have Path.pathkeys that represent the chosen index's ordering,
 927 if any.  A single-key index would create a single-PathKey list, while a
 928 multi-column index generates a list with one element per key index column.
 929 Non-key columns specified in the INCLUDE clause of covering indexes don't
 930 have corresponding PathKeys in the list, because they have no influence on
 931 index ordering.  (Actually, since an index can be scanned either forward or
 932 backward, there are two possible sort orders and two possible PathKey lists
 933 it can generate.)
 934
 935 Note that a bitmap scan has NIL pathkeys since we can say nothing about
 936 the overall order of its result.  Also, an indexscan on an unordered type
 937 of index generates NIL pathkeys.  However, we can always create a pathkey
 938 by doing an explicit sort.  The pathkeys for a Sort plan's output just
 939 represent the sort key fields and the ordering operators used.
 940
 941 Things get more interesting when we consider joins.  Suppose we do a
 942 mergejoin between A and B using the mergeclause A.X = B.Y.  The output
 943 of the mergejoin is sorted by X --- but it is also sorted by Y.  Again,
 944 this can be represented by a PathKey referencing an EquivalenceClass
 945 containing both X and Y.
 946
 947 With a little further thought, it becomes apparent that nestloop joins
 948 can also produce sorted output.  For example, if we do a nestloop join
 949 between outer relation A and inner relation B, then any pathkeys relevant
 950 to A are still valid for the join result: we have not altered the order of
 951 the tuples from A.  Even more interesting, if there was an equivalence clause
 952 A.X=B.Y, and A.X was a pathkey for the outer relation A, then we can assert
 953 that B.Y is a pathkey for the join result; X was ordered before and still
 954 is, and the joined values of Y are equal to the joined values of X, so Y
 955 must now be ordered too.  This is true even though we used neither an
 956 explicit sort nor a mergejoin on Y.  (Note: hash joins cannot be counted
 957 on to preserve the order of their outer relation, because the executor
 958 might decide to "batch" the join, so we always set pathkeys to NIL for
 959 a hashjoin path.)
 960
 961 An outer join doesn't preserve the ordering of its nullable input
 962 relation(s), because it might insert nulls at random points in the
 963 ordering.  We don't need to think about this explicitly in the PathKey
 964 representation, because a PathKey representing a post-join variable
 965 will contain varnullingrel bits, making it not equal to a PathKey
 966 representing the pre-join value.
 967
 968 In general, we can justify using EquivalenceClasses as the basis for
 969 pathkeys because, whenever we scan a relation containing multiple
 970 EquivalenceClass members or join two relations each containing
 971 EquivalenceClass members, we apply restriction or join clauses derived from
 972 the EquivalenceClass.  This guarantees that any two values listed in the
 973 EquivalenceClass are in fact equal in all tuples emitted by the scan or
 974 join, and therefore that if the tuples are sorted by one of the values,
 975 they can be considered sorted by any other as well.  It does not matter
 976 whether the test clause is used as a mergeclause, or merely enforced
 977 after-the-fact as a qpqual filter.
 978
 979 Note that there is no particular difficulty in labeling a path's sort
 980 order with a PathKey referencing an EquivalenceClass that contains
 981 variables not yet joined into the path's output.  We can simply ignore
 982 such entries as not being relevant (yet).  This makes it possible to
 983 use the same EquivalenceClasses throughout the join planning process.
 984 In fact, by being careful not to generate multiple identical PathKey
 985 objects, we can reduce comparison of EquivalenceClasses and PathKeys
 986 to simple pointer comparison, which is a huge savings because add_path
 987 has to make a large number of PathKey comparisons in deciding whether
 988 competing Paths are equivalently sorted.
 989
 990 Pathkeys are also useful to represent an ordering that we wish to achieve,
 991 since they are easily compared to the pathkeys of a potential candidate
 992 path.  So, SortGroupClause lists are turned into pathkeys lists for use
 993 inside the optimizer.
 994
 995 An additional refinement we can make is to insist that canonical pathkey
 996 lists (sort orderings) do not mention the same EquivalenceClass more than
 997 once.  For example, in all these cases the second sort column is redundant,
 998 because it cannot distinguish values that are the same according to the
 999 first sort column:
1000         SELECT ... ORDER BY x, x
1001         SELECT ... ORDER BY x, x DESC
1002         SELECT ... WHERE x = y ORDER BY x, y
1003 Although a user probably wouldn't write "ORDER BY x,x" directly, such
1004 redundancies are more probable once equivalence classes have been
1005 considered.  Also, the system may generate redundant pathkey lists when
1006 computing the sort ordering needed for a mergejoin.  By eliminating the
1007 redundancy, we save time and improve planning, since the planner will more
1008 easily recognize equivalent orderings as being equivalent.
1009
1010 Another interesting property is that if the underlying EquivalenceClass
1011 contains a constant, then the pathkey is completely redundant and need not
1012 be sorted by at all!  Every interesting row must contain the same value,
1013 so there's no need to sort.  This might seem pointless because users
1014 are unlikely to write "... WHERE x = 42 ORDER BY x", but it allows us to
1015 recognize when particular index columns are irrelevant to the sort order:
1016 if we have "... WHERE x = 42 ORDER BY y", scanning an index on (x,y)
1017 produces correctly ordered data without a sort step.  We used to have very
1018 ugly ad-hoc code to recognize that in limited contexts, but discarding
1019 constant ECs from pathkeys makes it happen cleanly and automatically.
1020
1021
1022 Order of processing for EquivalenceClasses and PathKeys
1023 -------------------------------------------------------
1024
1025 As alluded to above, there is a specific sequence of phases in the
1026 processing of EquivalenceClasses and PathKeys during planning.  During the
1027 initial scanning of the query's quals (deconstruct_jointree followed by
1028 reconsider_outer_join_clauses), we construct EquivalenceClasses based on
1029 mergejoinable clauses found in the quals.  At the end of this process,
1030 we know all we can know about equivalence of different variables, so
1031 subsequently there will be no further merging of EquivalenceClasses.
1032 At that point it is possible to consider the EquivalenceClasses as
1033 "canonical" and build canonical PathKeys that reference them.  At this
1034 time we construct PathKeys for the query's ORDER BY and related clauses.
1035 (Any ordering expressions that do not appear elsewhere will result in
1036 the creation of new EquivalenceClasses, but this cannot result in merging
1037 existing classes, so canonical-ness is not lost.)
1038
1039 Because all the EquivalenceClasses are known before we begin path
1040 generation, we can use them as a guide to which indexes are of interest:
1041 if an index's column is not mentioned in any EquivalenceClass then that
1042 index's sort order cannot possibly be helpful for the query.  This allows
1043 short-circuiting of much of the processing of create_index_paths() for
1044 irrelevant indexes.
1045
1046 There are some cases where planner.c constructs additional
1047 EquivalenceClasses and PathKeys after query_planner has completed.
1048 In these cases, the extra ECs/PKs are needed to represent sort orders
1049 that were not considered during query_planner.  Such situations should be
1050 minimized since it is impossible for query_planner to return a plan
1051 producing such a sort order, meaning an explicit sort will always be needed.
1052 Currently this happens only for queries involving multiple window functions
1053 with different orderings, for which extra sorts are needed anyway.
1054
1055
1056 Parameterized Paths
1057 -------------------
1058
1059 The naive way to join two relations using a clause like WHERE A.X = B.Y
1060 is to generate a nestloop plan like this:
1061
1062         NestLoop
1063                 Filter: A.X = B.Y
1064                 -> Seq Scan on A
1065                 -> Seq Scan on B
1066
1067 We can make this better by using a merge or hash join, but it still
1068 requires scanning all of both input relations.  If A is very small and B is
1069 very large, but there is an index on B.Y, it can be enormously better to do
1070 something like this:
1071
1072         NestLoop
1073                 -> Seq Scan on A
1074                 -> Index Scan using B_Y_IDX on B
1075                         Index Condition: B.Y = A.X
1076
1077 Here, we are expecting that for each row scanned from A, the nestloop
1078 plan node will pass down the current value of A.X into the scan of B.
1079 That allows the indexscan to treat A.X as a constant for any one
1080 invocation, and thereby use it as an index key.  This is the only plan type
1081 that can avoid fetching all of B, and for small numbers of rows coming from
1082 A, that will dominate every other consideration.  (As A gets larger, this
1083 gets less attractive, and eventually a merge or hash join will win instead.
1084 So we have to cost out all the alternatives to decide what to do.)
1085
1086 It can be useful for the parameter value to be passed down through
1087 intermediate layers of joins, for example:
1088
1089         NestLoop
1090                 -> Seq Scan on A
1091                 Hash Join
1092                         Join Condition: B.Y = C.W
1093                         -> Seq Scan on B
1094                         -> Index Scan using C_Z_IDX on C
1095                                 Index Condition: C.Z = A.X
1096
1097 If all joins are plain inner joins then this is usually unnecessary,
1098 because it's possible to reorder the joins so that a parameter is used
1099 immediately below the nestloop node that provides it.  But in the
1100 presence of outer joins, such join reordering may not be possible.
1101
1102 Also, the bottom-level scan might require parameters from more than one
1103 other relation.  In principle we could join the other relations first
1104 so that all the parameters are supplied from a single nestloop level.
1105 But if those other relations have no join clause in common (which is
1106 common in star-schema queries for instance), the planner won't consider
1107 joining them directly to each other.  In such a case we need to be able
1108 to create a plan like
1109
1110     NestLoop
1111         -> Seq Scan on SmallTable1 A
1112         NestLoop
1113             -> Seq Scan on SmallTable2 B
1114             -> Index Scan using XYIndex on LargeTable C
1115                  Index Condition: C.X = A.AID and C.Y = B.BID
1116
1117 so we should be willing to pass down A.AID through a join even though
1118 there is no join order constraint forcing the plan to look like this.
1119
1120 Before version 9.2, Postgres used ad-hoc methods for planning and
1121 executing nestloop queries of this kind, and those methods could not
1122 handle passing parameters down through multiple join levels.
1123
1124 To plan such queries, we now use a notion of a "parameterized path",
1125 which is a path that makes use of a join clause to a relation that's not
1126 scanned by the path.  In the example two above, we would construct a
1127 path representing the possibility of doing this:
1128
1129         -> Index Scan using C_Z_IDX on C
1130                 Index Condition: C.Z = A.X
1131
1132 This path will be marked as being parameterized by relation A.  (Note that
1133 this is only one of the possible access paths for C; we'd still have a
1134 plain unparameterized seqscan, and perhaps other possibilities.)  The
1135 parameterization marker does not prevent joining the path to B, so one of
1136 the paths generated for the joinrel {B C} will represent
1137
1138         Hash Join
1139                 Join Condition: B.Y = C.W
1140                 -> Seq Scan on B
1141                 -> Index Scan using C_Z_IDX on C
1142                         Index Condition: C.Z = A.X
1143
1144 This path is still marked as being parameterized by A.  When we attempt to
1145 join {B C} to A to form the complete join tree, such a path can only be
1146 used as the inner side of a nestloop join: it will be ignored for other
1147 possible join types.  So we will form a join path representing the query
1148 plan shown above, and it will compete in the usual way with paths built
1149 from non-parameterized scans.
1150
1151 While all ordinary paths for a particular relation generate the same set
1152 of rows (since they must all apply the same set of restriction clauses),
1153 parameterized paths typically generate fewer rows than less-parameterized
1154 paths, since they have additional clauses to work with.  This means we
1155 must consider the number of rows generated as an additional figure of
1156 merit.  A path that costs more than another, but generates fewer rows,
1157 must be kept since the smaller number of rows might save work at some
1158 intermediate join level.  (It would not save anything if joined
1159 immediately to the source of the parameters.)
1160
1161 To keep cost estimation rules relatively simple, we make an implementation
1162 restriction that all paths for a given relation of the same parameterization
1163 (i.e., the same set of outer relations supplying parameters) must have the
1164 same rowcount estimate.  This is justified by insisting that each such path
1165 apply *all* join clauses that are available with the named outer relations.
1166 Different paths might, for instance, choose different join clauses to use
1167 as index clauses; but they must then apply any other join clauses available
1168 from the same outer relations as filter conditions, so that the set of rows
1169 returned is held constant.  This restriction doesn't degrade the quality of
1170 the finished plan: it amounts to saying that we should always push down
1171 movable join clauses to the lowest possible evaluation level, which is a
1172 good thing anyway.  The restriction is useful in particular to support
1173 pre-filtering of join paths in add_path_precheck.  Without this rule we
1174 could never reject a parameterized path in advance of computing its rowcount
1175 estimate, which would greatly reduce the value of the pre-filter mechanism.
1176
1177 To limit planning time, we have to avoid generating an unreasonably large
1178 number of parameterized paths.  We do this by only generating parameterized
1179 relation scan paths for index scans, and then only for indexes for which
1180 suitable join clauses are available.  There are also heuristics in join
1181 planning that try to limit the number of parameterized paths considered.
1182
1183 In particular, there's been a deliberate policy decision to favor hash
1184 joins over merge joins for parameterized join steps (those occurring below
1185 a nestloop that provides parameters to the lower join's inputs).  While we
1186 do not ignore merge joins entirely, joinpath.c does not fully explore the
1187 space of potential merge joins with parameterized inputs.  Also, add_path
1188 treats parameterized paths as having no pathkeys, so that they compete
1189 only on cost and rowcount; they don't get preference for producing a
1190 special sort order.  This creates additional bias against merge joins,
1191 since we might discard a path that could have been useful for performing
1192 a merge without an explicit sort step.  Since a parameterized path must
1193 ultimately be used on the inside of a nestloop, where its sort order is
1194 uninteresting, these choices do not affect any requirement for the final
1195 output order of a query --- they only make it harder to use a merge join
1196 at a lower level.  The savings in planning work justifies that.
1197
1198 Similarly, parameterized paths do not normally get preference in add_path
1199 for having cheap startup cost; that's seldom of much value when on the
1200 inside of a nestloop, so it seems not worth keeping extra paths solely for
1201 that.  An exception occurs for parameterized paths for the RHS relation of
1202 a SEMI or ANTI join: in those cases, we can stop the inner scan after the
1203 first match, so it's primarily startup not total cost that we care about.
1204
1205
1206 LATERAL subqueries
1207 ------------------
1208
1209 As of 9.3 we support SQL-standard LATERAL references from subqueries in
1210 FROM (and also functions in FROM).  The planner implements these by
1211 generating parameterized paths for any RTE that contains lateral
1212 references.  In such cases, *all* paths for that relation will be
1213 parameterized by at least the set of relations used in its lateral
1214 references.  (And in turn, join relations including such a subquery might
1215 not have any unparameterized paths.)  All the other comments made above for
1216 parameterized paths still apply, though; in particular, each such path is
1217 still expected to enforce any join clauses that can be pushed down to it,
1218 so that all paths of the same parameterization have the same rowcount.
1219
1220 We also allow LATERAL subqueries to be flattened (pulled up into the parent
1221 query) by the optimizer, but only when this does not introduce lateral
1222 references into JOIN/ON quals that would refer to relations outside the
1223 lowest outer join at/above that qual.  The semantics of such a qual would
1224 be unclear.  Note that even with this restriction, pullup of a LATERAL
1225 subquery can result in creating PlaceHolderVars that contain lateral
1226 references to relations outside their syntactic scope.  We still evaluate
1227 such PHVs at their syntactic location or lower, but the presence of such a
1228 PHV in the quals or targetlist of a plan node requires that node to appear
1229 on the inside of a nestloop join relative to the rel(s) supplying the
1230 lateral reference.  (Perhaps now that that stuff works, we could relax the
1231 pullup restriction?)
1232
1233
1234 Security-level constraints on qual clauses
1235 ------------------------------------------
1236
1237 To support row-level security and security-barrier views efficiently,
1238 we mark qual clauses (RestrictInfo nodes) with a "security_level" field.
1239 The basic concept is that a qual with a lower security_level must be
1240 evaluated before one with a higher security_level.  This ensures that
1241 "leaky" quals that might expose sensitive data are not evaluated until
1242 after the security barrier quals that are supposed to filter out
1243 security-sensitive rows.  However, many qual conditions are "leakproof",
1244 that is we trust the functions they use to not expose data.  To avoid
1245 unnecessarily inefficient plans, a leakproof qual is not delayed by
1246 security-level considerations, even if it has a higher syntactic
1247 security_level than another qual.
1248
1249 In a query that contains no use of RLS or security-barrier views, all
1250 quals will have security_level zero, so that none of these restrictions
1251 kick in; we don't even need to check leakproofness of qual conditions.
1252
1253 If there are security-barrier quals, they get security_level zero (and
1254 possibly higher, if there are multiple layers of barriers).  Regular quals
1255 coming from the query text get a security_level one more than the highest
1256 level used for barrier quals.
1257
1258 When new qual clauses are generated by EquivalenceClass processing,
1259 they must be assigned a security_level.  This is trickier than it seems.
1260 One's first instinct is that it would be safe to use the largest level
1261 found among the source quals for the EquivalenceClass, but that isn't
1262 safe at all, because it allows unwanted delays of security-barrier quals.
1263 Consider a barrier qual "t.x = t.y" plus a query qual "t.x = constant",
1264 and suppose there is another query qual "leaky_function(t.z)" that
1265 we mustn't evaluate before the barrier qual has been checked.
1266 We will have an EC {t.x, t.y, constant} which will lead us to replace
1267 the EC quals with "t.x = constant AND t.y = constant".  (We do not want
1268 to give up that behavior, either, since the latter condition could allow
1269 use of an index on t.y, which we would never discover from the original
1270 quals.)  If these generated quals are assigned the same security_level as
1271 the query quals, then it's possible for the leaky_function qual to be
1272 evaluated first, allowing leaky_function to see data from rows that
1273 possibly don't pass the barrier condition.
1274
1275 Instead, our handling of security levels with ECs works like this:
1276 * Quals are not accepted as source clauses for ECs in the first place
1277 unless they are leakproof or have security_level zero.
1278 * EC-derived quals are assigned the minimum (not maximum) security_level
1279 found among the EC's source clauses.
1280 * If the maximum security_level found among the EC's source clauses is
1281 above zero, then the equality operators selected for derived quals must
1282 be leakproof.  When no such operator can be found, the EC is treated as
1283 "broken" and we fall back to emitting its source clauses without any
1284 additional derived quals.
1285
1286 These rules together ensure that an untrusted qual clause (one with
1287 security_level above zero) cannot cause an EC to generate a leaky derived
1288 clause.  This makes it safe to use the minimum not maximum security_level
1289 for derived clauses.  The rules could result in poor plans due to not
1290 being able to generate derived clauses at all, but the risk of that is
1291 small in practice because most btree equality operators are leakproof.
1292 Also, by making exceptions for level-zero quals, we ensure that there is
1293 no plan degradation when no barrier quals are present.
1294
1295 Once we have security levels assigned to all clauses, enforcement
1296 of barrier-qual ordering restrictions boils down to two rules:
1297
1298 * Table scan plan nodes must not select quals for early execution
1299 (for example, use them as index qualifiers in an indexscan) unless
1300 they are leakproof or have security_level no higher than any other
1301 qual that is due to be executed at the same plan node.  (Use the
1302 utility function restriction_is_securely_promotable() to check
1303 whether it's okay to select a qual for early execution.)
1304
1305 * Normal execution of a list of quals must execute them in an order
1306 that satisfies the same security rule, ie higher security_levels must
1307 be evaluated later unless leakproof.  (This is handled in a single place
1308 by order_qual_clauses() in createplan.c.)
1309
1310 order_qual_clauses() uses a heuristic to decide exactly what to do with
1311 leakproof clauses.  Normally it sorts clauses by security_level then cost,
1312 being careful that the sort is stable so that we don't reorder clauses
1313 without a clear reason.  But this could result in a very expensive qual
1314 being done before a cheaper one that is of higher security_level.
1315 If the cheaper qual is leaky we have no choice, but if it is leakproof
1316 we could put it first.  We choose to sort leakproof quals as if they
1317 have security_level zero, but only when their cost is less than 10X
1318 cpu_operator_cost; that restriction alleviates the opposite problem of
1319 doing expensive quals first just because they're leakproof.
1320
1321 Additional rules will be needed to support safe handling of join quals
1322 when there is a mix of security levels among join quals; for example, it
1323 will be necessary to prevent leaky higher-security-level quals from being
1324 evaluated at a lower join level than other quals of lower security level.
1325 Currently there is no need to consider that since security-prioritized
1326 quals can only be single-table restriction quals coming from RLS policies
1327 or security-barrier views, and security-barrier view subqueries are never
1328 flattened into the parent query.  Hence enforcement of security-prioritized
1329 quals only happens at the table scan level.  With extra rules for safe
1330 handling of security levels among join quals, it should be possible to let
1331 security-barrier views be flattened into the parent query, allowing more
1332 flexibility of planning while still preserving required ordering of qual
1333 evaluation.  But that will come later.
1334
1335
1336 Post scan/join planning
1337 -----------------------
1338
1339 So far we have discussed only scan/join planning, that is, implementation
1340 of the FROM and WHERE clauses of a SQL query.  But the planner must also
1341 determine how to deal with GROUP BY, aggregation, and other higher-level
1342 features of queries; and in many cases there are multiple ways to do these
1343 steps and thus opportunities for optimization choices.  These steps, like
1344 scan/join planning, are handled by constructing Paths representing the
1345 different ways to do a step, then choosing the cheapest Path.
1346
1347 Since all Paths require a RelOptInfo as "parent", we create RelOptInfos
1348 representing the outputs of these upper-level processing steps.  These
1349 RelOptInfos are mostly dummy, but their pathlist lists hold all the Paths
1350 considered useful for each step.  Currently, we may create these types of
1351 additional RelOptInfos during upper-level planning:
1352
1353 UPPERREL_SETOP          result of UNION/INTERSECT/EXCEPT, if any
1354 UPPERREL_PARTIAL_GROUP_AGG      result of partial grouping/aggregation, if any
1355 UPPERREL_GROUP_AGG      result of grouping/aggregation, if any
1356 UPPERREL_WINDOW         result of window functions, if any
1357 UPPERREL_PARTIAL_DISTINCT       result of partial "SELECT DISTINCT", if any
1358 UPPERREL_DISTINCT       result of "SELECT DISTINCT", if any
1359 UPPERREL_ORDERED        result of ORDER BY, if any
1360 UPPERREL_FINAL          result of any remaining top-level actions
1361
1362 UPPERREL_FINAL is used to represent any final processing steps, currently
1363 LockRows (SELECT FOR UPDATE), LIMIT/OFFSET, and ModifyTable.  There is no
1364 flexibility about the order in which these steps are done, and thus no need
1365 to subdivide this stage more finely.
1366
1367 These "upper relations" are identified by the UPPERREL enum values shown
1368 above, plus a relids set, which allows there to be more than one upperrel
1369 of the same kind.  We use NULL for the relids if there's no need for more
1370 than one upperrel of the same kind.  Currently, in fact, the relids set
1371 is vestigial because it's always NULL, but that's expected to change in
1372 the future.  For example, in planning set operations, we might need the
1373 relids to denote which subset of the leaf SELECTs has been combined in a
1374 particular group of Paths that are competing with each other.
1375
1376 The result of subquery_planner() is always returned as a set of Paths
1377 stored in the UPPERREL_FINAL rel with NULL relids.  The other types of
1378 upperrels are created only if needed for the particular query.
1379
1380
1381 Parallel Query and Partial Paths
1382 --------------------------------
1383
1384 Parallel query involves dividing up the work that needs to be performed
1385 either by an entire query or some portion of the query in such a way that
1386 some of that work can be done by one or more worker processes, which are
1387 called parallel workers.  Parallel workers are a subtype of dynamic
1388 background workers; see src/backend/access/transam/README.parallel for a
1389 fuller description.  The academic literature on parallel query suggests
1390 that parallel execution strategies can be divided into essentially two
1391 categories: pipelined parallelism, where the execution of the query is
1392 divided into multiple stages and each stage is handled by a separate
1393 process; and partitioning parallelism, where the data is split between
1394 multiple processes and each process handles a subset of it.  The
1395 literature, however, suggests that gains from pipeline parallelism are
1396 often very limited due to the difficulty of avoiding pipeline stalls.
1397 Consequently, we do not currently attempt to generate query plans that
1398 use this technique.
1399
1400 Instead, we focus on partitioning parallelism, which does not require
1401 that the underlying table be partitioned.  It only requires that (1)
1402 there is some method of dividing the data from at least one of the base
1403 tables involved in the relation across multiple processes, (2) allowing
1404 each process to handle its own portion of the data, and then (3)
1405 collecting the results.  Requirements (2) and (3) are satisfied by the
1406 executor node Gather (or GatherMerge), which launches any number of worker
1407 processes and executes its single child plan in all of them, and perhaps
1408 in the leader also, if the children aren't generating enough data to keep
1409 the leader busy.  Requirement (1) is handled by the table scan node: when
1410 invoked with parallel_aware = true, this node will, in effect, partition
1411 the table on a block by block basis, returning a subset of the tuples from
1412 the relation in each worker where that scan node is executed.
1413
1414 Just as we do for non-parallel access methods, we build Paths to
1415 represent access strategies that can be used in a parallel plan.  These
1416 are, in essence, the same strategies that are available in the
1417 non-parallel plan, but there is an important difference: a path that
1418 will run beneath a Gather node returns only a subset of the query
1419 results in each worker, not all of them.  To form a path that can
1420 actually be executed, the (rather large) cost of the Gather node must be
1421 accounted for.  For this reason among others, paths intended to run
1422 beneath a Gather node - which we call "partial" paths since they return
1423 only a subset of the results in each worker - must be kept separate from
1424 ordinary paths (see RelOptInfo's partial_pathlist and the function
1425 add_partial_path).
1426
1427 One of the keys to making parallel query effective is to run as much of
1428 the query in parallel as possible.  Therefore, we expect it to generally
1429 be desirable to postpone the Gather stage until as near to the top of the
1430 plan as possible.  Expanding the range of cases in which more work can be
1431 pushed below the Gather (and costing them accurately) is likely to keep us
1432 busy for a long time to come.
1433
1434 Partitionwise joins
1435 -------------------
1436
1437 A join between two similarly partitioned tables can be broken down into joins
1438 between their matching partitions if there exists an equi-join condition
1439 between the partition keys of the joining tables. The equi-join between
1440 partition keys implies that all join partners for a given row in one
1441 partitioned table must be in the corresponding partition of the other
1442 partitioned table. Because of this the join between partitioned tables to be
1443 broken into joins between the matching partitions. The resultant join is
1444 partitioned in the same way as the joining relations, thus allowing an N-way
1445 join between similarly partitioned tables having equi-join condition between
1446 their partition keys to be broken down into N-way joins between their matching
1447 partitions. This technique of breaking down a join between partitioned tables
1448 into joins between their partitions is called partitionwise join. We will use
1449 term "partitioned relation" for either a partitioned table or a join between
1450 compatibly partitioned tables.
1451
1452 Even if the joining relations don't have exactly the same partition bounds,
1453 partitionwise join can still be applied by using an advanced
1454 partition-matching algorithm.  For both the joining relations, the algorithm
1455 checks whether every partition of one joining relation only matches one
1456 partition of the other joining relation at most.  In such a case the join
1457 between the joining relations can be broken down into joins between the
1458 matching partitions.  The join relation can then be considered partitioned.
1459 The algorithm produces the pairs of the matching partitions, plus the
1460 partition bounds for the join relation, to allow partitionwise join for
1461 computing the join.  The algorithm is implemented in partition_bounds_merge().
1462 For an N-way join relation considered partitioned this way, not every pair of
1463 joining relations can use partitionwise join.  For example:
1464
1465         (A leftjoin B on (Pab)) innerjoin C on (Pac)
1466
1467 where A, B, and C are partitioned tables, and A has an extra partition
1468 compared to B and C.  When considering partitionwise join for the join {A B},
1469 the extra partition of A doesn't have a matching partition on the nullable
1470 side, which is the case that the current implementation of partitionwise join
1471 can't handle.  So {A B} is not considered partitioned, and the pair of {A B}
1472 and C considered for the 3-way join can't use partitionwise join.  On the
1473 other hand, the pair of {A C} and B can use partitionwise join because {A C}
1474 is considered partitioned by eliminating the extra partition (see identity 1
1475 on outer join reordering).  Whether an N-way join can use partitionwise join
1476 is determined based on the first pair of joining relations that are both
1477 partitioned and can use partitionwise join.
1478
1479 The partitioning properties of a partitioned relation are stored in its
1480 RelOptInfo.  The information about data types of partition keys are stored in
1481 PartitionSchemeData structure. The planner maintains a list of canonical
1482 partition schemes (distinct PartitionSchemeData objects) so that RelOptInfo of
1483 any two partitioned relations with same partitioning scheme point to the same
1484 PartitionSchemeData object.  This reduces memory consumed by
1485 PartitionSchemeData objects and makes it easy to compare the partition schemes
1486 of joining relations.
1487
1488 Partitionwise aggregates/grouping
1489 ---------------------------------
1490
1491 If the GROUP BY clause contains all of the partition keys, all the rows
1492 that belong to a given group must come from a single partition; therefore,
1493 aggregation can be done completely separately for each partition. Otherwise,
1494 partial aggregates can be computed for each partition, and then finalized
1495 after appending the results from the individual partitions.  This technique of
1496 breaking down aggregation or grouping over a partitioned relation into
1497 aggregation or grouping over its partitions is called partitionwise
1498 aggregation.  Especially when the partition keys match the GROUP BY clause,
1499 this can be significantly faster than the regular method.