src/backend/optimizer/README

   1 $PostgreSQL$
   2
   3 Optimizer
   4 =========
   5
   6 These directories take the Query structure returned by the parser, and
   7 generate a plan used by the executor.  The /plan directory generates the
   8 actual output plan, the /path code generates all possible ways to join the
   9 tables, and /prep handles various preprocessing steps for special cases.
  10 /util is utility stuff.  /geqo is the separate "genetic optimization" planner
  11 --- it does a semi-random search through the join tree space, rather than
  12 exhaustively considering all possible join trees.  (But each join considered
  13 by /geqo is given to /path to create paths for, so we consider all possible
  14 implementation paths for each specific join pair even in GEQO mode.)
  15
  16
  17 Paths and Join Pairs
  18 --------------------
  19
  20 During the planning/optimizing process, we build "Path" trees representing
  21 the different ways of doing a query.  We select the cheapest Path that
  22 generates the desired relation and turn it into a Plan to pass to the
  23 executor.  (There is pretty much a one-to-one correspondence between the
  24 Path and Plan trees, but Path nodes omit info that won't be needed during
  25 planning, and include info needed for planning that won't be needed by the
  26 executor.)
  27
  28 The optimizer builds a RelOptInfo structure for each base relation used in
  29 the query.  Base rels are either primitive tables, or subquery subselects
  30 that are planned via a separate recursive invocation of the planner.  A
  31 RelOptInfo is also built for each join relation that is considered during
  32 planning.  A join rel is simply a combination of base rels.  There is only
  33 one join RelOptInfo for any given set of baserels --- for example, the join
  34 {A B C} is represented by the same RelOptInfo no matter whether we build it
  35 by joining A and B first and then adding C, or joining B and C first and
  36 then adding A, etc.  These different means of building the joinrel are
  37 represented as Paths.  For each RelOptInfo we build a list of Paths that
  38 represent plausible ways to implement the scan or join of that relation.
  39 Once we've considered all the plausible Paths for a rel, we select the one
  40 that is cheapest according to the planner's cost estimates.  The final plan
  41 is derived from the cheapest Path for the RelOptInfo that includes all the
  42 base rels of the query.
  43
  44 Possible Paths for a primitive table relation include plain old sequential
  45 scan, plus index scans for any indexes that exist on the table, plus bitmap
  46 index scans using one or more indexes.  A subquery base relation just has
  47 one Path, a "SubqueryScan" path (which links to the subplan that was built
  48 by a recursive invocation of the planner).  Likewise a function-RTE base
  49 relation has only one possible Path.
  50
  51 Joins always occur using two RelOptInfos.  One is outer, the other inner.
  52 Outers drive lookups of values in the inner.  In a nested loop, lookups of
  53 values in the inner occur by scanning the inner path once per outer tuple
  54 to find each matching inner row.  In a mergejoin, inner and outer rows are
  55 ordered, and are accessed in order, so only one scan is required to perform
  56 the entire join: both inner and outer paths are scanned in-sync.  (There's
  57 not a lot of difference between inner and outer in a mergejoin...)  In a
  58 hashjoin, the inner is scanned first and all its rows are entered in a
  59 hashtable, then the outer is scanned and for each row we lookup the join
  60 key in the hashtable.
  61
  62 A Path for a join relation is actually a tree structure, with the top
  63 Path node representing the join method.  It has left and right subpaths
  64 that represent the scan or join methods used for the two input relations.
  65
  66
  67 Join Tree Construction
  68 ----------------------
  69
  70 The optimizer generates optimal query plans by doing a more-or-less
  71 exhaustive search through the ways of executing the query.  The best Path
  72 tree is found by a recursive process:
  73
  74 1) Take each base relation in the query, and make a RelOptInfo structure
  75 for it.  Find each potentially useful way of accessing the relation,
  76 including sequential and index scans, and make Paths representing those
  77 ways.  All the Paths made for a given relation are placed in its
  78 RelOptInfo.pathlist.  (Actually, we discard Paths that are obviously
  79 inferior alternatives before they ever get into the pathlist --- what
  80 ends up in the pathlist is the cheapest way of generating each potentially
  81 useful sort ordering of the relation.)  Also create a RelOptInfo.joininfo
  82 list including all the join clauses that involve this relation.  For
  83 example, the WHERE clause "tab1.col1 = tab2.col1" generates entries in
  84 both tab1 and tab2's joininfo lists.
  85
  86 If we have only a single base relation in the query, we are done.
  87 Otherwise we have to figure out how to join the base relations into a
  88 single join relation.
  89
  90 2) Normally, any explicit JOIN clauses are "flattened" so that we just
  91 have a list of relations to join.  However, FULL OUTER JOIN clauses are
  92 never flattened, and other kinds of JOIN might not be either, if the
  93 flattening process is stopped by join_collapse_limit or from_collapse_limit
  94 restrictions.  Therefore, we end up with a planning problem that contains
  95 lists of relations to be joined in any order, where any individual item
  96 might be a sub-list that has to be joined together before we can consider
  97 joining it to its siblings.  We process these sub-problems recursively,
  98 bottom up.  Note that the join list structure constrains the possible join
  99 orders, but it doesn't constrain the join implementation method at each
 100 join (nestloop, merge, hash), nor does it say which rel is considered outer
 101 or inner at each join.  We consider all these possibilities in building
 102 Paths. We generate a Path for each feasible join method, and select the
 103 cheapest Path.
 104
 105 For each planning problem, therefore, we will have a list of relations
 106 that are either base rels or joinrels constructed per sub-join-lists.
 107 We can join these rels together in any order the planner sees fit.
 108 The standard (non-GEQO) planner does this as follows:
 109
 110 Consider joining each RelOptInfo to each other RelOptInfo for which there
 111 is a usable joinclause, and generate a Path for each possible join method
 112 for each such pair.  (If we have a RelOptInfo with no join clauses, we have
 113 no choice but to generate a clauseless Cartesian-product join; so we
 114 consider joining that rel to each other available rel.  But in the presence
 115 of join clauses we will only consider joins that use available join
 116 clauses.  Note that join-order restrictions induced by outer joins and
 117 IN/EXISTS clauses are also checked, to ensure that we find a workable join
 118 order in cases where those restrictions force a clauseless join to be done.)
 119
 120 If we only had two relations in the list, we are done: we just pick
 121 the cheapest path for the join RelOptInfo.  If we had more than two, we now
 122 need to consider ways of joining join RelOptInfos to each other to make
 123 join RelOptInfos that represent more than two list items.
 124
 125 The join tree is constructed using a "dynamic programming" algorithm:
 126 in the first pass (already described) we consider ways to create join rels
 127 representing exactly two list items.  The second pass considers ways
 128 to make join rels that represent exactly three list items; the next pass,
 129 four items, etc.  The last pass considers how to make the final join
 130 relation that includes all list items --- obviously there can be only one
 131 join rel at this top level, whereas there can be more than one join rel
 132 at lower levels.  At each level we use joins that follow available join
 133 clauses, if possible, just as described for the first level.
 134
 135 For example:
 136
 137     SELECT  *
 138     FROM    tab1, tab2, tab3, tab4
 139     WHERE   tab1.col = tab2.col AND
 140         tab2.col = tab3.col AND
 141         tab3.col = tab4.col
 142
 143     Tables 1, 2, 3, and 4 are joined as:
 144     {1 2},{2 3},{3 4}
 145     {1 2 3},{2 3 4}
 146     {1 2 3 4}
 147     (other possibilities will be excluded for lack of join clauses)
 148
 149     SELECT  *
 150     FROM    tab1, tab2, tab3, tab4
 151     WHERE   tab1.col = tab2.col AND
 152         tab1.col = tab3.col AND
 153         tab1.col = tab4.col
 154
 155     Tables 1, 2, 3, and 4 are joined as:
 156     {1 2},{1 3},{1 4}
 157     {1 2 3},{1 3 4},{1 2 4}
 158     {1 2 3 4}
 159
 160 We consider left-handed plans (the outer rel of an upper join is a joinrel,
 161 but the inner is always a single list item); right-handed plans (outer rel
 162 is always a single item); and bushy plans (both inner and outer can be
 163 joins themselves).  For example, when building {1 2 3 4} we consider
 164 joining {1 2 3} to {4} (left-handed), {4} to {1 2 3} (right-handed), and
 165 {1 2} to {3 4} (bushy), among other choices.  Although the jointree
 166 scanning code produces these potential join combinations one at a time,
 167 all the ways to produce the same set of joined base rels will share the
 168 same RelOptInfo, so the paths produced from different join combinations
 169 that produce equivalent joinrels will compete in add_path().
 170
 171 Once we have built the final join rel, we use either the cheapest path
 172 for it or the cheapest path with the desired ordering (if that's cheaper
 173 than applying a sort to the cheapest other path).
 174
 175 If the query contains one-sided outer joins (LEFT or RIGHT joins), or
 176 IN or EXISTS WHERE clauses that were converted to joins, then some of
 177 the possible join orders may be illegal.  These are excluded by having
 178 join_is_legal consult a side list of such "special" joins to see
 179 whether a proposed join is illegal.  (The same consultation allows it
 180 to see which join style should be applied for a valid join, ie,
 181 JOIN_INNER, JOIN_LEFT, etc.)
 182
 183
 184 Valid OUTER JOIN Optimizations
 185 ------------------------------
 186
 187 The planner's treatment of outer join reordering is based on the following
 188 identities:
 189
 190 1.      (A leftjoin B on (Pab)) innerjoin C on (Pac)
 191         = (A innerjoin C on (Pac)) leftjoin B on (Pab)
 192
 193 where Pac is a predicate referencing A and C, etc (in this case, clearly
 194 Pac cannot reference B, or the transformation is nonsensical).
 195
 196 2.      (A leftjoin B on (Pab)) leftjoin C on (Pac)
 197         = (A leftjoin C on (Pac)) leftjoin B on (Pab)
 198
 199 3.      (A leftjoin B on (Pab)) leftjoin C on (Pbc)
 200         = A leftjoin (B leftjoin C on (Pbc)) on (Pab)
 201
 202 Identity 3 only holds if predicate Pbc must fail for all-null B rows
 203 (that is, Pbc is strict for at least one column of B).  If Pbc is not
 204 strict, the first form might produce some rows with nonnull C columns
 205 where the second form would make those entries null.
 206
 207 RIGHT JOIN is equivalent to LEFT JOIN after switching the two input
 208 tables, so the same identities work for right joins.
 209
 210 An example of a case that does *not* work is moving an innerjoin into or
 211 out of the nullable side of an outer join:
 212
 213         A leftjoin (B join C on (Pbc)) on (Pab)
 214         != (A leftjoin B on (Pab)) join C on (Pbc)
 215
 216 SEMI joins work a little bit differently.  A semijoin can be reassociated
 217 into or out of the lefthand side of another semijoin, but not into or out
 218 of the righthand side.  Likewise, an inner join, left join, or antijoin
 219 can be reassociated into or out of the lefthand side of a semijoin, but
 220 not into or out of the righthand side.
 221
 222 ANTI joins work approximately like LEFT joins, except that identity 3
 223 fails if the join to C is an antijoin (even if Pbc is strict, and in
 224 both the cases where the other join is a leftjoin and where it is an
 225 antijoin).  So we can't reorder antijoins into or out of the RHS of a
 226 leftjoin or antijoin, even if the relevant clause is strict.
 227
 228 The current code does not attempt to re-order FULL JOINs at all.
 229 FULL JOIN ordering is enforced by not collapsing FULL JOIN nodes when
 230 translating the jointree to "joinlist" representation.  Other types of
 231 JOIN nodes are normally collapsed so that they participate fully in the
 232 join order search.  To avoid generating illegal join orders, the planner
 233 creates a SpecialJoinInfo node for each non-inner join, and join_is_legal
 234 checks this list to decide if a proposed join is legal.
 235
 236 What we store in SpecialJoinInfo nodes are the minimum sets of Relids
 237 required on each side of the join to form the outer join.  Note that
 238 these are minimums; there's no explicit maximum, since joining other
 239 rels to the OJ's syntactic rels may be legal.  Per identities 1 and 2,
 240 non-FULL joins can be freely associated into the lefthand side of an
 241 OJ, but in some cases they can't be associated into the righthand side.
 242 So the restriction enforced by join_is_legal is that a proposed join
 243 can't join a rel within or partly within an RHS boundary to one outside
 244 the boundary, unless the join validly implements some outer join.
 245 (To support use of identity 3, we have to allow cases where an apparent
 246 violation of a lower OJ's RHS is committed while forming an upper OJ.
 247 If this wouldn't in fact be legal, the upper OJ's minimum LHS or RHS
 248 set must be expanded to include the whole of the lower OJ, thereby
 249 preventing it from being formed before the lower OJ is.)
 250
 251
 252 Pulling Up Subqueries
 253 ---------------------
 254
 255 As we described above, a subquery appearing in the range table is planned
 256 independently and treated as a "black box" during planning of the outer
 257 query.  This is necessary when the subquery uses features such as
 258 aggregates, GROUP, or DISTINCT.  But if the subquery is just a simple
 259 scan or join, treating the subquery as a black box may produce a poor plan
 260 compared to considering it as part of the entire plan search space.
 261 Therefore, at the start of the planning process the planner looks for
 262 simple subqueries and pulls them up into the main query's jointree.
 263
 264 Pulling up a subquery may result in FROM-list joins appearing below the top
 265 of the join tree.  Each FROM-list is planned using the dynamic-programming
 266 search method described above.
 267
 268 If pulling up a subquery produces a FROM-list as a direct child of another
 269 FROM-list, then we can merge the two FROM-lists together.  Once that's
 270 done, the subquery is an absolutely integral part of the outer query and
 271 will not constrain the join tree search space at all.  However, that could
 272 result in unpleasant growth of planning time, since the dynamic-programming
 273 search has runtime exponential in the number of FROM-items considered.
 274 Therefore, we don't merge FROM-lists if the result would have too many
 275 FROM-items in one list.
 276
 277
 278 Optimizer Functions
 279 -------------------
 280
 281 The primary entry point is planner().
 282
 283 planner()
 284  set up for recursive handling of subqueries
 285  do final cleanup after planning
 286 -subquery_planner()
 287  pull up sublinks and subqueries from rangetable, if possible
 288  canonicalize qual
 289      Attempt to simplify WHERE clause to the most useful form; this includes
 290      flattening nested AND/ORs and detecting clauses that are duplicated in
 291      different branches of an OR.
 292  simplify constant expressions
 293  process sublinks
 294  convert Vars of outer query levels into Params
 295 --grouping_planner()
 296   preprocess target list for non-SELECT queries
 297   handle UNION/INTERSECT/EXCEPT, GROUP BY, HAVING, aggregates,
 298         ORDER BY, DISTINCT, LIMIT
 299 --query_planner()
 300    pull out constant quals, which can be used to gate execution of the
 301         whole plan (if any are found, we make a top-level Result node
 302         to do the gating)
 303    make list of base relations used in query
 304    split up the qual into restrictions (a=1) and joins (b=c)
 305    find qual clauses that enable merge and hash joins
 306 ----make_one_rel()
 307      set_base_rel_pathlist()
 308       find seqscan and all index paths for each base relation
 309       find selectivity of columns used in joins
 310      make_rel_from_joinlist()
 311       hand off join subproblems to a plugin, GEQO, or standard_join_search()
 312 -----standard_join_search()
 313       call join_search_one_level() for each level of join tree needed
 314       join_search_one_level():
 315         For each joinrel of the prior level, do make_rels_by_clause_joins()
 316         if it has join clauses, or make_rels_by_clauseless_joins() if not.
 317         Also generate "bushy plan" joins between joinrels of lower levels.
 318       Back at standard_join_search(), apply set_cheapest() to extract the
 319       cheapest path for each newly constructed joinrel.
 320       Loop back if this wasn't the top join level.
 321    Back at query_planner:
 322     put back any constant quals by adding a Result node
 323  Back at grouping_planner:
 324  do grouping(GROUP)
 325  do aggregates
 326  make unique(DISTINCT)
 327  make sort(ORDER BY)
 328  make limit(LIMIT/OFFSET)
 329
 330
 331 Optimizer Data Structures
 332 -------------------------
 333
 334 PlannerGlobal   - global information for a single planner invocation
 335
 336 PlannerInfo     - information for planning a particular Query (we make
 337                   a separate PlannerInfo node for each sub-Query)
 338
 339 RelOptInfo      - a relation or joined relations
 340
 341  RestrictInfo   - WHERE clauses, like "x = 3" or "y = z"
 342                   (note the same structure is used for restriction and
 343                    join clauses)
 344
 345  Path           - every way to generate a RelOptInfo(sequential,index,joins)
 346   SeqScan       - a plain Path node with pathtype = T_SeqScan
 347   IndexPath     - index scans
 348   BitmapHeapPath - top of a bitmapped index scan
 349   TidPath       - scan by CTID
 350   AppendPath    - append multiple subpaths together
 351   ResultPath    - a Result plan node (used for FROM-less SELECT)
 352   MaterialPath  - a Material plan node
 353   UniquePath    - remove duplicate rows
 354   NestPath      - nested-loop joins
 355   MergePath     - merge joins
 356   HashPath      - hash joins
 357
 358  EquivalenceClass - a data structure representing a set of values known equal
 359
 360  PathKey        - a data structure representing the sort ordering of a path
 361
 362 The optimizer spends a good deal of its time worrying about the ordering
 363 of the tuples returned by a path.  The reason this is useful is that by
 364 knowing the sort ordering of a path, we may be able to use that path as
 365 the left or right input of a mergejoin and avoid an explicit sort step.
 366 Nestloops and hash joins don't really care what the order of their inputs
 367 is, but mergejoin needs suitably ordered inputs.  Therefore, all paths
 368 generated during the optimization process are marked with their sort order
 369 (to the extent that it is known) for possible use by a higher-level merge.
 370
 371 It is also possible to avoid an explicit sort step to implement a user's
 372 ORDER BY clause if the final path has the right ordering already, so the
 373 sort ordering is of interest even at the top level.  query_planner() will
 374 look for the cheapest path with a sort order matching the desired order,
 375 and grouping_planner() will compare its cost to the cost of using the
 376 cheapest-overall path and doing an explicit sort.
 377
 378 When we are generating paths for a particular RelOptInfo, we discard a path
 379 if it is more expensive than another known path that has the same or better
 380 sort order.  We will never discard a path that is the only known way to
 381 achieve a given sort order (without an explicit sort, that is).  In this
 382 way, the next level up will have the maximum freedom to build mergejoins
 383 without sorting, since it can pick from any of the paths retained for its
 384 inputs.
 385
 386
 387 EquivalenceClasses
 388 ------------------
 389
 390 During the deconstruct_jointree() scan of the query's qual clauses, we look
 391 for mergejoinable equality clauses A = B whose applicability is not delayed
 392 by an outer join; these are called "equivalence clauses".  When we find
 393 one, we create an EquivalenceClass containing the expressions A and B to
 394 record this knowledge.  If we later find another equivalence clause B = C,
 395 we add C to the existing EquivalenceClass for {A B}; this may require
 396 merging two existing EquivalenceClasses.  At the end of the scan, we have
 397 sets of values that are known all transitively equal to each other.  We can
 398 therefore use a comparison of any pair of the values as a restriction or
 399 join clause (when these values are available at the scan or join, of
 400 course); furthermore, we need test only one such comparison, not all of
 401 them.  Therefore, equivalence clauses are removed from the standard qual
 402 distribution process.  Instead, when preparing a restriction or join clause
 403 list, we examine each EquivalenceClass to see if it can contribute a
 404 clause, and if so we select an appropriate pair of values to compare.  For
 405 example, if we are trying to join A's relation to C's, we can generate the
 406 clause A = C, even though this appeared nowhere explicitly in the original
 407 query.  This may allow us to explore join paths that otherwise would have
 408 been rejected as requiring Cartesian-product joins.
 409
 410 Sometimes an EquivalenceClass may contain a pseudo-constant expression
 411 (i.e., one not containing Vars or Aggs of the current query level, nor
 412 volatile functions).  In this case we do not follow the policy of
 413 dynamically generating join clauses: instead, we dynamically generate
 414 restriction clauses "var = const" wherever one of the variable members of
 415 the class can first be computed.  For example, if we have A = B and B = 42,
 416 we effectively generate the restriction clauses A = 42 and B = 42, and then
 417 we need not bother with explicitly testing the join clause A = B when the
 418 relations are joined.  In effect, all the class members can be tested at
 419 relation-scan level and there's never a need for join tests.
 420
 421 The precise technical interpretation of an EquivalenceClass is that it
 422 asserts that at any plan node where more than one of its member values
 423 can be computed, output rows in which the values are not all equal may
 424 be discarded without affecting the query result.  (We require all levels
 425 of the plan to enforce EquivalenceClasses, hence a join need not recheck
 426 equality of values that were computable by one of its children.)  For an
 427 ordinary EquivalenceClass that is "valid everywhere", we can further infer
 428 that the values are all non-null, because all mergejoinable operators are
 429 strict.  However, we also allow equivalence clauses that appear below the
 430 nullable side of an outer join to form EquivalenceClasses; for these
 431 classes, the interpretation is that either all the values are equal, or
 432 all (except pseudo-constants) have gone to null.  (This requires a
 433 limitation that non-constant members be strict, else they might not go
 434 to null when the other members do.)  Consider for example
 435
 436         SELECT *
 437           FROM a LEFT JOIN
 438                (SELECT * FROM b JOIN c ON b.y = c.z WHERE b.y = 10) ss
 439                ON a.x = ss.y
 440           WHERE a.x = 42;
 441
 442 We can form the below-outer-join EquivalenceClass {b.y c.z 10} and thereby
 443 apply c.z = 10 while scanning c.  (The reason we disallow outerjoin-delayed
 444 clauses from forming EquivalenceClasses is exactly that we want to be able
 445 to push any derived clauses as far down as possible.)  But once above the
 446 outer join it's no longer necessarily the case that b.y = 10, and thus we
 447 cannot use such EquivalenceClasses to conclude that sorting is unnecessary
 448 (see discussion of PathKeys below).
 449
 450 In this example, notice also that a.x = ss.y (really a.x = b.y) is not an
 451 equivalence clause because its applicability to b is delayed by the outer
 452 join; thus we do not try to insert b.y into the equivalence class {a.x 42}.
 453 But since we see that a.x has been equated to 42 above the outer join, we
 454 are able to form a below-outer-join class {b.y 42}; this restriction can be
 455 added because no b/c row not having b.y = 42 can contribute to the result
 456 of the outer join, and so we need not compute such rows.  Now this class
 457 will get merged with {b.y c.z 10}, leading to the contradiction 10 = 42,
 458 which lets the planner deduce that the b/c join need not be computed at all
 459 because none of its rows can contribute to the outer join.  (This gets
 460 implemented as a gating Result filter, since more usually the potential
 461 contradiction involves Param values rather than just Consts, and thus has
 462 to be checked at runtime.)
 463
 464 To aid in determining the sort ordering(s) that can work with a mergejoin,
 465 we mark each mergejoinable clause with the EquivalenceClasses of its left
 466 and right inputs.  For an equivalence clause, these are of course the same
 467 EquivalenceClass.  For a non-equivalence mergejoinable clause (such as an
 468 outer-join qualification), we generate two separate EquivalenceClasses for
 469 the left and right inputs.  This may result in creating single-item
 470 equivalence "classes", though of course these are still subject to merging
 471 if other equivalence clauses are later found to bear on the same
 472 expressions.
 473
 474 Another way that we may form a single-item EquivalenceClass is in creation
 475 of a PathKey to represent a desired sort order (see below).  This is a bit
 476 different from the above cases because such an EquivalenceClass might
 477 contain an aggregate function or volatile expression.  (A clause containing
 478 a volatile function will never be considered mergejoinable, even if its top
 479 operator is mergejoinable, so there is no way for a volatile expression to
 480 get into EquivalenceClasses otherwise.  Aggregates are disallowed in WHERE
 481 altogether, so will never be found in a mergejoinable clause.)  This is just
 482 a convenience to maintain a uniform PathKey representation: such an
 483 EquivalenceClass will never be merged with any other.
 484
 485 An EquivalenceClass also contains a list of btree opfamily OIDs, which
 486 determines what the equalities it represents actually "mean".  All the
 487 equivalence clauses that contribute to an EquivalenceClass must have
 488 equality operators that belong to the same set of opfamilies.  (Note: most
 489 of the time, a particular equality operator belongs to only one family, but
 490 it's possible that it belongs to more than one.  We keep track of all the
 491 families to ensure that we can make use of an index belonging to any one of
 492 the families for mergejoin purposes.)
 493
 494
 495 PathKeys
 496 --------
 497
 498 The PathKeys data structure represents what is known about the sort order
 499 of the tuples generated by a particular Path.  A path's pathkeys field is a
 500 list of PathKey nodes, where the n'th item represents the n'th sort key of
 501 the result.  Each PathKey contains these fields:
 502
 503         * a reference to an EquivalenceClass
 504         * a btree opfamily OID (must match one of those in the EC)
 505         * a sort direction (ascending or descending)
 506         * a nulls-first-or-last flag
 507
 508 The EquivalenceClass represents the value being sorted on.  Since the
 509 various members of an EquivalenceClass are known equal according to the
 510 opfamily, we can consider a path sorted by any one of them to be sorted by
 511 any other too; this is what justifies referencing the whole
 512 EquivalenceClass rather than just one member of it.
 513
 514 In single/base relation RelOptInfo's, the Paths represent various ways
 515 of scanning the relation and the resulting ordering of the tuples.
 516 Sequential scan Paths have NIL pathkeys, indicating no known ordering.
 517 Index scans have Path.pathkeys that represent the chosen index's ordering,
 518 if any.  A single-key index would create a single-PathKey list, while a
 519 multi-column index generates a list with one element per index column.
 520 (Actually, since an index can be scanned either forward or backward, there
 521 are two possible sort orders and two possible PathKey lists it can
 522 generate.)
 523
 524 Note that a bitmap scan or multi-pass indexscan (OR clause scan) has NIL
 525 pathkeys since we can say nothing about the overall order of its result.
 526 Also, an indexscan on an unordered type of index generates NIL pathkeys.
 527 However, we can always create a pathkey by doing an explicit sort.  The
 528 pathkeys for a Sort plan's output just represent the sort key fields and
 529 the ordering operators used.
 530
 531 Things get more interesting when we consider joins.  Suppose we do a
 532 mergejoin between A and B using the mergeclause A.X = B.Y.  The output
 533 of the mergejoin is sorted by X --- but it is also sorted by Y.  Again,
 534 this can be represented by a PathKey referencing an EquivalenceClass
 535 containing both X and Y.
 536
 537 With a little further thought, it becomes apparent that nestloop joins
 538 can also produce sorted output.  For example, if we do a nestloop join
 539 between outer relation A and inner relation B, then any pathkeys relevant
 540 to A are still valid for the join result: we have not altered the order of
 541 the tuples from A.  Even more interesting, if there was an equivalence clause
 542 A.X=B.Y, and A.X was a pathkey for the outer relation A, then we can assert
 543 that B.Y is a pathkey for the join result; X was ordered before and still
 544 is, and the joined values of Y are equal to the joined values of X, so Y
 545 must now be ordered too.  This is true even though we used neither an
 546 explicit sort nor a mergejoin on Y.  (Note: hash joins cannot be counted
 547 on to preserve the order of their outer relation, because the executor
 548 might decide to "batch" the join, so we always set pathkeys to NIL for
 549 a hashjoin path.)  Exception: a RIGHT or FULL join doesn't preserve the
 550 ordering of its outer relation, because it might insert nulls at random
 551 points in the ordering.
 552
 553 In general, we can justify using EquivalenceClasses as the basis for
 554 pathkeys because, whenever we scan a relation containing multiple
 555 EquivalenceClass members or join two relations each containing
 556 EquivalenceClass members, we apply restriction or join clauses derived from
 557 the EquivalenceClass.  This guarantees that any two values listed in the
 558 EquivalenceClass are in fact equal in all tuples emitted by the scan or
 559 join, and therefore that if the tuples are sorted by one of the values,
 560 they can be considered sorted by any other as well.  It does not matter
 561 whether the test clause is used as a mergeclause, or merely enforced
 562 after-the-fact as a qpqual filter.
 563
 564 Note that there is no particular difficulty in labeling a path's sort
 565 order with a PathKey referencing an EquivalenceClass that contains
 566 variables not yet joined into the path's output.  We can simply ignore
 567 such entries as not being relevant (yet).  This makes it possible to
 568 use the same EquivalenceClasses throughout the join planning process.
 569 In fact, by being careful not to generate multiple identical PathKey
 570 objects, we can reduce comparison of EquivalenceClasses and PathKeys
 571 to simple pointer comparison, which is a huge savings because add_path
 572 has to make a large number of PathKey comparisons in deciding whether
 573 competing Paths are equivalently sorted.
 574
 575 Pathkeys are also useful to represent an ordering that we wish to achieve,
 576 since they are easily compared to the pathkeys of a potential candidate
 577 path.  So, SortGroupClause lists are turned into pathkeys lists for use
 578 inside the optimizer.
 579
 580 Because we have to generate pathkeys lists from the sort clauses before
 581 we've finished EquivalenceClass merging, we cannot use the pointer-equality
 582 method of comparing PathKeys in the earliest stages of the planning
 583 process.  Instead, we generate "non canonical" PathKeys that reference
 584 single-element EquivalenceClasses that might get merged later.  After we
 585 complete EquivalenceClass merging, we replace these with "canonical"
 586 PathKeys that reference only fully-merged classes, and after that we make
 587 sure we don't generate more than one copy of each "canonical" PathKey.
 588 Then it is safe to use pointer comparison on canonical PathKeys.
 589
 590 An additional refinement we can make is to insist that canonical pathkey
 591 lists (sort orderings) do not mention the same EquivalenceClass more than
 592 once.  For example, in all these cases the second sort column is redundant,
 593 because it cannot distinguish values that are the same according to the
 594 first sort column:
 595         SELECT ... ORDER BY x, x
 596         SELECT ... ORDER BY x, x DESC
 597         SELECT ... WHERE x = y ORDER BY x, y
 598 Although a user probably wouldn't write "ORDER BY x,x" directly, such
 599 redundancies are more probable once equivalence classes have been
 600 considered.  Also, the system may generate redundant pathkey lists when
 601 computing the sort ordering needed for a mergejoin.  By eliminating the
 602 redundancy, we save time and improve planning, since the planner will more
 603 easily recognize equivalent orderings as being equivalent.
 604
 605 Another interesting property is that if the underlying EquivalenceClass
 606 contains a constant and is not below an outer join, then the pathkey is
 607 completely redundant and need not be sorted by at all!  Every row must
 608 contain the same constant value, so there's no need to sort.  (If the EC is
 609 below an outer join, we still have to sort, since some of the rows might
 610 have gone to null and others not.  In this case we must be careful to pick
 611 a non-const member to sort by.  The assumption that all the non-const
 612 members go to null at the same plan level is critical here, else they might
 613 not produce the same sort order.)  This might seem pointless because users
 614 are unlikely to write "... WHERE x = 42 ORDER BY x", but it allows us to
 615 recognize when particular index columns are irrelevant to the sort order:
 616 if we have "... WHERE x = 42 ORDER BY y", scanning an index on (x,y)
 617 produces correctly ordered data without a sort step.  We used to have very
 618 ugly ad-hoc code to recognize that in limited contexts, but discarding
 619 constant ECs from pathkeys makes it happen cleanly and automatically.
 620
 621 You might object that a below-outer-join EquivalenceClass doesn't always
 622 represent the same values at every level of the join tree, and so using
 623 it to uniquely identify a sort order is dubious.  This is true, but we
 624 can avoid dealing with the fact explicitly because we always consider that
 625 an outer join destroys any ordering of its nullable inputs.  Thus, even
 626 if a path was sorted by {a.x} below an outer join, we'll re-sort if that
 627 sort ordering was important; and so using the same PathKey for both sort
 628 orderings doesn't create any real problem.
 629
 630
 631
 632 Though Bob Devine <bob.devine@worldnet.att.net> was not involved in the
 633 coding of our optimizer, he is available to field questions about
 634 optimizer topics.
 635
 636 -- bjm & tgl
 637