llvm/docs/MergeFunctions.rst

   1 =================================
   2 MergeFunctions pass, how it works
   3 =================================
   4
   5 .. contents::
   6    :local:
   7
   8 Introduction
   9 ============
  10 Sometimes code contains equal functions, or functions that does exactly the same
  11 thing even though they are non-equal on the IR level (e.g.: multiplication on 2
  12 and 'shl 1'). It could happen due to several reasons: mainly, the usage of
  13 templates and automatic code generators. Though, sometimes the user itself could
  14 write the same thing twice :-)
  15
  16 The main purpose of this pass is to recognize such functions and merge them.
  17
  18 This document is the extension to pass comments and describes the pass logic. It
  19 describes the algorithm that is used in order to compare functions and
  20 explains how we could combine equal functions correctly to keep the module
  21 valid.
  22
  23 Material is brought in a top-down form, so the reader could start to learn pass
  24 from high level ideas and end with low-level algorithm details, thus preparing
  25 him or her for reading the sources.
  26
  27 The main goal is to describe the algorithm and logic here and the concept. If
  28 you *don't want* to read the source code, but want to understand pass
  29 algorithms, this document is good for you. The author tries not to repeat the
  30 source-code and covers only common cases to avoid the cases of needing to
  31 update this document after any minor code changes.
  32
  33
  34 What should I know to be able to follow along with this document?
  35 -----------------------------------------------------------------
  36
  37 The reader should be familiar with common compile-engineering principles and
  38 LLVM code fundamentals. In this article, we assume the reader is familiar with
  39 `Single Static Assignment
  40 <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_
  41 concept and has an understanding of
  42 `IR structure <https://llvm.org/docs/LangRef.html#high-level-structure>`_.
  43
  44 We will use terms such as
  45 "`module <https://llvm.org/docs/LangRef.html#high-level-structure>`_",
  46 "`function <https://llvm.org/docs/ProgrammersManual.html#the-function-class>`_",
  47 "`basic block <http://en.wikipedia.org/wiki/Basic_block>`_",
  48 "`user <https://llvm.org/docs/ProgrammersManual.html#the-user-class>`_",
  49 "`value <https://llvm.org/docs/ProgrammersManual.html#the-value-class>`_",
  50 "`instruction
  51 <https://llvm.org/docs/ProgrammersManual.html#the-instruction-class>`_".
  52
  53 As a good starting point, the Kaleidoscope tutorial can be used:
  54
  55 :doc:`tutorial/index`
  56
  57 It's especially important to understand chapter 3 of tutorial:
  58
  59 :doc:`tutorial/LangImpl03`
  60
  61 The reader should also know how passes work in LLVM. They could use this
  62 article as a reference and start point here:
  63
  64 :doc:`WritingAnLLVMPass`
  65
  66 What else? Well perhaps the reader should also have some experience in LLVM pass
  67 debugging and bug-fixing.
  68
  69 Narrative structure
  70 -------------------
  71 The article consists of three parts. The first part explains pass functionality
  72 on the top-level. The second part describes the comparison procedure itself.
  73 The third part describes the merging process.
  74
  75 In every part, the author tries to put the contents in the top-down form.
  76 The top-level methods will first be described followed by the terminal ones at
  77 the end, in the tail of each part. If the reader sees the reference to the
  78 method that wasn't described yet, they will find its description a bit below.
  79
  80 Basics
  81 ======
  82
  83 How to do it?
  84 -------------
  85 Do we need to merge functions? The obvious answer is: Yes, that is quite a
  86 possible case. We usually *do* have duplicates and it would be good to get rid
  87 of them. But how do we detect duplicates? This is the idea: we split functions
  88 into smaller bricks or parts and compare the "bricks" amount. If equal,
  89 we compare the "bricks" themselves, and then do our conclusions about functions
  90 themselves.
  91
  92 What could the difference be? For example, on a machine with 64-bit pointers
  93 (let's assume we have only one address space), one function stores a 64-bit
  94 integer, while another one stores a pointer. If the target is the machine
  95 mentioned above, and if functions are identical, except the parameter type (we
  96 could consider it as a part of function type), then we can treat a ``uint64_t``
  97 and a ``void*`` as equal.
  98
  99 This is just an example; more possible details are described a bit below.
 100
 101 As another example, the reader may imagine two more functions. The first
 102 function performs a multiplication by 2, while the second one performs an
 103 logical left shift by 1.
 104
 105 Possible solutions
 106 ^^^^^^^^^^^^^^^^^^
 107 Let's briefly consider possible options about how and what we have to implement
 108 in order to create full-featured functions merging, and also what it would
 109 mean for us.
 110
 111 Equal function detection obviously supposes that a "detector" method to be
 112 implemented and latter should answer the question "whether functions are equal".
 113 This "detector" method consists of tiny "sub-detectors", which each answers
 114 exactly the same question, but for function parts.
 115
 116 As the second step, we should merge equal functions. So it should be a "merger"
 117 method. "Merger" accepts two functions *F1* and *F2*, and produces *F1F2*
 118 function, the result of merging.
 119
 120 Having such routines in our hands, we can process a whole module, and merge all
 121 equal functions.
 122
 123 In this case, we have to compare every function with every another function. As
 124 the reader may notice, this way seems to be quite expensive. Of course we could
 125 introduce hashing and other helpers, but it is still just an optimization, and
 126 thus the level of O(N*N) complexity.
 127
 128 Can we reach another level? Could we introduce logarithmical search, or random
 129 access lookup? The answer is: "yes".
 130
 131 Random-access
 132 """""""""""""
 133 How it could this be done? Just convert each function to a number, and gather
 134 all of them in a special hash-table. Functions with equal hashes are equal.
 135 Good hashing means, that every function part must be taken into account. That
 136 means we have to convert every function part into some number, and then add it
 137 into the hash. The lookup-up time would be small, but such an approach adds some
 138 delay due to the hashing routine.
 139
 140 Logarithmical search
 141 """"""""""""""""""""
 142 We could introduce total ordering among the functions set, once ordered we
 143 could then implement a logarithmical search. Lookup time still depends on N,
 144 but adds a little of delay (*log(N)*).
 145
 146 Present state
 147 """""""""""""
 148 Both of the approaches (random-access and logarithmical) have been implemented
 149 and tested and both give a very good improvement. What was most
 150 surprising is that logarithmical search was faster; sometimes by up to 15%. The
 151 hashing method needs some extra CPU time, which is the main reason why it works
 152 slower; in most cases, total "hashing" time is greater than total
 153 "logarithmical-search" time.
 154
 155 So, preference has been granted to the "logarithmical search".
 156
 157 Though in the case of need, *logarithmical-search* (read "total-ordering") could
 158 be used as a milestone on our way to the *random-access* implementation.
 159
 160 Every comparison is based either on the numbers or on the flags comparison. In
 161 the *random-access* approach, we could use the same comparison algorithm.
 162 During comparison, we exit once we find the difference, but here we might have
 163 to scan the whole function body every time (note, it could be slower). Like in
 164 "total-ordering", we will track every number and flag, but instead of
 165 comparison, we should get the numbers sequence and then create the hash number.
 166 So, once again, *total-ordering* could be considered as a milestone for even
 167 faster (in theory) random-access approach.
 168
 169 MergeFunctions, main fields and runOnModule
 170 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 171 There are two main important fields in the class:
 172
 173 ``FnTree``  – the set of all unique functions. It keeps items that couldn't be
 174 merged with each other. It is defined as:
 175
 176 ``std::set<FunctionNode> FnTree;``
 177
 178 Here ``FunctionNode`` is a wrapper for ``llvm::Function`` class, with
 179 implemented “<” operator among the functions set (below we explain how it works
 180 exactly; this is a key point in fast functions comparison).
 181
 182 ``Deferred`` – merging process can affect bodies of functions that are in
 183 ``FnTree`` already. Obviously, such functions should be rechecked again. In this
 184 case, we remove them from ``FnTree``, and mark them to be rescanned, namely
 185 put them into ``Deferred`` list.
 186
 187 runOnModule
 188 """""""""""
 189 The algorithm is pretty simple:
 190
 191 1. Put all module's functions into the *worklist*.
 192
 193 2. Scan *worklist*'s functions twice: first enumerate only strong functions and
 194 then only weak ones:
 195
 196    2.1. Loop body: take a function from *worklist*  (call it *FCur*) and try to
 197    insert it into *FnTree*: check whether *FCur* is equal to one of functions
 198    in *FnTree*. If there *is* an equal function in *FnTree*
 199    (call it *FExists*): merge function *FCur* with *FExists*. Otherwise add
 200    the function from the *worklist* to *FnTree*.
 201
 202 3. Once the *worklist* scanning and merging operations are complete, check the
 203 *Deferred* list. If it is not empty: refill the *worklist* contents with
 204 *Deferred* list and redo step 2, if the *Deferred* list is empty, then exit
 205 from method.
 206
 207 Comparison and logarithmical search
 208 """""""""""""""""""""""""""""""""""
 209 Let's recall our task: for every function *F* from module *M*, we have to find
 210 equal functions *F`* in the shortest time possible , and merge them into a
 211 single function.
 212
 213 Defining total ordering among the functions set allows us to organize
 214 functions into a binary tree. The lookup procedure complexity would be
 215 estimated as O(log(N)) in this case. But how do we define *total-ordering*?
 216
 217 We have to introduce a single rule applicable to every pair of functions, and
 218 following this rule, then evaluate which of them is greater. What kind of rule
 219 could it be? Let's declare it as the "compare" method that returns one of 3
 220 possible values:
 221
 222 -1, left is *less* than right,
 223
 224 0, left and right are *equal*,
 225
 226 1, left is *greater* than right.
 227
 228 Of course it means, that we have to maintain
 229 *strict and non-strict order relation properties*:
 230
 231 * reflexivity (``a <= a``, ``a == a``, ``a >= a``),
 232 * antisymmetry (if ``a <= b`` and ``b <= a`` then ``a == b``),
 233 * transitivity (``a <= b`` and ``b <= c``, then ``a <= c``)
 234 * asymmetry (if ``a < b``, then ``a > b`` or ``a == b``).
 235
 236 As mentioned before, the comparison routine consists of
 237 "sub-comparison-routines", with each of them also consisting of
 238 "sub-comparison-routines", and so on. Finally, it ends up with primitive
 239 comparison.
 240
 241 Below, we will use the following operations:
 242
 243 #. ``cmpNumbers(number1, number2)`` is a method that returns -1 if left is less
 244    than right; 0, if left and right are equal; and 1 otherwise.
 245
 246 #. ``cmpFlags(flag1, flag2)`` is a hypothetical method that compares two flags.
 247    The logic is the same as in ``cmpNumbers``, where ``true`` is 1, and
 248    ``false`` is 0.
 249
 250 The rest of the article is based on *MergeFunctions.cpp* source code
 251 (found in *<llvm_dir>/lib/Transforms/IPO/MergeFunctions.cpp*). We would like
 252 to ask reader to keep this file open, so we could use it as a reference
 253 for further explanations.
 254
 255 Now, we're ready to proceed to the next chapter and see how it works.
 256
 257 Functions comparison
 258 ====================
 259 At first, let's define how exactly we compare complex objects.
 260
 261 Complex object comparison (function, basic-block, etc) is mostly based on its
 262 sub-object comparison results. It is similar to the next "tree" objects
 263 comparison:
 264
 265 #. For two trees *T1* and *T2* we perform *depth-first-traversal* and have
 266    two sequences as a product: "*T1Items*" and "*T2Items*".
 267
 268 #. We then compare chains "*T1Items*" and "*T2Items*" in
 269    the most-significant-item-first order. The result of items comparison
 270    would be the result of *T1* and *T2* comparison itself.
 271
 272 FunctionComparator::compare(void)
 273 ---------------------------------
 274 A brief look at the source code tells us that the comparison starts in the
 275 “``int FunctionComparator::compare(void)``” method.
 276
 277 1. The first parts to be compared are the function's attributes and some
 278 properties that is outside the “attributes” term, but still could make the
 279 function different without changing its body. This part of the comparison is
 280 usually done within simple *cmpNumbers* or *cmpFlags* operations (e.g.
 281 ``cmpFlags(F1->hasGC(), F2->hasGC())``). Below is a full list of function's
 282 properties to be compared on this stage:
 283
 284   * *Attributes* (those are returned by ``Function::getAttributes()``
 285     method).
 286
 287   * *GC*, for equivalence, *RHS* and *LHS* should be both either without
 288     *GC* or with the same one.
 289
 290   * *Section*, just like a *GC*: *RHS* and *LHS* should be defined in the
 291     same section.
 292
 293   * *Variable arguments*. *LHS* and *RHS* should be both either with or
 294     without *var-args*.
 295
 296   * *Calling convention* should be the same.
 297
 298 2. Function type. Checked by ``FunctionComparator::cmpType(Type*, Type*)``
 299 method. It checks return type and parameters type; the method itself will be
 300 described later.
 301
 302 3. Associate function formal parameters with each other. Then comparing function
 303 bodies, if we see the usage of *LHS*'s *i*-th argument in *LHS*'s body, then,
 304 we want to see usage of *RHS*'s *i*-th argument at the same place in *RHS*'s
 305 body, otherwise functions are different. On this stage we grant the preference
 306 to those we met later in function body (value we met first would be *less*).
 307 This is done by “``FunctionComparator::cmpValues(const Value*, const Value*)``”
 308 method (will be described a bit later).
 309
 310 4. Function body comparison. As it written in method comments:
 311
 312 “We do a CFG-ordered walk since the actual ordering of the blocks in the linked
 313 list is immaterial. Our walk starts at the entry block for both functions, then
 314 takes each block from each terminator in order. As an artifact, this also means
 315 that unreachable blocks are ignored.”
 316
 317 So, using this walk we get BBs from *left* and *right* in the same order, and
 318 compare them by “``FunctionComparator::compare(const BasicBlock*, const
 319 BasicBlock*)``” method.
 320
 321 We also associate BBs with each other, like we did it with function formal
 322 arguments (see ``cmpValues`` method below).
 323
 324 FunctionComparator::cmpType
 325 ---------------------------
 326 Consider how type comparison works.
 327
 328 1. Coerce pointer to integer. If left type is a pointer, try to coerce it to the
 329 integer type. It could be done if its address space is 0, or if address spaces
 330 are ignored at all. Do the same thing for the right type.
 331
 332 2. If left and right types are equal, return 0. Otherwise we need to give
 333 preference to one of them. So proceed to the next step.
 334
 335 3. If types are of different kind (different type IDs). Return result of type
 336 IDs comparison, treating them as numbers (use ``cmpNumbers`` operation).
 337
 338 4. If types are vectors or integers, return result of their pointers comparison,
 339 comparing them as numbers.
 340
 341 5. Check whether type ID belongs to the next group (call it equivalent-group):
 342
 343    * Void
 344
 345    * Float
 346
 347    * Double
 348
 349    * X86_FP80
 350
 351    * FP128
 352
 353    * PPC_FP128
 354
 355    * Label
 356
 357    * Metadata.
 358
 359    If ID belongs to group above, return 0. Since it's enough to see that
 360    types has the same ``TypeID``. No additional information is required.
 361
 362 6. Left and right are pointers. Return result of address space comparison
 363 (numbers comparison).
 364
 365 7. Complex types (structures, arrays, etc.). Follow complex objects comparison
 366 technique (see the very first paragraph of this chapter). Both *left* and
 367 *right* are to be expanded and their element types will be checked the same
 368 way. If we get -1 or 1 on some stage, return it. Otherwise return 0.
 369
 370 8. Steps 1-6 describe all the possible cases, if we passed steps 1-6 and didn't
 371 get any conclusions, then invoke ``llvm_unreachable``, since it's quite an
 372 unexpectable case.
 373
 374 cmpValues(const Value*, const Value*)
 375 -------------------------------------
 376 Method that compares local values.
 377
 378 This method gives us an answer to a very curious question: whether we could
 379 treat local values as equal, and which value is greater otherwise. It's
 380 better to start from example:
 381
 382 Consider the situation when we're looking at the same place in left
 383 function "*FL*" and in right function "*FR*". Every part of *left* place is
 384 equal to the corresponding part of *right* place, and (!) both parts use
 385 *Value* instances, for example:
 386
 387 .. code-block:: text
 388
 389    instr0 i32 %LV   ; left side, function FL
 390    instr0 i32 %RV   ; right side, function FR
 391
 392 So, now our conclusion depends on *Value* instances comparison.
 393
 394 The main purpose of this method is to determine relation between such values.
 395
 396 What can we expect from equal functions? At the same place, in functions
 397 "*FL*" and "*FR*" we expect to see *equal* values, or values *defined* at
 398 the same place in "*FL*" and "*FR*".
 399
 400 Consider a small example here:
 401
 402 .. code-block:: text
 403
 404   define void %f(i32 %pf0, i32 %pf1) {
 405     instr0 i32 %pf0 instr1 i32 %pf1 instr2 i32 123
 406   }
 407
 408 .. code-block:: text
 409
 410   define void %g(i32 %pg0, i32 %pg1) {
 411     instr0 i32 %pg0 instr1 i32 %pg0 instr2 i32 123
 412   }
 413
 414 In this example, *pf0* is associated with *pg0*, *pf1* is associated with
 415 *pg1*, and we also declare that *pf0* < *pf1*, and thus *pg0* < *pf1*.
 416
 417 Instructions with opcode "*instr0*" would be *equal*, since their types and
 418 opcodes are equal, and values are *associated*.
 419
 420 Instructions with opcode "*instr1*" from *f* is *greater* than instructions
 421 with opcode "*instr1*" from *g*; here we have equal types and opcodes, but
 422 "*pf1* is greater than "*pg0*".
 423
 424 Instructions with opcode "*instr2*" are equal, because their opcodes and
 425 types are equal, and the same constant is used as a value.
 426
 427 What we associate in cmpValues?
 428 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 429 * Function arguments. *i*-th argument from left function associated with
 430   *i*-th argument from right function.
 431 * BasicBlock instances. In basic-block enumeration loop we associate *i*-th
 432   BasicBlock from the left function with *i*-th BasicBlock from the right
 433   function.
 434 * Instructions.
 435 * Instruction operands. Note, we can meet *Value* here we have never seen
 436   before. In this case it is not a function argument, nor *BasicBlock*, nor
 437   *Instruction*. It is a global value. It is a constant, since it's the only
 438   supposed global here. The method also compares: Constants that are of the
 439   same type and if right constant can be losslessly bit-casted to the left
 440   one, then we also compare them.
 441
 442 How to implement cmpValues?
 443 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
 444 *Association* is a case of equality for us. We just treat such values as equal,
 445 but, in general, we need to implement antisymmetric relation. As mentioned
 446 above, to understand what is *less*, we can use order in which we
 447 meet values. If both values have the same order in a function (met at the same
 448 time), we then treat values as *associated*. Otherwise – it depends on who was
 449 first.
 450
 451 Every time we run the top-level compare method, we initialize two identical
 452 maps (one for the left side, another one for the right side):
 453
 454 ``map<Value, int> sn_mapL, sn_mapR;``
 455
 456 The key of the map is the *Value* itself, the *value* – is its order (call it
 457 *serial number*).
 458
 459 To add value *V* we need to perform the next procedure:
 460
 461 ``sn_map.insert(std::make_pair(V, sn_map.size()));``
 462
 463 For the first *Value*, map will return *0*, for the second *Value* map will
 464 return *1*, and so on.
 465
 466 We can then check whether left and right values met at the same time with
 467 a simple comparison:
 468
 469 ``cmpNumbers(sn_mapL[Left], sn_mapR[Right]);``
 470
 471 Of course, we can combine insertion and comparison:
 472
 473 .. code-block:: c++
 474
 475   std::pair<iterator, bool>
 476     LeftRes = sn_mapL.insert(std::make_pair(Left, sn_mapL.size())), RightRes
 477     = sn_mapR.insert(std::make_pair(Right, sn_mapR.size()));
 478   return cmpNumbers(LeftRes.first->second, RightRes.first->second);
 479
 480 Let's look, how whole method could be implemented.
 481
 482 1. We have to start with the bad news. Consider function self and
 483 cross-referencing cases:
 484
 485 .. code-block:: c++
 486
 487   // self-reference unsigned fact0(unsigned n) { return n > 1 ? n
 488   * fact0(n-1) : 1; } unsigned fact1(unsigned n) { return n > 1 ? n *
 489   fact1(n-1) : 1; }
 490
 491   // cross-reference unsigned ping(unsigned n) { return n!= 0 ? pong(n-1) : 0;
 492   } unsigned pong(unsigned n) { return n!= 0 ? ping(n-1) : 0; }
 493
 494 ..
 495
 496   This comparison has been implemented in initial *MergeFunctions* pass
 497   version. But, unfortunately, it is not transitive. And this is the only case
 498   we can't convert to less-equal-greater comparison. It is a seldom case, 4-5
 499   functions of 10000 (checked in test-suite), and, we hope, the reader would
 500   forgive us for such a sacrifice in order to get the O(log(N)) pass time.
 501
 502 2. If left/right *Value* is a constant, we have to compare them. Return 0 if it
 503 is the same constant, or use ``cmpConstants`` method otherwise.
 504
 505 3. If left/right is *InlineAsm* instance. Return result of *Value* pointers
 506 comparison.
 507
 508 4. Explicit association of *L* (left value) and *R*  (right value). We need to
 509 find out whether values met at the same time, and thus are *associated*. Or we
 510 need to put the rule: when we treat *L* < *R*. Now it is easy: we just return
 511 the result of numbers comparison:
 512
 513 .. code-block:: c++
 514
 515    std::pair<iterator, bool>
 516      LeftRes = sn_mapL.insert(std::make_pair(Left, sn_mapL.size())),
 517      RightRes = sn_mapR.insert(std::make_pair(Right, sn_mapR.size()));
 518    if (LeftRes.first->second == RightRes.first->second) return 0;
 519    if (LeftRes.first->second < RightRes.first->second) return -1;
 520    return 1;
 521
 522 Now when *cmpValues* returns 0, we can proceed the comparison procedure.
 523 Otherwise, if we get (-1 or 1), we need to pass this result to the top level,
 524 and finish comparison procedure.
 525
 526 cmpConstants
 527 ------------
 528 Performs constants comparison as follows:
 529
 530 1. Compare constant types using ``cmpType`` method. If the result is -1 or 1,
 531 goto step 2, otherwise proceed to step 3.
 532
 533 2. If types are different, we still can check whether constants could be
 534 losslessly bitcasted to each other. The further explanation is modification of
 535 ``canLosslesslyBitCastTo`` method.
 536
 537    2.1 Check whether constants are of the first class types
 538    (``isFirstClassType`` check):
 539
 540    2.1.1. If both constants are *not* of the first class type: return result
 541    of ``cmpType``.
 542
 543    2.1.2. Otherwise, if left type is not of the first class, return -1. If
 544    right type is not of the first class, return 1.
 545
 546    2.1.3. If both types are of the first class type, proceed to the next step
 547    (2.1.3.1).
 548
 549    2.1.3.1. If types are vectors, compare their bitwidth using the
 550    *cmpNumbers*. If result is not 0, return it.
 551
 552    2.1.3.2. Different types, but not a vectors:
 553
 554    * if both of them are pointers, good for us, we can proceed to step 3.
 555    * if one of types is pointer, return result of *isPointer* flags
 556      comparison (*cmpFlags* operation).
 557    * otherwise we have no methods to prove bitcastability, and thus return
 558      result of types comparison (-1 or 1).
 559
 560 Steps below are for the case when types are equal, or case when constants are
 561 bitcastable:
 562
 563 3. One of constants is a "*null*" value. Return the result of
 564 ``cmpFlags(L->isNullValue, R->isNullValue)`` comparison.
 565
 566 4. Compare value IDs, and return result if it is not 0:
 567
 568 .. code-block:: c++
 569
 570   if (int Res = cmpNumbers(L->getValueID(), R->getValueID()))
 571     return Res;
 572
 573 5. Compare the contents of constants. The comparison depends on the kind of
 574 constants, but on this stage it is just a lexicographical comparison. Just see
 575 how it was described in the beginning of "*Functions comparison*" paragraph.
 576 Mathematically, it is equal to the next case: we encode left constant and right
 577 constant (with similar way *bitcode-writer* does). Then compare left code
 578 sequence and right code sequence.
 579
 580 compare(const BasicBlock*, const BasicBlock*)
 581 ---------------------------------------------
 582 Compares two *BasicBlock* instances.
 583
 584 It enumerates instructions from left *BB* and right *BB*.
 585
 586 1. It assigns serial numbers to the left and right instructions, using
 587 ``cmpValues`` method.
 588
 589 2. If one of left or right is *GEP* (``GetElementPtr``), then treat *GEP* as
 590 greater than other instructions. If both instructions are *GEPs* use ``cmpGEP``
 591 method for comparison. If result is -1 or 1, pass it to the top-level
 592 comparison (return it).
 593
 594    3.1. Compare operations. Call ``cmpOperation`` method. If result is -1 or
 595    1, return it.
 596
 597    3.2. Compare number of operands, if result is -1 or 1, return it.
 598
 599    3.3. Compare operands themselves, use ``cmpValues`` method. Return result
 600    if it is -1 or 1.
 601
 602    3.4. Compare type of operands, using ``cmpType`` method. Return result if
 603    it is -1 or 1.
 604
 605    3.5. Proceed to the next instruction.
 606
 607 4. We can finish instruction enumeration in 3 cases:
 608
 609    4.1. We reached the end of both left and right basic-blocks. We didn't
 610    exit on steps 1-3, so contents are equal, return 0.
 611
 612    4.2. We have reached the end of the left basic-block. Return -1.
 613
 614    4.3. Return 1 (we reached the end of the right basic block).
 615
 616 cmpGEP
 617 ------
 618 Compares two GEPs (``getelementptr`` instructions).
 619
 620 It differs from regular operations comparison with the only thing: possibility
 621 to use ``accumulateConstantOffset`` method.
 622
 623 So, if we get constant offset for both left and right *GEPs*, then compare it as
 624 numbers, and return comparison result.
 625
 626 Otherwise treat it like a regular operation (see previous paragraph).
 627
 628 cmpOperation
 629 ------------
 630 Compares instruction opcodes and some important operation properties.
 631
 632 1. Compare opcodes, if it differs return the result.
 633
 634 2. Compare number of operands. If it differs – return the result.
 635
 636 3. Compare operation types, use *cmpType*. All the same – if types are
 637 different, return result.
 638
 639 4. Compare *subclassOptionalData*, get it with ``getRawSubclassOptionalData``
 640 method, and compare it like a numbers.
 641
 642 5. Compare operand types.
 643
 644 6. For some particular instructions, check equivalence (relation in our case) of
 645 some significant attributes. For example, we have to compare alignment for
 646 ``load`` instructions.
 647
 648 O(log(N))
 649 ---------
 650 Methods described above implement order relationship. And latter, could be used
 651 for nodes comparison in a binary tree. So we can organize functions set into
 652 the binary tree and reduce the cost of lookup procedure from
 653 O(N*N) to O(log(N)).
 654
 655 Merging process, mergeTwoFunctions
 656 ==================================
 657 Once *MergeFunctions* detected that current function (*G*) is equal to one that
 658 were analyzed before (function *F*) it calls ``mergeTwoFunctions(Function*,
 659 Function*)``.
 660
 661 Operation affects ``FnTree`` contents with next way: *F* will stay in
 662 ``FnTree``. *G* being equal to *F* will not be added to ``FnTree``. Calls of
 663 *G* would be replaced with something else. It changes bodies of callers. So,
 664 functions that calls *G* would be put into ``Deferred`` set and removed from
 665 ``FnTree``, and analyzed again.
 666
 667 The approach is next:
 668
 669 1. Most wished case: when we can use alias and both of *F* and *G* are weak. We
 670 make both of them with aliases to the third strong function *H*. Actually *H*
 671 is *F*. See below how it's made (but it's better to look straight into the
 672 source code). Well, this is a case when we can just replace *G* with *F*
 673 everywhere, we use ``replaceAllUsesWith`` operation here (*RAUW*).
 674
 675 2. *F* could not be overridden, while *G* could. It would be good to do the
 676 next: after merging the places where overridable function were used, still use
 677 overridable stub. So try to make *G* alias to *F*, or create overridable tail
 678 call wrapper around *F* and replace *G* with that call.
 679
 680 3. Neither *F* nor *G* could be overridden. We can't use *RAUW*. We can just
 681 change the callers: call *F* instead of *G*.  That's what
 682 ``replaceDirectCallers`` does.
 683
 684 Below is a detailed body description.
 685
 686 If “F” may be overridden
 687 ------------------------
 688 As follows from ``mayBeOverridden`` comments: “whether the definition of this
 689 global may be replaced by something non-equivalent at link time”. If so, that's
 690 ok: we can use alias to *F* instead of *G* or change call instructions itself.
 691
 692 HasGlobalAliases, removeUsers
 693 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 694 First consider the case when we have global aliases of one function name to
 695 another. Our purpose is  make both of them with aliases to the third strong
 696 function. Though if we keep *F* alive and without major changes we can leave it
 697 in ``FnTree``. Try to combine these two goals.
 698
 699 Do stub replacement of *F* itself with an alias to *F*.
 700
 701 1. Create stub function *H*, with the same name and attributes like function
 702 *F*. It takes maximum alignment of *F* and *G*.
 703
 704 2. Replace all uses of function *F* with uses of function *H*. It is the two
 705 steps procedure instead. First of all, we must take into account, all functions
 706 from whom *F* is called would be changed: since we change the call argument
 707 (from *F* to *H*). If so we must to review these caller functions again after
 708 this procedure. We remove callers from ``FnTree``, method with name
 709 ``removeUsers(F)`` does that (don't confuse with ``replaceAllUsesWith``):
 710
 711    2.1. ``Inside removeUsers(Value*
 712    V)`` we go through the all values that use value *V* (or *F* in our context).
 713    If value is instruction, we go to function that holds this instruction and
 714    mark it as to-be-analyzed-again (put to ``Deferred`` set), we also remove
 715    caller from ``FnTree``.
 716
 717    2.2. Now we can do the replacement: call ``F->replaceAllUsesWith(H)``.
 718
 719 3. *H* (that now "officially" plays *F*'s role) is replaced with alias to *F*.
 720 Do the same with *G*: replace it with alias to *F*. So finally everywhere *F*
 721 was used, we use *H* and it is alias to *F*, and everywhere *G* was used we
 722 also have alias to *F*.
 723
 724 4. Set *F* linkage to private. Make it strong :-)
 725
 726 No global aliases, replaceDirectCallers
 727 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 728 If global aliases are not supported. We call ``replaceDirectCallers``. Just
 729 go through all calls of *G* and replace it with calls of *F*. If you look into
 730 the method you will see that it scans all uses of *G* too, and if use is callee
 731 (if user is call instruction and *G* is used as what to be called), we replace
 732 it with use of *F*.
 733
 734 If “F” could not be overridden, fix it!
 735 """""""""""""""""""""""""""""""""""""""
 736
 737 We call ``writeThunkOrAlias(Function *F, Function *G)``. Here we try to replace
 738 *G* with alias to *F* first. The next conditions are essential:
 739
 740 * target should support global aliases,
 741 * the address itself of  *G* should be not significant, not named and not
 742   referenced anywhere,
 743 * function should come with external, local or weak linkage.
 744
 745 Otherwise we write thunk: some wrapper that has *G's* interface and calls *F*,
 746 so *G* could be replaced with this wrapper.
 747
 748 *writeAlias*
 749
 750 As follows from *llvm* reference:
 751
 752 “Aliases act as *second name* for the aliasee value”. So we just want to create
 753 a second name for *F* and use it instead of *G*:
 754
 755 1. create global alias itself (*GA*),
 756
 757 2. adjust alignment of *F* so it must be maximum of current and *G's* alignment;
 758
 759 3. replace uses of *G*:
 760
 761    3.1. first mark all callers of *G* as to-be-analyzed-again, using
 762    ``removeUsers`` method (see chapter above),
 763
 764    3.2. call ``G->replaceAllUsesWith(GA)``.
 765
 766 4. Get rid of *G*.
 767
 768 *writeThunk*
 769
 770 As it written in method comments:
 771
 772 “Replace G with a simple tail call to bitcast(F). Also replace direct uses of G
 773 with bitcast(F). Deletes G.”
 774
 775 In general it does the same as usual when we want to replace callee, except the
 776 first point:
 777
 778 1. We generate tail call wrapper around *F*, but with interface that allows use
 779 it instead of *G*.
 780
 781 2. “As-usual”: ``removeUsers`` and ``replaceAllUsesWith`` then.
 782
 783 3. Get rid of *G*.
 784
 785