technical/parallel-checkout.html

   1 <?xml version="1.0" encoding="UTF-8"?>
   2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
   3     "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
   4 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
   5 <head>
   6 <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=UTF-8" />
   7 <meta name="generator" content="AsciiDoc 10.2.0" />
   8 <title>Parallel Checkout Design Notes</title>
   9 <style type="text/css">
  10 /* Shared CSS for AsciiDoc xhtml11 and html5 backends */
  11
  12 /* Default font. */
  13 body {
  14   font-family: Georgia,serif;
  15 }
  16
  17 /* Title font. */
  18 h1, h2, h3, h4, h5, h6,
  19 div.title, caption.title,
  20 thead, p.table.header,
  21 #toctitle,
  22 #author, #revnumber, #revdate, #revremark,
  23 #footer {
  24   font-family: Arial,Helvetica,sans-serif;
  25 }
  26
  27 body {
  28   margin: 1em 5% 1em 5%;
  29 }
  30
  31 a {
  32   color: blue;
  33   text-decoration: underline;
  34 }
  35 a:visited {
  36   color: fuchsia;
  37 }
  38
  39 em {
  40   font-style: italic;
  41   color: navy;
  42 }
  43
  44 strong {
  45   font-weight: bold;
  46   color: #083194;
  47 }
  48
  49 h1, h2, h3, h4, h5, h6 {
  50   color: #527bbd;
  51   margin-top: 1.2em;
  52   margin-bottom: 0.5em;
  53   line-height: 1.3;
  54 }
  55
  56 h1, h2, h3 {
  57   border-bottom: 2px solid silver;
  58 }
  59 h2 {
  60   padding-top: 0.5em;
  61 }
  62 h3 {
  63   float: left;
  64 }
  65 h3 + * {
  66   clear: left;
  67 }
  68 h5 {
  69   font-size: 1.0em;
  70 }
  71
  72 div.sectionbody {
  73   margin-left: 0;
  74 }
  75
  76 hr {
  77   border: 1px solid silver;
  78 }
  79
  80 p {
  81   margin-top: 0.5em;
  82   margin-bottom: 0.5em;
  83 }
  84
  85 ul, ol, li > p {
  86   margin-top: 0;
  87 }
  88 ul > li     { color: #aaa; }
  89 ul > li > * { color: black; }
  90
  91 .monospaced, code, pre {
  92   font-family: "Courier New", Courier, monospace;
  93   font-size: inherit;
  94   color: navy;
  95   padding: 0;
  96   margin: 0;
  97 }
  98 pre {
  99   white-space: pre-wrap;
 100 }
 101
 102 #author {
 103   color: #527bbd;
 104   font-weight: bold;
 105   font-size: 1.1em;
 106 }
 107 #email {
 108 }
 109 #revnumber, #revdate, #revremark {
 110 }
 111
 112 #footer {
 113   font-size: small;
 114   border-top: 2px solid silver;
 115   padding-top: 0.5em;
 116   margin-top: 4.0em;
 117 }
 118 #footer-text {
 119   float: left;
 120   padding-bottom: 0.5em;
 121 }
 122 #footer-badges {
 123   float: right;
 124   padding-bottom: 0.5em;
 125 }
 126
 127 #preamble {
 128   margin-top: 1.5em;
 129   margin-bottom: 1.5em;
 130 }
 131 div.imageblock, div.exampleblock, div.verseblock,
 132 div.quoteblock, div.literalblock, div.listingblock, div.sidebarblock,
 133 div.admonitionblock {
 134   margin-top: 1.0em;
 135   margin-bottom: 1.5em;
 136 }
 137 div.admonitionblock {
 138   margin-top: 2.0em;
 139   margin-bottom: 2.0em;
 140   margin-right: 10%;
 141   color: #606060;
 142 }
 143
 144 div.content { /* Block element content. */
 145   padding: 0;
 146 }
 147
 148 /* Block element titles. */
 149 div.title, caption.title {
 150   color: #527bbd;
 151   font-weight: bold;
 152   text-align: left;
 153   margin-top: 1.0em;
 154   margin-bottom: 0.5em;
 155 }
 156 div.title + * {
 157   margin-top: 0;
 158 }
 159
 160 td div.title:first-child {
 161   margin-top: 0.0em;
 162 }
 163 div.content div.title:first-child {
 164   margin-top: 0.0em;
 165 }
 166 div.content + div.title {
 167   margin-top: 0.0em;
 168 }
 169
 170 div.sidebarblock > div.content {
 171   background: #ffffee;
 172   border: 1px solid #dddddd;
 173   border-left: 4px solid #f0f0f0;
 174   padding: 0.5em;
 175 }
 176
 177 div.listingblock > div.content {
 178   border: 1px solid #dddddd;
 179   border-left: 5px solid #f0f0f0;
 180   background: #f8f8f8;
 181   padding: 0.5em;
 182 }
 183
 184 div.quoteblock, div.verseblock {
 185   padding-left: 1.0em;
 186   margin-left: 1.0em;
 187   margin-right: 10%;
 188   border-left: 5px solid #f0f0f0;
 189   color: #888;
 190 }
 191
 192 div.quoteblock > div.attribution {
 193   padding-top: 0.5em;
 194   text-align: right;
 195 }
 196
 197 div.verseblock > pre.content {
 198   font-family: inherit;
 199   font-size: inherit;
 200 }
 201 div.verseblock > div.attribution {
 202   padding-top: 0.75em;
 203   text-align: left;
 204 }
 205 /* DEPRECATED: Pre version 8.2.7 verse style literal block. */
 206 div.verseblock + div.attribution {
 207   text-align: left;
 208 }
 209
 210 div.admonitionblock .icon {
 211   vertical-align: top;
 212   font-size: 1.1em;
 213   font-weight: bold;
 214   text-decoration: underline;
 215   color: #527bbd;
 216   padding-right: 0.5em;
 217 }
 218 div.admonitionblock td.content {
 219   padding-left: 0.5em;
 220   border-left: 3px solid #dddddd;
 221 }
 222
 223 div.exampleblock > div.content {
 224   border-left: 3px solid #dddddd;
 225   padding-left: 0.5em;
 226 }
 227
 228 div.imageblock div.content { padding-left: 0; }
 229 span.image img { border-style: none; vertical-align: text-bottom; }
 230 a.image:visited { color: white; }
 231
 232 dl {
 233   margin-top: 0.8em;
 234   margin-bottom: 0.8em;
 235 }
 236 dt {
 237   margin-top: 0.5em;
 238   margin-bottom: 0;
 239   font-style: normal;
 240   color: navy;
 241 }
 242 dd > *:first-child {
 243   margin-top: 0.1em;
 244 }
 245
 246 ul, ol {
 247     list-style-position: outside;
 248 }
 249 ol.arabic {
 250   list-style-type: decimal;
 251 }
 252 ol.loweralpha {
 253   list-style-type: lower-alpha;
 254 }
 255 ol.upperalpha {
 256   list-style-type: upper-alpha;
 257 }
 258 ol.lowerroman {
 259   list-style-type: lower-roman;
 260 }
 261 ol.upperroman {
 262   list-style-type: upper-roman;
 263 }
 264
 265 div.compact ul, div.compact ol,
 266 div.compact p, div.compact p,
 267 div.compact div, div.compact div {
 268   margin-top: 0.1em;
 269   margin-bottom: 0.1em;
 270 }
 271
 272 tfoot {
 273   font-weight: bold;
 274 }
 275 td > div.verse {
 276   white-space: pre;
 277 }
 278
 279 div.hdlist {
 280   margin-top: 0.8em;
 281   margin-bottom: 0.8em;
 282 }
 283 div.hdlist tr {
 284   padding-bottom: 15px;
 285 }
 286 dt.hdlist1.strong, td.hdlist1.strong {
 287   font-weight: bold;
 288 }
 289 td.hdlist1 {
 290   vertical-align: top;
 291   font-style: normal;
 292   padding-right: 0.8em;
 293   color: navy;
 294 }
 295 td.hdlist2 {
 296   vertical-align: top;
 297 }
 298 div.hdlist.compact tr {
 299   margin: 0;
 300   padding-bottom: 0;
 301 }
 302
 303 .comment {
 304   background: yellow;
 305 }
 306
 307 .footnote, .footnoteref {
 308   font-size: 0.8em;
 309 }
 310
 311 span.footnote, span.footnoteref {
 312   vertical-align: super;
 313 }
 314
 315 #footnotes {
 316   margin: 20px 0 20px 0;
 317   padding: 7px 0 0 0;
 318 }
 319
 320 #footnotes div.footnote {
 321   margin: 0 0 5px 0;
 322 }
 323
 324 #footnotes hr {
 325   border: none;
 326   border-top: 1px solid silver;
 327   height: 1px;
 328   text-align: left;
 329   margin-left: 0;
 330   width: 20%;
 331   min-width: 100px;
 332 }
 333
 334 div.colist td {
 335   padding-right: 0.5em;
 336   padding-bottom: 0.3em;
 337   vertical-align: top;
 338 }
 339 div.colist td img {
 340   margin-top: 0.3em;
 341 }
 342
 343 @media print {
 344   #footer-badges { display: none; }
 345 }
 346
 347 #toc {
 348   margin-bottom: 2.5em;
 349 }
 350
 351 #toctitle {
 352   color: #527bbd;
 353   font-size: 1.1em;
 354   font-weight: bold;
 355   margin-top: 1.0em;
 356   margin-bottom: 0.1em;
 357 }
 358
 359 div.toclevel0, div.toclevel1, div.toclevel2, div.toclevel3, div.toclevel4 {
 360   margin-top: 0;
 361   margin-bottom: 0;
 362 }
 363 div.toclevel2 {
 364   margin-left: 2em;
 365   font-size: 0.9em;
 366 }
 367 div.toclevel3 {
 368   margin-left: 4em;
 369   font-size: 0.9em;
 370 }
 371 div.toclevel4 {
 372   margin-left: 6em;
 373   font-size: 0.9em;
 374 }
 375
 376 span.aqua { color: aqua; }
 377 span.black { color: black; }
 378 span.blue { color: blue; }
 379 span.fuchsia { color: fuchsia; }
 380 span.gray { color: gray; }
 381 span.green { color: green; }
 382 span.lime { color: lime; }
 383 span.maroon { color: maroon; }
 384 span.navy { color: navy; }
 385 span.olive { color: olive; }
 386 span.purple { color: purple; }
 387 span.red { color: red; }
 388 span.silver { color: silver; }
 389 span.teal { color: teal; }
 390 span.white { color: white; }
 391 span.yellow { color: yellow; }
 392
 393 span.aqua-background { background: aqua; }
 394 span.black-background { background: black; }
 395 span.blue-background { background: blue; }
 396 span.fuchsia-background { background: fuchsia; }
 397 span.gray-background { background: gray; }
 398 span.green-background { background: green; }
 399 span.lime-background { background: lime; }
 400 span.maroon-background { background: maroon; }
 401 span.navy-background { background: navy; }
 402 span.olive-background { background: olive; }
 403 span.purple-background { background: purple; }
 404 span.red-background { background: red; }
 405 span.silver-background { background: silver; }
 406 span.teal-background { background: teal; }
 407 span.white-background { background: white; }
 408 span.yellow-background { background: yellow; }
 409
 410 span.big { font-size: 2em; }
 411 span.small { font-size: 0.6em; }
 412
 413 span.underline { text-decoration: underline; }
 414 span.overline { text-decoration: overline; }
 415 span.line-through { text-decoration: line-through; }
 416
 417 div.unbreakable { page-break-inside: avoid; }
 418
 419
 420 /*
 421  * xhtml11 specific
 422  *
 423  * */
 424
 425 div.tableblock {
 426   margin-top: 1.0em;
 427   margin-bottom: 1.5em;
 428 }
 429 div.tableblock > table {
 430   border: 3px solid #527bbd;
 431 }
 432 thead, p.table.header {
 433   font-weight: bold;
 434   color: #527bbd;
 435 }
 436 p.table {
 437   margin-top: 0;
 438 }
 439 /* Because the table frame attribute is overridden by CSS in most browsers. */
 440 div.tableblock > table[frame="void"] {
 441   border-style: none;
 442 }
 443 div.tableblock > table[frame="hsides"] {
 444   border-left-style: none;
 445   border-right-style: none;
 446 }
 447 div.tableblock > table[frame="vsides"] {
 448   border-top-style: none;
 449   border-bottom-style: none;
 450 }
 451
 452
 453 /*
 454  * html5 specific
 455  *
 456  * */
 457
 458 table.tableblock {
 459   margin-top: 1.0em;
 460   margin-bottom: 1.5em;
 461 }
 462 thead, p.tableblock.header {
 463   font-weight: bold;
 464   color: #527bbd;
 465 }
 466 p.tableblock {
 467   margin-top: 0;
 468 }
 469 table.tableblock {
 470   border-width: 3px;
 471   border-spacing: 0px;
 472   border-style: solid;
 473   border-color: #527bbd;
 474   border-collapse: collapse;
 475 }
 476 th.tableblock, td.tableblock {
 477   border-width: 1px;
 478   padding: 4px;
 479   border-style: solid;
 480   border-color: #527bbd;
 481 }
 482
 483 table.tableblock.frame-topbot {
 484   border-left-style: hidden;
 485   border-right-style: hidden;
 486 }
 487 table.tableblock.frame-sides {
 488   border-top-style: hidden;
 489   border-bottom-style: hidden;
 490 }
 491 table.tableblock.frame-none {
 492   border-style: hidden;
 493 }
 494
 495 th.tableblock.halign-left, td.tableblock.halign-left {
 496   text-align: left;
 497 }
 498 th.tableblock.halign-center, td.tableblock.halign-center {
 499   text-align: center;
 500 }
 501 th.tableblock.halign-right, td.tableblock.halign-right {
 502   text-align: right;
 503 }
 504
 505 th.tableblock.valign-top, td.tableblock.valign-top {
 506   vertical-align: top;
 507 }
 508 th.tableblock.valign-middle, td.tableblock.valign-middle {
 509   vertical-align: middle;
 510 }
 511 th.tableblock.valign-bottom, td.tableblock.valign-bottom {
 512   vertical-align: bottom;
 513 }
 514
 515
 516 /*
 517  * manpage specific
 518  *
 519  * */
 520
 521 body.manpage h1 {
 522   padding-top: 0.5em;
 523   padding-bottom: 0.5em;
 524   border-top: 2px solid silver;
 525   border-bottom: 2px solid silver;
 526 }
 527 body.manpage h2 {
 528   border-style: none;
 529 }
 530 body.manpage div.sectionbody {
 531   margin-left: 3em;
 532 }
 533
 534 @media print {
 535   body.manpage div#toc { display: none; }
 536 }
 537
 538
 539 </style>
 540 <script type="text/javascript">
 541 /*<![CDATA[*/
 542 var asciidoc = {  // Namespace.
 543
 544 /////////////////////////////////////////////////////////////////////
 545 // Table Of Contents generator
 546 /////////////////////////////////////////////////////////////////////
 547
 548 /* Author: Mihai Bazon, September 2002
 549  * http://students.infoiasi.ro/~mishoo
 550  *
 551  * Table Of Content generator
 552  * Version: 0.4
 553  *
 554  * Feel free to use this script under the terms of the GNU General Public
 555  * License, as long as you do not remove or alter this notice.
 556  */
 557
 558  /* modified by Troy D. Hanson, September 2006. License: GPL */
 559  /* modified by Stuart Rackham, 2006, 2009. License: GPL */
 560
 561 // toclevels = 1..4.
 562 toc: function (toclevels) {
 563
 564   function getText(el) {
 565     var text = "";
 566     for (var i = el.firstChild; i != null; i = i.nextSibling) {
 567       if (i.nodeType == 3 /* Node.TEXT_NODE */) // IE doesn't speak constants.
 568         text += i.data;
 569       else if (i.firstChild != null)
 570         text += getText(i);
 571     }
 572     return text;
 573   }
 574
 575   function TocEntry(el, text, toclevel) {
 576     this.element = el;
 577     this.text = text;
 578     this.toclevel = toclevel;
 579   }
 580
 581   function tocEntries(el, toclevels) {
 582     var result = new Array;
 583     var re = new RegExp('[hH]([1-'+(toclevels+1)+'])');
 584     // Function that scans the DOM tree for header elements (the DOM2
 585     // nodeIterator API would be a better technique but not supported by all
 586     // browsers).
 587     var iterate = function (el) {
 588       for (var i = el.firstChild; i != null; i = i.nextSibling) {
 589         if (i.nodeType == 1 /* Node.ELEMENT_NODE */) {
 590           var mo = re.exec(i.tagName);
 591           if (mo && (i.getAttribute("class") || i.getAttribute("className")) != "float") {
 592             result[result.length] = new TocEntry(i, getText(i), mo[1]-1);
 593           }
 594           iterate(i);
 595         }
 596       }
 597     }
 598     iterate(el);
 599     return result;
 600   }
 601
 602   var toc = document.getElementById("toc");
 603   if (!toc) {
 604     return;
 605   }
 606
 607   // Delete existing TOC entries in case we're reloading the TOC.
 608   var tocEntriesToRemove = [];
 609   var i;
 610   for (i = 0; i < toc.childNodes.length; i++) {
 611     var entry = toc.childNodes[i];
 612     if (entry.nodeName.toLowerCase() == 'div'
 613      && entry.getAttribute("class")
 614      && entry.getAttribute("class").match(/^toclevel/))
 615       tocEntriesToRemove.push(entry);
 616   }
 617   for (i = 0; i < tocEntriesToRemove.length; i++) {
 618     toc.removeChild(tocEntriesToRemove[i]);
 619   }
 620
 621   // Rebuild TOC entries.
 622   var entries = tocEntries(document.getElementById("content"), toclevels);
 623   for (var i = 0; i < entries.length; ++i) {
 624     var entry = entries[i];
 625     if (entry.element.id == "")
 626       entry.element.id = "_toc_" + i;
 627     var a = document.createElement("a");
 628     a.href = "#" + entry.element.id;
 629     a.appendChild(document.createTextNode(entry.text));
 630     var div = document.createElement("div");
 631     div.appendChild(a);
 632     div.className = "toclevel" + entry.toclevel;
 633     toc.appendChild(div);
 634   }
 635   if (entries.length == 0)
 636     toc.parentNode.removeChild(toc);
 637 },
 638
 639
 640 /////////////////////////////////////////////////////////////////////
 641 // Footnotes generator
 642 /////////////////////////////////////////////////////////////////////
 643
 644 /* Based on footnote generation code from:
 645  * http://www.brandspankingnew.net/archive/2005/07/format_footnote.html
 646  */
 647
 648 footnotes: function () {
 649   // Delete existing footnote entries in case we're reloading the footnodes.
 650   var i;
 651   var noteholder = document.getElementById("footnotes");
 652   if (!noteholder) {
 653     return;
 654   }
 655   var entriesToRemove = [];
 656   for (i = 0; i < noteholder.childNodes.length; i++) {
 657     var entry = noteholder.childNodes[i];
 658     if (entry.nodeName.toLowerCase() == 'div' && entry.getAttribute("class") == "footnote")
 659       entriesToRemove.push(entry);
 660   }
 661   for (i = 0; i < entriesToRemove.length; i++) {
 662     noteholder.removeChild(entriesToRemove[i]);
 663   }
 664
 665   // Rebuild footnote entries.
 666   var cont = document.getElementById("content");
 667   var spans = cont.getElementsByTagName("span");
 668   var refs = {};
 669   var n = 0;
 670   for (i=0; i<spans.length; i++) {
 671     if (spans[i].className == "footnote") {
 672       n++;
 673       var note = spans[i].getAttribute("data-note");
 674       if (!note) {
 675         // Use [\s\S] in place of . so multi-line matches work.
 676         // Because JavaScript has no s (dotall) regex flag.
 677         note = spans[i].innerHTML.match(/\s*\[([\s\S]*)]\s*/)[1];
 678         spans[i].innerHTML =
 679           "[<a id='_footnoteref_" + n + "' href='#_footnote_" + n +
 680           "' title='View footnote' class='footnote'>" + n + "</a>]";
 681         spans[i].setAttribute("data-note", note);
 682       }
 683       noteholder.innerHTML +=
 684         "<div class='footnote' id='_footnote_" + n + "'>" +
 685         "<a href='#_footnoteref_" + n + "' title='Return to text'>" +
 686         n + "</a>. " + note + "</div>";
 687       var id =spans[i].getAttribute("id");
 688       if (id != null) refs["#"+id] = n;
 689     }
 690   }
 691   if (n == 0)
 692     noteholder.parentNode.removeChild(noteholder);
 693   else {
 694     // Process footnoterefs.
 695     for (i=0; i<spans.length; i++) {
 696       if (spans[i].className == "footnoteref") {
 697         var href = spans[i].getElementsByTagName("a")[0].getAttribute("href");
 698         href = href.match(/#.*/)[0];  // Because IE return full URL.
 699         n = refs[href];
 700         spans[i].innerHTML =
 701           "[<a href='#_footnote_" + n +
 702           "' title='View footnote' class='footnote'>" + n + "</a>]";
 703       }
 704     }
 705   }
 706 },
 707
 708 install: function(toclevels) {
 709   var timerId;
 710
 711   function reinstall() {
 712     asciidoc.footnotes();
 713     if (toclevels) {
 714       asciidoc.toc(toclevels);
 715     }
 716   }
 717
 718   function reinstallAndRemoveTimer() {
 719     clearInterval(timerId);
 720     reinstall();
 721   }
 722
 723   timerId = setInterval(reinstall, 500);
 724   if (document.addEventListener)
 725     document.addEventListener("DOMContentLoaded", reinstallAndRemoveTimer, false);
 726   else
 727     window.onload = reinstallAndRemoveTimer;
 728 }
 729
 730 }
 731 asciidoc.install();
 732 /*]]>*/
 733 </script>
 734 </head>
 735 <body class="article">
 736 <div id="header">
 737 <h1>Parallel Checkout Design Notes</h1>
 738 <span id="revdate">2023-08-21</span>
 739 </div>
 740 <div id="content">
 741 <div id="preamble">
 742 <div class="sectionbody">
 743 <div class="paragraph"><p>The "Parallel Checkout" feature attempts to use multiple processes to
 744 parallelize the work of uncompressing the blobs, applying in-core
 745 filters, and writing the resulting contents to the working tree during a
 746 checkout operation. It can be used by all checkout-related commands,
 747 such as <code>clone</code>, <code>checkout</code>, <code>reset</code>, <code>sparse-checkout</code>, and others.</p></div>
 748 <div class="paragraph"><p>These commands share the following basic structure:</p></div>
 749 <div class="ulist"><ul>
 750 <li>
 751 <p>
 752 Step 1: Read the current index file into memory.
 753 </p>
 754 </li>
 755 <li>
 756 <p>
 757 Step 2: Modify the in-memory index based upon the command, and
 758   temporarily mark all cache entries that need to be updated.
 759 </p>
 760 </li>
 761 <li>
 762 <p>
 763 Step 3: Populate the working tree to match the new candidate index.
 764   This includes iterating over all of the to-be-updated cache entries
 765   and delete, create, or overwrite the associated files in the working
 766   tree.
 767 </p>
 768 </li>
 769 <li>
 770 <p>
 771 Step 4: Write the new index to disk.
 772 </p>
 773 </li>
 774 </ul></div>
 775 <div class="paragraph"><p>Step 3 is the focus of the "parallel checkout" effort described here.</p></div>
 776 </div>
 777 </div>
 778 <div class="sect1">
 779 <h2 id="_sequential_implementation">Sequential Implementation</h2>
 780 <div class="sectionbody">
 781 <div class="paragraph"><p>For the purposes of discussion here, the current sequential
 782 implementation of Step 3 is divided in 3 parts, each one implemented in
 783 its own function:</p></div>
 784 <div class="ulist"><ul>
 785 <li>
 786 <p>
 787 Step 3a: <code>unpack-trees.c:check_updates()</code> contains a series of
 788   sequential loops iterating over the <code>cache_entry</code>'s array. The main
 789   loop in this function calls the Step 3b function for each of the
 790   to-be-updated entries.
 791 </p>
 792 </li>
 793 <li>
 794 <p>
 795 Step 3b: <code>entry.c:checkout_entry()</code> examines the existing working tree
 796   for file conflicts, collisions, and unsaved changes. It removes files
 797   and creates leading directories as necessary. It calls the Step 3c
 798   function for each entry to be written.
 799 </p>
 800 </li>
 801 <li>
 802 <p>
 803 Step 3c: <code>entry.c:write_entry()</code> loads the blob into memory, smudges
 804   it if necessary, creates the file in the working tree, writes the
 805   smudged contents, calls <code>fstat()</code> or <code>lstat()</code>, and updates the
 806   associated <code>cache_entry</code> struct with the stat information gathered.
 807 </p>
 808 </li>
 809 </ul></div>
 810 <div class="paragraph"><p>It wouldn&#8217;t be safe to perform Step 3b in parallel, as there could be
 811 race conditions between file creations and removals. Instead, the
 812 parallel checkout framework lets the sequential code handle Step 3b,
 813 and uses parallel workers to replace the sequential
 814 <code>entry.c:write_entry()</code> calls from Step 3c.</p></div>
 815 </div>
 816 </div>
 817 <div class="sect1">
 818 <h2 id="_rejected_multi_threaded_solution">Rejected Multi-Threaded Solution</h2>
 819 <div class="sectionbody">
 820 <div class="paragraph"><p>The most "straightforward" implementation would be to spread the set of
 821 to-be-updated cache entries across multiple threads. But due to the
 822 thread-unsafe functions in the object database code, we would have to use locks to
 823 coordinate the parallel operation. An early prototype of this solution
 824 showed that the multi-threaded checkout would bring performance
 825 improvements over the sequential code, but there was still too much lock
 826 contention. A <code>perf</code> profiling indicated that around 20% of the runtime
 827 during a local Linux clone (on an SSD) was spent in locking functions.
 828 For this reason this approach was rejected in favor of using multiple
 829 child processes, which led to a better performance.</p></div>
 830 </div>
 831 </div>
 832 <div class="sect1">
 833 <h2 id="_multi_process_solution">Multi-Process Solution</h2>
 834 <div class="sectionbody">
 835 <div class="paragraph"><p>Parallel checkout alters the aforementioned Step 3 to use multiple
 836 <code>checkout--worker</code> background processes to distribute the work. The
 837 long-running worker processes are controlled by the foreground Git
 838 command using the existing run-command API.</p></div>
 839 <div class="sect2">
 840 <h3 id="_overview">Overview</h3>
 841 <div class="paragraph"><p>Step 3b is only slightly altered; for each entry to be checked out, the
 842 main process performs the following steps:</p></div>
 843 <div class="ulist"><ul>
 844 <li>
 845 <p>
 846 M1: Check whether there is any untracked or unclean file in the
 847   working tree which would be overwritten by this entry, and decide
 848   whether to proceed (removing the file(s)) or not.
 849 </p>
 850 </li>
 851 <li>
 852 <p>
 853 M2: Create the leading directories.
 854 </p>
 855 </li>
 856 <li>
 857 <p>
 858 M3: Load the conversion attributes for the entry&#8217;s path.
 859 </p>
 860 </li>
 861 <li>
 862 <p>
 863 M4: Check, based on the entry&#8217;s type and conversion attributes,
 864   whether the entry is eligible for parallel checkout (more on this
 865   later). If it is eligible, enqueue the entry and the loaded
 866   attributes to later write the entry in parallel. If not, write the
 867   entry right away, using the default sequential code.
 868 </p>
 869 </li>
 870 </ul></div>
 871 <div class="paragraph"><p>Note: we save the conversion attributes associated with each entry
 872 because the workers don&#8217;t have access to the main process' index state,
 873 so they can&#8217;t load the attributes by themselves (and the attributes are
 874 needed to properly smudge the entry). Additionally, this has a positive
 875 impact on performance as (1) we don&#8217;t need to load the attributes twice
 876 and (2) the attributes machinery is optimized to handle paths in
 877 sequential order.</p></div>
 878 <div class="paragraph"><p>After all entries have passed through the above steps, the main process
 879 checks if the number of enqueued entries is sufficient to spread among
 880 the workers. If not, it just writes them sequentially. Otherwise, it
 881 spawns the workers and distributes the queued entries uniformly in
 882 continuous chunks. This aims to minimize the chances of two workers
 883 writing to the same directory simultaneously, which could increase lock
 884 contention in the kernel.</p></div>
 885 <div class="paragraph"><p>Then, for each assigned item, each worker:</p></div>
 886 <div class="ulist"><ul>
 887 <li>
 888 <p>
 889 W1: Checks if there is any non-directory file in the leading part of
 890   the entry&#8217;s path or if there already exists a file at the entry' path.
 891   If so, mark the entry with <code>PC_ITEM_COLLIDED</code> and skip it (more on
 892   this later).
 893 </p>
 894 </li>
 895 <li>
 896 <p>
 897 W2: Creates the file (with O_CREAT and O_EXCL).
 898 </p>
 899 </li>
 900 <li>
 901 <p>
 902 W3: Loads the blob into memory (inflating and delta reconstructing
 903   it).
 904 </p>
 905 </li>
 906 <li>
 907 <p>
 908 W4: Applies any required in-process filter, like end-of-line
 909   conversion and re-encoding.
 910 </p>
 911 </li>
 912 <li>
 913 <p>
 914 W5: Writes the result to the file descriptor opened at W2.
 915 </p>
 916 </li>
 917 <li>
 918 <p>
 919 W6: Calls <code>fstat()</code> or lstat()` on the just-written path, and sends
 920   the result back to the main process, together with the end status of
 921   the operation and the item&#8217;s identification number.
 922 </p>
 923 </li>
 924 </ul></div>
 925 <div class="paragraph"><p>Note that, when possible, steps W3 to W5 are delegated to the streaming
 926 machinery, removing the need to keep the entire blob in memory.</p></div>
 927 <div class="paragraph"><p>If the worker fails to read the blob or to write it to the working tree,
 928 it removes the created file to avoid leaving empty files behind. This is
 929 the <strong>only</strong> time a worker is allowed to remove a file.</p></div>
 930 <div class="paragraph"><p>As mentioned earlier, it is the responsibility of the main process to
 931 remove any file that blocks the checkout operation (or abort if the
 932 removal(s) would cause data loss and the user didn&#8217;t ask to <code>--force</code>).
 933 This is crucial to avoid race conditions and also to properly detect
 934 path collisions at Step W1.</p></div>
 935 <div class="paragraph"><p>After the workers finish writing the items and sending back the required
 936 information, the main process handles the results in two steps:</p></div>
 937 <div class="ulist"><ul>
 938 <li>
 939 <p>
 940 First, it updates the in-memory index with the <code>lstat()</code> information
 941   sent by the workers. (This must be done first as this information
 942   might me required in the following step.)
 943 </p>
 944 </li>
 945 <li>
 946 <p>
 947 Then it writes the items which collided on disk (i.e. items marked
 948   with <code>PC_ITEM_COLLIDED</code>). More on this below.
 949 </p>
 950 </li>
 951 </ul></div>
 952 </div>
 953 </div>
 954 </div>
 955 <div class="sect1">
 956 <h2 id="_path_collisions">Path Collisions</h2>
 957 <div class="sectionbody">
 958 <div class="paragraph"><p>Path collisions happen when two different paths correspond to the same
 959 entry in the file system. E.g. the paths <em>a</em> and <em>A</em> would collide in a
 960 case-insensitive file system.</p></div>
 961 <div class="paragraph"><p>The sequential checkout deals with collisions in the same way that it
 962 deals with files that were already present in the working tree before
 963 checkout. Basically, it checks if the path that it wants to write
 964 already exists on disk, makes sure the existing file doesn&#8217;t have
 965 unsaved data, and then overwrites it. (To be more pedantic: it deletes
 966 the existing file and creates the new one.) So, if there are multiple
 967 colliding files to be checked out, the sequential code will write each
 968 one of them but only the last will actually survive on disk.</p></div>
 969 <div class="paragraph"><p>Parallel checkout aims to reproduce the same behavior. However, we
 970 cannot let the workers racily write to the same file on disk. Instead,
 971 the workers detect when the entry that they want to check out would
 972 collide with an existing file, and mark it with <code>PC_ITEM_COLLIDED</code>.
 973 Later, the main process can sequentially feed these entries back to
 974 <code>checkout_entry()</code> without the risk of race conditions. On clone, this
 975 also has the effect of marking the colliding entries to later emit a
 976 warning for the user, like the classic sequential checkout does.</p></div>
 977 <div class="paragraph"><p>The workers are able to detect both collisions among the entries being
 978 concurrently written and collisions between a parallel-eligible entry
 979 and an ineligible entry. The general idea for collision detection is
 980 quite straightforward: for each parallel-eligible entry, the main
 981 process must remove all files that prevent this entry from being written
 982 (before enqueueing it). This includes any non-directory file in the
 983 leading path of the entry. Later, when a worker gets assigned the entry,
 984 it looks again for the non-directories files and for an already existing
 985 file at the entry&#8217;s path. If any of these checks finds something, the
 986 worker knows that there was a path collision.</p></div>
 987 <div class="paragraph"><p>Because parallel checkout can distinguish path collisions from the case
 988 where the file was already present in the working tree before checkout,
 989 we could alternatively choose to skip the checkout of colliding entries.
 990 However, each entry that doesn&#8217;t get written would have NULL <code>lstat()</code>
 991 fields on the index. This could cause performance penalties for
 992 subsequent commands that need to refresh the index, as they would have
 993 to go to the file system to see if the entry is dirty. Thus, if we have
 994 N entries in a colliding group and we decide to write and <code>lstat()</code> only
 995 one of them, every subsequent <code>git-status</code> will have to read, convert,
 996 and hash the written file N - 1 times. By checking out all colliding
 997 entries (like the sequential code does), we only pay the overhead once,
 998 during checkout.</p></div>
 999 </div>
1000 </div>
1001 <div class="sect1">
1002 <h2 id="_eligible_entries_for_parallel_checkout">Eligible Entries for Parallel Checkout</h2>
1003 <div class="sectionbody">
1004 <div class="paragraph"><p>As previously mentioned, not all entries passed to <code>checkout_entry()</code>
1005 will be considered eligible for parallel checkout. More specifically, we
1006 exclude:</p></div>
1007 <div class="ulist"><ul>
1008 <li>
1009 <p>
1010 Symbolic links; to avoid race conditions that, in combination with
1011   path collisions, could cause workers to write files at the wrong
1012   place. For example, if we were to concurrently check out a symlink
1013   <em>a</em> &#8594; <em>b</em> and a regular file <em>A/f</em> in a case-insensitive file system,
1014   we could potentially end up writing the file <em>A/f</em> at <em>a/f</em>, due to a
1015   race condition.
1016 </p>
1017 </li>
1018 <li>
1019 <p>
1020 Regular files that require external filters (either "one shot" filters
1021   or long-running process filters). These filters are black-boxes to Git
1022   and may have their own internal locking or non-concurrent assumptions.
1023   So it might not be safe to run multiple instances in parallel.
1024 </p>
1025 <div class="paragraph"><p>Besides, long-running filters may use the delayed checkout feature to
1026 postpone the return of some filtered blobs. The delayed checkout queue
1027 and the parallel checkout queue are not compatible and should remain
1028 separate.</p></div>
1029 <div class="paragraph"><p>Note: regular files that only require internal filters, like end-of-line
1030 conversion and re-encoding, are eligible for parallel checkout.</p></div>
1031 </li>
1032 </ul></div>
1033 <div class="paragraph"><p>Ineligible entries are checked out by the classic sequential codepath
1034 <strong>before</strong> spawning workers.</p></div>
1035 <div class="paragraph"><p>Note: submodules&#8217;s files are also eligible for parallel checkout (as
1036 long as they don&#8217;t fall into any of the excluding categories mentioned
1037 above). But since each submodule is checked out in its own child
1038 process, we don&#8217;t mix the superproject&#8217;s and the submodules' files in
1039 the same parallel checkout process or queue.</p></div>
1040 </div>
1041 </div>
1042 <div class="sect1">
1043 <h2 id="_the_api">The API</h2>
1044 <div class="sectionbody">
1045 <div class="paragraph"><p>The parallel checkout API was designed with the goal of minimizing
1046 changes to the current users of the checkout machinery. This means that
1047 they don&#8217;t have to call a different function for sequential or parallel
1048 checkout. As already mentioned, <code>checkout_entry()</code> will automatically
1049 insert the given entry in the parallel checkout queue when this feature
1050 is enabled and the entry is eligible; otherwise, it will just write the
1051 entry right away, using the sequential code. In general, callers of the
1052 parallel checkout API should look similar to this:</p></div>
1053 <div class="listingblock">
1054 <div class="content">
1055 <pre><code>int pc_workers, pc_threshold, err = 0;
1056 struct checkout state;
1057
1058 get_parallel_checkout_configs(&amp;pc_workers, &amp;pc_threshold);
1059
1060 /*
1061  * This check is not strictly required, but it
1062  * should save some time in sequential mode.
1063  */
1064 if (pc_workers &gt; 1)
1065         init_parallel_checkout();
1066
1067 for (each cache_entry ce to-be-updated)
1068         err |= checkout_entry(ce, &amp;state, NULL, NULL);
1069
1070 err |= run_parallel_checkout(&amp;state, pc_workers, pc_threshold, NULL, NULL);</code></pre>
1071 </div></div>
1072 </div>
1073 </div>
1074 </div>
1075 <div id="footnotes"><hr /></div>
1076 <div id="footer">
1077 <div id="footer-text">
1078 Last updated
1079  2022-11-11 23:55:30 PST
1080 </div>
1081 </div>
1082 </body>
1083 </html>