Autogenerated HTML docs for v2.42.0
[git-htmldocs.git] / technical / reftable.html
blob16c612f2491148ba3a424ea6e45a3c5e9d8f2118
1 <?xml version="1.0" encoding="UTF-8"?>
2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
3 "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
4 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
5 <head>
6 <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=UTF-8" />
7 <meta name="generator" content="AsciiDoc 10.2.0" />
8 <title></title>
9 <style type="text/css">
10 /* Shared CSS for AsciiDoc xhtml11 and html5 backends */
12 /* Default font. */
13 body {
14 font-family: Georgia,serif;
17 /* Title font. */
18 h1, h2, h3, h4, h5, h6,
19 div.title, caption.title,
20 thead, p.table.header,
21 #toctitle,
22 #author, #revnumber, #revdate, #revremark,
23 #footer {
24 font-family: Arial,Helvetica,sans-serif;
27 body {
28 margin: 1em 5% 1em 5%;
31 a {
32 color: blue;
33 text-decoration: underline;
35 a:visited {
36 color: fuchsia;
39 em {
40 font-style: italic;
41 color: navy;
44 strong {
45 font-weight: bold;
46 color: #083194;
49 h1, h2, h3, h4, h5, h6 {
50 color: #527bbd;
51 margin-top: 1.2em;
52 margin-bottom: 0.5em;
53 line-height: 1.3;
56 h1, h2, h3 {
57 border-bottom: 2px solid silver;
59 h2 {
60 padding-top: 0.5em;
62 h3 {
63 float: left;
65 h3 + * {
66 clear: left;
68 h5 {
69 font-size: 1.0em;
72 div.sectionbody {
73 margin-left: 0;
76 hr {
77 border: 1px solid silver;
80 p {
81 margin-top: 0.5em;
82 margin-bottom: 0.5em;
85 ul, ol, li > p {
86 margin-top: 0;
88 ul > li { color: #aaa; }
89 ul > li > * { color: black; }
91 .monospaced, code, pre {
92 font-family: "Courier New", Courier, monospace;
93 font-size: inherit;
94 color: navy;
95 padding: 0;
96 margin: 0;
98 pre {
99 white-space: pre-wrap;
102 #author {
103 color: #527bbd;
104 font-weight: bold;
105 font-size: 1.1em;
107 #email {
109 #revnumber, #revdate, #revremark {
112 #footer {
113 font-size: small;
114 border-top: 2px solid silver;
115 padding-top: 0.5em;
116 margin-top: 4.0em;
118 #footer-text {
119 float: left;
120 padding-bottom: 0.5em;
122 #footer-badges {
123 float: right;
124 padding-bottom: 0.5em;
127 #preamble {
128 margin-top: 1.5em;
129 margin-bottom: 1.5em;
131 div.imageblock, div.exampleblock, div.verseblock,
132 div.quoteblock, div.literalblock, div.listingblock, div.sidebarblock,
133 div.admonitionblock {
134 margin-top: 1.0em;
135 margin-bottom: 1.5em;
137 div.admonitionblock {
138 margin-top: 2.0em;
139 margin-bottom: 2.0em;
140 margin-right: 10%;
141 color: #606060;
144 div.content { /* Block element content. */
145 padding: 0;
148 /* Block element titles. */
149 div.title, caption.title {
150 color: #527bbd;
151 font-weight: bold;
152 text-align: left;
153 margin-top: 1.0em;
154 margin-bottom: 0.5em;
156 div.title + * {
157 margin-top: 0;
160 td div.title:first-child {
161 margin-top: 0.0em;
163 div.content div.title:first-child {
164 margin-top: 0.0em;
166 div.content + div.title {
167 margin-top: 0.0em;
170 div.sidebarblock > div.content {
171 background: #ffffee;
172 border: 1px solid #dddddd;
173 border-left: 4px solid #f0f0f0;
174 padding: 0.5em;
177 div.listingblock > div.content {
178 border: 1px solid #dddddd;
179 border-left: 5px solid #f0f0f0;
180 background: #f8f8f8;
181 padding: 0.5em;
184 div.quoteblock, div.verseblock {
185 padding-left: 1.0em;
186 margin-left: 1.0em;
187 margin-right: 10%;
188 border-left: 5px solid #f0f0f0;
189 color: #888;
192 div.quoteblock > div.attribution {
193 padding-top: 0.5em;
194 text-align: right;
197 div.verseblock > pre.content {
198 font-family: inherit;
199 font-size: inherit;
201 div.verseblock > div.attribution {
202 padding-top: 0.75em;
203 text-align: left;
205 /* DEPRECATED: Pre version 8.2.7 verse style literal block. */
206 div.verseblock + div.attribution {
207 text-align: left;
210 div.admonitionblock .icon {
211 vertical-align: top;
212 font-size: 1.1em;
213 font-weight: bold;
214 text-decoration: underline;
215 color: #527bbd;
216 padding-right: 0.5em;
218 div.admonitionblock td.content {
219 padding-left: 0.5em;
220 border-left: 3px solid #dddddd;
223 div.exampleblock > div.content {
224 border-left: 3px solid #dddddd;
225 padding-left: 0.5em;
228 div.imageblock div.content { padding-left: 0; }
229 span.image img { border-style: none; vertical-align: text-bottom; }
230 a.image:visited { color: white; }
232 dl {
233 margin-top: 0.8em;
234 margin-bottom: 0.8em;
236 dt {
237 margin-top: 0.5em;
238 margin-bottom: 0;
239 font-style: normal;
240 color: navy;
242 dd > *:first-child {
243 margin-top: 0.1em;
246 ul, ol {
247 list-style-position: outside;
249 ol.arabic {
250 list-style-type: decimal;
252 ol.loweralpha {
253 list-style-type: lower-alpha;
255 ol.upperalpha {
256 list-style-type: upper-alpha;
258 ol.lowerroman {
259 list-style-type: lower-roman;
261 ol.upperroman {
262 list-style-type: upper-roman;
265 div.compact ul, div.compact ol,
266 div.compact p, div.compact p,
267 div.compact div, div.compact div {
268 margin-top: 0.1em;
269 margin-bottom: 0.1em;
272 tfoot {
273 font-weight: bold;
275 td > div.verse {
276 white-space: pre;
279 div.hdlist {
280 margin-top: 0.8em;
281 margin-bottom: 0.8em;
283 div.hdlist tr {
284 padding-bottom: 15px;
286 dt.hdlist1.strong, td.hdlist1.strong {
287 font-weight: bold;
289 td.hdlist1 {
290 vertical-align: top;
291 font-style: normal;
292 padding-right: 0.8em;
293 color: navy;
295 td.hdlist2 {
296 vertical-align: top;
298 div.hdlist.compact tr {
299 margin: 0;
300 padding-bottom: 0;
303 .comment {
304 background: yellow;
307 .footnote, .footnoteref {
308 font-size: 0.8em;
311 span.footnote, span.footnoteref {
312 vertical-align: super;
315 #footnotes {
316 margin: 20px 0 20px 0;
317 padding: 7px 0 0 0;
320 #footnotes div.footnote {
321 margin: 0 0 5px 0;
324 #footnotes hr {
325 border: none;
326 border-top: 1px solid silver;
327 height: 1px;
328 text-align: left;
329 margin-left: 0;
330 width: 20%;
331 min-width: 100px;
334 div.colist td {
335 padding-right: 0.5em;
336 padding-bottom: 0.3em;
337 vertical-align: top;
339 div.colist td img {
340 margin-top: 0.3em;
343 @media print {
344 #footer-badges { display: none; }
347 #toc {
348 margin-bottom: 2.5em;
351 #toctitle {
352 color: #527bbd;
353 font-size: 1.1em;
354 font-weight: bold;
355 margin-top: 1.0em;
356 margin-bottom: 0.1em;
359 div.toclevel0, div.toclevel1, div.toclevel2, div.toclevel3, div.toclevel4 {
360 margin-top: 0;
361 margin-bottom: 0;
363 div.toclevel2 {
364 margin-left: 2em;
365 font-size: 0.9em;
367 div.toclevel3 {
368 margin-left: 4em;
369 font-size: 0.9em;
371 div.toclevel4 {
372 margin-left: 6em;
373 font-size: 0.9em;
376 span.aqua { color: aqua; }
377 span.black { color: black; }
378 span.blue { color: blue; }
379 span.fuchsia { color: fuchsia; }
380 span.gray { color: gray; }
381 span.green { color: green; }
382 span.lime { color: lime; }
383 span.maroon { color: maroon; }
384 span.navy { color: navy; }
385 span.olive { color: olive; }
386 span.purple { color: purple; }
387 span.red { color: red; }
388 span.silver { color: silver; }
389 span.teal { color: teal; }
390 span.white { color: white; }
391 span.yellow { color: yellow; }
393 span.aqua-background { background: aqua; }
394 span.black-background { background: black; }
395 span.blue-background { background: blue; }
396 span.fuchsia-background { background: fuchsia; }
397 span.gray-background { background: gray; }
398 span.green-background { background: green; }
399 span.lime-background { background: lime; }
400 span.maroon-background { background: maroon; }
401 span.navy-background { background: navy; }
402 span.olive-background { background: olive; }
403 span.purple-background { background: purple; }
404 span.red-background { background: red; }
405 span.silver-background { background: silver; }
406 span.teal-background { background: teal; }
407 span.white-background { background: white; }
408 span.yellow-background { background: yellow; }
410 span.big { font-size: 2em; }
411 span.small { font-size: 0.6em; }
413 span.underline { text-decoration: underline; }
414 span.overline { text-decoration: overline; }
415 span.line-through { text-decoration: line-through; }
417 div.unbreakable { page-break-inside: avoid; }
421 * xhtml11 specific
423 * */
425 div.tableblock {
426 margin-top: 1.0em;
427 margin-bottom: 1.5em;
429 div.tableblock > table {
430 border: 3px solid #527bbd;
432 thead, p.table.header {
433 font-weight: bold;
434 color: #527bbd;
436 p.table {
437 margin-top: 0;
439 /* Because the table frame attribute is overridden by CSS in most browsers. */
440 div.tableblock > table[frame="void"] {
441 border-style: none;
443 div.tableblock > table[frame="hsides"] {
444 border-left-style: none;
445 border-right-style: none;
447 div.tableblock > table[frame="vsides"] {
448 border-top-style: none;
449 border-bottom-style: none;
454 * html5 specific
456 * */
458 table.tableblock {
459 margin-top: 1.0em;
460 margin-bottom: 1.5em;
462 thead, p.tableblock.header {
463 font-weight: bold;
464 color: #527bbd;
466 p.tableblock {
467 margin-top: 0;
469 table.tableblock {
470 border-width: 3px;
471 border-spacing: 0px;
472 border-style: solid;
473 border-color: #527bbd;
474 border-collapse: collapse;
476 th.tableblock, td.tableblock {
477 border-width: 1px;
478 padding: 4px;
479 border-style: solid;
480 border-color: #527bbd;
483 table.tableblock.frame-topbot {
484 border-left-style: hidden;
485 border-right-style: hidden;
487 table.tableblock.frame-sides {
488 border-top-style: hidden;
489 border-bottom-style: hidden;
491 table.tableblock.frame-none {
492 border-style: hidden;
495 th.tableblock.halign-left, td.tableblock.halign-left {
496 text-align: left;
498 th.tableblock.halign-center, td.tableblock.halign-center {
499 text-align: center;
501 th.tableblock.halign-right, td.tableblock.halign-right {
502 text-align: right;
505 th.tableblock.valign-top, td.tableblock.valign-top {
506 vertical-align: top;
508 th.tableblock.valign-middle, td.tableblock.valign-middle {
509 vertical-align: middle;
511 th.tableblock.valign-bottom, td.tableblock.valign-bottom {
512 vertical-align: bottom;
517 * manpage specific
519 * */
521 body.manpage h1 {
522 padding-top: 0.5em;
523 padding-bottom: 0.5em;
524 border-top: 2px solid silver;
525 border-bottom: 2px solid silver;
527 body.manpage h2 {
528 border-style: none;
530 body.manpage div.sectionbody {
531 margin-left: 3em;
534 @media print {
535 body.manpage div#toc { display: none; }
539 </style>
540 <script type="text/javascript">
541 /*<![CDATA[*/
542 var asciidoc = { // Namespace.
544 /////////////////////////////////////////////////////////////////////
545 // Table Of Contents generator
546 /////////////////////////////////////////////////////////////////////
548 /* Author: Mihai Bazon, September 2002
549 * http://students.infoiasi.ro/~mishoo
551 * Table Of Content generator
552 * Version: 0.4
554 * Feel free to use this script under the terms of the GNU General Public
555 * License, as long as you do not remove or alter this notice.
558 /* modified by Troy D. Hanson, September 2006. License: GPL */
559 /* modified by Stuart Rackham, 2006, 2009. License: GPL */
561 // toclevels = 1..4.
562 toc: function (toclevels) {
564 function getText(el) {
565 var text = "";
566 for (var i = el.firstChild; i != null; i = i.nextSibling) {
567 if (i.nodeType == 3 /* Node.TEXT_NODE */) // IE doesn't speak constants.
568 text += i.data;
569 else if (i.firstChild != null)
570 text += getText(i);
572 return text;
575 function TocEntry(el, text, toclevel) {
576 this.element = el;
577 this.text = text;
578 this.toclevel = toclevel;
581 function tocEntries(el, toclevels) {
582 var result = new Array;
583 var re = new RegExp('[hH]([1-'+(toclevels+1)+'])');
584 // Function that scans the DOM tree for header elements (the DOM2
585 // nodeIterator API would be a better technique but not supported by all
586 // browsers).
587 var iterate = function (el) {
588 for (var i = el.firstChild; i != null; i = i.nextSibling) {
589 if (i.nodeType == 1 /* Node.ELEMENT_NODE */) {
590 var mo = re.exec(i.tagName);
591 if (mo && (i.getAttribute("class") || i.getAttribute("className")) != "float") {
592 result[result.length] = new TocEntry(i, getText(i), mo[1]-1);
594 iterate(i);
598 iterate(el);
599 return result;
602 var toc = document.getElementById("toc");
603 if (!toc) {
604 return;
607 // Delete existing TOC entries in case we're reloading the TOC.
608 var tocEntriesToRemove = [];
609 var i;
610 for (i = 0; i < toc.childNodes.length; i++) {
611 var entry = toc.childNodes[i];
612 if (entry.nodeName.toLowerCase() == 'div'
613 && entry.getAttribute("class")
614 && entry.getAttribute("class").match(/^toclevel/))
615 tocEntriesToRemove.push(entry);
617 for (i = 0; i < tocEntriesToRemove.length; i++) {
618 toc.removeChild(tocEntriesToRemove[i]);
621 // Rebuild TOC entries.
622 var entries = tocEntries(document.getElementById("content"), toclevels);
623 for (var i = 0; i < entries.length; ++i) {
624 var entry = entries[i];
625 if (entry.element.id == "")
626 entry.element.id = "_toc_" + i;
627 var a = document.createElement("a");
628 a.href = "#" + entry.element.id;
629 a.appendChild(document.createTextNode(entry.text));
630 var div = document.createElement("div");
631 div.appendChild(a);
632 div.className = "toclevel" + entry.toclevel;
633 toc.appendChild(div);
635 if (entries.length == 0)
636 toc.parentNode.removeChild(toc);
640 /////////////////////////////////////////////////////////////////////
641 // Footnotes generator
642 /////////////////////////////////////////////////////////////////////
644 /* Based on footnote generation code from:
645 * http://www.brandspankingnew.net/archive/2005/07/format_footnote.html
648 footnotes: function () {
649 // Delete existing footnote entries in case we're reloading the footnodes.
650 var i;
651 var noteholder = document.getElementById("footnotes");
652 if (!noteholder) {
653 return;
655 var entriesToRemove = [];
656 for (i = 0; i < noteholder.childNodes.length; i++) {
657 var entry = noteholder.childNodes[i];
658 if (entry.nodeName.toLowerCase() == 'div' && entry.getAttribute("class") == "footnote")
659 entriesToRemove.push(entry);
661 for (i = 0; i < entriesToRemove.length; i++) {
662 noteholder.removeChild(entriesToRemove[i]);
665 // Rebuild footnote entries.
666 var cont = document.getElementById("content");
667 var spans = cont.getElementsByTagName("span");
668 var refs = {};
669 var n = 0;
670 for (i=0; i<spans.length; i++) {
671 if (spans[i].className == "footnote") {
672 n++;
673 var note = spans[i].getAttribute("data-note");
674 if (!note) {
675 // Use [\s\S] in place of . so multi-line matches work.
676 // Because JavaScript has no s (dotall) regex flag.
677 note = spans[i].innerHTML.match(/\s*\[([\s\S]*)]\s*/)[1];
678 spans[i].innerHTML =
679 "[<a id='_footnoteref_" + n + "' href='#_footnote_" + n +
680 "' title='View footnote' class='footnote'>" + n + "</a>]";
681 spans[i].setAttribute("data-note", note);
683 noteholder.innerHTML +=
684 "<div class='footnote' id='_footnote_" + n + "'>" +
685 "<a href='#_footnoteref_" + n + "' title='Return to text'>" +
686 n + "</a>. " + note + "</div>";
687 var id =spans[i].getAttribute("id");
688 if (id != null) refs["#"+id] = n;
691 if (n == 0)
692 noteholder.parentNode.removeChild(noteholder);
693 else {
694 // Process footnoterefs.
695 for (i=0; i<spans.length; i++) {
696 if (spans[i].className == "footnoteref") {
697 var href = spans[i].getElementsByTagName("a")[0].getAttribute("href");
698 href = href.match(/#.*/)[0]; // Because IE return full URL.
699 n = refs[href];
700 spans[i].innerHTML =
701 "[<a href='#_footnote_" + n +
702 "' title='View footnote' class='footnote'>" + n + "</a>]";
708 install: function(toclevels) {
709 var timerId;
711 function reinstall() {
712 asciidoc.footnotes();
713 if (toclevels) {
714 asciidoc.toc(toclevels);
718 function reinstallAndRemoveTimer() {
719 clearInterval(timerId);
720 reinstall();
723 timerId = setInterval(reinstall, 500);
724 if (document.addEventListener)
725 document.addEventListener("DOMContentLoaded", reinstallAndRemoveTimer, false);
726 else
727 window.onload = reinstallAndRemoveTimer;
731 asciidoc.install();
732 /*]]>*/
733 </script>
734 </head>
735 <body class="article">
736 <div id="header">
737 </div>
738 <div id="content">
739 <div class="sect1">
740 <h2 id="_reftable">reftable</h2>
741 <div class="sectionbody">
742 <div class="sect2">
743 <h3 id="_overview">Overview</h3>
744 <div class="sect3">
745 <h4 id="_problem_statement">Problem statement</h4>
746 <div class="paragraph"><p>Some repositories contain a lot of references (e.g. android at 866k,
747 rails at 31k). The existing packed-refs format takes up a lot of space
748 (e.g. 62M), and does not scale with additional references. Lookup of a
749 single reference requires linearly scanning the file.</p></div>
750 <div class="paragraph"><p>Atomic pushes modifying multiple references require copying the entire
751 packed-refs file, which can be a considerable amount of data moved
752 (e.g. 62M in, 62M out) for even small transactions (2 refs modified).</p></div>
753 <div class="paragraph"><p>Repositories with many loose references occupy a large number of disk
754 blocks from the local file system, as each reference is its own file
755 storing 41 bytes (and another file for the corresponding reflog). This
756 negatively affects the number of inodes available when a large number of
757 repositories are stored on the same filesystem. Readers can be penalized
758 due to the larger number of syscalls required to traverse and read the
759 <code>$GIT_DIR/refs</code> directory.</p></div>
760 </div>
761 <div class="sect3">
762 <h4 id="_objectives">Objectives</h4>
763 <div class="ulist"><ul>
764 <li>
766 Near constant time lookup for any single reference, even when the
767 repository is cold and not in process or kernel cache.
768 </p>
769 </li>
770 <li>
772 Near constant time verification if an object name is referred to by at least
773 one reference (for allow-tip-sha1-in-want).
774 </p>
775 </li>
776 <li>
778 Efficient enumeration of an entire namespace, such as <code>refs/tags/</code>.
779 </p>
780 </li>
781 <li>
783 Support atomic push with <code>O(size_of_update)</code> operations.
784 </p>
785 </li>
786 <li>
788 Combine reflog storage with ref storage for small transactions.
789 </p>
790 </li>
791 <li>
793 Separate reflog storage for base refs and historical logs.
794 </p>
795 </li>
796 </ul></div>
797 </div>
798 <div class="sect3">
799 <h4 id="_description">Description</h4>
800 <div class="paragraph"><p>A reftable file is a portable binary file format customized for
801 reference storage. References are sorted, enabling linear scans, binary
802 search lookup, and range scans.</p></div>
803 <div class="paragraph"><p>Storage in the file is organized into variable sized blocks. Prefix
804 compression is used within a single block to reduce disk space. Block
805 size and alignment is tunable by the writer.</p></div>
806 </div>
807 <div class="sect3">
808 <h4 id="_performance">Performance</h4>
809 <div class="paragraph"><p>Space used, packed-refs vs. reftable:</p></div>
810 <div class="tableblock">
811 <table rules="all"
812 width="100%"
813 frame="border"
814 cellspacing="0" cellpadding="4">
815 <col width="16%" />
816 <col width="16%" />
817 <col width="16%" />
818 <col width="16%" />
819 <col width="16%" />
820 <col width="16%" />
821 <thead>
822 <tr>
823 <th align="left" valign="top">repository </th>
824 <th align="right" valign="top">packed-refs </th>
825 <th align="right" valign="top">reftable </th>
826 <th align="right" valign="top">% original </th>
827 <th align="right" valign="top">avg ref </th>
828 <th align="right" valign="top">avg obj</th>
829 </tr>
830 </thead>
831 <tbody>
832 <tr>
833 <td align="left" valign="top"><p class="table">android</p></td>
834 <td align="right" valign="top"><p class="table">62.2 M</p></td>
835 <td align="right" valign="top"><p class="table">36.1 M</p></td>
836 <td align="right" valign="top"><p class="table">58.0%</p></td>
837 <td align="right" valign="top"><p class="table">33 bytes</p></td>
838 <td align="right" valign="top"><p class="table">5 bytes</p></td>
839 </tr>
840 <tr>
841 <td align="left" valign="top"><p class="table">rails</p></td>
842 <td align="right" valign="top"><p class="table">1.8 M</p></td>
843 <td align="right" valign="top"><p class="table">1.1 M</p></td>
844 <td align="right" valign="top"><p class="table">57.7%</p></td>
845 <td align="right" valign="top"><p class="table">29 bytes</p></td>
846 <td align="right" valign="top"><p class="table">4 bytes</p></td>
847 </tr>
848 <tr>
849 <td align="left" valign="top"><p class="table">git</p></td>
850 <td align="right" valign="top"><p class="table">78.7 K</p></td>
851 <td align="right" valign="top"><p class="table">48.1 K</p></td>
852 <td align="right" valign="top"><p class="table">61.0%</p></td>
853 <td align="right" valign="top"><p class="table">50 bytes</p></td>
854 <td align="right" valign="top"><p class="table">4 bytes</p></td>
855 </tr>
856 <tr>
857 <td align="left" valign="top"><p class="table">git (heads)</p></td>
858 <td align="right" valign="top"><p class="table">332 b</p></td>
859 <td align="right" valign="top"><p class="table">269 b</p></td>
860 <td align="right" valign="top"><p class="table">81.0%</p></td>
861 <td align="right" valign="top"><p class="table">33 bytes</p></td>
862 <td align="right" valign="top"><p class="table">0 bytes</p></td>
863 </tr>
864 </tbody>
865 </table>
866 </div>
867 <div class="paragraph"><p>Scan (read 866k refs), by reference name lookup (single ref from 866k
868 refs), and by SHA-1 lookup (refs with that SHA-1, from 866k refs):</p></div>
869 <div class="tableblock">
870 <table rules="all"
871 width="100%"
872 frame="border"
873 cellspacing="0" cellpadding="4">
874 <col width="20%" />
875 <col width="20%" />
876 <col width="20%" />
877 <col width="20%" />
878 <col width="20%" />
879 <thead>
880 <tr>
881 <th align="left" valign="top">format </th>
882 <th align="right" valign="top">cache </th>
883 <th align="right" valign="top">scan </th>
884 <th align="right" valign="top">by name </th>
885 <th align="right" valign="top">by SHA-1</th>
886 </tr>
887 </thead>
888 <tbody>
889 <tr>
890 <td align="left" valign="top"><p class="table">packed-refs</p></td>
891 <td align="right" valign="top"><p class="table">cold</p></td>
892 <td align="right" valign="top"><p class="table">402 ms</p></td>
893 <td align="right" valign="top"><p class="table">409,660.1 usec</p></td>
894 <td align="right" valign="top"><p class="table">412,535.8 usec</p></td>
895 </tr>
896 <tr>
897 <td align="left" valign="top"><p class="table">packed-refs</p></td>
898 <td align="right" valign="top"><p class="table">hot</p></td>
899 <td align="right" valign="top"><p class="table"></p></td>
900 <td align="right" valign="top"><p class="table">6,844.6 usec</p></td>
901 <td align="right" valign="top"><p class="table">20,110.1 usec</p></td>
902 </tr>
903 <tr>
904 <td align="left" valign="top"><p class="table">reftable</p></td>
905 <td align="right" valign="top"><p class="table">cold</p></td>
906 <td align="right" valign="top"><p class="table">112 ms</p></td>
907 <td align="right" valign="top"><p class="table">33.9 usec</p></td>
908 <td align="right" valign="top"><p class="table">323.2 usec</p></td>
909 </tr>
910 <tr>
911 <td align="left" valign="top"><p class="table">reftable</p></td>
912 <td align="right" valign="top"><p class="table">hot</p></td>
913 <td align="right" valign="top"><p class="table"></p></td>
914 <td align="right" valign="top"><p class="table">20.2 usec</p></td>
915 <td align="right" valign="top"><p class="table">320.8 usec</p></td>
916 </tr>
917 </tbody>
918 </table>
919 </div>
920 <div class="paragraph"><p>Space used for 149,932 log entries for 43,061 refs, reflog vs. reftable:</p></div>
921 <div class="tableblock">
922 <table rules="all"
923 width="100%"
924 frame="border"
925 cellspacing="0" cellpadding="4">
926 <col width="33%" />
927 <col width="33%" />
928 <col width="33%" />
929 <thead>
930 <tr>
931 <th align="left" valign="top">format </th>
932 <th align="right" valign="top">size </th>
933 <th align="right" valign="top">avg entry</th>
934 </tr>
935 </thead>
936 <tbody>
937 <tr>
938 <td align="left" valign="top"><p class="table">$GIT_DIR/logs</p></td>
939 <td align="right" valign="top"><p class="table">173 M</p></td>
940 <td align="right" valign="top"><p class="table">1209 bytes</p></td>
941 </tr>
942 <tr>
943 <td align="left" valign="top"><p class="table">reftable</p></td>
944 <td align="right" valign="top"><p class="table">5 M</p></td>
945 <td align="right" valign="top"><p class="table">37 bytes</p></td>
946 </tr>
947 </tbody>
948 </table>
949 </div>
950 </div>
951 </div>
952 <div class="sect2">
953 <h3 id="_details">Details</h3>
954 <div class="sect3">
955 <h4 id="_peeling">Peeling</h4>
956 <div class="paragraph"><p>References stored in a reftable are peeled, a record for an annotated
957 (or signed) tag records both the tag object, and the object it refers
958 to. This is analogous to storage in the packed-refs format.</p></div>
959 </div>
960 <div class="sect3">
961 <h4 id="_reference_name_encoding">Reference name encoding</h4>
962 <div class="paragraph"><p>Reference names are an uninterpreted sequence of bytes that must pass
963 <a href="../git-check-ref-format.html">git-check-ref-format(1)</a> as a valid reference name.</p></div>
964 </div>
965 <div class="sect3">
966 <h4 id="_key_unicity">Key unicity</h4>
967 <div class="paragraph"><p>Each entry must have a unique key; repeated keys are disallowed.</p></div>
968 </div>
969 <div class="sect3">
970 <h4 id="_network_byte_order">Network byte order</h4>
971 <div class="paragraph"><p>All multi-byte, fixed width fields are in network byte order.</p></div>
972 </div>
973 <div class="sect3">
974 <h4 id="_varint_encoding">Varint encoding</h4>
975 <div class="paragraph"><p>Varint encoding is identical to the ofs-delta encoding method used
976 within pack files.</p></div>
977 <div class="paragraph"><p>Decoder works such as:</p></div>
978 <div class="literalblock">
979 <div class="content">
980 <pre><code>val = buf[ptr] &amp; 0x7f
981 while (buf[ptr] &amp; 0x80) {
982 ptr++
983 val = ((val + 1) &lt;&lt; 7) | (buf[ptr] &amp; 0x7f)
984 }</code></pre>
985 </div></div>
986 </div>
987 <div class="sect3">
988 <h4 id="_ordering">Ordering</h4>
989 <div class="paragraph"><p>Blocks are lexicographically ordered by their first reference.</p></div>
990 </div>
991 <div class="sect3">
992 <h4 id="_directory_file_conflicts">Directory/file conflicts</h4>
993 <div class="paragraph"><p>The reftable format accepts both <code>refs/heads/foo</code> and
994 <code>refs/heads/foo/bar</code> as distinct references.</p></div>
995 <div class="paragraph"><p>This property is useful for retaining log records in reftable, but may
996 confuse versions of Git using <code>$GIT_DIR/refs</code> directory tree to maintain
997 references. Users of reftable may choose to continue to reject <code>foo</code> and
998 <code>foo/bar</code> type conflicts to prevent problems for peers.</p></div>
999 </div>
1000 </div>
1001 <div class="sect2">
1002 <h3 id="_file_format">File format</h3>
1003 <div class="sect3">
1004 <h4 id="_structure">Structure</h4>
1005 <div class="paragraph"><p>A reftable file has the following high-level structure:</p></div>
1006 <div class="literalblock">
1007 <div class="content">
1008 <pre><code>first_block {
1009 header
1010 first_ref_block
1012 ref_block*
1013 ref_index*
1014 obj_block*
1015 obj_index*
1016 log_block*
1017 log_index*
1018 footer</code></pre>
1019 </div></div>
1020 <div class="paragraph"><p>A log-only file omits the <code>ref_block</code>, <code>ref_index</code>, <code>obj_block</code> and
1021 <code>obj_index</code> sections, containing only the file header and log block:</p></div>
1022 <div class="literalblock">
1023 <div class="content">
1024 <pre><code>first_block {
1025 header
1027 log_block*
1028 log_index*
1029 footer</code></pre>
1030 </div></div>
1031 <div class="paragraph"><p>in a log-only file the first log block immediately follows the file
1032 header, without padding to block alignment.</p></div>
1033 </div>
1034 <div class="sect3">
1035 <h4 id="_block_size">Block size</h4>
1036 <div class="paragraph"><p>The file&#8217;s block size is arbitrarily determined by the writer, and does
1037 not have to be a power of 2. The block size must be larger than the
1038 longest reference name or log entry used in the repository, as
1039 references cannot span blocks.</p></div>
1040 <div class="paragraph"><p>Powers of two that are friendly to the virtual memory system or
1041 filesystem (such as 4k or 8k) are recommended. Larger sizes (64k) can
1042 yield better compression, with a possible increased cost incurred by
1043 readers during access.</p></div>
1044 <div class="paragraph"><p>The largest block size is <code>16777215</code> bytes (15.99 MiB).</p></div>
1045 </div>
1046 <div class="sect3">
1047 <h4 id="_block_alignment">Block alignment</h4>
1048 <div class="paragraph"><p>Writers may choose to align blocks at multiples of the block size by
1049 including <code>padding</code> filled with NUL bytes at the end of a block to round
1050 out to the chosen alignment. When alignment is used, writers must
1051 specify the alignment with the file header&#8217;s <code>block_size</code> field.</p></div>
1052 <div class="paragraph"><p>Block alignment is not required by the file format. Unaligned files must
1053 set <code>block_size = 0</code> in the file header, and omit <code>padding</code>. Unaligned
1054 files with more than one ref block must include the <a href="#Ref-index">ref
1055 index</a> to support fast lookup. Readers must be able to read both aligned
1056 and non-aligned files.</p></div>
1057 <div class="paragraph"><p>Very small files (e.g. a single ref block) may omit <code>padding</code> and the ref
1058 index to reduce total file size.</p></div>
1059 </div>
1060 <div class="sect3">
1061 <h4 id="_header_version_1">Header (version 1)</h4>
1062 <div class="paragraph"><p>A 24-byte header appears at the beginning of the file:</p></div>
1063 <div class="literalblock">
1064 <div class="content">
1065 <pre><code>'REFT'
1066 uint8( version_number = 1 )
1067 uint24( block_size )
1068 uint64( min_update_index )
1069 uint64( max_update_index )</code></pre>
1070 </div></div>
1071 <div class="paragraph"><p>Aligned files must specify <code>block_size</code> to configure readers with the
1072 expected block alignment. Unaligned files must set <code>block_size = 0</code>.</p></div>
1073 <div class="paragraph"><p>The <code>min_update_index</code> and <code>max_update_index</code> describe bounds for the
1074 <code>update_index</code> field of all log records in this file. When reftables are
1075 used in a stack for <a href="#Update-transactions">transactions</a>, these
1076 fields can order the files such that the prior file&#8217;s
1077 <code>max_update_index + 1</code> is the next file&#8217;s <code>min_update_index</code>.</p></div>
1078 </div>
1079 <div class="sect3">
1080 <h4 id="_header_version_2">Header (version 2)</h4>
1081 <div class="paragraph"><p>A 28-byte header appears at the beginning of the file:</p></div>
1082 <div class="literalblock">
1083 <div class="content">
1084 <pre><code>'REFT'
1085 uint8( version_number = 2 )
1086 uint24( block_size )
1087 uint64( min_update_index )
1088 uint64( max_update_index )
1089 uint32( hash_id )</code></pre>
1090 </div></div>
1091 <div class="paragraph"><p>The header is identical to <code>version_number=1</code>, with the 4-byte hash ID
1092 ("sha1" for SHA1 and "s256" for SHA-256) append to the header.</p></div>
1093 <div class="paragraph"><p>For maximum backward compatibility, it is recommended to use version 1 when
1094 writing SHA1 reftables.</p></div>
1095 </div>
1096 <div class="sect3">
1097 <h4 id="_first_ref_block">First ref block</h4>
1098 <div class="paragraph"><p>The first ref block shares the same block as the file header, and is 24
1099 bytes smaller than all other blocks in the file. The first block
1100 immediately begins after the file header, at position 24.</p></div>
1101 <div class="paragraph"><p>If the first block is a log block (a log-only file), its block header
1102 begins immediately at position 24.</p></div>
1103 </div>
1104 <div class="sect3">
1105 <h4 id="_ref_block_format">Ref block format</h4>
1106 <div class="paragraph"><p>A ref block is written as:</p></div>
1107 <div class="literalblock">
1108 <div class="content">
1109 <pre><code>'r'
1110 uint24( block_len )
1111 ref_record+
1112 uint24( restart_offset )+
1113 uint16( restart_count )
1115 padding?</code></pre>
1116 </div></div>
1117 <div class="paragraph"><p>Blocks begin with <code>block_type = 'r'</code> and a 3-byte <code>block_len</code> which
1118 encodes the number of bytes in the block up to, but not including the
1119 optional <code>padding</code>. This is always less than or equal to the file&#8217;s
1120 block size. In the first ref block, <code>block_len</code> includes 24 bytes for
1121 the file header.</p></div>
1122 <div class="paragraph"><p>The 2-byte <code>restart_count</code> stores the number of entries in the
1123 <code>restart_offset</code> list, which must not be empty. Readers can use
1124 <code>restart_count</code> to binary search between restarts before starting a
1125 linear scan.</p></div>
1126 <div class="paragraph"><p>Exactly <code>restart_count</code> 3-byte <code>restart_offset</code> values precedes the
1127 <code>restart_count</code>. Offsets are relative to the start of the block and
1128 refer to the first byte of any <code>ref_record</code> whose name has not been
1129 prefix compressed. Entries in the <code>restart_offset</code> list must be sorted,
1130 ascending. Readers can start linear scans from any of these records.</p></div>
1131 <div class="paragraph"><p>A variable number of <code>ref_record</code> fill the middle of the block,
1132 describing reference names and values. The format is described below.</p></div>
1133 <div class="paragraph"><p>As the first ref block shares the first file block with the file header,
1134 all <code>restart_offset</code> in the first block are relative to the start of the
1135 file (position 0), and include the file header. This forces the first
1136 <code>restart_offset</code> to be <code>28</code>.</p></div>
1137 <div class="sect4">
1138 <h5 id="_ref_record">ref record</h5>
1139 <div class="paragraph"><p>A <code>ref_record</code> describes a single reference, storing both the name and
1140 its value(s). Records are formatted as:</p></div>
1141 <div class="literalblock">
1142 <div class="content">
1143 <pre><code>varint( prefix_length )
1144 varint( (suffix_length &lt;&lt; 3) | value_type )
1145 suffix
1146 varint( update_index_delta )
1147 value?</code></pre>
1148 </div></div>
1149 <div class="paragraph"><p>The <code>prefix_length</code> field specifies how many leading bytes of the prior
1150 reference record&#8217;s name should be copied to obtain this reference&#8217;s
1151 name. This must be 0 for the first reference in any block, and also must
1152 be 0 for any <code>ref_record</code> whose offset is listed in the <code>restart_offset</code>
1153 table at the end of the block.</p></div>
1154 <div class="paragraph"><p>Recovering a reference name from any <code>ref_record</code> is a simple concat:</p></div>
1155 <div class="literalblock">
1156 <div class="content">
1157 <pre><code>this_name = prior_name[0..prefix_length] + suffix</code></pre>
1158 </div></div>
1159 <div class="paragraph"><p>The <code>suffix_length</code> value provides the number of bytes available in
1160 <code>suffix</code> to copy from <code>suffix</code> to complete the reference name.</p></div>
1161 <div class="paragraph"><p>The <code>update_index</code> that last modified the reference can be obtained by
1162 adding <code>update_index_delta</code> to the <code>min_update_index</code> from the file
1163 header: <code>min_update_index + update_index_delta</code>.</p></div>
1164 <div class="paragraph"><p>The <code>value</code> follows. Its format is determined by <code>value_type</code>, one of
1165 the following:</p></div>
1166 <div class="ulist"><ul>
1167 <li>
1169 <code>0x0</code>: deletion; no value data (see transactions, below)
1170 </p>
1171 </li>
1172 <li>
1174 <code>0x1</code>: one object name; value of the ref
1175 </p>
1176 </li>
1177 <li>
1179 <code>0x2</code>: two object names; value of the ref, peeled target
1180 </p>
1181 </li>
1182 <li>
1184 <code>0x3</code>: symbolic reference: <code>varint( target_len ) target</code>
1185 </p>
1186 </li>
1187 </ul></div>
1188 <div class="paragraph"><p>Symbolic references use <code>0x3</code>, followed by the complete name of the
1189 reference target. No compression is applied to the target name.</p></div>
1190 <div class="paragraph"><p>Types <code>0x4..0x7</code> are reserved for future use.</p></div>
1191 </div>
1192 </div>
1193 <div class="sect3">
1194 <h4 id="_ref_index">Ref index</h4>
1195 <div class="paragraph"><p>The ref index stores the name of the last reference from every ref block
1196 in the file, enabling reduced disk seeks for lookups. Any reference can
1197 be found by searching the index, identifying the containing block, and
1198 searching within that block.</p></div>
1199 <div class="paragraph"><p>The index may be organized into a multi-level index, where the 1st level
1200 index block points to additional ref index blocks (2nd level), which may
1201 in turn point to either additional index blocks (e.g. 3rd level) or ref
1202 blocks (leaf level). Disk reads required to access a ref go up with
1203 higher index levels. Multi-level indexes may be required to ensure no
1204 single index block exceeds the file format&#8217;s max block size of
1205 <code>16777215</code> bytes (15.99 MiB). To achieve constant O(1) disk seeks for
1206 lookups the index must be a single level, which is permitted to exceed
1207 the file&#8217;s configured block size, but not the format&#8217;s max block size of
1208 15.99 MiB.</p></div>
1209 <div class="paragraph"><p>If present, the ref index block(s) appears after the last ref block.</p></div>
1210 <div class="paragraph"><p>If there are at least 4 ref blocks, a ref index block should be written
1211 to improve lookup times. Cold reads using the index require 2 disk reads
1212 (read index, read block), and binary searching &lt; 4 blocks also requires
1213 &#8656; 2 reads. Omitting the index block from smaller files saves space.</p></div>
1214 <div class="paragraph"><p>If the file is unaligned and contains more than one ref block, the ref
1215 index must be written.</p></div>
1216 <div class="paragraph"><p>Index block format:</p></div>
1217 <div class="literalblock">
1218 <div class="content">
1219 <pre><code>'i'
1220 uint24( block_len )
1221 index_record+
1222 uint24( restart_offset )+
1223 uint16( restart_count )
1225 padding?</code></pre>
1226 </div></div>
1227 <div class="paragraph"><p>The index blocks begin with <code>block_type = 'i'</code> and a 3-byte <code>block_len</code>
1228 which encodes the number of bytes in the block, up to but not including
1229 the optional <code>padding</code>.</p></div>
1230 <div class="paragraph"><p>The <code>restart_offset</code> and <code>restart_count</code> fields are identical in format,
1231 meaning and usage as in ref blocks.</p></div>
1232 <div class="paragraph"><p>To reduce the number of reads required for random access in very large
1233 files the index block may be larger than other blocks. However, readers
1234 must hold the entire index in memory to benefit from this, so it&#8217;s a
1235 time-space tradeoff in both file size and reader memory.</p></div>
1236 <div class="paragraph"><p>Increasing the file&#8217;s block size decreases the index size. Alternatively
1237 a multi-level index may be used, keeping index blocks within the file&#8217;s
1238 block size, but increasing the number of blocks that need to be
1239 accessed.</p></div>
1240 <div class="sect4">
1241 <h5 id="_index_record">index record</h5>
1242 <div class="paragraph"><p>An index record describes the last entry in another block. Index records
1243 are written as:</p></div>
1244 <div class="literalblock">
1245 <div class="content">
1246 <pre><code>varint( prefix_length )
1247 varint( (suffix_length &lt;&lt; 3) | 0 )
1248 suffix
1249 varint( block_position )</code></pre>
1250 </div></div>
1251 <div class="paragraph"><p>Index records use prefix compression exactly like <code>ref_record</code>.</p></div>
1252 <div class="paragraph"><p>Index records store <code>block_position</code> after the suffix, specifying the
1253 absolute position in bytes (from the start of the file) of the block
1254 that ends with this reference. Readers can seek to <code>block_position</code> to
1255 begin reading the block header.</p></div>
1256 <div class="paragraph"><p>Readers must examine the block header at <code>block_position</code> to determine
1257 if the next block is another level index block, or the leaf-level ref
1258 block.</p></div>
1259 </div>
1260 <div class="sect4">
1261 <h5 id="_reading_the_index">Reading the index</h5>
1262 <div class="paragraph"><p>Readers loading the ref index must first read the footer (below) to
1263 obtain <code>ref_index_position</code>. If not present, the position will be 0. The
1264 <code>ref_index_position</code> is for the 1st level root of the ref index.</p></div>
1265 </div>
1266 </div>
1267 <div class="sect3">
1268 <h4 id="_obj_block_format">Obj block format</h4>
1269 <div class="paragraph"><p>Object blocks are optional. Writers may choose to omit object blocks,
1270 especially if readers will not use the object name to ref mapping.</p></div>
1271 <div class="paragraph"><p>Object blocks use unique, abbreviated 2-31 byte object name keys, mapping to
1272 ref blocks containing references pointing to that object directly, or as
1273 the peeled value of an annotated tag. Like ref blocks, object blocks use
1274 the file&#8217;s standard block size. The abbreviation length is available in
1275 the footer as <code>obj_id_len</code>.</p></div>
1276 <div class="paragraph"><p>To save space in small files, object blocks may be omitted if the ref
1277 index is not present, as brute force search will only need to read a few
1278 ref blocks. When missing, readers should brute force a linear search of
1279 all references to lookup by object name.</p></div>
1280 <div class="paragraph"><p>An object block is written as:</p></div>
1281 <div class="literalblock">
1282 <div class="content">
1283 <pre><code>'o'
1284 uint24( block_len )
1285 obj_record+
1286 uint24( restart_offset )+
1287 uint16( restart_count )
1289 padding?</code></pre>
1290 </div></div>
1291 <div class="paragraph"><p>Fields are identical to ref block. Binary search using the restart table
1292 works the same as in reference blocks.</p></div>
1293 <div class="paragraph"><p>Because object names are abbreviated by writers to the shortest unique
1294 abbreviation within the reftable, obj key lengths have a variable length. Their
1295 length must be at least 2 bytes. Readers must compare only for common prefix
1296 match within an obj block or obj index.</p></div>
1297 <div class="sect4">
1298 <h5 id="_obj_record">obj record</h5>
1299 <div class="paragraph"><p>An <code>obj_record</code> describes a single object abbreviation, and the blocks
1300 containing references using that unique abbreviation:</p></div>
1301 <div class="literalblock">
1302 <div class="content">
1303 <pre><code>varint( prefix_length )
1304 varint( (suffix_length &lt;&lt; 3) | cnt_3 )
1305 suffix
1306 varint( cnt_large )?
1307 varint( position_delta )*</code></pre>
1308 </div></div>
1309 <div class="paragraph"><p>Like in reference blocks, abbreviations are prefix compressed within an
1310 obj block. On large reftables with many unique objects, higher block
1311 sizes (64k), and higher restart interval (128), a <code>prefix_length</code> of 2
1312 or 3 and <code>suffix_length</code> of 3 may be common in obj records (unique
1313 abbreviation of 5-6 raw bytes, 10-12 hex digits).</p></div>
1314 <div class="paragraph"><p>Each record contains <code>position_count</code> number of positions for matching
1315 ref blocks. For 1-7 positions the count is stored in <code>cnt_3</code>. When
1316 <code>cnt_3 = 0</code> the actual count follows in a varint, <code>cnt_large</code>.</p></div>
1317 <div class="paragraph"><p>The use of <code>cnt_3</code> bets most objects are pointed to by only a single
1318 reference, some may be pointed to by a couple of references, and very
1319 few (if any) are pointed to by more than 7 references.</p></div>
1320 <div class="paragraph"><p>A special case exists when <code>cnt_3 = 0</code> and <code>cnt_large = 0</code>: there are no
1321 <code>position_delta</code>, but at least one reference starts with this
1322 abbreviation. A reader that needs exact reference names must scan all
1323 references to find which specific references have the desired object.
1324 Writers should use this format when the <code>position_delta</code> list would have
1325 overflowed the file&#8217;s block size due to a high number of references
1326 pointing to the same object.</p></div>
1327 <div class="paragraph"><p>The first <code>position_delta</code> is the position from the start of the file.
1328 Additional <code>position_delta</code> entries are sorted ascending and relative to
1329 the prior entry, e.g. a reader would perform:</p></div>
1330 <div class="literalblock">
1331 <div class="content">
1332 <pre><code>pos = position_delta[0]
1333 prior = pos
1334 for (j = 1; j &lt; position_count; j++) {
1335 pos = prior + position_delta[j]
1336 prior = pos
1337 }</code></pre>
1338 </div></div>
1339 <div class="paragraph"><p>With a position in hand, a reader must linearly scan the ref block,
1340 starting from the first <code>ref_record</code>, testing each reference&#8217;s object names
1341 (for <code>value_type = 0x1</code> or <code>0x2</code>) for full equality. Faster searching by
1342 object name within a single ref block is not supported by the reftable format.
1343 Smaller block sizes reduce the number of candidates this step must
1344 consider.</p></div>
1345 </div>
1346 </div>
1347 <div class="sect3">
1348 <h4 id="_obj_index">Obj index</h4>
1349 <div class="paragraph"><p>The obj index stores the abbreviation from the last entry for every obj
1350 block in the file, enabling reduced disk seeks for all lookups. It is
1351 formatted exactly the same as the ref index, but refers to obj blocks.</p></div>
1352 <div class="paragraph"><p>The obj index should be present if obj blocks are present, as obj blocks
1353 should only be written in larger files.</p></div>
1354 <div class="paragraph"><p>Readers loading the obj index must first read the footer (below) to
1355 obtain <code>obj_index_position</code>. If not present, the position will be 0.</p></div>
1356 </div>
1357 <div class="sect3">
1358 <h4 id="_log_block_format">Log block format</h4>
1359 <div class="paragraph"><p>Unlike ref and obj blocks, log blocks are always unaligned.</p></div>
1360 <div class="paragraph"><p>Log blocks are variable in size, and do not match the <code>block_size</code>
1361 specified in the file header or footer. Writers should choose an
1362 appropriate buffer size to prepare a log block for deflation, such as
1363 <code>2 * block_size</code>.</p></div>
1364 <div class="paragraph"><p>A log block is written as:</p></div>
1365 <div class="literalblock">
1366 <div class="content">
1367 <pre><code>'g'
1368 uint24( block_len )
1369 zlib_deflate {
1370 log_record+
1371 uint24( restart_offset )+
1372 uint16( restart_count )
1373 }</code></pre>
1374 </div></div>
1375 <div class="paragraph"><p>Log blocks look similar to ref blocks, except <code>block_type = 'g'</code>.</p></div>
1376 <div class="paragraph"><p>The 4-byte block header is followed by the deflated block contents using
1377 zlib deflate. The <code>block_len</code> in the header is the inflated size
1378 (including 4-byte block header), and should be used by readers to
1379 preallocate the inflation output buffer. A log block&#8217;s <code>block_len</code> may
1380 exceed the file&#8217;s block size.</p></div>
1381 <div class="paragraph"><p>Offsets within the log block (e.g. <code>restart_offset</code>) still include the
1382 4-byte header. Readers may prefer prefixing the inflation output buffer
1383 with the 4-byte header.</p></div>
1384 <div class="paragraph"><p>Within the deflate container, a variable number of <code>log_record</code> describe
1385 reference changes. The log record format is described below. See ref
1386 block format (above) for a description of <code>restart_offset</code> and
1387 <code>restart_count</code>.</p></div>
1388 <div class="paragraph"><p>Because log blocks have no alignment or padding between blocks, readers
1389 must keep track of the bytes consumed by the inflater to know where the
1390 next log block begins.</p></div>
1391 <div class="sect4">
1392 <h5 id="_log_record">log record</h5>
1393 <div class="paragraph"><p>Log record keys are structured as:</p></div>
1394 <div class="literalblock">
1395 <div class="content">
1396 <pre><code>ref_name '\0' reverse_int64( update_index )</code></pre>
1397 </div></div>
1398 <div class="paragraph"><p>where <code>update_index</code> is the unique transaction identifier. The
1399 <code>update_index</code> field must be unique within the scope of a <code>ref_name</code>.
1400 See the update transactions section below for further details.</p></div>
1401 <div class="paragraph"><p>The <code>reverse_int64</code> function inverses the value so lexicographical
1402 ordering the network byte order encoding sorts the more recent records
1403 with higher <code>update_index</code> values first:</p></div>
1404 <div class="literalblock">
1405 <div class="content">
1406 <pre><code>reverse_int64(int64 t) {
1407 return 0xffffffffffffffff - t;
1408 }</code></pre>
1409 </div></div>
1410 <div class="paragraph"><p>Log records have a similar starting structure to ref and index records,
1411 utilizing the same prefix compression scheme applied to the log record
1412 key described above.</p></div>
1413 <div class="literalblock">
1414 <div class="content">
1415 <pre><code> varint( prefix_length )
1416 varint( (suffix_length &lt;&lt; 3) | log_type )
1417 suffix
1418 log_data {
1419 old_id
1420 new_id
1421 varint( name_length ) name
1422 varint( email_length ) email
1423 varint( time_seconds )
1424 sint16( tz_offset )
1425 varint( message_length ) message
1426 }?</code></pre>
1427 </div></div>
1428 <div class="paragraph"><p>Log record entries use <code>log_type</code> to indicate what follows:</p></div>
1429 <div class="ulist"><ul>
1430 <li>
1432 <code>0x0</code>: deletion; no log data.
1433 </p>
1434 </li>
1435 <li>
1437 <code>0x1</code>: standard git reflog data using <code>log_data</code> above.
1438 </p>
1439 </li>
1440 </ul></div>
1441 <div class="paragraph"><p>The <code>log_type = 0x0</code> is mostly useful for <code>git stash drop</code>, removing an
1442 entry from the reflog of <code>refs/stash</code> in a transaction file (below),
1443 without needing to rewrite larger files. Readers reading a stack of
1444 reflogs must treat this as a deletion.</p></div>
1445 <div class="paragraph"><p>For <code>log_type = 0x1</code>, the <code>log_data</code> section follows
1446 <a href="../git-update-ref.html">git-update-ref(1)</a> logging and includes:</p></div>
1447 <div class="ulist"><ul>
1448 <li>
1450 two object names (old id, new id)
1451 </p>
1452 </li>
1453 <li>
1455 varint string of committer&#8217;s name
1456 </p>
1457 </li>
1458 <li>
1460 varint string of committer&#8217;s email
1461 </p>
1462 </li>
1463 <li>
1465 varint time in seconds since epoch (Jan 1, 1970)
1466 </p>
1467 </li>
1468 <li>
1470 2-byte timezone offset in minutes (signed)
1471 </p>
1472 </li>
1473 <li>
1475 varint string of message
1476 </p>
1477 </li>
1478 </ul></div>
1479 <div class="paragraph"><p><code>tz_offset</code> is the absolute number of minutes from GMT the committer was
1480 at the time of the update. For example <code>GMT-0800</code> is encoded in reftable
1481 as <code>sint16(-480)</code> and <code>GMT+0230</code> is <code>sint16(150)</code>.</p></div>
1482 <div class="paragraph"><p>The committer email does not contain <code>&lt;</code> or <code>&gt;</code>, it&#8217;s the value normally
1483 found between the <code>&lt;&gt;</code> in a git commit object header.</p></div>
1484 <div class="paragraph"><p>The <code>message_length</code> may be 0, in which case there was no message
1485 supplied for the update.</p></div>
1486 <div class="paragraph"><p>Contrary to traditional reflog (which is a file), renames are encoded as
1487 a combination of ref deletion and ref creation. A deletion is a log
1488 record with a zero new_id, and a creation is a log record with a zero old_id.</p></div>
1489 </div>
1490 <div class="sect4">
1491 <h5 id="_reading_the_log">Reading the log</h5>
1492 <div class="paragraph"><p>Readers accessing the log must first read the footer (below) to
1493 determine the <code>log_position</code>. The first block of the log begins at
1494 <code>log_position</code> bytes since the start of the file. The <code>log_position</code> is
1495 not block aligned.</p></div>
1496 </div>
1497 <div class="sect4">
1498 <h5 id="_importing_logs">Importing logs</h5>
1499 <div class="paragraph"><p>When importing from <code>$GIT_DIR/logs</code> writers should globally order all
1500 log records roughly by timestamp while preserving file order, and assign
1501 unique, increasing <code>update_index</code> values for each log line. Newer log
1502 records get higher <code>update_index</code> values.</p></div>
1503 <div class="paragraph"><p>Although an import may write only a single reftable file, the reftable
1504 file must span many unique <code>update_index</code>, as each log line requires its
1505 own <code>update_index</code> to preserve semantics.</p></div>
1506 </div>
1507 </div>
1508 <div class="sect3">
1509 <h4 id="_log_index">Log index</h4>
1510 <div class="paragraph"><p>The log index stores the log key
1511 (<code>refname \0 reverse_int64(update_index)</code>) for the last log record of
1512 every log block in the file, supporting bounded-time lookup.</p></div>
1513 <div class="paragraph"><p>A log index block must be written if 2 or more log blocks are written to
1514 the file. If present, the log index appears after the last log block.
1515 There is no padding used to align the log index to block alignment.</p></div>
1516 <div class="paragraph"><p>Log index format is identical to ref index, except the keys are 9 bytes
1517 longer to include <code>'\0'</code> and the 8-byte <code>reverse_int64(update_index)</code>.
1518 Records use <code>block_position</code> to refer to the start of a log block.</p></div>
1519 <div class="sect4">
1520 <h5 id="_reading_the_index_2">Reading the index</h5>
1521 <div class="paragraph"><p>Readers loading the log index must first read the footer (below) to
1522 obtain <code>log_index_position</code>. If not present, the position will be 0.</p></div>
1523 </div>
1524 </div>
1525 <div class="sect3">
1526 <h4 id="_footer">Footer</h4>
1527 <div class="paragraph"><p>After the last block of the file, a file footer is written. It begins
1528 like the file header, but is extended with additional data.</p></div>
1529 <div class="literalblock">
1530 <div class="content">
1531 <pre><code> HEADER
1533 uint64( ref_index_position )
1534 uint64( (obj_position &lt;&lt; 5) | obj_id_len )
1535 uint64( obj_index_position )
1537 uint64( log_position )
1538 uint64( log_index_position )
1540 uint32( CRC-32 of above )</code></pre>
1541 </div></div>
1542 <div class="paragraph"><p>If a section is missing (e.g. ref index) the corresponding position
1543 field (e.g. <code>ref_index_position</code>) will be 0.</p></div>
1544 <div class="ulist"><ul>
1545 <li>
1547 <code>obj_position</code>: byte position for the first obj block.
1548 </p>
1549 </li>
1550 <li>
1552 <code>obj_id_len</code>: number of bytes used to abbreviate object names in
1553 obj blocks.
1554 </p>
1555 </li>
1556 <li>
1558 <code>log_position</code>: byte position for the first log block.
1559 </p>
1560 </li>
1561 <li>
1563 <code>ref_index_position</code>: byte position for the start of the ref index.
1564 </p>
1565 </li>
1566 <li>
1568 <code>obj_index_position</code>: byte position for the start of the obj index.
1569 </p>
1570 </li>
1571 <li>
1573 <code>log_index_position</code>: byte position for the start of the log index.
1574 </p>
1575 </li>
1576 </ul></div>
1577 <div class="paragraph"><p>The size of the footer is 68 bytes for version 1, and 72 bytes for
1578 version 2.</p></div>
1579 <div class="sect4">
1580 <h5 id="_reading_the_footer">Reading the footer</h5>
1581 <div class="paragraph"><p>Readers must first read the file start to determine the version
1582 number. Then they seek to <code>file_length - FOOTER_LENGTH</code> to access the
1583 footer. A trusted external source (such as <code>stat(2)</code>) is necessary to
1584 obtain <code>file_length</code>. When reading the footer, readers must verify:</p></div>
1585 <div class="ulist"><ul>
1586 <li>
1588 4-byte magic is correct
1589 </p>
1590 </li>
1591 <li>
1593 1-byte version number is recognized
1594 </p>
1595 </li>
1596 <li>
1598 4-byte CRC-32 matches the other 64 bytes (including magic, and
1599 version)
1600 </p>
1601 </li>
1602 </ul></div>
1603 <div class="paragraph"><p>Once verified, the other fields of the footer can be accessed.</p></div>
1604 </div>
1605 <div class="sect4">
1606 <h5 id="_empty_tables">Empty tables</h5>
1607 <div class="paragraph"><p>A reftable may be empty. In this case, the file starts with a header
1608 and is immediately followed by a footer.</p></div>
1609 </div>
1610 </div>
1611 <div class="sect3">
1612 <h4 id="_binary_search">Binary search</h4>
1613 <div class="paragraph"><p>Binary search within a block is supported by the <code>restart_offset</code> fields
1614 at the end of the block. Readers can binary search through the restart
1615 table to locate between which two restart points the sought reference or
1616 key should appear.</p></div>
1617 <div class="paragraph"><p>Each record identified by a <code>restart_offset</code> stores the complete key in
1618 the <code>suffix</code> field of the record, making the compare operation during
1619 binary search straightforward.</p></div>
1620 <div class="paragraph"><p>Once a restart point lexicographically before the sought reference has
1621 been identified, readers can linearly scan through the following record
1622 entries to locate the sought record, terminating if the current record
1623 sorts after (and therefore the sought key is not present).</p></div>
1624 <div class="sect4">
1625 <h5 id="_restart_point_selection">Restart point selection</h5>
1626 <div class="paragraph"><p>Writers determine the restart points at file creation. The process is
1627 arbitrary, but every 16 or 64 records is recommended. Every 16 may be
1628 more suitable for smaller block sizes (4k or 8k), every 64 for larger
1629 block sizes (64k).</p></div>
1630 <div class="paragraph"><p>More frequent restart points reduces prefix compression and increases
1631 space consumed by the restart table, both of which increase file size.</p></div>
1632 <div class="paragraph"><p>Less frequent restart points makes prefix compression more effective,
1633 decreasing overall file size, with increased penalties for readers
1634 walking through more records after the binary search step.</p></div>
1635 <div class="paragraph"><p>A maximum of <code>65535</code> restart points per block is supported.</p></div>
1636 </div>
1637 </div>
1638 </div>
1639 <div class="sect2">
1640 <h3 id="_considerations">Considerations</h3>
1641 <div class="sect3">
1642 <h4 id="_lightweight_refs_dominate">Lightweight refs dominate</h4>
1643 <div class="paragraph"><p>The reftable format assumes the vast majority of references are single
1644 object names valued with common prefixes, such as Gerrit Code Review&#8217;s
1645 <code>refs/changes/</code> namespace, GitHub&#8217;s <code>refs/pulls/</code> namespace, or many
1646 lightweight tags in the <code>refs/tags/</code> namespace.</p></div>
1647 <div class="paragraph"><p>Annotated tags storing the peeled object cost an additional object name per
1648 reference.</p></div>
1649 </div>
1650 <div class="sect3">
1651 <h4 id="_low_overhead">Low overhead</h4>
1652 <div class="paragraph"><p>A reftable with very few references (e.g. git.git with 5 heads) is 269
1653 bytes for reftable, vs. 332 bytes for packed-refs. This supports
1654 reftable scaling down for transaction logs (below).</p></div>
1655 </div>
1656 <div class="sect3">
1657 <h4 id="_block_size_2">Block size</h4>
1658 <div class="paragraph"><p>For a Gerrit Code Review type repository with many change refs, larger
1659 block sizes (64 KiB) and less frequent restart points (every 64) yield
1660 better compression due to more references within the block compressing
1661 against the prior reference.</p></div>
1662 <div class="paragraph"><p>Larger block sizes reduce the index size, as the reftable will require
1663 fewer blocks to store the same number of references.</p></div>
1664 </div>
1665 <div class="sect3">
1666 <h4 id="_minimal_disk_seeks">Minimal disk seeks</h4>
1667 <div class="paragraph"><p>Assuming the index block has been loaded into memory, binary searching
1668 for any single reference requires exactly 1 disk seek to load the
1669 containing block.</p></div>
1670 </div>
1671 <div class="sect3">
1672 <h4 id="_scans_and_lookups_dominate">Scans and lookups dominate</h4>
1673 <div class="paragraph"><p>Scanning all references and lookup by name (or namespace such as
1674 <code>refs/heads/</code>) are the most common activities performed on repositories.
1675 Object names are stored directly with references to optimize this use case.</p></div>
1676 </div>
1677 <div class="sect3">
1678 <h4 id="_logs_are_infrequently_read">Logs are infrequently read</h4>
1679 <div class="paragraph"><p>Logs are infrequently accessed, but can be large. Deflating log blocks
1680 saves disk space, with some increased penalty at read time.</p></div>
1681 <div class="paragraph"><p>Logs are stored in an isolated section from refs, reducing the burden on
1682 reference readers that want to ignore logs. Further, historical logs can
1683 be isolated into log-only files.</p></div>
1684 </div>
1685 <div class="sect3">
1686 <h4 id="_logs_are_read_backwards">Logs are read backwards</h4>
1687 <div class="paragraph"><p>Logs are frequently accessed backwards (most recent N records for master
1688 to answer <code>master@{4}</code>), so log records are grouped by reference, and
1689 sorted descending by update index.</p></div>
1690 </div>
1691 </div>
1692 <div class="sect2">
1693 <h3 id="_repository_format">Repository format</h3>
1694 <div class="sect3">
1695 <h4 id="_version_1">Version 1</h4>
1696 <div class="paragraph"><p>A repository must set its <code>$GIT_DIR/config</code> to configure reftable:</p></div>
1697 <div class="literalblock">
1698 <div class="content">
1699 <pre><code>[core]
1700 repositoryformatversion = 1
1701 [extensions]
1702 refStorage = reftable</code></pre>
1703 </div></div>
1704 </div>
1705 <div class="sect3">
1706 <h4 id="_layout">Layout</h4>
1707 <div class="paragraph"><p>A collection of reftable files are stored in the <code>$GIT_DIR/reftable/</code> directory.
1708 Their names should have a random element, such that each filename is globally
1709 unique; this helps avoid spurious failures on Windows, where open files cannot
1710 be removed or overwritten. It suggested to use
1711 <code>${min_update_index}-${max_update_index}-${random}.ref</code> as a naming convention.</p></div>
1712 <div class="paragraph"><p>Log-only files use the <code>.log</code> extension, while ref-only and mixed ref
1713 and log files use <code>.ref</code>. extension.</p></div>
1714 <div class="paragraph"><p>The stack ordering file is <code>$GIT_DIR/reftable/tables.list</code> and lists the
1715 current files, one per line, in order, from oldest (base) to newest
1716 (most recent):</p></div>
1717 <div class="literalblock">
1718 <div class="content">
1719 <pre><code>$ cat .git/reftable/tables.list
1720 00000001-00000001-RANDOM1.log
1721 00000002-00000002-RANDOM2.ref
1722 00000003-00000003-RANDOM3.ref</code></pre>
1723 </div></div>
1724 <div class="paragraph"><p>Readers must read <code>$GIT_DIR/reftable/tables.list</code> to determine which
1725 files are relevant right now, and search through the stack in reverse
1726 order (last reftable is examined first).</p></div>
1727 <div class="paragraph"><p>Reftable files not listed in <code>tables.list</code> may be new (and about to be
1728 added to the stack by the active writer), or ancient and ready to be
1729 pruned.</p></div>
1730 </div>
1731 <div class="sect3">
1732 <h4 id="_backward_compatibility">Backward compatibility</h4>
1733 <div class="paragraph"><p>Older clients should continue to recognize the directory as a git
1734 repository so they don&#8217;t look for an enclosing repository in parent
1735 directories. To this end, a reftable-enabled repository must contain the
1736 following dummy files</p></div>
1737 <div class="ulist"><ul>
1738 <li>
1740 <code>.git/HEAD</code>, a regular file containing <code>ref: refs/heads/.invalid</code>.
1741 </p>
1742 </li>
1743 <li>
1745 <code>.git/refs/</code>, a directory
1746 </p>
1747 </li>
1748 <li>
1750 <code>.git/refs/heads</code>, a regular file
1751 </p>
1752 </li>
1753 </ul></div>
1754 </div>
1755 <div class="sect3">
1756 <h4 id="_readers">Readers</h4>
1757 <div class="paragraph"><p>Readers can obtain a consistent snapshot of the reference space by
1758 following:</p></div>
1759 <div class="olist arabic"><ol class="arabic">
1760 <li>
1762 Open and read the <code>tables.list</code> file.
1763 </p>
1764 </li>
1765 <li>
1767 Open each of the reftable files that it mentions.
1768 </p>
1769 </li>
1770 <li>
1772 If any of the files is missing, goto 1.
1773 </p>
1774 </li>
1775 <li>
1777 Read from the now-open files as long as necessary.
1778 </p>
1779 </li>
1780 </ol></div>
1781 </div>
1782 <div class="sect3">
1783 <h4 id="_update_transactions">Update transactions</h4>
1784 <div class="paragraph"><p>Although reftables are immutable, mutations are supported by writing a
1785 new reftable and atomically appending it to the stack:</p></div>
1786 <div class="olist arabic"><ol class="arabic">
1787 <li>
1789 Acquire <code>tables.list.lock</code>.
1790 </p>
1791 </li>
1792 <li>
1794 Read <code>tables.list</code> to determine current reftables.
1795 </p>
1796 </li>
1797 <li>
1799 Select <code>update_index</code> to be most recent file&#8217;s
1800 <code>max_update_index + 1</code>.
1801 </p>
1802 </li>
1803 <li>
1805 Prepare temp reftable <code>tmp_XXXXXX</code>, including log entries.
1806 </p>
1807 </li>
1808 <li>
1810 Rename <code>tmp_XXXXXX</code> to <code>${update_index}-${update_index}-${random}.ref</code>.
1811 </p>
1812 </li>
1813 <li>
1815 Copy <code>tables.list</code> to <code>tables.list.lock</code>, appending file from (5).
1816 </p>
1817 </li>
1818 <li>
1820 Rename <code>tables.list.lock</code> to <code>tables.list</code>.
1821 </p>
1822 </li>
1823 </ol></div>
1824 <div class="paragraph"><p>During step 4 the new file&#8217;s <code>min_update_index</code> and <code>max_update_index</code>
1825 are both set to the <code>update_index</code> selected by step 3. All log records
1826 for the transaction use the same <code>update_index</code> in their keys. This
1827 enables later correlation of which references were updated by the same
1828 transaction.</p></div>
1829 <div class="paragraph"><p>Because a single <code>tables.list.lock</code> file is used to manage locking, the
1830 repository is single-threaded for writers. Writers may have to busy-spin
1831 (with backoff) around creating <code>tables.list.lock</code>, for up to an
1832 acceptable wait period, aborting if the repository is too busy to
1833 mutate. Application servers wrapped around repositories (e.g. Gerrit
1834 Code Review) can layer their own lock/wait queue to improve fairness to
1835 writers.</p></div>
1836 </div>
1837 <div class="sect3">
1838 <h4 id="_reference_deletions">Reference deletions</h4>
1839 <div class="paragraph"><p>Deletion of any reference can be explicitly stored by setting the <code>type</code>
1840 to <code>0x0</code> and omitting the <code>value</code> field of the <code>ref_record</code>. This serves
1841 as a tombstone, overriding any assertions about the existence of the
1842 reference from earlier files in the stack.</p></div>
1843 </div>
1844 <div class="sect3">
1845 <h4 id="_compaction">Compaction</h4>
1846 <div class="paragraph"><p>A partial stack of reftables can be compacted by merging references
1847 using a straightforward merge join across reftables, selecting the most
1848 recent value for output, and omitting deleted references that do not
1849 appear in remaining, lower reftables.</p></div>
1850 <div class="paragraph"><p>A compacted reftable should set its &#8216;min_update_index` to the smallest
1851 of the input files&#8217; <code>min_update_index</code>, and its <code>max_update_index</code>
1852 likewise to the largest input <code>max_update_index</code>.</p></div>
1853 <div class="paragraph"><p>For sake of illustration, assume the stack currently consists of
1854 reftable files (from oldest to newest): A, B, C, and D. The compactor is
1855 going to compact B and C, leaving A and D alone.</p></div>
1856 <div class="olist arabic"><ol class="arabic">
1857 <li>
1859 Obtain lock <code>tables.list.lock</code> and read the <code>tables.list</code> file.
1860 </p>
1861 </li>
1862 <li>
1864 Obtain locks <code>B.lock</code> and <code>C.lock</code>. Ownership of these locks
1865 prevents other processes from trying to compact these files.
1866 </p>
1867 </li>
1868 <li>
1870 Release <code>tables.list.lock</code>.
1871 </p>
1872 </li>
1873 <li>
1875 Compact <code>B</code> and <code>C</code> into a temp file
1876 <code>${min_update_index}-${max_update_index}_XXXXXX</code>.
1877 </p>
1878 </li>
1879 <li>
1881 Reacquire lock <code>tables.list.lock</code>.
1882 </p>
1883 </li>
1884 <li>
1886 Verify that <code>B</code> and <code>C</code> are still in the stack, in that order. This
1887 should always be the case, assuming that other processes are adhering to
1888 the locking protocol.
1889 </p>
1890 </li>
1891 <li>
1893 Rename <code>${min_update_index}-${max_update_index}_XXXXXX</code> to
1894 <code>${min_update_index}-${max_update_index}-${random}.ref</code>.
1895 </p>
1896 </li>
1897 <li>
1899 Write the new stack to <code>tables.list.lock</code>, replacing <code>B</code> and <code>C</code>
1900 with the file from (4).
1901 </p>
1902 </li>
1903 <li>
1905 Rename <code>tables.list.lock</code> to <code>tables.list</code>.
1906 </p>
1907 </li>
1908 <li>
1910 Delete <code>B</code> and <code>C</code>, perhaps after a short sleep to avoid forcing
1911 readers to backtrack.
1912 </p>
1913 </li>
1914 </ol></div>
1915 <div class="paragraph"><p>This strategy permits compactions to proceed independently of updates.</p></div>
1916 <div class="paragraph"><p>Each reftable (compacted or not) is uniquely identified by its name, so
1917 open reftables can be cached by their name.</p></div>
1918 </div>
1919 <div class="sect3">
1920 <h4 id="_windows">Windows</h4>
1921 <div class="paragraph"><p>On windows, and other systems that do not allow deleting or renaming to open
1922 files, compaction may succeed, but other readers may prevent obsolete tables
1923 from being deleted.</p></div>
1924 <div class="paragraph"><p>On these platforms, the following strategy can be followed: on closing a
1925 reftable stack, reload <code>tables.list</code>, and delete any tables no longer mentioned
1926 in <code>tables.list</code>.</p></div>
1927 <div class="paragraph"><p>Irregular program exit may still leave about unused files. In this case, a
1928 cleanup operation should proceed as follows:</p></div>
1929 <div class="ulist"><ul>
1930 <li>
1932 take a lock <code>tables.list.lock</code> to prevent concurrent modifications
1933 </p>
1934 </li>
1935 <li>
1937 refresh the reftable stack, by reading <code>tables.list</code>
1938 </p>
1939 </li>
1940 <li>
1942 for each <code>*.ref</code> file, remove it if
1943 </p>
1944 <div class="ulist"><ul>
1945 <li>
1947 it is not mentioned in <code>tables.list</code>, and
1948 </p>
1949 </li>
1950 <li>
1952 its max update_index is not beyond the max update_index of the stack
1953 </p>
1954 </li>
1955 </ul></div>
1956 </li>
1957 </ul></div>
1958 </div>
1959 </div>
1960 <div class="sect2">
1961 <h3 id="_alternatives_considered">Alternatives considered</h3>
1962 <div class="sect3">
1963 <h4 id="_bzip_packed_refs">bzip packed-refs</h4>
1964 <div class="paragraph"><p><code>bzip2</code> can significantly shrink a large packed-refs file (e.g. 62 MiB
1965 compresses to 23 MiB, 37%). However the bzip format does not support
1966 random access to a single reference. Readers must inflate and discard
1967 while performing a linear scan.</p></div>
1968 <div class="paragraph"><p>Breaking packed-refs into chunks (individually compressing each chunk)
1969 would reduce the amount of data a reader must inflate, but still leaves
1970 the problem of indexing chunks to support readers efficiently locating
1971 the correct chunk.</p></div>
1972 <div class="paragraph"><p>Given the compression achieved by reftable&#8217;s encoding, it does not seem
1973 necessary to add the complexity of bzip/gzip/zlib.</p></div>
1974 </div>
1975 <div class="sect3">
1976 <h4 id="_michael_haggerty_8217_s_alternate_format">Michael Haggerty&#8217;s alternate format</h4>
1977 <div class="paragraph"><p>Michael Haggerty proposed
1978 <a href="https://lore.kernel.org/git/CAMy9T_HCnyc1g8XWOOWhe7nN0aEFyyBskV2aOMb_fe%2BwGvEJ7A%40mail.gmail.com/">an
1979 alternate</a> format to reftable on the Git mailing list. This format uses
1980 smaller chunks, without the restart table, and avoids block alignment
1981 with padding. Reflog entries immediately follow each ref, and are thus
1982 interleaved between refs.</p></div>
1983 <div class="paragraph"><p>Performance testing indicates reftable is faster for lookups (51%
1984 faster, 11.2 usec vs. 5.4 usec), although reftable produces a slightly
1985 larger file (+ ~3.2%, 28.3M vs 29.2M):</p></div>
1986 <div class="tableblock">
1987 <table rules="all"
1988 width="100%"
1989 frame="border"
1990 cellspacing="0" cellpadding="4">
1991 <col width="25%" />
1992 <col width="25%" />
1993 <col width="25%" />
1994 <col width="25%" />
1995 <thead>
1996 <tr>
1997 <th align="right" valign="top">format </th>
1998 <th align="right" valign="top">size </th>
1999 <th align="right" valign="top">seek cold </th>
2000 <th align="right" valign="top">seek hot</th>
2001 </tr>
2002 </thead>
2003 <tbody>
2004 <tr>
2005 <td align="right" valign="top"><p class="table">mh-alt</p></td>
2006 <td align="right" valign="top"><p class="table">28.3 M</p></td>
2007 <td align="right" valign="top"><p class="table">23.4 usec</p></td>
2008 <td align="right" valign="top"><p class="table">11.2 usec</p></td>
2009 </tr>
2010 <tr>
2011 <td align="right" valign="top"><p class="table">reftable</p></td>
2012 <td align="right" valign="top"><p class="table">29.2 M</p></td>
2013 <td align="right" valign="top"><p class="table">19.9 usec</p></td>
2014 <td align="right" valign="top"><p class="table">5.4 usec</p></td>
2015 </tr>
2016 </tbody>
2017 </table>
2018 </div>
2019 </div>
2020 <div class="sect3">
2021 <h4 id="_jgit_ketch_reftree">JGit Ketch RefTree</h4>
2022 <div class="paragraph"><p><a href="https://dev.eclipse.org/mhonarc/lists/jgit-dev/msg03073.html">JGit Ketch</a>
2023 proposed
2024 <a href="https://lore.kernel.org/git/CAJo%3DhJvnAPNAdDcAAwAvU9C4RVeQdoS3Ev9WTguHx4fD0V_nOg%40mail.gmail.com/">RefTree</a>,
2025 an encoding of references inside Git tree objects stored as part of the
2026 repository&#8217;s object database.</p></div>
2027 <div class="paragraph"><p>The RefTree format adds additional load on the object database storage
2028 layer (more loose objects, more objects in packs), and relies heavily on
2029 the packer&#8217;s delta compression to save space. Namespaces which are flat
2030 (e.g. thousands of tags in refs/tags) initially create very large loose
2031 objects, and so RefTree does not address the problem of copying many
2032 references to modify a handful.</p></div>
2033 <div class="paragraph"><p>Flat namespaces are not efficiently searchable in RefTree, as tree
2034 objects in canonical formatting cannot be binary searched. This fails
2035 the need to handle a large number of references in a single namespace,
2036 such as GitHub&#8217;s <code>refs/pulls</code>, or a project with many tags.</p></div>
2037 </div>
2038 <div class="sect3">
2039 <h4 id="_lmdb">LMDB</h4>
2040 <div class="paragraph"><p>David Turner proposed
2041 <a href="https://lore.kernel.org/git/1455772670-21142-26-git-send-email-dturner@twopensource.com/">using
2042 LMDB</a>, as LMDB is lightweight (64k of runtime code) and GPL-compatible
2043 license.</p></div>
2044 <div class="paragraph"><p>A downside of LMDB is its reliance on a single C implementation. This
2045 makes embedding inside JGit (a popular reimplementation of Git)
2046 difficult, and hoisting onto virtual storage (for JGit DFS) virtually
2047 impossible.</p></div>
2048 <div class="paragraph"><p>A common format that can be supported by all major Git implementations
2049 (git-core, JGit, libgit2) is strongly preferred.</p></div>
2050 </div>
2051 </div>
2052 </div>
2053 </div>
2054 </div>
2055 <div id="footnotes"><hr /></div>
2056 <div id="footer">
2057 <div id="footer-text">
2058 Last updated
2059 2022-03-09 14:17:38 PST
2060 </div>
2061 </div>
2062 </body>
2063 </html>