1 <?xml version=
"1.0" encoding=
"UTF-8"?>
2 <!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.1//EN"
3 "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
4 <html xmlns=
"http://www.w3.org/1999/xhtml" xml:
lang=
"en">
6 <meta http-equiv=
"Content-Type" content=
"application/xhtml+xml; charset=UTF-8" />
7 <meta name=
"generator" content=
"AsciiDoc 10.2.0" />
8 <title>Bundle URIs
</title>
9 <style type=
"text/css">
10 /* Shared CSS for AsciiDoc xhtml11 and html5 backends */
14 font-family: Georgia,serif;
18 h1, h2, h3, h4, h5, h6,
19 div.title, caption.title,
20 thead, p.table.header,
22 #author, #revnumber, #revdate, #revremark,
24 font-family: Arial,Helvetica,sans-serif;
28 margin:
1em
5%
1em
5%;
33 text-decoration: underline;
49 h1, h2, h3, h4, h5, h6 {
57 border-bottom:
2px solid silver;
77 border:
1px solid silver;
88 ul
> li { color: #aaa; }
89 ul
> li
> * { color: black; }
91 .monospaced, code, pre {
92 font-family:
"Courier New", Courier, monospace;
99 white-space: pre-wrap;
109 #revnumber, #revdate, #revremark {
114 border-top:
2px solid silver;
120 padding-bottom:
0.5em;
124 padding-bottom:
0.5em;
129 margin-bottom:
1.5em;
131 div.imageblock, div.exampleblock, div.verseblock,
132 div.quoteblock, div.literalblock, div.listingblock, div.sidebarblock,
133 div.admonitionblock {
135 margin-bottom:
1.5em;
137 div.admonitionblock {
139 margin-bottom:
2.0em;
144 div.content { /* Block element content. */
148 /* Block element titles. */
149 div.title, caption.title {
154 margin-bottom:
0.5em;
160 td div.title:first-child {
163 div.content div.title:first-child {
166 div.content + div.title {
170 div.sidebarblock
> div.content {
172 border:
1px solid #dddddd;
173 border-left:
4px solid #f0f0f0;
177 div.listingblock
> div.content {
178 border:
1px solid #dddddd;
179 border-left:
5px solid #f0f0f0;
184 div.quoteblock, div.verseblock {
188 border-left:
5px solid #f0f0f0;
192 div.quoteblock
> div.attribution {
197 div.verseblock
> pre.content {
198 font-family: inherit;
201 div.verseblock
> div.attribution {
205 /* DEPRECATED: Pre version
8.2.7 verse style literal block. */
206 div.verseblock + div.attribution {
210 div.admonitionblock .icon {
214 text-decoration: underline;
216 padding-right:
0.5em;
218 div.admonitionblock td.content {
220 border-left:
3px solid #dddddd;
223 div.exampleblock
> div.content {
224 border-left:
3px solid #dddddd;
228 div.imageblock div.content { padding-left:
0; }
229 span.image img { border-style: none; vertical-align: text-bottom; }
230 a.image:visited { color: white; }
234 margin-bottom:
0.8em;
247 list-style-position: outside;
250 list-style-type: decimal;
253 list-style-type: lower-alpha;
256 list-style-type: upper-alpha;
259 list-style-type: lower-roman;
262 list-style-type: upper-roman;
265 div.compact ul, div.compact ol,
266 div.compact p, div.compact p,
267 div.compact div, div.compact div {
269 margin-bottom:
0.1em;
281 margin-bottom:
0.8em;
284 padding-bottom:
15px;
286 dt.hdlist1.strong, td.hdlist1.strong {
292 padding-right:
0.8em;
298 div.hdlist.compact tr {
307 .footnote, .footnoteref {
311 span.footnote, span.footnoteref {
312 vertical-align: super;
316 margin:
20px
0 20px
0;
320 #footnotes div.footnote {
326 border-top:
1px solid silver;
335 padding-right:
0.5em;
336 padding-bottom:
0.3em;
344 #footer-badges { display: none; }
348 margin-bottom:
2.5em;
356 margin-bottom:
0.1em;
359 div.toclevel0, div.toclevel1, div.toclevel2, div.toclevel3, div.toclevel4 {
376 span.aqua { color: aqua; }
377 span.black { color: black; }
378 span.blue { color: blue; }
379 span.fuchsia { color: fuchsia; }
380 span.gray { color: gray; }
381 span.green { color: green; }
382 span.lime { color: lime; }
383 span.maroon { color: maroon; }
384 span.navy { color: navy; }
385 span.olive { color: olive; }
386 span.purple { color: purple; }
387 span.red { color: red; }
388 span.silver { color: silver; }
389 span.teal { color: teal; }
390 span.white { color: white; }
391 span.yellow { color: yellow; }
393 span.aqua-background { background: aqua; }
394 span.black-background { background: black; }
395 span.blue-background { background: blue; }
396 span.fuchsia-background { background: fuchsia; }
397 span.gray-background { background: gray; }
398 span.green-background { background: green; }
399 span.lime-background { background: lime; }
400 span.maroon-background { background: maroon; }
401 span.navy-background { background: navy; }
402 span.olive-background { background: olive; }
403 span.purple-background { background: purple; }
404 span.red-background { background: red; }
405 span.silver-background { background: silver; }
406 span.teal-background { background: teal; }
407 span.white-background { background: white; }
408 span.yellow-background { background: yellow; }
410 span.big { font-size:
2em; }
411 span.small { font-size:
0.6em; }
413 span.underline { text-decoration: underline; }
414 span.overline { text-decoration: overline; }
415 span.line-through { text-decoration: line-through; }
417 div.unbreakable { page-break-inside: avoid; }
427 margin-bottom:
1.5em;
429 div.tableblock
> table {
430 border:
3px solid #
527bbd;
432 thead, p.table.header {
439 /* Because the table frame attribute is overridden by CSS in most browsers. */
440 div.tableblock
> table[
frame=
"void"] {
443 div.tableblock
> table[
frame=
"hsides"] {
444 border-left-style: none;
445 border-right-style: none;
447 div.tableblock
> table[
frame=
"vsides"] {
448 border-top-style: none;
449 border-bottom-style: none;
460 margin-bottom:
1.5em;
462 thead, p.tableblock.header {
473 border-color: #
527bbd;
474 border-collapse: collapse;
476 th.tableblock, td.tableblock {
480 border-color: #
527bbd;
483 table.tableblock.frame-topbot {
484 border-left-style: hidden;
485 border-right-style: hidden;
487 table.tableblock.frame-sides {
488 border-top-style: hidden;
489 border-bottom-style: hidden;
491 table.tableblock.frame-none {
492 border-style: hidden;
495 th.tableblock.halign-left, td.tableblock.halign-left {
498 th.tableblock.halign-center, td.tableblock.halign-center {
501 th.tableblock.halign-right, td.tableblock.halign-right {
505 th.tableblock.valign-top, td.tableblock.valign-top {
508 th.tableblock.valign-middle, td.tableblock.valign-middle {
509 vertical-align: middle;
511 th.tableblock.valign-bottom, td.tableblock.valign-bottom {
512 vertical-align: bottom;
523 padding-bottom:
0.5em;
524 border-top:
2px solid silver;
525 border-bottom:
2px solid silver;
530 body.manpage div.sectionbody {
535 body.manpage div#toc { display: none; }
540 <script type=
"text/javascript">
542 var asciidoc = { // Namespace.
544 /////////////////////////////////////////////////////////////////////
545 // Table Of Contents generator
546 /////////////////////////////////////////////////////////////////////
548 /* Author: Mihai Bazon, September
2002
549 * http://students.infoiasi.ro/~mishoo
551 * Table Of Content generator
554 * Feel free to use this script under the terms of the GNU General Public
555 * License, as long as you do not remove or alter this notice.
558 /* modified by Troy D. Hanson, September
2006. License: GPL */
559 /* modified by Stuart Rackham,
2006,
2009. License: GPL */
562 toc: function (toclevels) {
564 function getText(el) {
566 for (var i = el.firstChild; i != null; i = i.nextSibling) {
567 if (i.nodeType ==
3 /* Node.TEXT_NODE */) // IE doesn't speak constants.
569 else if (i.firstChild != null)
575 function TocEntry(el, text, toclevel) {
578 this.toclevel = toclevel;
581 function tocEntries(el, toclevels) {
582 var result = new Array;
583 var re = new RegExp('[hH]([
1-'+(toclevels+
1)+'])');
584 // Function that scans the DOM tree for header elements (the DOM2
585 // nodeIterator API would be a better technique but not supported by all
587 var iterate = function (el) {
588 for (var i = el.firstChild; i != null; i = i.nextSibling) {
589 if (i.nodeType ==
1 /* Node.ELEMENT_NODE */) {
590 var mo = re.exec(i.tagName);
591 if (mo && (i.getAttribute(
"class") || i.getAttribute(
"className")) !=
"float") {
592 result[result.length] = new TocEntry(i, getText(i), mo[
1]-
1);
602 var toc = document.getElementById(
"toc");
607 // Delete existing TOC entries in case we're reloading the TOC.
608 var tocEntriesToRemove = [];
610 for (i =
0; i < toc.childNodes.length; i++) {
611 var entry = toc.childNodes[i];
612 if (entry.nodeName.toLowerCase() == 'div'
613 && entry.getAttribute(
"class")
614 && entry.getAttribute(
"class").match(/^toclevel/))
615 tocEntriesToRemove.push(entry);
617 for (i =
0; i < tocEntriesToRemove.length; i++) {
618 toc.removeChild(tocEntriesToRemove[i]);
621 // Rebuild TOC entries.
622 var entries = tocEntries(document.getElementById(
"content"), toclevels);
623 for (var i =
0; i < entries.length; ++i) {
624 var entry = entries[i];
625 if (entry.element.id ==
"")
626 entry.element.id =
"_toc_" + i;
627 var a = document.createElement(
"a");
628 a.href =
"#" + entry.element.id;
629 a.appendChild(document.createTextNode(entry.text));
630 var div = document.createElement(
"div");
632 div.className =
"toclevel" + entry.toclevel;
633 toc.appendChild(div);
635 if (entries.length ==
0)
636 toc.parentNode.removeChild(toc);
640 /////////////////////////////////////////////////////////////////////
641 // Footnotes generator
642 /////////////////////////////////////////////////////////////////////
644 /* Based on footnote generation code from:
645 * http://www.brandspankingnew.net/archive/
2005/
07/format_footnote.html
648 footnotes: function () {
649 // Delete existing footnote entries in case we're reloading the footnodes.
651 var noteholder = document.getElementById(
"footnotes");
655 var entriesToRemove = [];
656 for (i =
0; i < noteholder.childNodes.length; i++) {
657 var entry = noteholder.childNodes[i];
658 if (entry.nodeName.toLowerCase() == 'div' && entry.getAttribute(
"class") ==
"footnote")
659 entriesToRemove.push(entry);
661 for (i =
0; i < entriesToRemove.length; i++) {
662 noteholder.removeChild(entriesToRemove[i]);
665 // Rebuild footnote entries.
666 var cont = document.getElementById(
"content");
667 var spans = cont.getElementsByTagName(
"span");
670 for (i=
0; i
<spans.length; i++) {
671 if (spans[i].className ==
"footnote") {
673 var note = spans[i].getAttribute(
"data-note");
675 // Use [\s\S] in place of . so multi-line matches work.
676 // Because JavaScript has no s (dotall) regex flag.
677 note = spans[i].innerHTML.match(/\s*\[([\s\S]*)]\s*/)[
1];
679 "[<a id='_footnoteref_" + n +
"' href='#_footnote_" + n +
680 "' title='View footnote' class='footnote'>" + n +
"</a>]";
681 spans[i].setAttribute(
"data-note", note);
683 noteholder.innerHTML +=
684 "<div class='footnote' id='_footnote_" + n +
"'>" +
685 "<a href='#_footnoteref_" + n +
"' title='Return to text'>" +
686 n +
"</a>. " + note +
"</div>";
687 var id =spans[i].getAttribute(
"id");
688 if (id != null) refs[
"#"+id] = n;
692 noteholder.parentNode.removeChild(noteholder);
694 // Process footnoterefs.
695 for (i=
0; i
<spans.length; i++) {
696 if (spans[i].className ==
"footnoteref") {
697 var href = spans[i].getElementsByTagName(
"a")[
0].getAttribute(
"href");
698 href = href.match(/#.*/)[
0]; // Because IE return full URL.
701 "[<a href='#_footnote_" + n +
702 "' title='View footnote' class='footnote'>" + n +
"</a>]";
708 install: function(toclevels) {
711 function reinstall() {
712 asciidoc.footnotes();
714 asciidoc.toc(toclevels);
718 function reinstallAndRemoveTimer() {
719 clearInterval(timerId);
723 timerId = setInterval(reinstall,
500);
724 if (document.addEventListener)
725 document.addEventListener(
"DOMContentLoaded", reinstallAndRemoveTimer, false);
727 window.onload = reinstallAndRemoveTimer;
735 <body class=
"article">
738 <span id=
"revdate">2024-
05-
31</span>
742 <div class=
"sectionbody">
743 <div class=
"paragraph"><p>Git bundles are files that store a pack-file along with some extra metadata,
744 including a set of refs and a (possibly empty) set of necessary commits. See
745 <a href=
"../git-bundle.html">git-bundle(
1)
</a> and
<a href=
"../gitformat-bundle.html">gitformat-bundle(
5)
</a> for more information.
</p></div>
746 <div class=
"paragraph"><p>Bundle URIs are locations where Git can download one or more bundles in
747 order to bootstrap the object database in advance of fetching the remaining
748 objects from a remote.
</p></div>
749 <div class=
"paragraph"><p>One goal is to speed up clones and fetches for users with poor network
750 connectivity to the origin server. Another benefit is to allow heavy users,
751 such as CI build farms, to use local resources for the majority of Git data
752 and thereby reducing the load on the origin server.
</p></div>
753 <div class=
"paragraph"><p>To enable the bundle URI feature, users can specify a bundle URI using
754 command-line options or the origin server can advertise one or more URIs
755 via a protocol v2 capability.
</p></div>
759 <h2 id=
"_design_goals">Design Goals
</h2>
760 <div class=
"sectionbody">
761 <div class=
"paragraph"><p>The bundle URI standard aims to be flexible enough to satisfy multiple
762 workloads. The bundle provider and the Git client have several choices in
763 how they create and consume bundle URIs.
</p></div>
764 <div class=
"ulist"><ul>
767 Bundles can have whatever name the server desires. This name could refer
768 to immutable data by using a hash of the bundle contents. However, this
769 means that a new URI will be needed after every update of the content.
770 This might be acceptable if the server is advertising the URI (and the
771 server is aware of new bundles being generated) but would not be
772 ergonomic for users using the command line option.
777 The bundles could be organized specifically for bootstrapping full
778 clones, but could also be organized with the intention of bootstrapping
779 incremental fetches. The bundle provider must decide on one of several
780 organization schemes to minimize client downloads during incremental
781 fetches, but the Git client can also choose whether to use bundles for
782 either of these operations.
787 The bundle provider can choose to support full clones, partial clones,
788 or both. The client can detect which bundles are appropriate for the
789 repository
’s partial clone filter, if any.
794 The bundle provider can use a single bundle (for clones only), or a
795 list of bundles. When using a list of bundles, the provider can specify
796 whether or not the client needs
<em>all
</em> of the bundle URIs for a full
797 clone, or if
<em>any
</em> one of the bundle URIs is sufficient. This allows the
798 bundle provider to use different URIs for different geographies.
803 The bundle provider can organize the bundles using heuristics, such as
804 creation tokens, to help the client prevent downloading bundles it does
805 not need. When the bundle provider does not provide these heuristics,
806 the client can use optimizations to minimize how much of the data is
812 The bundle provider does not need to be associated with the Git server.
813 The client can choose to use the bundle provider without it being
814 advertised by the Git server.
819 The client can choose to discover bundle providers that are advertised
820 by the Git server. This could happen during
<code>git clone
</code>, during
821 <code>git fetch
</code>, both, or neither. The user can choose which combination
827 The client can choose to configure a bundle provider manually at any
828 time. The client can also choose to specify a bundle provider manually
829 as a command-line option to
<code>git clone
</code>.
833 <div class=
"paragraph"><p>Each repository is different and every Git server has different needs.
834 Hopefully the bundle URI feature is flexible enough to satisfy all needs.
835 If not, then the feature can be extended through its versioning mechanism.
</p></div>
839 <h2 id=
"_server_requirements">Server requirements
</h2>
840 <div class=
"sectionbody">
841 <div class=
"paragraph"><p>To provide a server-side implementation of bundle servers, no other parts
842 of the Git protocol are required. This allows server maintainers to use
843 static content solutions such as CDNs in order to serve the bundle files.
</p></div>
844 <div class=
"paragraph"><p>At the current scope of the bundle URI feature, all URIs are expected to
845 be HTTP(S) URLs where content is downloaded to a local file using a
<code>GET
</code>
846 request to that URL. The server could include authentication requirements
847 to those requests with the aim of triggering the configured credential
848 helper for secure access. (Future extensions could use
"file://" URIs or
850 <div class=
"paragraph"><p>Assuming a
<code>200 OK
</code> response from the server, the content at the URL is
851 inspected. First, Git attempts to parse the file as a bundle file of
852 version
2 or higher. If the file is not a bundle, then the file is parsed
853 as a plain-text file using Git
’s config parser. The key-value pairs in
854 that config file are expected to describe a list of bundle URIs. If
855 neither of these parse attempts succeed, then Git will report an error to
856 the user that the bundle URI provided erroneous data.
</p></div>
857 <div class=
"paragraph"><p>Any other data provided by the server is considered erroneous.
</p></div>
861 <h2 id=
"_bundle_lists">Bundle Lists
</h2>
862 <div class=
"sectionbody">
863 <div class=
"paragraph"><p>The Git server can advertise bundle URIs using a set of
<code>key=value
</code> pairs.
864 A bundle URI can also serve a plain-text file in the Git config format
865 containing these same
<code>key=value
</code> pairs. In both cases, we consider this
866 to be a
<em>bundle list
</em>. The pairs specify information about the bundles
867 that the client can use to make decisions for which bundles to download
868 and which to ignore.
</p></div>
869 <div class=
"paragraph"><p>A few keys focus on properties of the list itself.
</p></div>
870 <div class=
"dlist"><dl>
876 (Required) This value provides a version number for the bundle
877 list. If a future Git change enables a feature that needs the Git
878 client to react to a new key in the bundle list file, then this version
879 will increment. The only current version number is
1, and if any other
880 value is specified then Git will fail to use this file.
888 (Required) This value has one of two values:
<code>all
</code> and
<code>any
</code>. When
<code>all
</code>
889 is specified, then the client should expect to need all of the listed
890 bundle URIs that match their repository
’s requirements. When
<code>any
</code> is
891 specified, then the client should expect that any one of the bundle URIs
892 that match their repository
’s requirements will suffice. Typically, the
893 <code>any
</code> option is used to list a number of different bundle servers
894 located in different geographies.
902 If this string-valued key exists, then the bundle list is designed to
903 work well with incremental
<code>git fetch
</code> commands. The heuristic signals
904 that there are additional keys available for each bundle that help
905 determine which subset of bundles the client should download. The only
906 heuristic currently planned is
<code>creationToken
</code>.
910 <div class=
"paragraph"><p>The remaining keys include an
<code><id
></code> segment which is a server-designated
911 name for each available bundle. The
<code><id
></code> must contain only alphanumeric
912 and
<code>-
</code> characters.
</p></div>
913 <div class=
"dlist"><dl>
915 bundle.
<id
>.uri
919 (Required) This string value is the URI for downloading bundle
<code><id
></code>.
920 If the URI begins with a protocol (
<code>http://
</code> or
<code>https://
</code>) then the URI
921 is absolute. Otherwise, the URI is interpreted as relative to the URI
922 used for the bundle list. If the URI begins with
<code>/
</code>, then that relative
923 path is relative to the domain name used for the bundle list. (This use
924 of relative paths is intended to make it easier to distribute a set of
925 bundles across a large number of servers or CDNs with different domain
930 bundle.
<id
>.filter
934 This string value represents an object filter that should also appear in
935 the header of this bundle. The server uses this value to differentiate
936 different kinds of bundles from which the client can choose those that
937 match their object filters.
941 bundle.
<id
>.creationToken
945 This value is a nonnegative
64-bit integer used for sorting the bundles
946 list. This is used to download a subset of bundles during a fetch when
947 <code>bundle.heuristic=creationToken
</code>.
951 bundle.
<id
>.location
955 This string value advertises a real-world location from where the bundle
956 URI is served. This can be used to present the user with an option for
957 which bundle URI to use or simply as an informative indicator of which
958 bundle URI was selected by Git. This is only valuable when
959 <code>bundle.mode
</code> is
<code>any
</code>.
963 <div class=
"paragraph"><p>Here is an example bundle list using the Git config format:
</p></div>
964 <div class=
"literalblock">
965 <div class=
"content">
969 heuristic = creationToken
</code></pre>
971 <div class=
"literalblock">
972 <div class=
"content">
973 <pre><code>[bundle
"2022-02-09-1644442601-daily"]
974 uri = https://bundles.example.com/git/git/
2022-
02-
09-
1644442601-daily.bundle
975 creationToken =
1644442601</code></pre>
977 <div class=
"literalblock">
978 <div class=
"content">
979 <pre><code>[bundle
"2022-02-02-1643842562"]
980 uri = https://bundles.example.com/git/git/
2022-
02-
02-
1643842562.bundle
981 creationToken =
1643842562</code></pre>
983 <div class=
"literalblock">
984 <div class=
"content">
985 <pre><code>[bundle
"2022-02-09-1644442631-daily-blobless"]
986 uri =
2022-
02-
09-
1644442631-daily-blobless.bundle
987 creationToken =
1644442631
988 filter = blob:none
</code></pre>
990 <div class=
"literalblock">
991 <div class=
"content">
992 <pre><code>[bundle
"2022-02-02-1643842568-blobless"]
993 uri = /git/git/
2022-
02-
02-
1643842568-blobless.bundle
994 creationToken =
1643842568
995 filter = blob:none
</code></pre>
997 <div class=
"paragraph"><p>This example uses
<code>bundle.mode=all
</code> as well as the
998 <code>bundle.
<id
>.creationToken
</code> heuristic. It also uses the
<code>bundle.
<id
>.filter
</code>
999 options to present two parallel sets of bundles: one for full clones and
1000 another for blobless partial clones.
</p></div>
1001 <div class=
"paragraph"><p>Suppose that this bundle list was found at the URI
1002 <code>https://bundles.example.com/git/git/
</code> and so the two blobless bundles have
1003 the following fully-expanded URIs:
</p></div>
1004 <div class=
"ulist"><ul>
1007 <code>https://bundles.example.com/git/git/
2022-
02-
09-
1644442631-daily-blobless.bundle
</code>
1012 <code>https://bundles.example.com/git/git/
2022-
02-
02-
1643842568-blobless.bundle
</code>
1019 <h2 id=
"_advertising_bundle_uris">Advertising Bundle URIs
</h2>
1020 <div class=
"sectionbody">
1021 <div class=
"paragraph"><p>If a user knows a bundle URI for the repository they are cloning, then
1022 they can specify that URI manually through a command-line option. However,
1023 a Git host may want to advertise bundle URIs during the clone operation,
1024 helping users unaware of the feature.
</p></div>
1025 <div class=
"paragraph"><p>The only thing required for this feature is that the server can advertise
1026 one or more bundle URIs. This advertisement takes the form of a new
1027 protocol v2 capability specifically for discovering bundle URIs.
</p></div>
1028 <div class=
"paragraph"><p>The client could choose an arbitrary bundle URI as an option
<em>or
</em> select
1029 the URI with best performance by some exploratory checks. It is up to the
1030 bundle provider to decide if having multiple URIs is preferable to a
1031 single URI that is geodistributed through server-side infrastructure.
</p></div>
1035 <h2 id=
"_cloning_with_bundle_uris">Cloning with Bundle URIs
</h2>
1036 <div class=
"sectionbody">
1037 <div class=
"paragraph"><p>The primary need for bundle URIs is to speed up clones. The Git client
1038 will interact with bundle URIs according to the following flow:
</p></div>
1039 <div class=
"olist arabic"><ol class=
"arabic">
1042 The user specifies a bundle URI with the
<code>--bundle-uri
</code> command-line
1043 option
<em>or
</em> the client discovers a bundle list advertised by the
1049 If the downloaded data from a bundle URI is a bundle, then the client
1050 inspects the bundle headers to check that the prerequisite commit OIDs
1051 are present in the client repository. If some are missing, then the
1052 client delays unbundling until other bundles have been unbundled,
1053 making those OIDs present. When all required OIDs are present, the
1054 client unbundles that data using a refspec. The default refspec is
1055 <code>+refs/heads/*:refs/bundles/*
</code>, but this can be configured. These refs
1056 are stored so that later
<code>git fetch
</code> negotiations can communicate each
1057 bundled ref as a
<code>have
</code>, reducing the size of the fetch over the Git
1058 protocol. To allow pruning refs from this ref namespace, Git may
1059 introduce a numbered namespace (such as
<code>refs/bundles/
<i
>/*
</code>) such that
1060 stale bundle refs can be deleted.
1065 If the file is instead a bundle list, then the client inspects the
1066 <code>bundle.mode
</code> to see if the list is of the
<code>all
</code> or
<code>any
</code> form.
1068 <div class=
"olist loweralpha"><ol class=
"loweralpha">
1071 If
<code>bundle.mode=all
</code>, then the client considers all bundle
1072 URIs. The list is reduced based on the
<code>bundle.
<id
>.filter
</code> options
1073 matching the client repository
’s partial clone filter. Then, all
1074 bundle URIs are requested. If the
<code>bundle.
<id
>.creationToken
</code>
1075 heuristic is provided, then the bundles are downloaded in decreasing
1076 order by the creation token, stopping when a bundle has all required
1077 OIDs. The bundles can then be unbundled in increasing creation token
1078 order. The client stores the latest creation token as a heuristic
1079 for avoiding future downloads if the bundle list does not advertise
1080 bundles with larger creation tokens.
1085 If
<code>bundle.mode=any
</code>, then the client can choose any one of the
1086 bundle URIs to inspect. The client can use a variety of ways to
1087 choose among these URIs. The client can also fallback to another URI
1088 if the initial choice fails to return a result.
1094 <div class=
"paragraph"><p>Note that during a clone we expect that all bundles will be required, and
1095 heuristics such as
<code>bundle.
<uri
>.creationToken
</code> can be used to download
1096 bundles in chronological order or in parallel.
</p></div>
1097 <div class=
"paragraph"><p>If a given bundle URI is a bundle list with a
<code>bundle.heuristic
</code>
1098 value, then the client can choose to store that URI as its chosen bundle
1099 URI. The client can then navigate directly to that URI during later
<code>git
1100 fetch
</code> calls.
</p></div>
1101 <div class=
"paragraph"><p>When downloading bundle URIs, the client can choose to inspect the initial
1102 content before committing to downloading the entire content. This may
1103 provide enough information to determine if the URI is a bundle list or
1104 a bundle. In the case of a bundle, the client may inspect the bundle
1105 header to determine that all advertised tips are already in the client
1106 repository and cancel the remaining download.
</p></div>
1110 <h2 id=
"_fetching_with_bundle_uris">Fetching with Bundle URIs
</h2>
1111 <div class=
"sectionbody">
1112 <div class=
"paragraph"><p>When the client fetches new data, it can decide to fetch from bundle
1113 servers before fetching from the origin remote. This could be done via a
1114 command-line option, but it is more likely useful to use a config value
1115 such as the one specified during the clone.
</p></div>
1116 <div class=
"paragraph"><p>The fetch operation follows the same procedure to download bundles from a
1117 bundle list (although we do
<em>not
</em> want to use parallel downloads here). We
1118 expect that the process will end when all prerequisite commit OIDs in a
1119 thin bundle are already in the object database.
</p></div>
1120 <div class=
"paragraph"><p>When using the
<code>creationToken
</code> heuristic, the client can avoid downloading
1121 any bundles if their creation tokens are not larger than the stored
1122 creation token. After fetching new bundles, Git updates this local
1123 creation token.
</p></div>
1124 <div class=
"paragraph"><p>If the bundle provider does not provide a heuristic, then the client
1125 should attempt to inspect the bundle headers before downloading the full
1126 bundle data in case the bundle tips already exist in the client
1127 repository.
</p></div>
1131 <h2 id=
"_error_conditions">Error Conditions
</h2>
1132 <div class=
"sectionbody">
1133 <div class=
"paragraph"><p>If the Git client discovers something unexpected while downloading
1134 information according to a bundle URI or the bundle list found at that
1135 location, then Git can ignore that data and continue as if it was not
1136 given a bundle URI. The remote Git server is the ultimate source of truth,
1137 not the bundle URI.
</p></div>
1138 <div class=
"paragraph"><p>Here are a few example error conditions:
</p></div>
1139 <div class=
"ulist"><ul>
1142 The client fails to connect with a server at the given URI or a connection
1143 is lost without any chance to recover.
1148 The client receives a
400-level response (such as
<code>404 Not Found
</code> or
1149 <code>401 Not Authorized
</code>). The client should use the credential helper to
1150 find and provide a credential for the URI, but match the semantics of
1151 Git
’s other HTTP protocols in terms of handling specific
400-level
1157 The server reports any other failure response.
1162 The client receives data that is not parsable as a bundle or bundle list.
1167 A bundle includes a filter that does not match expectations.
1172 The client cannot unbundle the bundles because the prerequisite commit OIDs
1173 are not in the object database and there are no more bundles to download.
1177 <div class=
"paragraph"><p>There are also situations that could be seen as wasteful, but are not
1178 error conditions:
</p></div>
1179 <div class=
"ulist"><ul>
1182 The downloaded bundles contain more information than is requested by
1183 the clone or fetch request. A primary example is if the user requests
1184 a clone with
<code>--single-branch
</code> but downloads bundles that store every
1185 reachable commit from all
<code>refs/heads/*
</code> references. This might be
1186 initially wasteful, but perhaps these objects will become reachable by
1187 a later ref update that the client cares about.
1192 A bundle download during a
<code>git fetch
</code> contains objects already in the
1193 object database. This is probably unavoidable if we are using bundles
1194 for fetches, since the client will almost always be slightly ahead of
1195 the bundle servers after performing its
"catch-up" fetch to the remote
1196 server. This extra work is most wasteful when the client is fetching
1197 much more frequently than the server is computing bundles, such as if
1198 the client is using hourly prefetches with background maintenance, but
1199 the server is computing bundles weekly. For this reason, the client
1200 should not use bundle URIs for fetch unless the server has explicitly
1201 recommended it through a
<code>bundle.heuristic
</code> value.
1208 <h2 id=
"_example_bundle_provider_organization">Example Bundle Provider organization
</h2>
1209 <div class=
"sectionbody">
1210 <div class=
"paragraph"><p>The bundle URI feature is intentionally designed to be flexible to
1211 different ways a bundle provider wants to organize the object data.
1212 However, it can be helpful to have a complete organization model described
1213 here so providers can start from that base.
</p></div>
1214 <div class=
"paragraph"><p>This example organization is a simplified model of what is used by the
1215 GVFS Cache Servers (see section near the end of this document) which have
1216 been beneficial in speeding up clones and fetches for very large
1217 repositories, although using extra software outside of Git.
</p></div>
1218 <div class=
"paragraph"><p>The bundle provider deploys servers across multiple geographies. Each
1219 server manages its own bundle set. The server can track a number of Git
1220 repositories, but provides a bundle list for each based on a pattern. For
1221 example, when mirroring a repository at
<code>https://
<domain
>/
<org
>/
<repo
></code>
1222 the bundle server could have its bundle list available at
1223 <code>https://
<server-url
>/
<domain
>/
<org
>/
<repo
></code>. The origin Git server can
1224 list all of these servers under the
"any" mode:
</p></div>
1225 <div class=
"literalblock">
1226 <div class=
"content">
1229 mode = any
</code></pre>
1231 <div class=
"literalblock">
1232 <div class=
"content">
1233 <pre><code>[bundle
"eastus"]
1234 uri = https://eastus.example.com/
<domain
>/
<org
>/
<repo
></code></pre>
1236 <div class=
"literalblock">
1237 <div class=
"content">
1238 <pre><code>[bundle
"europe"]
1239 uri = https://europe.example.com/
<domain
>/
<org
>/
<repo
></code></pre>
1241 <div class=
"literalblock">
1242 <div class=
"content">
1243 <pre><code>[bundle
"apac"]
1244 uri = https://apac.example.com/
<domain
>/
<org
>/
<repo
></code></pre>
1246 <div class=
"paragraph"><p>This
"list of lists" is static and only changes if a bundle server is
1247 added or removed.
</p></div>
1248 <div class=
"paragraph"><p>Each bundle server manages its own set of bundles. The initial bundle list
1249 contains only a single bundle, containing all of the objects received from
1250 cloning the repository from the origin server. The list uses the
1251 <code>creationToken
</code> heuristic and a
<code>creationToken
</code> is made for the bundle
1252 based on the server
’s timestamp.
</p></div>
1253 <div class=
"paragraph"><p>The bundle server runs regularly-scheduled updates for the bundle list,
1254 such as once a day. During this task, the server fetches the latest
1255 contents from the origin server and generates a bundle containing the
1256 objects reachable from the latest origin refs, but not contained in a
1257 previously-computed bundle. This bundle is added to the list, with care
1258 that the
<code>creationToken
</code> is strictly greater than the previous maximum
1259 <code>creationToken
</code>.
</p></div>
1260 <div class=
"paragraph"><p>When the bundle list grows too large, say more than
30 bundles, then the
1261 oldest
"<em>N</em> minus 30" bundles are combined into a single bundle. This
1262 bundle
’s
<code>creationToken
</code> is equal to the maximum
<code>creationToken
</code> among the
1263 merged bundles.
</p></div>
1264 <div class=
"paragraph"><p>An example bundle list is provided here, although it only has two daily
1265 bundles and not a full list of
30:
</p></div>
1266 <div class=
"literalblock">
1267 <div class=
"content">
1271 heuristic = creationToken
</code></pre>
1273 <div class=
"literalblock">
1274 <div class=
"content">
1275 <pre><code>[bundle
"2022-02-13-1644770820-daily"]
1276 uri = https://eastus.example.com/
<domain
>/
<org
>/
<repo
>/
2022-
02-
09-
1644770820-daily.bundle
1277 creationToken =
1644770820</code></pre>
1279 <div class=
"literalblock">
1280 <div class=
"content">
1281 <pre><code>[bundle
"2022-02-09-1644442601-daily"]
1282 uri = https://eastus.example.com/
<domain
>/
<org
>/
<repo
>/
2022-
02-
09-
1644442601-daily.bundle
1283 creationToken =
1644442601</code></pre>
1285 <div class=
"literalblock">
1286 <div class=
"content">
1287 <pre><code>[bundle
"2022-02-02-1643842562"]
1288 uri = https://eastus.example.com/
<domain
>/
<org
>/
<repo
>/
2022-
02-
02-
1643842562.bundle
1289 creationToken =
1643842562</code></pre>
1291 <div class=
"paragraph"><p>To avoid storing and serving object data in perpetuity despite becoming
1292 unreachable in the origin server, this bundle merge can be more careful.
1293 Instead of taking an absolute union of the old bundles, instead the bundle
1294 can be created by looking at the newer bundles and ensuring that their
1295 necessary commits are all available in this merged bundle (or in another
1296 one of the newer bundles). This allows
"expiring" object data that is not
1297 being used by new commits in this window of time. That data could be
1298 reintroduced by a later push.
</p></div>
1299 <div class=
"paragraph"><p>The intention of this data organization has two main goals. First, initial
1300 clones of the repository become faster by downloading precomputed object
1301 data from a closer source. Second,
<code>git fetch
</code> commands can be faster,
1302 especially if the client has not fetched for a few days. However, if a
1303 client does not fetch for
30 days, then the bundle list organization would
1304 cause redownloading a large amount of object data.
</p></div>
1305 <div class=
"paragraph"><p>One way to make this organization more useful to users who fetch frequently
1306 is to have more frequent bundle creation. For example, bundles could be
1307 created every hour, and then once a day those
"hourly" bundles could be
1308 merged into a
"daily" bundle. The daily bundles are merged into the
1309 oldest bundle after
30 days.
</p></div>
1310 <div class=
"paragraph"><p>It is recommended that this bundle strategy is repeated with the
<code>blob:none
</code>
1311 filter if clients of this repository are expecting to use blobless partial
1312 clones. This list of blobless bundles stays in the same list as the full
1313 bundles, but uses the
<code>bundle.
<id
>.filter
</code> key to separate the two groups.
1314 For very large repositories, the bundle provider may want to
<em>only
</em> provide
1315 blobless bundles.
</p></div>
1319 <h2 id=
"_implementation_plan">Implementation Plan
</h2>
1320 <div class=
"sectionbody">
1321 <div class=
"paragraph"><p>This design document is being submitted on its own as an aspirational
1322 document, with the goal of implementing all of the mentioned client
1323 features over the course of several patch series. Here is a potential
1324 outline for submitting these features:
</p></div>
1325 <div class=
"olist arabic"><ol class=
"arabic">
1328 Integrate bundle URIs into
<code>git clone
</code> with a
<code>--bundle-uri
</code> option.
1329 This will include a new
<code>git fetch --bundle-uri
</code> mode for use as the
1330 implementation underneath
<code>git clone
</code>. The initial version here will
1331 expect a single bundle at the given URI.
1336 Implement the ability to parse a bundle list from a bundle URI and
1337 update the
<code>git fetch --bundle-uri
</code> logic to properly distinguish
1338 between
<code>bundle.mode
</code> options. Specifically design the feature so
1339 that the config format parsing feeds a list of key-value pairs into the
1345 Create the
<code>bundle-uri
</code> protocol v2 command so Git servers can advertise
1346 bundle URIs using the key-value pairs. Plug into the existing key-value
1347 input to the bundle list logic. Allow
<code>git clone
</code> to discover these
1348 bundle URIs and bootstrap the client repository from the bundle data.
1349 (This choice is an opt-in via a config option and a command-line
1355 Allow the client to understand the
<code>bundle.heuristic
</code> configuration key
1356 and the
<code>bundle.
<id
>.creationToken
</code> heuristic. When
<code>git clone
</code>
1357 discovers a bundle URI with
<code>bundle.heuristic
</code>, it configures the client
1358 repository to check that bundle URI during later
<code>git fetch
<remote
></code>
1364 Allow clients to discover bundle URIs during
<code>git fetch
</code> and configure
1365 a bundle URI for later fetches if
<code>bundle.heuristic
</code> is set.
1370 Implement the
"inspect headers" heuristic to reduce data downloads when
1371 the
<code>bundle.
<id
>.creationToken
</code> heuristic is not available.
1375 <div class=
"paragraph"><p>As these features are reviewed, this plan might be updated. We also expect
1376 that new designs will be discovered and implemented as this feature
1377 matures and becomes used in real-world scenarios.
</p></div>
1381 <h2 id=
"_related_work_packfile_uris">Related Work: Packfile URIs
</h2>
1382 <div class=
"sectionbody">
1383 <div class=
"paragraph"><p>The Git protocol already has a capability where the Git server can list
1384 a set of URLs along with the packfile response when serving a client
1385 request. The client is then expected to download the packfiles at those
1386 locations in order to have a complete understanding of the response.
</p></div>
1387 <div class=
"paragraph"><p>This mechanism is used by the Gerrit server (implemented with JGit) and
1388 has been effective at reducing CPU load and improving user performance for
1390 <div class=
"paragraph"><p>A major downside to this mechanism is that the origin server needs to know
1391 <em>exactly
</em> what is in those packfiles, and the packfiles need to be available
1392 to the user for some time after the server has responded. This coupling
1393 between the origin and the packfile data is difficult to manage.
</p></div>
1394 <div class=
"paragraph"><p>Further, this implementation is extremely hard to make work with fetches.
</p></div>
1398 <h2 id=
"_related_work_gvfs_cache_servers">Related Work: GVFS Cache Servers
</h2>
1399 <div class=
"sectionbody">
1400 <div class=
"paragraph"><p>The GVFS Protocol [
2] is a set of HTTP endpoints designed independently of
1401 the Git project before Git
’s partial clone was created. One feature of this
1402 protocol is the idea of a
"cache server" which can be colocated with build
1403 machines or developer offices to transfer Git data without overloading the
1404 central server.
</p></div>
1405 <div class=
"paragraph"><p>The endpoint that VFS for Git is famous for is the
<code>GET /gvfs/objects/{oid}
</code>
1406 endpoint, which allows downloading an object on-demand. This is a critical
1407 piece of the filesystem virtualization of that product.
</p></div>
1408 <div class=
"paragraph"><p>However, a more subtle need is the
<code>GET /gvfs/prefetch?lastPackTimestamp=
<t
></code>
1409 endpoint. Given an optional timestamp, the cache server responds with a list
1410 of precomputed packfiles containing the commits and trees that were introduced
1411 in those time intervals.
</p></div>
1412 <div class=
"paragraph"><p>The cache server computes these
"prefetch" packfiles using the following
1414 <div class=
"olist arabic"><ol class=
"arabic">
1417 Every hour, an
"hourly" pack is generated with a given timestamp.
1422 Nightly, the previous
24 hourly packs are rolled up into a
"daily" pack.
1427 Nightly, all prefetch packs more than
30 days old are rolled up into
1432 <div class=
"paragraph"><p>When a user runs
<code>gvfs clone
</code> or
<code>scalar clone
</code> against a repo with cache
1433 servers, the client requests all prefetch packfiles, which is at most
1434 <code>24 +
30 +
1</code> packfiles downloading only commits and trees. The client
1435 then follows with a request to the origin server for the references, and
1436 attempts to checkout that tip reference. (There is an extra endpoint that
1437 helps get all reachable trees from a given commit, in case that commit
1438 was not already in a prefetch packfile.)
</p></div>
1439 <div class=
"paragraph"><p>During a
<code>git fetch
</code>, a hook requests the prefetch endpoint using the
1440 most-recent timestamp from a previously-downloaded prefetch packfile.
1441 Only the list of packfiles with later timestamps are downloaded. Most
1442 users fetch hourly, so they get at most one hourly prefetch pack. Users
1443 whose machines have been off or otherwise have not fetched in over
30 days
1444 might redownload all prefetch packfiles. This is rare.
</p></div>
1445 <div class=
"paragraph"><p>It is important to note that the clients always contact the origin server
1446 for the refs advertisement, so the refs are frequently
"ahead" of the
1447 prefetched pack data. The missing objects are downloaded on-demand using
1448 the
<code>GET gvfs/objects/{oid}
</code> requests, when needed by a command such as
1449 <code>git checkout
</code> or
<code>git log
</code>. Some Git optimizations disable checks that
1450 would cause these on-demand downloads to be too aggressive.
</p></div>
1454 <h2 id=
"_see_also">See Also
</h2>
1455 <div class=
"sectionbody">
1456 <div class=
"paragraph"><p>[
1]
<a href=
"https://lore.kernel.org/git/RFC-cover-00.13-0000000000-20210805T150534Z-avarab@gmail.com/">https://lore.kernel.org/git/RFC-cover-
00.13-
0000000000-
20210805T150534Z-avarab@gmail.com/
</a>
1457 An earlier RFC for a bundle URI feature.
</p></div>
1458 <div class=
"paragraph"><p>[
2]
<a href=
"https://github.com/microsoft/VFSForGit/blob/master/Protocol.md">https://github.com/microsoft/VFSForGit/blob/master/Protocol.md
</a>
1459 The GVFS Protocol
</p></div>
1463 <div id=
"footnotes"><hr /></div>
1465 <div id=
"footer-text">
1467 2023-
02-
22 15:
29:
29 PST