man5/gitformat-chunk.5

   1 '\" t
   2 .\"     Title: gitformat-chunk
   3 .\"    Author: [FIXME: author] [see http://www.docbook.org/tdg5/en/html/author]
   4 .\" Generator: DocBook XSL Stylesheets vsnapshot <http://docbook.sf.net/>
   5 .\"      Date: 10/27/2022
   6 .\"    Manual: Git Manual
   7 .\"    Source: Git 2.38.1.220.g9388e93f00
   8 .\"  Language: English
   9 .\"
  10 .TH "GITFORMAT\-CHUNK" "5" "10/27/2022" "Git 2\&.38\&.1\&.220\&.g9388e9" "Git Manual"
  11 .\" -----------------------------------------------------------------
  12 .\" * Define some portability stuff
  13 .\" -----------------------------------------------------------------
  14 .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  15 .\" http://bugs.debian.org/507673
  16 .\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html
  17 .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  18 .ie \n(.g .ds Aq \(aq
  19 .el       .ds Aq '
  20 .\" -----------------------------------------------------------------
  21 .\" * set default formatting
  22 .\" -----------------------------------------------------------------
  23 .\" disable hyphenation
  24 .nh
  25 .\" disable justification (adjust text to left margin only)
  26 .ad l
  27 .\" -----------------------------------------------------------------
  28 .\" * MAIN CONTENT STARTS HERE *
  29 .\" -----------------------------------------------------------------
  30 .SH "NAME"
  31 gitformat-chunk \- Chunk\-based file formats
  32 .SH "SYNOPSIS"
  33 .sp
  34 Used by \fBgitformat-commit-graph\fR(5) and the "MIDX" format (see the pack format documentation in \fBgitformat-pack\fR(5))\&.
  35 .SH "DESCRIPTION"
  36 .sp
  37 Some file formats in Git use a common concept of "chunks" to describe sections of the file\&. This allows structured access to a large file by scanning a small "table of contents" for the remaining data\&. This common format is used by the \fBcommit\-graph\fR and \fBmulti\-pack\-index\fR files\&. See the \fBmulti\-pack\-index\fR format in \fBgitformat-pack\fR(5) and the \fBcommit\-graph\fR format in \fBgitformat-commit-graph\fR(5) for how they use the chunks to describe structured data\&.
  38 .sp
  39 A chunk\-based file format begins with some header information custom to that format\&. That header should include enough information to identify the file type, format version, and number of chunks in the file\&. From this information, that file can determine the start of the chunk\-based region\&.
  40 .sp
  41 The chunk\-based region starts with a table of contents describing where each chunk starts and ends\&. This consists of (C+1) rows of 12 bytes each, where C is the number of chunks\&. Consider the following table:
  42 .sp
  43 .if n \{\
  44 .RS 4
  45 .\}
  46 .nf
  47 | Chunk ID (4 bytes) | Chunk Offset (8 bytes) |
  48 |\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-|\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-|
  49 | ID[0]              | OFFSET[0]              |
  50 | \&.\&.\&.                | \&.\&.\&.                    |
  51 | ID[C]              | OFFSET[C]              |
  52 | 0x0000             | OFFSET[C+1]            |
  53 .fi
  54 .if n \{\
  55 .RE
  56 .\}
  57 .sp
  58 Each row consists of a 4\-byte chunk identifier (ID) and an 8\-byte offset\&. Each integer is stored in network\-byte order\&.
  59 .sp
  60 The chunk identifier \fBID[i]\fR is a label for the data stored within this fill from \fBOFFSET[i]\fR (inclusive) to \fBOFFSET[i+1]\fR (exclusive)\&. Thus, the size of the \fBi`th chunk is equal to the difference between `OFFSET[i+1]\fR and \fBOFFSET[i]\fR\&. This requires that the chunk data appears contiguously in the same order as the table of contents\&.
  61 .sp
  62 The final entry in the table of contents must be four zero bytes\&. This confirms that the table of contents is ending and provides the offset for the end of the chunk\-based data\&.
  63 .sp
  64 Note: The chunk\-based format expects that the file contains \fIat least\fR a trailing hash after \fBOFFSET[C+1]\fR\&.
  65 .sp
  66 Functions for working with chunk\-based file formats are declared in \fBchunk\-format\&.h\fR\&. Using these methods provide extra checks that assist developers when creating new file formats\&.
  67 .SH "WRITING CHUNK\-BASED FILE FORMATS"
  68 .sp
  69 To write a chunk\-based file format, create a \fBstruct chunkfile\fR by calling \fBinit_chunkfile()\fR and pass a \fBstruct hashfile\fR pointer\&. The caller is responsible for opening the \fBhashfile\fR and writing header information so the file format is identifiable before the chunk\-based format begins\&.
  70 .sp
  71 Then, call \fBadd_chunk()\fR for each chunk that is intended for write\&. This populates the \fBchunkfile\fR with information about the order and size of each chunk to write\&. Provide a \fBchunk_write_fn\fR function pointer to perform the write of the chunk data upon request\&.
  72 .sp
  73 Call \fBwrite_chunkfile()\fR to write the table of contents to the \fBhashfile\fR followed by each of the chunks\&. This will verify that each chunk wrote the expected amount of data so the table of contents is correct\&.
  74 .sp
  75 Finally, call \fBfree_chunkfile()\fR to clear the \fBstruct chunkfile\fR data\&. The caller is responsible for finalizing the \fBhashfile\fR by writing the trailing hash and closing the file\&.
  76 .SH "READING CHUNK\-BASED FILE FORMATS"
  77 .sp
  78 To read a chunk\-based file format, the file must be opened as a memory\-mapped region\&. The chunk\-format API expects that the entire file is mapped as a contiguous memory region\&.
  79 .sp
  80 Initialize a \fBstruct chunkfile\fR pointer with \fBinit_chunkfile(NULL)\fR\&.
  81 .sp
  82 After reading the header information from the beginning of the file, including the chunk count, call \fBread_table_of_contents()\fR to populate the \fBstruct chunkfile\fR with the list of chunks, their offsets, and their sizes\&.
  83 .sp
  84 Extract the data information for each chunk using \fBpair_chunk()\fR or \fBread_chunk()\fR:
  85 .sp
  86 .RS 4
  87 .ie n \{\
  88 \h'-04'\(bu\h'+03'\c
  89 .\}
  90 .el \{\
  91 .sp -1
  92 .IP \(bu 2.3
  93 .\}
  94 \fBpair_chunk()\fR
  95 assigns a given pointer with the location inside the memory\-mapped file corresponding to that chunk\(cqs offset\&. If the chunk does not exist, then the pointer is not modified\&.
  96 .RE
  97 .sp
  98 .RS 4
  99 .ie n \{\
 100 \h'-04'\(bu\h'+03'\c
 101 .\}
 102 .el \{\
 103 .sp -1
 104 .IP \(bu 2.3
 105 .\}
 106 \fBread_chunk()\fR
 107 takes a
 108 \fBchunk_read_fn\fR
 109 function pointer and calls it with the appropriate initial pointer and size information\&. The function is not called if the chunk does not exist\&. Use this method to read chunks if you need to perform immediate parsing or if you need to execute logic based on the size of the chunk\&.
 110 .RE
 111 .sp
 112 After calling these methods, call \fBfree_chunkfile()\fR to clear the \fBstruct chunkfile\fR data\&. This will not close the memory\-mapped region\&. Callers are expected to own that data for the timeframe the pointers into the region are needed\&.
 113 .SH "EXAMPLES"
 114 .sp
 115 These file formats use the chunk\-format API, and can be used as examples for future formats:
 116 .sp
 117 .RS 4
 118 .ie n \{\
 119 \h'-04'\(bu\h'+03'\c
 120 .\}
 121 .el \{\
 122 .sp -1
 123 .IP \(bu 2.3
 124 .\}
 125 \fBcommit\-graph:\fR
 126 see
 127 \fBwrite_commit_graph_file()\fR
 128 and
 129 \fBparse_commit_graph()\fR
 130 in
 131 \fBcommit\-graph\&.c\fR
 132 for how the chunk\-format API is used to write and parse the commit\-graph file format documented in the commit\-graph file format in
 133 \fBgitformat-commit-graph\fR(5)\&.
 134 .RE
 135 .sp
 136 .RS 4
 137 .ie n \{\
 138 \h'-04'\(bu\h'+03'\c
 139 .\}
 140 .el \{\
 141 .sp -1
 142 .IP \(bu 2.3
 143 .\}
 144 \fBmulti\-pack\-index:\fR
 145 see
 146 \fBwrite_midx_internal()\fR
 147 and
 148 \fBload_multi_pack_index()\fR
 149 in
 150 \fBmidx\&.c\fR
 151 for how the chunk\-format API is used to write and parse the multi\-pack\-index file format documented in the multi\-pack\-index file format section of
 152 \fBgitformat-pack\fR(5)\&.
 153 .RE
 154 .SH "GIT"
 155 .sp
 156 Part of the \fBgit\fR(1) suite