1 .. SPDX-License-Identifier: GPL-2.0
6 In an ext4 filesystem, a directory is more or less a flat file that maps
7 an arbitrary byte string (usually ASCII) to an inode number on the
8 filesystem. There can be many directory entries across the filesystem
9 that reference the same inode number--these are known as hard links, and
10 that is why hard links cannot reference files on other filesystems. As
11 such, directory entries are found by reading the data block(s)
12 associated with a directory file for the particular directory entry that
15 Linear (Classic) Directories
16 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
18 By default, each directory lists its entries in an “almost-linear”
19 array. I write “almost” because it's not a linear array in the memory
20 sense because directory entries are not split across filesystem blocks.
21 Therefore, it is more accurate to say that a directory is a series of
22 data blocks and that each block contains a linear array of directory
23 entries. The end of each per-block array is signified by reaching the
24 end of the block; the last entry in the block has a record length that
25 takes it all the way to the end of the block. The end of the entire
26 directory is of course signified by reaching the end of the file. Unused
27 directory entries are signified by inode = 0. By default the filesystem
28 uses ``struct ext4_dir_entry_2`` for directory entries unless the
29 “filetype” feature flag is not set, in which case it uses
30 ``struct ext4_dir_entry``.
32 The original directory entry format is ``struct ext4_dir_entry``, which
33 is at most 263 bytes long, though on disk you'll need to reference
34 ``dirent.rec_len`` to know for sure.
47 - Number of the inode that this directory entry points to.
51 - Length of this directory entry. Must be a multiple of 4.
55 - Length of the file name.
58 - name[EXT4\_NAME\_LEN]
61 Since file names cannot be longer than 255 bytes, the new directory
62 entry format shortens the rec\_len field and uses the space for a file
63 type flag, probably to avoid having to load every inode during directory
64 tree traversal. This format is ``ext4_dir_entry_2``, which is at most
65 263 bytes long, though on disk you'll need to reference
66 ``dirent.rec_len`` to know for sure.
79 - Number of the inode that this directory entry points to.
83 - Length of this directory entry.
87 - Length of the file name.
91 - File type code, see ftype_ table below.
94 - name[EXT4\_NAME\_LEN]
99 The directory file type is one of the following values:
114 - Character device file.
124 In order to add checksums to these classic directory blocks, a phony
125 ``struct ext4_dir_entry`` is placed at the end of each leaf block to
126 hold the checksum. The directory entry is 12 bytes long. The inode
127 number and name\_len fields are set to zero to fool old software into
128 ignoring an apparently empty directory entry, and the checksum is stored
129 in the place where the name normally goes. The structure is
130 ``struct ext4_dir_entry_tail``:
142 - det\_reserved\_zero1
143 - Inode number, which must be zero.
147 - Length of this directory entry, which must be 12.
150 - det\_reserved\_zero2
151 - Length of the file name, which must be zero.
155 - File type, which must be 0xDE.
159 - Directory leaf block checksum.
161 The leaf directory block checksum is calculated against the FS UUID, the
162 directory's inode number, the directory's inode generation number, and
163 the entire directory entry block up to (but not including) the fake
166 Hash Tree Directories
167 ~~~~~~~~~~~~~~~~~~~~~
169 A linear array of directory entries isn't great for performance, so a
170 new feature was added to ext3 to provide a faster (but peculiar)
171 balanced tree keyed off a hash of the directory entry name. If the
172 EXT4\_INDEX\_FL (0x1000) flag is set in the inode, this directory uses a
173 hashed btree (htree) to organize and find directory entries. For
174 backwards read-only compatibility with ext2, this tree is actually
175 hidden inside the directory file, masquerading as “empty” directory data
176 blocks! It was stated previously that the end of the linear directory
177 entry table was signified with an entry pointing to inode 0; this is
178 (ab)used to fool the old linear-scan algorithm into thinking that the
179 rest of the directory block is empty so that it moves on.
181 The root of the tree always lives in the first data block of the
182 directory. By ext2 custom, the '.' and '..' entries must appear at the
183 beginning of this first block, so they are put here as two
184 ``struct ext4_dir_entry_2``\ s and not stored in the tree. The rest of
185 the root node contains metadata about the tree and finally a hash->block
186 map to find nodes that are lower in the htree. If
187 ``dx_root.info.indirect_levels`` is non-zero then the htree has two
188 levels; the data block pointed to by the root node's map is an interior
189 node, which is indexed by a minor hash. Interior nodes in this tree
190 contains a zeroed out ``struct ext4_dir_entry_2`` followed by a
191 minor\_hash->block map to find leafe nodes. Leaf nodes contain a linear
192 array of all ``struct ext4_dir_entry_2``; all of these entries
193 (presumably) hash to the same value. If there is an overflow, the
194 entries simply overflow into the next leaf node, and the
195 least-significant bit of the hash (in the interior node map) that gets
196 us to this next leaf node is set.
198 To traverse the directory as a htree, the code calculates the hash of
199 the desired file name and uses it to find the corresponding block
200 number. If the tree is flat, the block is a linear array of directory
201 entries that can be searched; otherwise, the minor hash of the file name
202 is computed and used against this second block to find the corresponding
203 third block number. That third block number will be a linear array of
206 To traverse the directory as a linear array (such as the old code does),
207 the code simply reads every data block in the directory. The blocks used
208 for the htree will appear to have no entries (aside from '.' and '..')
209 and so only the leaf nodes will appear to have any interesting content.
211 The root of the htree is in ``struct dx_root``, which is the full length
225 - inode number of this directory.
229 - Length of this record, 12.
233 - Length of the name, 1.
237 - File type of this entry, 0x2 (directory) (if the feature flag is set).
245 - inode number of parent directory.
249 - block\_size - 12. The record length is long enough to cover all htree
254 - Length of the name, 2.
258 - File type of this entry, 0x2 (directory) (if the feature flag is set).
265 - struct dx\_root\_info.reserved\_zero
269 - struct dx\_root\_info.hash\_version
270 - Hash type, see dirhash_ table below.
273 - struct dx\_root\_info.info\_length
274 - Length of the tree information, 0x8.
277 - struct dx\_root\_info.indirect\_levels
278 - Depth of the htree. Cannot be larger than 3 if the INCOMPAT\_LARGEDIR
279 feature is set; cannot be larger than 2 otherwise.
282 - struct dx\_root\_info.unused\_flags
287 - Maximum number of dx\_entries that can follow this header, plus 1 for
292 - Actual number of dx\_entries that follow this header, plus 1 for the
297 - The block number (within the directory file) that goes with hash=0.
301 - As many 8-byte ``struct dx_entry`` as fits in the rest of the data block.
305 The directory hash is one of the following values:
322 - Half MD4, unsigned.
326 Interior nodes of an htree are recorded as ``struct dx_node``, which is
327 also the full length of a data block:
340 - Zero, to make it look like this entry is not in use.
344 - The size of the block, in order to hide all of the dx\_node data.
348 - Zero. There is no name for this “unused” directory entry.
352 - Zero. There is no file type for this “unused” directory entry.
356 - Maximum number of dx\_entries that can follow this header, plus 1 for
361 - Actual number of dx\_entries that follow this header, plus 1 for the
366 - The block number (within the directory file) that goes with the lowest
367 hash value of this block. This value is stored in the parent block.
371 - As many 8-byte ``struct dx_entry`` as fits in the rest of the data block.
373 The hash maps that exist in both ``struct dx_root`` and
374 ``struct dx_node`` are recorded as ``struct dx_entry``, which is 8 bytes
392 - Block number (within the directory file, not filesystem blocks) of the
393 next node in the htree.
395 (If you think this is all quite clever and peculiar, so does the
398 If metadata checksums are enabled, the last 8 bytes of the directory
399 block (precisely the length of one dx\_entry) are used to store a
400 ``struct dx_tail``, which contains the checksum. The ``limit`` and
401 ``count`` entries in the dx\_root/dx\_node structures are adjusted as
402 necessary to fit the dx\_tail into the block. If there is no space for
403 the dx\_tail, the user is notified to run e2fsck -D to rebuild the
404 directory index (which will ensure that there's space for the checksum.
405 The dx\_tail structure is 8 bytes long and looks like this:
422 - Checksum of the htree directory block.
424 The checksum is calculated against the FS UUID, the htree index header
425 (dx\_root or dx\_node), all of the htree indices (dx\_entry) that are in
426 use, and the tail block (dx\_tail).