1 .. SPDX-License-Identifier: GPL-2.0
6 In a regular UNIX filesystem, the inode stores all the metadata
7 pertaining to the file (time stamps, block maps, extended attributes,
8 etc), not the directory entry. To find the information associated with a
9 file, one must traverse the directory files to find the directory entry
10 associated with a file, then load the inode to find the metadata for
11 that file. ext4 appears to cheat (for performance reasons) a little bit
12 by storing a copy of the file type (normally stored in the inode) in the
13 directory entry. (Compare all this to FAT, which stores all the file
14 information directly in the directory entry, but does not support hard
15 links and is in general more seek-happy than ext4 due to its simpler
16 block allocator and extensive use of linked lists.)
18 The inode table is a linear array of ``struct ext4_inode``. The table is
19 sized to have enough blocks to store at least
20 ``sb.s_inode_size * sb.s_inodes_per_group`` bytes. The number of the
21 block group containing an inode can be calculated as
22 ``(inode_number - 1) / sb.s_inodes_per_group``, and the offset into the
23 group's table is ``(inode_number - 1) % sb.s_inodes_per_group``. There
26 The inode checksum is calculated against the FS UUID, the inode number,
27 and the inode structure itself.
29 The inode table entry is laid out in ``struct ext4_inode``.
43 - File mode. See the table i_mode_ below.
47 - Lower 16-bits of Owner UID.
51 - Lower 32-bits of size in bytes.
55 - Last access time, in seconds since the epoch. However, if the EA\_INODE
56 inode flag is set, this inode stores an extended attribute value and
57 this field contains the checksum of the value.
61 - Last inode change time, in seconds since the epoch. However, if the
62 EA\_INODE inode flag is set, this inode stores an extended attribute
63 value and this field contains the lower 32 bits of the attribute value's
68 - Last data modification time, in seconds since the epoch. However, if the
69 EA\_INODE inode flag is set, this inode stores an extended attribute
70 value and this field contains the number of the inode that owns the
75 - Deletion Time, in seconds since the epoch.
79 - Lower 16-bits of GID.
83 - Hard link count. Normally, ext4 does not permit an inode to have more
84 than 65,000 hard links. This applies to files as well as directories,
85 which means that there cannot be more than 64,998 subdirectories in a
86 directory (each subdirectory's '..' entry counts as a hard link, as does
87 the '.' entry in the directory itself). With the DIR\_NLINK feature
88 enabled, ext4 supports more than 64,998 subdirectories by setting this
89 field to 1 to indicate that the number of hard links is not known.
93 - Lower 32-bits of “block” count. If the huge\_file feature flag is not
94 set on the filesystem, the file consumes ``i_blocks_lo`` 512-byte blocks
95 on disk. If huge\_file is set and EXT4\_HUGE\_FILE\_FL is NOT set in
96 ``inode.i_flags``, then the file consumes ``i_blocks_lo + (i_blocks_hi
97 << 32)`` 512-byte blocks on disk. If huge\_file is set and
98 EXT4\_HUGE\_FILE\_FL IS set in ``inode.i_flags``, then this file
99 consumes (``i_blocks_lo + i_blocks_hi`` << 32) filesystem blocks on
104 - Inode flags. See the table i_flags_ below.
108 - See the table i_osd1_ for more details.
111 - i\_block[EXT4\_N\_BLOCKS=15]
112 - Block map or extent tree. See the section “The Contents of inode.i\_block”.
116 - File version (for NFS).
120 - Lower 32-bits of extended attribute block. ACLs are of course one of
121 many possible extended attributes; I think the name of this field is a
122 result of the first use of extended attributes being for ACLs.
125 - i\_size\_high / i\_dir\_acl
126 - Upper 32-bits of file/directory size. In ext2/3 this field was named
127 i\_dir\_acl, though it was usually set to zero and never used.
131 - (Obsolete) fragment address.
135 - See the table i_osd2_ for more details.
139 - Size of this inode - 128. Alternately, the size of the extended inode
140 fields beyond the original ext2 inode, including this field.
144 - Upper 16-bits of the inode checksum.
148 - Extra change time bits. This provides sub-second precision. See Inode
153 - Extra modification time bits. This provides sub-second precision.
157 - Extra access time bits. This provides sub-second precision.
161 - File creation time, in seconds since the epoch.
165 - Extra file creation time bits. This provides sub-second precision.
169 - Upper 32-bits for version number.
177 The ``i_mode`` value is a combination of the following flags:
186 - S\_IXOTH (Others may execute)
188 - S\_IWOTH (Others may write)
190 - S\_IROTH (Others may read)
192 - S\_IXGRP (Group members may execute)
194 - S\_IWGRP (Group members may write)
196 - S\_IRGRP (Group members may read)
198 - S\_IXUSR (Owner may execute)
200 - S\_IWUSR (Owner may write)
202 - S\_IRUSR (Owner may read)
204 - S\_ISVTX (Sticky bit)
210 - These are mutually-exclusive file types:
214 - S\_IFCHR (Character device)
216 - S\_IFDIR (Directory)
218 - S\_IFBLK (Block device)
220 - S\_IFREG (Regular file)
222 - S\_IFLNK (Symbolic link)
228 The ``i_flags`` field is a combination of these values:
237 - This file requires secure deletion (EXT4\_SECRM\_FL). (not implemented)
239 - This file should be preserved, should undeletion be desired
240 (EXT4\_UNRM\_FL). (not implemented)
242 - File is compressed (EXT4\_COMPR\_FL). (not really implemented)
244 - All writes to the file must be synchronous (EXT4\_SYNC\_FL).
246 - File is immutable (EXT4\_IMMUTABLE\_FL).
248 - File can only be appended (EXT4\_APPEND\_FL).
250 - The dump(1) utility should not dump this file (EXT4\_NODUMP\_FL).
252 - Do not update access time (EXT4\_NOATIME\_FL).
254 - Dirty compressed file (EXT4\_DIRTY\_FL). (not used)
256 - File has one or more compressed clusters (EXT4\_COMPRBLK\_FL). (not used)
258 - Do not compress file (EXT4\_NOCOMPR\_FL). (not used)
260 - Encrypted inode (EXT4\_ENCRYPT\_FL). This bit value previously was
261 EXT4\_ECOMPR\_FL (compression error), which was never used.
263 - Directory has hashed indexes (EXT4\_INDEX\_FL).
265 - AFS magic directory (EXT4\_IMAGIC\_FL).
267 - File data must always be written through the journal
268 (EXT4\_JOURNAL\_DATA\_FL).
270 - File tail should not be merged (EXT4\_NOTAIL\_FL). (not used by ext4)
272 - All directory entry data should be written synchronously (see
273 ``dirsync``) (EXT4\_DIRSYNC\_FL).
275 - Top of directory hierarchy (EXT4\_TOPDIR\_FL).
277 - This is a huge file (EXT4\_HUGE\_FILE\_FL).
279 - Inode uses extents (EXT4\_EXTENTS\_FL).
281 - Verity protected file (EXT4\_VERITY\_FL).
283 - Inode stores a large extended attribute value in its data blocks
284 (EXT4\_EA\_INODE\_FL).
286 - This file has blocks allocated past EOF (EXT4\_EOFBLOCKS\_FL).
289 - Inode is a snapshot (``EXT4_SNAPFILE_FL``). (not in mainline)
291 - Snapshot is being deleted (``EXT4_SNAPFILE_DELETED_FL``). (not in
294 - Snapshot shrink has completed (``EXT4_SNAPFILE_SHRUNK_FL``). (not in
297 - Inode has inline data (EXT4\_INLINE\_DATA\_FL).
299 - Create children with the same project ID (EXT4\_PROJINHERIT\_FL).
301 - Reserved for ext4 library (EXT4\_RESERVED\_FL).
305 - User-visible flags.
307 - User-modifiable flags. Note that while EXT4\_JOURNAL\_DATA\_FL and
308 EXT4\_EXTENTS\_FL can be set with setattr, they are not in the kernel's
309 EXT4\_FL\_USER\_MODIFIABLE mask, since it needs to handle the setting of
310 these flags in a special manner and they are masked out of the set of
311 flags that are saved directly to i\_flags.
315 The ``osd1`` field has multiple meanings depending on the creator:
330 - Inode version. However, if the EA\_INODE inode flag is set, this inode
331 stores an extended attribute value and this field contains the upper 32
332 bits of the attribute value's reference count.
366 The ``osd2`` field has multiple meanings depending on the filesystem creator:
381 - Upper 16-bits of the block count. Please see the note attached to
385 - l\_i\_file\_acl\_high
386 - Upper 16-bits of the extended attribute block (historically, the file
387 ACL location). See the Extended Attributes section below.
391 - Upper 16-bits of the Owner UID.
395 - Upper 16-bits of the GID.
399 - Lower 16-bits of the inode checksum.
422 - Upper 16-bits of the file mode.
426 - Upper 16-bits of the Owner UID.
430 - Upper 16-bits of the GID.
452 - m\_i\_file\_acl\_high
453 - Upper 16-bits of the extended attribute block (historically, the file
463 In ext2 and ext3, the inode structure size was fixed at 128 bytes
464 (``EXT2_GOOD_OLD_INODE_SIZE``) and each inode had a disk record size of
465 128 bytes. Starting with ext4, it is possible to allocate a larger
466 on-disk inode at format time for all inodes in the filesystem to provide
467 space beyond the end of the original ext2 inode. The on-disk inode
468 record size is recorded in the superblock as ``s_inode_size``. The
469 number of bytes actually used by struct ext4\_inode beyond the original
470 128-byte ext2 inode is recorded in the ``i_extra_isize`` field for each
471 inode, which allows struct ext4\_inode to grow for a new kernel without
472 having to upgrade all of the on-disk inodes. Access to fields beyond
473 EXT2\_GOOD\_OLD\_INODE\_SIZE should be verified to be within
474 ``i_extra_isize``. By default, ext4 inode records are 256 bytes, and (as
475 of August 2019) the inode structure is 160 bytes
476 (``i_extra_isize = 32``). The extra space between the end of the inode
477 structure and the end of the inode record can be used to store extended
478 attributes. Each inode record can be as large as the filesystem block
479 size, though this is not terribly efficient.
484 Each block group contains ``sb->s_inodes_per_group`` inodes. Because
485 inode 0 is defined not to exist, this formula can be used to find the
486 block group that an inode lives in:
487 ``bg = (inode_num - 1) / sb->s_inodes_per_group``. The particular inode
488 can be found within the block group's inode table at
489 ``index = (inode_num - 1) % sb->s_inodes_per_group``. To get the byte
490 address within the inode table, use
491 ``offset = index * sb->s_inode_size``.
496 Four timestamps are recorded in the lower 128 bytes of the inode
497 structure -- inode change time (ctime), access time (atime), data
498 modification time (mtime), and deletion time (dtime). The four fields
499 are 32-bit signed integers that represent seconds since the Unix epoch
500 (1970-01-01 00:00:00 GMT), which means that the fields will overflow in
501 January 2038. For inodes that are not linked from any directory but are
502 still open (orphan inodes), the dtime field is overloaded for use with
503 the orphan list. The superblock field ``s_last_orphan`` points to the
504 first inode in the orphan list; dtime is then the number of the next
505 orphaned inode, or zero if there are no more orphans.
507 If the inode structure size ``sb->s_inode_size`` is larger than 128
508 bytes and the ``i_inode_extra`` field is large enough to encompass the
509 respective ``i_[cma]time_extra`` field, the ctime, atime, and mtime
510 inode fields are widened to 64 bits. Within this “extra” 32-bit field,
511 the lower two bits are used to extend the 32-bit seconds field to be 34
512 bit wide; the upper 30 bits are used to provide nanosecond timestamp
513 accuracy. Therefore, timestamps should not overflow until May 2446.
514 dtime was not widened. There is also a fifth timestamp to record inode
515 creation time (crtime); this field is 64-bits wide and decoded in the
516 same manner as 64-bit [cma]time. Neither crtime nor dtime are accessible
517 through the regular stat() interface, though debugfs will report them.
519 We use the 32-bit signed time value plus (2^32 \* (extra epoch bits)).
523 :widths: 20 20 20 20 20
528 - Adjustment for signed 32-bit to 64-bit tv\_sec
529 - Decoded 64-bit tv\_sec
534 - ``-0x80000000 - -0x00000001``
535 - 1901-12-13 to 1969-12-31
539 - ``0x000000000 - 0x07fffffff``
540 - 1970-01-01 to 2038-01-19
544 - ``0x080000000 - 0x0ffffffff``
545 - 2038-01-19 to 2106-02-07
549 - ``0x100000000 - 0x17fffffff``
550 - 2106-02-07 to 2174-02-25
554 - ``0x180000000 - 0x1ffffffff``
555 - 2174-02-25 to 2242-03-16
559 - ``0x200000000 - 0x27fffffff``
560 - 2242-03-16 to 2310-04-04
564 - ``0x280000000 - 0x2ffffffff``
565 - 2310-04-04 to 2378-04-22
569 - ``0x300000000 - 0x37fffffff``
570 - 2378-04-22 to 2446-05-10
572 This is a somewhat odd encoding since there are effectively seven times
573 as many positive values as negative values. There have also been
574 long-standing bugs decoding and encoding dates beyond 2038, which don't
575 seem to be fixed as of kernel 3.12 and e2fsprogs 1.42.8. 64-bit kernels
576 incorrectly use the extra epoch bits 1,1 for dates between 1901 and
577 1970. At some point the kernel will be fixed and e2fsck will fix this
578 situation, assuming that it is run before 2310.