1 .. SPDX-License-Identifier: GPL-2.0
3 ===================================
4 File management in the Linux kernel
5 ===================================
7 This document describes how locking for files (struct file)
8 and file descriptor table (struct files) works.
10 Up until 2.6.12, the file descriptor table has been protected
11 with a lock (files->file_lock) and reference count (files->count).
12 ->file_lock protected accesses to all the file related fields
13 of the table. ->count was used for sharing the file descriptor
14 table between tasks cloned with CLONE_FILES flag. Typically
15 this would be the case for posix threads. As with the common
16 refcounting model in the kernel, the last task doing
17 a put_files_struct() frees the file descriptor (fd) table.
18 The files (struct file) themselves are protected using
19 reference count (->f_count).
21 In the new lock-free model of file descriptor management,
22 the reference counting is similar, but the locking is
23 based on RCU. The file descriptor table contains multiple
24 elements - the fd sets (open_fds and close_on_exec, the
25 array of file pointers, the sizes of the sets and the array
26 etc.). In order for the updates to appear atomic to
27 a lock-free reader, all the elements of the file descriptor
28 table are in a separate structure - struct fdtable.
29 files_struct contains a pointer to struct fdtable through
30 which the actual fd table is accessed. Initially the
31 fdtable is embedded in files_struct itself. On a subsequent
32 expansion of fdtable, a new fdtable structure is allocated
33 and files->fdtab points to the new structure. The fdtable
34 structure is freed with RCU and lock-free readers either
35 see the old fdtable or the new fdtable making the update
36 appear atomic. Here are the locking rules for
37 the fdtable structure -
39 1. All references to the fdtable must be done through
40 the files_fdtable() macro::
46 fdt = files_fdtable(files);
48 if (n <= fdt->max_fds)
53 files_fdtable() uses rcu_dereference() macro which takes care of
54 the memory barrier requirements for lock-free dereference.
55 The fdtable pointer must be read within the read-side
58 2. Reading of the fdtable as described above must be protected
59 by rcu_read_lock()/rcu_read_unlock().
61 3. For any update to the fd table, files->file_lock must
64 4. To look up the file structure given an fd, a reader
65 must use either lookup_fdget_rcu() or files_lookup_fdget_rcu() APIs. These
66 take care of barrier requirements due to lock-free lookup.
73 file = lookup_fdget_rcu(fd);
81 5. Since both fdtable and file structures can be looked up
82 lock-free, they must be installed using rcu_assign_pointer()
83 API. If they are looked up lock-free, rcu_dereference()
84 must be used. However it is advisable to use files_fdtable()
85 and lookup_fdget_rcu()/files_lookup_fdget_rcu() which take care of these
88 6. While updating, the fdtable pointer must be looked up while
89 holding files->file_lock. If ->file_lock is dropped, then
90 another thread expand the files thereby creating a new
91 fdtable and making the earlier fdtable pointer stale.
95 spin_lock(&files->file_lock);
96 fd = locate_fd(files, file, start);
98 /* locate_fd() may have expanded fdtable, load the ptr */
99 fdt = files_fdtable(files);
100 __set_open_fd(fd, fdt);
101 __clear_close_on_exec(fd, fdt);
102 spin_unlock(&files->file_lock);
105 Since locate_fd() can drop ->file_lock (and reacquire ->file_lock),
106 the fdtable pointer (fdt) must be loaded after locate_fd().
108 On newer kernels rcu based file lookup has been switched to rely on
109 SLAB_TYPESAFE_BY_RCU instead of call_rcu(). It isn't sufficient anymore
110 to just acquire a reference to the file in question under rcu using
111 atomic_long_inc_not_zero() since the file might have already been
112 recycled and someone else might have bumped the reference. In other
113 words, callers might see reference count bumps from newer users. For
114 this is reason it is necessary to verify that the pointer is the same
115 before and after the reference count increment. This pattern can be seen
116 in get_file_rcu() and __files_get_rcu().
118 In addition, it isn't possible to access or check fields in struct file
119 without first acquiring a reference on it under rcu lookup. Not doing
120 that was always very dodgy and it was only usable for non-pointer data
121 in struct file. With SLAB_TYPESAFE_BY_RCU it is necessary that callers
122 either first acquire a reference or they must hold the files_lock of the