1 .. SPDX-License-Identifier: GPL-2.0
3 =================================
4 Network Filesystem Helper Library
5 =================================
11 - Inode context helper functions.
12 - Buffered read helpers.
13 - Read helper functions.
14 - Read helper structures.
15 - Read helper operations.
16 - Read helper procedure.
17 - Read helper cache API.
23 The network filesystem helper library is a set of functions designed to aid a
24 network filesystem in implementing VM/VFS operations. For the moment, that
25 just includes turning various VM buffered read operations into requests to read
26 from the server. The helper library, however, can also interpose other
27 services, such as local caching or local data encryption.
29 Note that the library module doesn't link against local caching directly, so
30 access must be provided by the netfs.
36 The network filesystem helper library needs a place to store a bit of state for
37 its use on each netfs inode it is helping to manage. To this end, a context
38 structure is defined::
42 const struct netfs_request_ops *ops;
43 struct fscache_cookie *cache;
46 A network filesystem that wants to use netfs lib must place one of these in its
47 inode wrapper struct instead of the VFS ``struct inode``. This can be done in
48 a way similar to the following::
51 struct netfs_inode netfs; /* Netfslib context and vfs inode */
55 This allows netfslib to find its state by using ``container_of()`` from the
56 inode pointer, thereby allowing the netfslib helper functions to be pointed to
57 directly by the VFS/VM operation tables.
59 The structure contains the following fields:
63 The VFS inode structure.
67 The set of operations provided by the network filesystem to netfslib.
71 Local caching cookie, or NULL if no caching is enabled. This field does not
72 exist if fscache is disabled.
75 Inode Context Helper Functions
76 ------------------------------
78 To help deal with the per-inode context, a number helper functions are
79 provided. Firstly, a function to perform basic initialisation on a context and
80 set the operations table pointer::
82 void netfs_inode_init(struct netfs_inode *ctx,
83 const struct netfs_request_ops *ops);
85 then a function to cast from the VFS inode structure to the netfs context::
87 struct netfs_inode *netfs_node(struct inode *inode);
89 and finally, a function to get the cache cookie pointer from the context
90 attached to an inode (or NULL if fscache is disabled)::
92 struct fscache_cookie *netfs_i_cookie(struct netfs_inode *ctx);
98 The library provides a set of read helpers that handle the ->read_folio(),
99 ->readahead() and much of the ->write_begin() VM operations and translate them
100 into a common call framework.
102 The following services are provided:
104 * Handle folios that span multiple pages.
106 * Insulate the netfs from VM interface changes.
108 * Allow the netfs to arbitrarily split reads up into pieces, even ones that
109 don't match folio sizes or folio alignments and that may cross folios.
111 * Allow the netfs to expand a readahead request in both directions to meet its
114 * Allow the netfs to partially fulfil a read, which will then be resubmitted.
116 * Handle local caching, allowing cached data and server-read data to be
117 interleaved for a single request.
119 * Handle clearing of bufferage that isn't on the server.
121 * Handle retrying of reads that failed, switching reads from the cache to the
124 * In the future, this is a place that other services can be performed, such as
125 local encryption of data to be stored remotely or in the cache.
127 From the network filesystem, the helpers require a table of operations. This
128 includes a mandatory method to issue a read operation along with a number of
132 Read Helper Functions
133 ---------------------
135 Three read helpers are provided::
137 void netfs_readahead(struct readahead_control *ractl);
138 int netfs_read_folio(struct file *file,
139 struct folio *folio);
140 int netfs_write_begin(struct netfs_inode *ctx,
142 struct address_space *mapping,
145 struct folio **_folio,
148 Each corresponds to a VM address space operation. These operations use the
149 state in the per-inode context.
151 For ->readahead() and ->read_folio(), the network filesystem just point directly
152 at the corresponding read helper; whereas for ->write_begin(), it may be a
153 little more complicated as the network filesystem might want to flush
154 conflicting writes or track dirty data and needs to put the acquired folio if
155 an error occurs after calling the helper.
157 The helpers manage the read request, calling back into the network filesystem
158 through the supplied table of operations. Waits will be performed as
159 necessary before returning for helpers that are meant to be synchronous.
161 If an error occurs, the ->free_request() will be called to clean up the
162 netfs_io_request struct allocated. If some parts of the request are in
163 progress when an error occurs, the request will get partially completed if
164 sufficient data is read.
166 Additionally, there is::
168 * void netfs_subreq_terminated(struct netfs_io_subrequest *subreq,
169 ssize_t transferred_or_error,
172 which should be called to complete a read subrequest. This is given the number
173 of bytes transferred or a negative error code, plus a flag indicating whether
174 the operation was asynchronous (ie. whether the follow-on processing can be
175 done in the current context, given this may involve sleeping).
178 Read Helper Structures
179 ----------------------
181 The read helpers make use of a couple of structures to maintain the state of
182 the read. The first is a structure that manages a read request as a whole::
184 struct netfs_io_request {
186 struct address_space *mapping;
187 struct netfs_cache_resources cache_resources;
192 const struct netfs_request_ops *netfs_ops;
193 unsigned int debug_id;
197 The above fields are the ones the netfs can use. They are:
202 The inode and the address space of the file being read from. The mapping
203 may or may not point to inode->i_data.
205 * ``cache_resources``
207 Resources for the local cache to use, if present.
211 The network filesystem's private data. The value for this can be passed in
212 to the helper functions or set during the request.
217 The file position of the start of the read request and the length. These
218 may be altered by the ->expand_readahead() op.
222 The size of the file at the start of the request.
226 A pointer to the operation table. The value for this is passed into the
231 A number allocated to this operation that can be displayed in trace lines
235 The second structure is used to manage individual slices of the overall read
238 struct netfs_io_subrequest {
239 struct netfs_io_request *rreq;
244 unsigned short debug_index;
248 Each subrequest is expected to access a single source, though the helpers will
249 handle falling back from one source type to another. The members are:
253 A pointer to the read request.
258 The file position of the start of this slice of the read request and the
263 The amount of data transferred so far of the length of this slice. The
264 network filesystem or cache should start the operation this far into the
265 slice. If a short read occurs, the helpers will call again, having updated
266 this to reflect the amount read so far.
270 Flags pertaining to the read. There are two of interest to the filesystem
273 * ``NETFS_SREQ_CLEAR_TAIL``
275 This can be set to indicate that the remainder of the slice, from
276 transferred to len, should be cleared.
278 * ``NETFS_SREQ_SEEK_DATA_READ``
280 This is a hint to the cache that it might want to try skipping ahead to
281 the next data (ie. using SEEK_DATA).
285 A number allocated to this slice that can be displayed in trace lines for
289 Read Helper Operations
290 ----------------------
292 The network filesystem must provide the read helpers with a table of operations
293 through which it can issue requests and negotiate::
295 struct netfs_request_ops {
296 void (*init_request)(struct netfs_io_request *rreq, struct file *file);
297 void (*free_request)(struct netfs_io_request *rreq);
298 void (*expand_readahead)(struct netfs_io_request *rreq);
299 bool (*clamp_length)(struct netfs_io_subrequest *subreq);
300 void (*issue_read)(struct netfs_io_subrequest *subreq);
301 bool (*is_still_valid)(struct netfs_io_request *rreq);
302 int (*check_write_begin)(struct file *file, loff_t pos, unsigned len,
303 struct folio **foliop, void **_fsdata);
304 void (*done)(struct netfs_io_request *rreq);
307 The operations are as follows:
311 [Optional] This is called to initialise the request structure. It is given
312 the file for reference.
316 [Optional] This is called as the request is being deallocated so that the
317 filesystem can clean up any state it has attached there.
319 * ``expand_readahead()``
321 [Optional] This is called to allow the filesystem to expand the size of a
322 readahead read request. The filesystem gets to expand the request in both
323 directions, though it's not permitted to reduce it as the numbers may
324 represent an allocation already made. If local caching is enabled, it gets
325 to expand the request first.
327 Expansion is communicated by changing ->start and ->len in the request
328 structure. Note that if any change is made, ->len must be increased by at
329 least as much as ->start is reduced.
333 [Optional] This is called to allow the filesystem to reduce the size of a
334 subrequest. The filesystem can use this, for example, to chop up a request
335 that has to be split across multiple servers or to put multiple reads in
338 This should return 0 on success and an error code on error.
342 [Required] The helpers use this to dispatch a subrequest to the server for
343 reading. In the subrequest, ->start, ->len and ->transferred indicate what
344 data should be read from the server.
346 There is no return value; the netfs_subreq_terminated() function should be
347 called to indicate whether or not the operation succeeded and how much data
348 it transferred. The filesystem also should not deal with setting folios
349 uptodate, unlocking them or dropping their refs - the helpers need to deal
350 with this as they have to coordinate with copying to the local cache.
352 Note that the helpers have the folios locked, but not pinned. It is
353 possible to use the ITER_XARRAY iov iterator to refer to the range of the
354 inode that is being operated upon without the need to allocate large bvec
357 * ``is_still_valid()``
359 [Optional] This is called to find out if the data just read from the local
360 cache is still valid. It should return true if it is still valid and false
361 if not. If it's not still valid, it will be reread from the server.
363 * ``check_write_begin()``
365 [Optional] This is called from the netfs_write_begin() helper once it has
366 allocated/grabbed the folio to be modified to allow the filesystem to flush
367 conflicting state before allowing it to be modified.
369 It may unlock and discard the folio it was given and set the caller's folio
370 pointer to NULL. It should return 0 if everything is now fine (``*foliop``
371 left set) or the op should be retried (``*foliop`` cleared) and any other
372 error code to abort the operation.
376 [Optional] This is called after the folios in the request have all been
377 unlocked (and marked uptodate if applicable).
381 Read Helper Procedure
382 ---------------------
384 The read helpers work by the following general procedure:
386 * Set up the request.
388 * For readahead, allow the local cache and then the network filesystem to
389 propose expansions to the read request. This is then proposed to the VM.
390 If the VM cannot fully perform the expansion, a partially expanded read will
391 be performed, though this may not get written to the cache in its entirety.
393 * Loop around slicing chunks off of the request to form subrequests:
395 * If a local cache is present, it gets to do the slicing, otherwise the
396 helpers just try to generate maximal slices.
398 * The network filesystem gets to clamp the size of each slice if it is to be
399 the source. This allows rsize and chunking to be implemented.
401 * The helpers issue a read from the cache or a read from the server or just
402 clears the slice as appropriate.
404 * The next slice begins at the end of the last one.
406 * As slices finish being read, they terminate.
408 * When all the subrequests have terminated, the subrequests are assessed and
409 any that are short or have failed are reissued:
411 * Failed cache requests are issued against the server instead.
413 * Failed server requests just fail.
415 * Short reads against either source will be reissued against that source
416 provided they have transferred some more data:
418 * The cache may need to skip holes that it can't do DIO from.
420 * If NETFS_SREQ_CLEAR_TAIL was set, a short read will be cleared to the
421 end of the slice instead of reissuing.
423 * Once the data is read, the folios that have been fully read/cleared:
425 * Will be marked uptodate.
427 * If a cache is present, will be marked with PG_fscache.
431 * Any folios that need writing to the cache will then have DIO writes issued.
433 * Synchronous operations will wait for reading to be complete.
435 * Writes to the cache will proceed asynchronously and the folios will have the
436 PG_fscache mark removed when that completes.
438 * The request structures will be cleaned up when everything has completed.
441 Read Helper Cache API
442 ---------------------
444 When implementing a local cache to be used by the read helpers, two things are
445 required: some way for the network filesystem to initialise the caching for a
446 read request and a table of operations for the helpers to call.
448 To begin a cache operation on an fscache object, the following function is
451 int fscache_begin_read_operation(struct netfs_io_request *rreq,
452 struct fscache_cookie *cookie);
454 passing in the request pointer and the cookie corresponding to the file. This
455 fills in the cache resources mentioned below.
457 The netfs_io_request object contains a place for the cache to hang its
460 struct netfs_cache_resources {
461 const struct netfs_cache_ops *ops;
466 This contains an operations table pointer and two private pointers. The
467 operation table looks like the following::
469 struct netfs_cache_ops {
470 void (*end_operation)(struct netfs_cache_resources *cres);
472 void (*expand_readahead)(struct netfs_cache_resources *cres,
473 loff_t *_start, size_t *_len, loff_t i_size);
475 enum netfs_io_source (*prepare_read)(struct netfs_io_subrequest *subreq,
478 int (*read)(struct netfs_cache_resources *cres,
480 struct iov_iter *iter,
482 netfs_io_terminated_t term_func,
483 void *term_func_priv);
485 int (*prepare_write)(struct netfs_cache_resources *cres,
486 loff_t *_start, size_t *_len, loff_t i_size,
487 bool no_space_allocated_yet);
489 int (*write)(struct netfs_cache_resources *cres,
491 struct iov_iter *iter,
492 netfs_io_terminated_t term_func,
493 void *term_func_priv);
495 int (*query_occupancy)(struct netfs_cache_resources *cres,
496 loff_t start, size_t len, size_t granularity,
497 loff_t *_data_start, size_t *_data_len);
500 With a termination handler function pointer::
502 typedef void (*netfs_io_terminated_t)(void *priv,
503 ssize_t transferred_or_error,
506 The methods defined in the table are:
508 * ``end_operation()``
510 [Required] Called to clean up the resources at the end of the read request.
512 * ``expand_readahead()``
514 [Optional] Called at the beginning of a netfs_readahead() operation to allow
515 the cache to expand a request in either direction. This allows the cache to
516 size the request appropriately for the cache granularity.
518 The function is passed poiners to the start and length in its parameters,
519 plus the size of the file for reference, and adjusts the start and length
520 appropriately. It should return one of:
522 * ``NETFS_FILL_WITH_ZEROES``
523 * ``NETFS_DOWNLOAD_FROM_SERVER``
524 * ``NETFS_READ_FROM_CACHE``
525 * ``NETFS_INVALID_READ``
527 to indicate whether the slice should just be cleared or whether it should be
528 downloaded from the server or read from the cache - or whether slicing
529 should be given up at the current point.
533 [Required] Called to configure the next slice of a request. ->start and
534 ->len in the subrequest indicate where and how big the next slice can be;
535 the cache gets to reduce the length to match its granularity requirements.
539 [Required] Called to read from the cache. The start file offset is given
540 along with an iterator to read to, which gives the length also. It can be
541 given a hint requesting that it seek forward from that start position for
544 Also provided is a pointer to a termination handler function and private
545 data to pass to that function. The termination function should be called
546 with the number of bytes transferred or an error code, plus a flag
547 indicating whether the termination is definitely happening in the caller's
550 * ``prepare_write()``
552 [Required] Called to prepare a write to the cache to take place. This
553 involves checking to see whether the cache has sufficient space to honour
554 the write. ``*_start`` and ``*_len`` indicate the region to be written; the
555 region can be shrunk or it can be expanded to a page boundary either way as
556 necessary to align for direct I/O. i_size holds the size of the object and
557 is provided for reference. no_space_allocated_yet is set to true if the
558 caller is certain that no data has been written to that region - for example
559 if it tried to do a read from there already.
563 [Required] Called to write to the cache. The start file offset is given
564 along with an iterator to write from, which gives the length also.
566 Also provided is a pointer to a termination handler function and private
567 data to pass to that function. The termination function should be called
568 with the number of bytes transferred or an error code, plus a flag
569 indicating whether the termination is definitely happening in the caller's
572 * ``query_occupancy()``
574 [Required] Called to find out where the next piece of data is within a
575 particular region of the cache. The start and length of the region to be
576 queried are passed in, along with the granularity to which the answer needs
577 to be aligned. The function passes back the start and length of the data,
578 if any, available within that region. Note that there may be a hole at the
581 It returns 0 if some data was found, -ENODATA if there was no usable data
582 within the region or -ENOBUFS if there is no caching on this file.
584 Note that these methods are passed a pointer to the cache resource structure,
585 not the read request structure as they could be used in other situations where
586 there isn't a read request structure as well, such as writing dirty data to the
590 API Function Reference
591 ======================
593 .. kernel-doc:: include/linux/netfs.h
594 .. kernel-doc:: fs/netfs/buffered_read.c