1 ===============================
2 FS-CACHE NETWORK FILESYSTEM API
3 ===============================
5 There's an API by which a network filesystem can make use of the FS-Cache
6 facilities. This is based around a number of principles:
8 (1) Caches can store a number of different object types. There are two main
9 object types: indices and files. The first is a special type used by
10 FS-Cache to make finding objects faster and to make retiring of groups of
13 (2) Every index, file or other object is represented by a cookie. This cookie
14 may or may not have anything associated with it, but the netfs doesn't
17 (3) Barring the top-level index (one entry per cached netfs), the index
18 hierarchy for each netfs is structured according the whim of the netfs.
20 This API is declared in <linux/fscache.h>.
22 This document contains the following sections:
24 (1) Network filesystem definition
27 (4) Network filesystem (un)registration
29 (6) Index registration
30 (7) Data file registration
31 (8) Miscellaneous object registration
32 (9) Setting the data file size
33 (10) Page alloc/read/write
35 (12) Index and data file update
36 (13) Miscellaneous cookie operations
37 (14) Cookie unregistration
38 (15) Index invalidation
39 (16) Data file invalidation
40 (17) FS-Cache specific page flags.
43 =============================
44 NETWORK FILESYSTEM DEFINITION
45 =============================
47 FS-Cache needs a description of the network filesystem. This is specified
48 using a record of the following structure:
50 struct fscache_netfs {
53 struct fscache_cookie *primary_index;
57 This first two fields should be filled in before registration, and the third
58 will be filled in by the registration function; any other fields should just be
59 ignored and are for internal use only.
63 (1) The name of the netfs (used as the key in the toplevel index).
65 (2) The version of the netfs (if the name matches but the version doesn't, the
66 entire in-cache hierarchy for this netfs will be scrapped and begun
69 (3) The cookie representing the primary index will be allocated according to
70 another parameter passed into the registration function.
72 For example, kAFS (linux/fs/afs/) uses the following definitions to describe
75 struct fscache_netfs afs_cache_netfs = {
85 Indices are used for two purposes:
87 (1) To aid the finding of a file based on a series of keys (such as AFS's
88 "cell", "volume ID", "vnode ID").
90 (2) To make it easier to discard a subset of all the files cached based around
91 a particular key - for instance to mirror the removal of an AFS volume.
93 However, since it's unlikely that any two netfs's are going to want to define
94 their index hierarchies in quite the same way, FS-Cache tries to impose as few
95 restraints as possible on how an index is structured and where it is placed in
96 the tree. The netfs can even mix indices and data files at the same level, but
99 Each index entry consists of a key of indeterminate length plus some auxiliary
100 data, also of indeterminate length.
102 There are some limits on indices:
104 (1) Any index containing non-index objects should be restricted to a single
105 cache. Any such objects created within an index will be created in the
106 first cache only. The cache in which an index is created can be
107 controlled by cache tags (see below).
109 (2) The entry data must be atomically journallable, so it is limited to about
110 400 bytes at present. At least 400 bytes will be available.
112 (3) The depth of the index tree should be judged with care as the search
113 function is recursive. Too many layers will run the kernel out of stack.
120 To define an object, a structure of the following type should be filled out:
122 struct fscache_cookie_def
127 struct fscache_cache_tag *(*select_cache)(
128 const void *parent_netfs_data,
129 const void *cookie_netfs_data);
131 uint16_t (*get_key)(const void *cookie_netfs_data,
135 void (*get_attr)(const void *cookie_netfs_data,
138 uint16_t (*get_aux)(const void *cookie_netfs_data,
142 enum fscache_checkaux (*check_aux)(void *cookie_netfs_data,
146 void (*get_context)(void *cookie_netfs_data, void *context);
148 void (*put_context)(void *cookie_netfs_data, void *context);
150 void (*mark_pages_cached)(void *cookie_netfs_data,
151 struct address_space *mapping,
152 struct pagevec *cached_pvec);
154 void (*now_uncached)(void *cookie_netfs_data);
157 This has the following fields:
159 (1) The type of the object [mandatory].
161 This is one of the following values:
163 (*) FSCACHE_COOKIE_TYPE_INDEX
165 This defines an index, which is a special FS-Cache type.
167 (*) FSCACHE_COOKIE_TYPE_DATAFILE
169 This defines an ordinary data file.
171 (*) Any other value between 2 and 255
173 This defines an extraordinary object such as an XATTR.
175 (2) The name of the object type (NUL terminated unless all 16 chars are used)
178 (3) A function to select the cache in which to store an index [optional].
180 This function is invoked when an index needs to be instantiated in a cache
181 during the instantiation of a non-index object. Only the immediate index
182 parent for the non-index object will be queried. Any indices above that
183 in the hierarchy may be stored in multiple caches. This function does not
184 need to be supplied for any non-index object or any index that will only
187 If this function is not supplied or if it returns NULL then the first
188 cache in the parent's list will be chosen, or failing that, the first
189 cache in the master list.
191 (4) A function to retrieve an object's key from the netfs [mandatory].
193 This function will be called with the netfs data that was passed to the
194 cookie acquisition function and the maximum length of key data that it may
195 provide. It should write the required key data into the given buffer and
196 return the quantity it wrote.
198 (5) A function to retrieve attribute data from the netfs [optional].
200 This function will be called with the netfs data that was passed to the
201 cookie acquisition function. It should return the size of the file if
202 this is a data file. The size may be used to govern how much cache must
203 be reserved for this file in the cache.
205 If the function is absent, a file size of 0 is assumed.
207 (6) A function to retrieve auxiliary data from the netfs [optional].
209 This function will be called with the netfs data that was passed to the
210 cookie acquisition function and the maximum length of auxiliary data that
211 it may provide. It should write the auxiliary data into the given buffer
212 and return the quantity it wrote.
214 If this function is absent, the auxiliary data length will be set to 0.
216 The length of the auxiliary data buffer may be dependent on the key
217 length. A netfs mustn't rely on being able to provide more than 400 bytes
220 (7) A function to check the auxiliary data [optional].
222 This function will be called to check that a match found in the cache for
223 this object is valid. For instance with AFS it could check the auxiliary
224 data against the data version number returned by the server to determine
225 whether the index entry in a cache is still valid.
227 If this function is absent, it will be assumed that matching objects in a
228 cache are always valid.
230 If present, the function should return one of the following values:
232 (*) FSCACHE_CHECKAUX_OKAY - the entry is okay as is
233 (*) FSCACHE_CHECKAUX_NEEDS_UPDATE - the entry requires update
234 (*) FSCACHE_CHECKAUX_OBSOLETE - the entry should be deleted
236 This function can also be used to extract data from the auxiliary data in
237 the cache and copy it into the netfs's structures.
239 (8) A pair of functions to manage contexts for the completion callback
242 The cache read/write functions are passed a context which is then passed
243 to the I/O completion callback function. To ensure this context remains
244 valid until after the I/O completion is called, two functions may be
245 provided: one to get an extra reference on the context, and one to drop a
248 If the context is not used or is a type of object that won't go out of
249 scope, then these functions are not required. These functions are not
250 required for indices as indices may not contain data. These functions may
251 be called in interrupt context and so may not sleep.
253 (9) A function to mark a page as retaining cache metadata [optional].
255 This is called by the cache to indicate that it is retaining in-memory
256 information for this page and that the netfs should uncache the page when
257 it has finished. This does not indicate whether there's data on the disk
258 or not. Note that several pages at once may be presented for marking.
260 The PG_fscache bit is set on the pages before this function would be
261 called, so the function need not be provided if this is sufficient.
263 This function is not required for indices as they're not permitted data.
265 (10) A function to unmark all the pages retaining cache metadata [mandatory].
267 This is called by FS-Cache to indicate that a backing store is being
268 unbound from a cookie and that all the marks on the pages should be
269 cleared to prevent confusion. Note that the cache will have torn down all
270 its tracking information so that the pages don't need to be explicitly
273 This function is not required for indices as they're not permitted data.
276 ===================================
277 NETWORK FILESYSTEM (UN)REGISTRATION
278 ===================================
280 The first step is to declare the network filesystem to the cache. This also
281 involves specifying the layout of the primary index (for AFS, this would be the
284 The registration function is:
286 int fscache_register_netfs(struct fscache_netfs *netfs);
288 It just takes a pointer to the netfs definition. It returns 0 or an error as
291 For kAFS, registration is done as follows:
293 ret = fscache_register_netfs(&afs_cache_netfs);
295 The last step is, of course, unregistration:
297 void fscache_unregister_netfs(struct fscache_netfs *netfs);
304 FS-Cache permits the use of more than one cache. To permit particular index
305 subtrees to be bound to particular caches, the second step is to look up cache
306 representation tags. This step is optional; it can be left entirely up to
307 FS-Cache as to which cache should be used. The problem with doing that is that
308 FS-Cache will always pick the first cache that was registered.
310 To get the representation for a named tag:
312 struct fscache_cache_tag *fscache_lookup_cache_tag(const char *name);
314 This takes a text string as the name and returns a representation of a tag. It
315 will never return an error. It may return a dummy tag, however, if it runs out
316 of memory; this will inhibit caching with this tag.
318 Any representation so obtained must be released by passing it to this function:
320 void fscache_release_cache_tag(struct fscache_cache_tag *tag);
322 The tag will be retrieved by FS-Cache when it calls the object definition
323 operation select_cache().
330 The third step is to inform FS-Cache about part of an index hierarchy that can
331 be used to locate files. This is done by requesting a cookie for each index in
332 the path to the file:
334 struct fscache_cookie *
335 fscache_acquire_cookie(struct fscache_cookie *parent,
336 const struct fscache_object_def *def,
339 This function creates an index entry in the index represented by parent,
340 filling in the index entry by calling the operations pointed to by def.
342 Note that this function never returns an error - all errors are handled
343 internally. It may, however, return NULL to indicate no cookie. It is quite
344 acceptable to pass this token back to this function as the parent to another
345 acquisition (or even to the relinquish cookie, read page and write page
346 functions - see below).
348 Note also that no indices are actually created in a cache until a non-index
349 object needs to be created somewhere down the hierarchy. Furthermore, an index
350 may be created in several different caches independently at different times.
351 This is all handled transparently, and the netfs doesn't see any of it.
353 For example, with AFS, a cell would be added to the primary index. This index
354 entry would have a dependent inode containing a volume location index for the
355 volume mappings within this cell:
358 fscache_acquire_cookie(afs_cache_netfs.primary_index,
359 &afs_cell_cache_index_def,
362 Then when a volume location was accessed, it would be entered into the cell's
363 index and an inode would be allocated that acts as a volume type and hash chain
367 fscache_acquire_cookie(cell->cache,
368 &afs_vlocation_cache_index_def,
371 And then a particular flavour of volume (R/O for example) could be added to
372 that index, creating another index for vnodes (AFS inode equivalents):
375 fscache_acquire_cookie(vlocation->cache,
376 &afs_volume_cache_index_def,
380 ======================
381 DATA FILE REGISTRATION
382 ======================
384 The fourth step is to request a data file be created in the cache. This is
385 identical to index cookie acquisition. The only difference is that the type in
386 the object definition should be something other than index type.
389 fscache_acquire_cookie(volume->cache,
390 &afs_vnode_cache_object_def,
394 =================================
395 MISCELLANEOUS OBJECT REGISTRATION
396 =================================
398 An optional step is to request an object of miscellaneous type be created in
399 the cache. This is almost identical to index cookie acquisition. The only
400 difference is that the type in the object definition should be something other
401 than index type. Whilst the parent object could be an index, it's more likely
402 it would be some other type of object such as a data file.
405 fscache_acquire_cookie(vnode->cache,
406 &afs_xattr_cache_object_def,
409 Miscellaneous objects might be used to store extended attributes or directory
413 ==========================
414 SETTING THE DATA FILE SIZE
415 ==========================
417 The fifth step is to set the physical attributes of the file, such as its size.
418 This doesn't automatically reserve any space in the cache, but permits the
419 cache to adjust its metadata for data tracking appropriately:
421 int fscache_attr_changed(struct fscache_cookie *cookie);
423 The cache will return -ENOBUFS if there is no backing cache or if there is no
424 space to allocate any extra metadata required in the cache. The attributes
425 will be accessed with the get_attr() cookie definition operation.
427 Note that attempts to read or write data pages in the cache over this size may
428 be rebuffed with -ENOBUFS.
430 This operation schedules an attribute adjustment to happen asynchronously at
431 some point in the future, and as such, it may happen after the function returns
432 to the caller. The attribute adjustment excludes read and write operations.
435 =====================
436 PAGE READ/ALLOC/WRITE
437 =====================
439 And the sixth step is to store and retrieve pages in the cache. There are
440 three functions that are used to do this.
444 (1) A page should not be re-read or re-allocated without uncaching it first.
446 (2) A read or allocated page must be uncached when the netfs page is released
449 (3) A page should only be written to the cache if previous read or allocated.
451 This permits the cache to maintain its page tracking in proper order.
457 Firstly, the netfs should ask FS-Cache to examine the caches and read the
458 contents cached for a particular page of a particular file if present, or else
459 allocate space to store the contents if not:
462 void (*fscache_rw_complete_t)(struct page *page,
466 int fscache_read_or_alloc_page(struct fscache_cookie *cookie,
468 fscache_rw_complete_t end_io_func,
472 The cookie argument must specify a cookie for an object that isn't an index,
473 the page specified will have the data loaded into it (and is also used to
474 specify the page number), and the gfp argument is used to control how any
475 memory allocations made are satisfied.
477 If the cookie indicates the inode is not cached:
479 (1) The function will return -ENOBUFS.
481 Else if there's a copy of the page resident in the cache:
483 (1) The mark_pages_cached() cookie operation will be called on that page.
485 (2) The function will submit a request to read the data from the cache's
486 backing device directly into the page specified.
488 (3) The function will return 0.
490 (4) When the read is complete, end_io_func() will be invoked with:
492 (*) The netfs data supplied when the cookie was created.
494 (*) The page descriptor.
496 (*) The context argument passed to the above function. This will be
497 maintained with the get_context/put_context functions mentioned above.
499 (*) An argument that's 0 on success or negative for an error code.
501 If an error occurs, it should be assumed that the page contains no usable
504 end_io_func() will be called in process context if the read is results in
505 an error, but it might be called in interrupt context if the read is
508 Otherwise, if there's not a copy available in cache, but the cache may be able
511 (1) The mark_pages_cached() cookie operation will be called on that page.
513 (2) A block may be reserved in the cache and attached to the object at the
516 (3) The function will return -ENODATA.
518 This function may also return -ENOMEM or -EINTR, in which case it won't have
519 read any data from the cache.
525 Alternatively, if there's not expected to be any data in the cache for a page
526 because the file has been extended, a block can simply be allocated instead:
528 int fscache_alloc_page(struct fscache_cookie *cookie,
532 This is similar to the fscache_read_or_alloc_page() function, except that it
533 never reads from the cache. It will return 0 if a block has been allocated,
534 rather than -ENODATA as the other would. One or the other must be performed
535 before writing to the cache.
537 The mark_pages_cached() cookie operation will be called on the page if
544 Secondly, if the netfs changes the contents of the page (either due to an
545 initial download or if a user performs a write), then the page should be
546 written back to the cache:
548 int fscache_write_page(struct fscache_cookie *cookie,
552 The cookie argument must specify a data file cookie, the page specified should
553 contain the data to be written (and is also used to specify the page number),
554 and the gfp argument is used to control how any memory allocations made are
557 The page must have first been read or allocated successfully and must not have
558 been uncached before writing is performed.
560 If the cookie indicates the inode is not cached then:
562 (1) The function will return -ENOBUFS.
564 Else if space can be allocated in the cache to hold this page:
566 (1) PG_fscache_write will be set on the page.
568 (2) The function will submit a request to write the data to cache's backing
569 device directly from the page specified.
571 (3) The function will return 0.
573 (4) When the write is complete PG_fscache_write is cleared on the page and
574 anyone waiting for that bit will be woken up.
576 Else if there's no space available in the cache, -ENOBUFS will be returned. It
577 is also possible for the PG_fscache_write bit to be cleared when no write took
578 place if unforeseen circumstances arose (such as a disk error).
580 Writing takes place asynchronously.
586 A facility is provided to read several pages at once, as requested by the
587 readpages() address space operation:
589 int fscache_read_or_alloc_pages(struct fscache_cookie *cookie,
590 struct address_space *mapping,
591 struct list_head *pages,
593 fscache_rw_complete_t end_io_func,
597 This works in a similar way to fscache_read_or_alloc_page(), except:
599 (1) Any page it can retrieve data for is removed from pages and nr_pages and
600 dispatched for reading to the disk. Reads of adjacent pages on disk may
601 be merged for greater efficiency.
603 (2) The mark_pages_cached() cookie operation will be called on several pages
604 at once if they're being read or allocated.
606 (3) If there was an general error, then that error will be returned.
608 Else if some pages couldn't be allocated or read, then -ENOBUFS will be
611 Else if some pages couldn't be read but were allocated, then -ENODATA will
614 Otherwise, if all pages had reads dispatched, then 0 will be returned, the
615 list will be empty and *nr_pages will be 0.
617 (4) end_io_func will be called once for each page being read as the reads
618 complete. It will be called in process context if error != 0, but it may
619 be called in interrupt context if there is no error.
621 Note that a return of -ENODATA, -ENOBUFS or any other error does not preclude
622 some of the pages being read and some being allocated. Those pages will have
623 been marked appropriately and will need uncaching.
630 To uncache a page, this function should be called:
632 void fscache_uncache_page(struct fscache_cookie *cookie,
635 This function permits the cache to release any in-memory representation it
636 might be holding for this netfs page. This function must be called once for
637 each page on which the read or write page functions above have been called to
638 make sure the cache's in-memory tracking information gets torn down.
640 Note that pages can't be explicitly deleted from the a data file. The whole
641 data file must be retired (see the relinquish cookie function below).
643 Furthermore, note that this does not cancel the asynchronous read or write
644 operation started by the read/alloc and write functions, so the page
645 invalidation functions must use:
647 bool fscache_check_page_write(struct fscache_cookie *cookie,
650 to see if a page is being written to the cache, and:
652 void fscache_wait_on_page_write(struct fscache_cookie *cookie,
655 to wait for it to finish if it is.
658 When releasepage() is being implemented, a special FS-Cache function exists to
659 manage the heuristics of coping with vmscan trying to eject pages, which may
660 conflict with the cache trying to write pages to the cache (which may itself
661 need to allocate memory):
663 bool fscache_maybe_release_page(struct fscache_cookie *cookie,
667 This takes the netfs cookie, and the page and gfp arguments as supplied to
668 releasepage(). It will return false if the page cannot be released yet for
669 some reason and if it returns true, the page has been uncached and can now be
672 To make a page available for release, this function may wait for an outstanding
673 storage request to complete, or it may attempt to cancel the storage request -
674 in which case the page will not be stored in the cache this time.
677 BULK INODE PAGE UNCACHE
678 -----------------------
680 A convenience routine is provided to perform an uncache on all the pages
681 attached to an inode. This assumes that the pages on the inode correspond on a
682 1:1 basis with the pages in the cache.
684 void fscache_uncache_all_inode_pages(struct fscache_cookie *cookie,
685 struct inode *inode);
687 This takes the netfs cookie that the pages were cached with and the inode that
688 the pages are attached to. This function will wait for pages to finish being
689 written to the cache and for the cache to finish with the page generally. No
693 ==========================
694 INDEX AND DATA FILE UPDATE
695 ==========================
697 To request an update of the index data for an index or other object, the
698 following function should be called:
700 void fscache_update_cookie(struct fscache_cookie *cookie);
702 This function will refer back to the netfs_data pointer stored in the cookie by
703 the acquisition function to obtain the data to write into each revised index
704 entry. The update method in the parent index definition will be called to
707 Note that partial updates may happen automatically at other times, such as when
708 data blocks are added to a data file object.
711 ===============================
712 MISCELLANEOUS COOKIE OPERATIONS
713 ===============================
715 There are a number of operations that can be used to control cookies:
719 int fscache_pin_cookie(struct fscache_cookie *cookie);
720 void fscache_unpin_cookie(struct fscache_cookie *cookie);
722 These operations permit data cookies to be pinned into the cache and to
723 have the pinning removed. They are not permitted on index cookies.
725 The pinning function will return 0 if successful, -ENOBUFS in the cookie
726 isn't backed by a cache, -EOPNOTSUPP if the cache doesn't support pinning,
727 -ENOSPC if there isn't enough space to honour the operation, -ENOMEM or
728 -EIO if there's any other problem.
730 (*) Data space reservation:
732 int fscache_reserve_space(struct fscache_cookie *cookie, loff_t size);
734 This permits a netfs to request cache space be reserved to store up to the
735 given amount of a file. It is permitted to ask for more than the current
736 size of the file to allow for future file expansion.
738 If size is given as zero then the reservation will be cancelled.
740 The function will return 0 if successful, -ENOBUFS in the cookie isn't
741 backed by a cache, -EOPNOTSUPP if the cache doesn't support reservations,
742 -ENOSPC if there isn't enough space to honour the operation, -ENOMEM or
743 -EIO if there's any other problem.
745 Note that this doesn't pin an object in a cache; it can still be culled to
746 make space if it's not in use.
749 =====================
750 COOKIE UNREGISTRATION
751 =====================
753 To get rid of a cookie, this function should be called.
755 void fscache_relinquish_cookie(struct fscache_cookie *cookie,
758 If retire is non-zero, then the object will be marked for recycling, and all
759 copies of it will be removed from all active caches in which it is present.
760 Not only that but all child objects will also be retired.
762 If retire is zero, then the object may be available again when next the
763 acquisition function is called. Retirement here will overrule the pinning on a
766 One very important note - relinquish must NOT be called for a cookie unless all
767 the cookies for "child" indices, objects and pages have been relinquished
775 There is no direct way to invalidate an index subtree. To do this, the caller
776 should relinquish and retire the cookie they have, and then acquire a new one.
779 ======================
780 DATA FILE INVALIDATION
781 ======================
783 Sometimes it will be necessary to invalidate an object that contains data.
784 Typically this will be necessary when the server tells the netfs of a foreign
785 change - at which point the netfs has to throw away all the state it had for an
786 inode and reload from the server.
788 To indicate that a cache object should be invalidated, the following function
791 void fscache_invalidate(struct fscache_cookie *cookie);
793 This can be called with spinlocks held as it defers the work to a thread pool.
794 All extant storage, retrieval and attribute change ops at this point are
795 cancelled and discarded. Some future operations will be rejected until the
796 cache has had a chance to insert a barrier in the operations queue. After
797 that, operations will be queued again behind the invalidation operation.
799 The invalidation operation will perform an attribute change operation and an
800 auxiliary data update operation as it is very likely these will have changed.
802 Using the following function, the netfs can wait for the invalidation operation
803 to have reached a point at which it can start submitting ordinary operations
806 void fscache_wait_on_invalidate(struct fscache_cookie *cookie);
809 ===========================
810 FS-CACHE SPECIFIC PAGE FLAG
811 ===========================
813 FS-Cache makes use of a page flag, PG_private_2, for its own purpose. This is
814 given the alternative name PG_fscache.
816 PG_fscache is used to indicate that the page is known by the cache, and that
817 the cache must be informed if the page is going to go away. It's an indication
818 to the netfs that the cache has an interest in this page, where an interest may
819 be a pointer to it, resources allocated or reserved for it, or I/O in progress
822 The netfs can use this information in methods such as releasepage() to
823 determine whether it needs to uncache a page or update it.
825 Furthermore, if this bit is set, releasepage() and invalidatepage() operations
826 will be called on a page to get rid of it, even if PG_private is not set. This
827 allows caching to attempted on a page before read_cache_pages() to be called
828 after fscache_read_or_alloc_pages() as the former will try and release pages it
829 was given under certain circumstances.
831 This bit does not overlap with such as PG_private. This means that FS-Cache
832 can be used with a filesystem that uses the block buffering code.
834 There are a number of operations defined on this flag:
836 int PageFsCache(struct page *page);
837 void SetPageFsCache(struct page *page)
838 void ClearPageFsCache(struct page *page)
839 int TestSetPageFsCache(struct page *page)
840 int TestClearPageFsCache(struct page *page)
842 These functions are bit test, bit set, bit clear, bit test and set and bit
843 test and clear operations on PG_fscache.