docs/user/drivers/fs_modules.dox

   1 /*
   2  * Copyright 2007 Haiku Inc. All rights reserved.
   3  * Distributed under the terms of the MIT License.
   4  *
   5  * Authors:
   6  *   Ingo Weinhold
   7  */
   8
   9  /*!
  10         \page fs_modules File System Modules
  11
  12         To support a particular file system (FS), a kernel module implementing a
  13         special interface (\c file_system_module_info defined in \c <fs_interface.h>)
  14         has to be provided. As for any other module the \c std_ops() hook is invoked
  15         with \c B_MODULE_INIT directly after the FS module has been loaded by the
  16         kernel, and with \c B_MODULE_UNINIT before it is unloaded, thus providing
  17         a simple mechanism for one-time module initializations. The same module is
  18         used for accessing any volume of that FS type.
  19
  20         \section objects File System Objects
  21
  22         There are several types of objects a FS module has to deal with directly or
  23         indirectly:
  24
  25         - A \em volume is an instance of a file system. For a disk-based file
  26           system it corresponds to a disk, partition, or disk image file. When
  27           mounting a volume the virtual file system layer (VFS) assigns a unique
  28           number (ID, of type \c dev_t) to it and a handle (type \c void*) provided
  29           by the file system. The VFS creates an instance of struct \c fs_volume
  30           that stores these two, an operation vector (\c fs_volume_ops), and other
  31           volume related items.
  32           Whenever the FS is asked to perform an operation the \c fs_volume object
  33           is supplied, and whenever the FS requests a volume-related service from
  34           the kernel, it also has to pass the \c fs_volume object or, in some cases,
  35           just the volume ID.
  36           Normally the handle is a pointer to a data structure the FS allocates to
  37           associate data with the volume.
  38
  39         - A \em node is contained by a volume. It can be of type file, directory, or
  40           symbolic link (symlink). Just as volumes nodes are associated with an ID
  41           (type \c ino_t) and, if in use, also with a handle (type \c void*).
  42           As for volumes the VFS creates an instance of a structure (\c fs_vnode)
  43           for each node in use, storing the FS's handle for the node and an
  44           operation vector (\c fs_vnode_ops).
  45           Unlike the volume ID the node ID is defined by the FS.
  46           It often has a meaning to the FS, e.g. file systems using inodes might
  47           choose the inode number corresponding to the node. As long as the volume
  48           is mounted and the node is known to the VFS, its node ID must not change.
  49           The node handle is again a pointer to a data structure allocated by the
  50           FS.
  51
  52         - A \em vnode (VFS node) is the VFS representation of a node. A volume may
  53           contain a great number of nodes, but at a time only a few are represented
  54           by vnodes, usually only those that are currently in use (sometimes a few
  55           more).
  56
  57         - An \em entry (directory entry) belongs to a directory, has a name, and
  58           refers to a node. It is important to understand the difference between
  59           entries and nodes: A node doesn't have a name, only the entries that refer
  60           to it have. If a FS supports to have more than one entry refer to a single
  61           node, it is also said to support "hard links". It is possible that no
  62           entry refers to a node. This happens when a node (e.g. a file) is still
  63           open, but the last entry referring to it has been removed (the node will
  64           be deleted when the it is closed). While entries are to be understood as
  65           independent entities, the FS interface does not use IDs or handles to
  66           refer to them; it always uses directory and entry name pairs to do that.
  67
  68         - An \em attribute is a named and typed data container belonging to a node.
  69           A node may have any number of attributes; they are organized in a
  70           (depending on the FS, virtual or actually existing) attribute directory,
  71           through which one can iterate.
  72
  73         - An \em index is supposed to provide fast searching capabilities for
  74           attributes with a certain name. A volume's index directory allows for
  75           iterating through the indices.
  76
  77         - A \em query is a fully virtual object for searching for entries via an
  78           expression matching entry name, node size, node modification date, and/or
  79           node attributes. The mechanism of retrieving the entries found by a query
  80           is similar to that for reading a directory contents. A query can be live
  81           in which case the creator of the query is notified by the FS whenever an
  82           entry no longer matches the query expression or starts matching.
  83
  84
  85         \section concepts Generic Concepts
  86
  87         A FS module has to (or can) provide quite a lot of hook functions. There are
  88         a few concepts that apply to several groups of them:
  89
  90         - <em>Opening, Closing, and Cookies</em>: Many FS objects can be opened and
  91           closed, namely nodes in general, directories, attribute directories,
  92           attributes, the index directory, and queries. In each case there are three
  93           hook functions: <tt>open*()</tt>, <tt>close*()</tt>, and
  94           <tt>free*_cookie()</tt>. The <tt>open*()</tt> hook is passed all that is
  95           needed to identify the object to be opened and, in some cases, additional
  96           parameters e.g. specifying a particular opening mode. The implementation
  97           is required to return a cookie (type \c void*), usually a pointer to a
  98           data structure the FS allocates. In some cases (e.g.
  99           when an iteration state is associated with the cookie) a new cookie must
 100           be allocated for each instance of opening the object. The cookie is passed
 101           to all hooks that operate on a thusly opened object. The <tt>close*()</tt>
 102           hook is invoked to signal that the cookie is to be closed. At this point
 103           the cookie might still be in use. Blocking FS hooks (e.g. blocking
 104           read/write operations) using the same cookie have to be unblocked. When
 105           the cookie stops being in use the <tt>free*_cookie()</tt> hook is called;
 106           it has to free the cookie.
 107
 108         - <em>Entry Iteration</em>: For the FS objects serving as containers for
 109           other objects, i.e. directories, attribute directories, the index
 110           directory, and queries, the cookie mechanism is used for a stateful
 111           iteration through the contained objects. The <tt>read_*()</tt> hook reads
 112           the next one or more entries into a <tt>struct dirent</tt> buffer. The
 113           <tt>rewind_*()</tt> hook resets the iteration state to the first entry.
 114
 115         - <em>Stat Information</em>: In case of nodes, attributes, and indices
 116           detailed information about an object are requested via a
 117           <tt>read*_stat()</tt> hook and must be written into a <tt>struct stat</tt>
 118           buffer.
 119
 120
 121         \section vnodes VNodes
 122
 123         A vnode is the VFS representation of a node. As soon as an access to a node
 124         is requested, the VFS creates a corresponding vnode. The requesting entity
 125         gets a reference to the vnode for the time it works with the vnode and
 126         releases the reference when done. When the last reference to a vnode has
 127         been surrendered, the vnode is unused and the VFS can decide to destroy it
 128         (usually it is cached for a while longer).
 129
 130         When the VFS creates a vnode, it invokes the volume's
 131         \link fs_volume_ops::get_vnode get_vnode() \endlink
 132         hook to let it create the respective node handle (unless the FS requests the
 133         creation of the vnode explicitely by calling publish_vnode()). That's the
 134         only hook that specifies a node by ID; all other node-related hooks are
 135         defined in the respective node's operation vector and they are passed the
 136         respective \c fs_vnode object. When the VFS deletes the vnode, it invokes
 137         the nodes's \link fs_vnode_ops::put_vnode put_vnode() \endlink
 138         hook or, if the node was marked removed,
 139         \link fs_vnode_ops::remove_vnode remove_vnode() \endlink.
 140
 141         There are only four FS hooks through which the VFS gains knowledge of the
 142         existence of a node. The first one is the
 143         \link file_system_module_info::mount mount() \endlink
 144         hook. It is supposed to call \c publish_vnode() for the root node of the
 145         volume and return its ID. The second one is the
 146         \link fs_vnode_ops::lookup lookup() \endlink
 147         hook. Given a \c fs_vnode object of a directory and an entry name, it is
 148         supposed to call \c get_vnode() for the node the entry refers to and return
 149         the node ID.
 150         The remaining two hooks,
 151         \link fs_vnode_ops::read_dir read_dir() \endlink and
 152         \link fs_volume_ops::read_query read_query() \endlink,
 153         both return entries in a <tt>struct dirent</tt> structure, which also
 154         contains the ID of the node the entry refers to.
 155
 156
 157         \section mandatory_hooks Mandatory Hooks
 158
 159         Which hooks a FS module should provide mainly depends on what functionality
 160         it features. E.g. a FS without support for attribute, indices, and/or
 161         queries can omit the respective hooks (i.e. set them to \c NULL in the
 162         module, \c fs_volume_ops, and \c fs_vnode_ops structure). Some hooks are
 163         mandatory, though. A minimal read-only FS module must implement:
 164
 165         - \link file_system_module_info::mount mount() \endlink and
 166           \link fs_volume_ops::unmount unmount() \endlink:
 167           Mounting and unmounting a volume is required for pretty obvious reasons.
 168
 169         - \link fs_vnode_ops::lookup lookup() \endlink:
 170           The VFS uses this hook to resolve path names. It is probably one of the
 171           most frequently invoked hooks.
 172
 173         - \link fs_volume_ops::get_vnode get_vnode() \endlink and
 174           \link fs_vnode_ops::put_vnode put_vnode() \endlink:
 175           Create respectively destroy the FS's private node handle when
 176           the VFS creates/deletes the vnode for a particular node.
 177
 178         - \link fs_vnode_ops::read_stat read_stat() \endlink:
 179           Return a <tt>struct stat</tt> info for the given node, consisting of the
 180           type and size of the node, its owner and access permissions, as well as
 181           certain access times.
 182
 183         - \link fs_vnode_ops::open open() \endlink,
 184           \link fs_vnode_ops::close close() \endlink, and
 185           \link fs_vnode_ops::free_cookie free_cookie() \endlink:
 186           Open and close a node as explained in \ref concepts.
 187
 188         - \link fs_vnode_ops::read read() \endlink:
 189           Read data from an opened node (file). Even if the FS does not feature
 190           files, the hook has to be present anyway; it should return an error in
 191           this case.
 192
 193         - \link fs_vnode_ops::open_dir open_dir() \endlink,
 194           \link fs_vnode_ops::close_dir close_dir() \endlink, and
 195           \link fs_vnode_ops::free_dir_cookie free_dir_cookie() \endlink:
 196           Open and close a directory for entry iteration as explained in
 197           \ref concepts.
 198
 199         - \link fs_vnode_ops::read_dir read_dir() \endlink and
 200           \link fs_vnode_ops::rewind_dir rewind_dir() \endlink:
 201           Read the next entry/entries from a directory, respectively reset the
 202           iterator to the first entry, as explained in \ref concepts.
 203
 204         Although not strictly mandatory, a FS should additionally implement the
 205         following hooks:
 206
 207         - \link fs_volume_ops::read_fs_info read_fs_info() \endlink:
 208           Return general information about the volume, e.g. total and free size, and
 209           what special features (attributes, MIME types, queries) the volume/FS
 210           supports.
 211
 212         - \link fs_vnode_ops::read_symlink read_symlink() \endlink:
 213           Read the value of a symbolic link. Needed only, if the FS and volume
 214           support symbolic links at all. If absent symbolic links stored on the
 215           volume won't be interpreted.
 216
 217         - \link fs_vnode_ops::access access() \endlink:
 218           Return whether the current user has the given access permissions for a
 219           node. If the hook is absent the user is considered to have all
 220           permissions.
 221
 222
 223         \section permissions Checking Access Permission
 224
 225         While there is the \link fs_vnode_ops::access access() \endlink hook
 226         that explicitly checks access permission for a node, it is not used by the
 227         VFS to check access permissions for the other hooks. This has two reasons:
 228         It could be cheaper for the FS to do that in the respective hook (at least
 229         it's not more expensive), and the FS can make sure that there are no race
 230         conditions between the check and the start of the operation for the hook.
 231         The downside is that in most hooks the FS has to check those permissions.
 232         It is possible to simplify things a bit, though:
 233
 234         - For operations that require the file system object in question (node,
 235           directory, index, attribute, attribute directory, query) to be open, most
 236           of the checks can already be done in the respective <tt>open*()</tt> hook.
 237           E.g. in fs_vnode_ops::read() or fs_vnode_ops::write() one only has to
 238           check, if the file has been opened for reading/writing, not whether the
 239           current process has the respective permissions.
 240
 241         - The core of the fs_vnode_ops::access() hook can be moved into a private
 242           function that can be easily reused in other hooks to check the permissions
 243           for the respective operations. In most cases this will reduce permission
 244           checking to one or two additional "if"s in the hooks where it is required.
 245
 246
 247         \section node_monitoring Node Monitoring
 248
 249         One of the nice features of Haiku's API is an easy way to monitor
 250         directories or nodes for changes. That is one can register for watching a
 251         given node for certain modification events and will get a notification
 252         message whenever one of those events occurs. While other parts of the
 253         operating system do the actual notification message delivery, it is the
 254         responsibility of each file system to announce changes. It has to use the
 255         following functions to do that:
 256
 257         - notify_entry_created(): A directory entry has been created.
 258
 259         - notify_entry_removed(): A directory entry has been removed.
 260
 261         - notify_entry_moved(): A directory entry has been renamed and/or moved
 262           to another directory.
 263
 264         - notify_stat_changed(): One or more members of the stat data for node have
 265           changed. E.g. the \c st_size member changes when the file is truncated or
 266           data have been written to it beyond its former size. The modification time
 267           (\c st_mtime) changes whenever a node is write-accessed. To avoid a flood
 268           of messages for small and frequent write operations on an open file the
 269           file system can limit the number of notifications and mark them with the
 270           B_WATCH_INTERIM_STAT flag. When closing a modified file a notification
 271           without that flag should be issued.
 272
 273
 274         - notify_attribute_changed(): An attribute of a node has been added,
 275           removed, or changed.
 276
 277         If the file system supports queries, it needs to call the following
 278         functions to make live queries work:
 279
 280         - notify_query_entry_created(): A change caused an entry that didn't match
 281           the query predicate before to match now.
 282
 283         - notify_query_entry_removed(): A change caused an entry that matched
 284           the query predicate before to no longer match.
 285
 286
 287         \section caches Caches
 288
 289         The Haiku kernel provides three kinds of caches that can be used by a
 290         file system implementation to speed up file system operations:
 291
 292         - <em>Block cache</em>: Interesting for disk-based file systems. The device
 293           the file system volume is located on is considered to be divided in
 294           equally-sized blocks of data that can be accessed via the block cache API
 295           (e.g. block_cache_get() and block_cache_put()). As long as the system has
 296           enough memory the block cache will keep all blocks that have been accessed
 297           in memory, thus allowing further accesses to be very fast.
 298           The block cache also has transaction support, which is of interest for
 299           journaled file systems.
 300
 301         - <em>File cache</em>: Stores file contents. The FS can decide to create
 302           a file cache for any of its files. The fs_vnode_ops::read() and
 303           fs_vnode_ops::write() hooks can then simply be implemented by calling the
 304           file_cache_read() respectively file_cache_write() function, which will
 305           read the data from/write the data to the file cache. For reading uncached
 306           data or writing back cached data to the file, the file cache will invoke
 307           the fs_vnode_ops::io() hook.
 308           Only files for which the file cache is used, can be memory mapped (cf.
 309           mmap())
 310
 311         - <em>Entry cache</em>: Can be used to speed up resolving paths. Normally
 312           the VFS will call the fs_vnode_ops::lookup() hook for each element of the
 313           path to be resolved, which, depending on the file system, can be more or
 314           less expensive. When the FS uses the entry cache, those calls will be
 315           avoided most of the time. All the file system has to do is invoke the
 316           entry_cache_add() function when it encounters an entry that might not yet
 317           be known to the entry cache and entry_cache_remove() when a directory
 318           entry has been removed.
 319           The entry cache can also be used for negative caching. If the file system
 320           determines that the requested entry is not present during a lookup, it can
 321           cache this lookup failure by calling entry_cache_add_missing(). Further
 322           calls to fs_vnode_ops::lookup() for the missing entry will then be
 323           avoided.
 324           Note that it is safe to call entry_cache_add() and
 325           entry_cache_add_missing() with the same directory/name pair previously
 326           given to either function to update a cache entry, without needing to call
 327           entry_cache_remove() first. It is also safe to call entry_cache_remove()
 328           for pairs that have never been added to the cache.
 329 */
 330
 331 // TODO:
 332 //      * FS layers