docs/develop/kernel/fs/node_monitoring.html

   1 <html>
   2 <body bgcolor=white>
   3         <h1>Node Monitoring</h1>
   4         <h6>
   5                 Creation Date: January 16, 2003<br>
   6                 Author(s): Axel D&ouml;rfler
   7         </h6>
   8
   9         This document describes the feature of the BeOS kernel to monitor nodes. First,
  10         there is an explanation of what kind of functionality we have to reproduce (along
  11         with the higher level API), then we will present the implementation in OpenBeOS.
  12
  13         <h2>Requirements - Exported Functionality in BeOS</h2>
  14
  15         From user-level, BeOS exports the following API as found in the storage/NodeMonitor.h
  16         header file:
  17
  18         <pre>
  19         status_t watch_node(const node_ref *node,
  20                 uint32 flags,
  21                 BMessenger target);
  22
  23         status_t watch_node(const node_ref *node,
  24                 uint32 flags,
  25                 const BHandler *handler,
  26                 const BLooper *looper = NULL);
  27
  28         status_t stop_watching(BMessenger target);
  29
  30         status_t stop_watching(const BHandler *handler,
  31                 const BLooper *looper = NULL);
  32         </pre>
  33
  34         The kernel also exports two other functions to be used from file system add-ons
  35         that causes the kernel to send out notification messages:
  36
  37         <pre>
  38         int notify_listener(int op, nspace_id nsid,
  39                 vnode_id vnida, vnode_id vnidb,
  40                 vnode_id vnidc, const char *name);
  41         int send_notification(port_id port, long token,
  42                 ulong what, long op, nspace_id nsida,
  43                 nspace_id nsidb, vnode_id vnida,
  44                 vnode_id vnidb, vnode_id vnidc,
  45                 const char *name);
  46         </pre>
  47
  48         <p>
  49         The latter is only used for live query updates, but is obviously called by
  50         the former. The port/token pair identify a unique BLooper/BHandler pair, and
  51         it used internally to address those high-level objects from the kernel.
  52         </p>
  53         <p>
  54         When a file system calls the <code>notify_listener()</code> function, it will have
  55         a look if there are monitors for that node which meet the specified constraints -
  56         and it will call <code>send_notification()</code> for every single message to be send.
  57         </p>
  58         <p>
  59         Each of the parameters <code>vnida - vnidc</code> has a dedicated meaning:
  60         <ul>
  61                 <li><b>vnida:</b> the parent directory of the "main" node</li>
  62                 <li><b>vnidb:</b> the target parent directory for a move</li>
  63                 <li><b>vnidc:</b> the node that has triggered the notification to be send</li>
  64         </ul>
  65         </p>
  66         <p>
  67         The flags parameter in <code>watch_node()</code> understands the following constants:
  68         </p>
  69         <ul>
  70                 <li><b>B_STOP_WATCHING</b><br>
  71                         watch_node() will stop to watch the specified node.</li>
  72                 <li><b>B_WATCH_NAME</b><br>
  73                         name changes are notified through a B_ENTRY_MOVED opcode.</li>
  74                 <li><b>B_WATCH_STAT</b><br>
  75                         changes to the node's stat structure are notified with a B_STAT_CHANGED code.</li>
  76                 <li><b>B_WATCH_ATTR</b><br>
  77                         attribute changes will cause a B_ATTR_CHANGED to be send.</li>
  78                 <li><b>B_WATCH_DIRECTORY</b><br>
  79                         notifies on changes made to the specified directory, i.e. B_ENTRY_REMOVED, B_ENTRY_CREATED</li>
  80                 <li><b>B_WATCH_ALL</b><br>
  81                         is a short-hand for the flags above.</li>
  82                 <li><b>B_WATCH_MOUNT</b><br>
  83                         causes B_DEVICE_MOUNTED and B_DEVICE_UNMOUNTED to be send.</li>
  84         </ul>
  85         <p>
  86         Node monitors are maintained per team - every team can have up to 4096 monitors, although
  87         there exists a private kernel call to raise this limit (for example, Tracker is using it
  88         intensively).
  89         </p>
  90         <p>
  91         The kernel is able to send the BMessages directly to the specified BLooper and BHandler;
  92         it achieves this using the application kit's token mechanism. The message is constructed
  93         manually in the kernel, it doesn't use any application kit services.
  94         </p>
  95         <br>
  96
  97         <h2>Meeting the Requirements in an Optimal Way - Implementation in OpenBeOS</h2>
  98
  99         <p>
 100         If you assume that every file operation could trigger a notification message to be send,
 101         it's clear that the node monitoring system must be optimized for sending messages. For
 102         every call to <code>notify_listener()</code>, the kernel must check if there are any
 103         monitors for the node that was updated.
 104         </p>
 105         <p>
 106         Those monitors are put into a hash table which has the device number and the vnode ID
 107         as keys. Each of the monitors maintains a list of listeners which specify which port/token
 108         pair should be notified for what change. Since the vnodes are created/deleted as needed
 109         from the kernel, the node monitor is maintained independently from them; a simple pointer
 110         from a vnode to its monitor is not possible.
 111         </p>
 112         <p>
 113         The main structures that are involved in providing the node monitoring functionality
 114         look like this:
 115         </p>
 116
 117         <pre>
 118         struct monitor_listener {
 119                 monitor_listener        *next;
 120                 monitor_listener        *prev;
 121                 list_link               monitor_link;
 122                 port_id                 port;
 123                 int32                   token;
 124                 uint32                  flags;
 125                 node_monitor            *monitor;
 126         };
 127
 128         struct node_monitor {
 129                 node_monitor            *next;
 130                 mount_id                device;
 131                 vnode_id                node;
 132                 struct list             listeners;
 133         };
 134         </pre>
 135
 136         <p>
 137         The relevant part of the I/O context structure is this:
 138         </p>
 139
 140         <pre>
 141         struct io_context {
 142                 ...
 143                 struct list             node_monitors;
 144                 uint32                  num_monitors;
 145                 uint32                  max_monitors;
 146         };
 147         </pre>
 148
 149         <p>
 150         If you call <code>watch_node()</code> on a file with a flags parameter unequal to
 151         B_STOP_WATCHING, the following will happen in the node monitor:
 152         </p>
 153         <ol>
 154                 <li>The <code>add_node_monitor()</code> function does a hash lookup for the
 155                         device/vnode pair. If there is no <code>node_monitor</code> yet for this pair,
 156                         a new one will be created.</li>
 157                 <li>The list of listeners is scanned for the provided port/token pair (the
 158                         BLooper/BHandler pointer will already be translated in user-space), and
 159                         the new flag is or'd to the old field, or a new <code>monitor_listener</code>
 160                         is created if necessary - in the latter case, the team's node monitor
 161                         counter is incremented.</li>
 162         </ol>
 163         <p>
 164         If it's called with B_STOP_WATCHING defined, the reverse operation take effect, and
 165         the <code>monitor</code> field is used to see if this monitor don't have any listeners
 166         anymore, in which case it will be removed.
 167         </p>
 168         <p>
 169         Note the presence of the <code>max_monitors</code> - there is no hard limit the kernel
 170         exposes to userland applications; the listeners are maintained in a doubly-linked list.
 171         </p>
 172         <p>
 173         If a team is shut down, all listeners from its I/O context will be removed - since every
 174         listener stores a pointer to its monitor, determining the monitors that can be removed
 175         because of this operation is very cheap.
 176         </p>
 177         <p>
 178         The <code>notify_listener()</code> also only does a hash lookup for the device/node
 179         pair it got from the file system, and sends out as many notifications as specified by
 180         the listeners of the monitor that belong to that node.
 181         </p>
 182         <p>
 183         If a node is deleted from the disk, the corresponding <code>node_monitor</code> and its
 184         listeners will be removed as well, to prevent watching a new file that accidently happen
 185         to have the same device/node pair (as is possible with BFS, for example).
 186         </p>
 187
 188         <br>
 189         <h2>Differences Between Both Implementations</h2>
 190
 191         <p>
 192         Although the aim was to create a completely compatible monitoring implementation,
 193         there are some notable differences between the two.
 194         </p>
 195         <p>
 196         BeOS reserves a certain number of slots for calls to <code>watch_node()</code> - each
 197         call to that function will use one slot, even if you call it twice for the same node.
 198         OpenBeOS, however, will always use one slot per node - you could call <code>watch_node()</code>
 199         several times, but you would waste only one slot.
 200         </p>
 201         <p>
 202         While this is an implementational detail, it also causes a change in behaviour for
 203         applications; in BeOS, applications will get one message for every <code>watch_node()</code>
 204         call, in OpenBeOS, you'll get only one message per node. If an application relies
 205         on this strange behaviour of the BeOS kernel, it will no longer work correctly.
 206         </p>
 207         <p>
 208         The other difference is that OpenBeOS exports its node monitoring functionality to
 209         kernel modules as well, and provides an extra plain C API for them to use.
 210         </p>
 211
 212         <br>
 213         <h2>And Beyond?</h2>
 214
 215         <p>
 216         The current implementation directly iterates over all listeners and sends out notifications
 217         as required synchronously in the context of the thread that triggered the notification to
 218         be sent.
 219         </p>
 220         <p>
 221         If a node monitor needs to send out several messages, this could theoretically greatly
 222         decrease file system performance. To optimize for this case, the required data of the
 223         notification could be put into a queue and be sent by a dedicated worker thread. Since
 224         this requires an additional copy operation and a reserved address space for this queue,
 225         this optimization could be more expensive than the current implementation, depending
 226         on the usage pattern of the node monitoring mechanism.
 227         </p>
 228         <p>
 229         With BFS, it would be possible to introduce the possibility to automatically watch all
 230         files in a specified directory. While this would be very convenient at application level,
 231         it comes with several disadvantages:
 232         </p>
 233         <ol>
 234                 <li>This feature might not be easily accomplishable for many file systems; a file system
 235                         must be able to retrieve a node by ID only - it might not be feasible to find
 236                         out about the parent directory for many file systems.</li>
 237                 <li>Although it could potentially safe node monitors, it might cause the kernel to
 238                         send out a lot more messages to the application than it needs. With the restriction
 239                         the kernel imposes to the number of watched nodes for a team, the application's
 240                         designer might try to be much stricter with the number of monitors his application
 241                         will consume.</li>
 242         </ol>
 243         <p>
 244         While 1.) might be a real show stopper, 2.) is almost invalidated because of Tracker's
 245         usage of node monitors; it consumes a monitor for every entry it displays, which might
 246         be several thousands. Implementing this feature would not only greatly speed up maintaining
 247         this massive need of monitors, and cut down memory usage, but also ease the implementation
 248         at application level.
 249         </p>
 250         <p>
 251         Even 1.) could be solved if the kernel could query a file system if it can support
 252         this particular feature; it could then automatically monitor all files in that directory
 253         without adding complexity to the application using this feature. Of course,
 254         the effort to provide this functionality is much larger then - but for applications
 255         like Tracker, the complexity would be removed from the application without extra cost.
 256         </p>
 257         <p>
 258         However, none of the discussed feature extensions have been implemented for the currently
 259         developed version R1 of OpenBeOS.
 260         </p>
 261 </body>
 262 </html>