On Tue, Nov 06, 2007 at 02:33:53AM -0800, akpm@linux-foundation.org wrote:
[mmotm.git] / fs / reiser4 / znode.c
blob4ff97147c710fb14d5228b89af00aefbf0c95832
1 /* Copyright 2001, 2002, 2003 by Hans Reiser, licensing governed by
2 * reiser4/README */
3 /* Znode manipulation functions. */
4 /* Znode is the in-memory header for a tree node. It is stored
5 separately from the node itself so that it does not get written to
6 disk. In this respect znode is like buffer head or page head. We
7 also use znodes for additional reiser4 specific purposes:
9 . they are organized into tree structure which is a part of whole
10 reiser4 tree.
11 . they are used to implement node grained locking
12 . they are used to keep additional state associated with a
13 node
14 . they contain links to lists used by the transaction manager
16 Znode is attached to some variable "block number" which is instance of
17 fs/reiser4/tree.h:reiser4_block_nr type. Znode can exist without
18 appropriate node being actually loaded in memory. Existence of znode itself
19 is regulated by reference count (->x_count) in it. Each time thread
20 acquires reference to znode through call to zget(), ->x_count is
21 incremented and decremented on call to zput(). Data (content of node) are
22 brought in memory through call to zload(), which also increments ->d_count
23 reference counter. zload can block waiting on IO. Call to zrelse()
24 decreases this counter. Also, ->c_count keeps track of number of child
25 znodes and prevents parent znode from being recycled until all of its
26 children are. ->c_count is decremented whenever child goes out of existence
27 (being actually recycled in zdestroy()) which can be some time after last
28 reference to this child dies if we support some form of LRU cache for
29 znodes.
32 /* EVERY ZNODE'S STORY
34 1. His infancy.
36 Once upon a time, the znode was born deep inside of zget() by call to
37 zalloc(). At the return from zget() znode had:
39 . reference counter (x_count) of 1
40 . assigned block number, marked as used in bitmap
41 . pointer to parent znode. Root znode parent pointer points
42 to its father: "fake" znode. This, in turn, has NULL parent pointer.
43 . hash table linkage
44 . no data loaded from disk
45 . no node plugin
46 . no sibling linkage
48 2. His childhood
50 Each node is either brought into memory as a result of tree traversal, or
51 created afresh, creation of the root being a special case of the latter. In
52 either case it's inserted into sibling list. This will typically require
53 some ancillary tree traversing, but ultimately both sibling pointers will
54 exist and JNODE_LEFT_CONNECTED and JNODE_RIGHT_CONNECTED will be true in
55 zjnode.state.
57 3. His youth.
59 If znode is bound to already existing node in a tree, its content is read
60 from the disk by call to zload(). At that moment, JNODE_LOADED bit is set
61 in zjnode.state and zdata() function starts to return non null for this
62 znode. zload() further calls zparse() that determines which node layout
63 this node is rendered in, and sets ->nplug on success.
65 If znode is for new node just created, memory for it is allocated and
66 zinit_new() function is called to initialise data, according to selected
67 node layout.
69 4. His maturity.
71 After this point, znode lingers in memory for some time. Threads can
72 acquire references to znode either by blocknr through call to zget(), or by
73 following a pointer to unallocated znode from internal item. Each time
74 reference to znode is obtained, x_count is increased. Thread can read/write
75 lock znode. Znode data can be loaded through calls to zload(), d_count will
76 be increased appropriately. If all references to znode are released
77 (x_count drops to 0), znode is not recycled immediately. Rather, it is
78 still cached in the hash table in the hope that it will be accessed
79 shortly.
81 There are two ways in which znode existence can be terminated:
83 . sudden death: node bound to this znode is removed from the tree
84 . overpopulation: znode is purged out of memory due to memory pressure
86 5. His death.
88 Death is complex process.
90 When we irrevocably commit ourselves to decision to remove node from the
91 tree, JNODE_HEARD_BANSHEE bit is set in zjnode.state of corresponding
92 znode. This is done either in ->kill_hook() of internal item or in
93 reiser4_kill_root() function when tree root is removed.
95 At this moment znode still has:
97 . locks held on it, necessary write ones
98 . references to it
99 . disk block assigned to it
100 . data loaded from the disk
101 . pending requests for lock
103 But once JNODE_HEARD_BANSHEE bit set, last call to unlock_znode() does node
104 deletion. Node deletion includes two phases. First all ways to get
105 references to that znode (sibling and parent links and hash lookup using
106 block number stored in parent node) should be deleted -- it is done through
107 sibling_list_remove(), also we assume that nobody uses down link from
108 parent node due to its nonexistence or proper parent node locking and
109 nobody uses parent pointers from children due to absence of them. Second we
110 invalidate all pending lock requests which still are on znode's lock
111 request queue, this is done by reiser4_invalidate_lock(). Another
112 JNODE_IS_DYING znode status bit is used to invalidate pending lock requests.
113 Once it set all requesters are forced to return -EINVAL from
114 longterm_lock_znode(). Future locking attempts are not possible because all
115 ways to get references to that znode are removed already. Last, node is
116 uncaptured from transaction.
118 When last reference to the dying znode is just about to be released,
119 block number for this lock is released and znode is removed from the
120 hash table.
122 Now znode can be recycled.
124 [it's possible to free bitmap block and remove znode from the hash
125 table when last lock is released. This will result in having
126 referenced but completely orphaned znode]
128 6. Limbo
130 As have been mentioned above znodes with reference counter 0 are
131 still cached in a hash table. Once memory pressure increases they are
132 purged out of there [this requires something like LRU list for
133 efficient implementation. LRU list would also greatly simplify
134 implementation of coord cache that would in this case morph to just
135 scanning some initial segment of LRU list]. Data loaded into
136 unreferenced znode are flushed back to the durable storage if
137 necessary and memory is freed. Znodes themselves can be recycled at
138 this point too.
142 #include "debug.h"
143 #include "dformat.h"
144 #include "key.h"
145 #include "coord.h"
146 #include "plugin/plugin_header.h"
147 #include "plugin/node/node.h"
148 #include "plugin/plugin.h"
149 #include "txnmgr.h"
150 #include "jnode.h"
151 #include "znode.h"
152 #include "block_alloc.h"
153 #include "tree.h"
154 #include "tree_walk.h"
155 #include "super.h"
156 #include "reiser4.h"
158 #include <linux/pagemap.h>
159 #include <linux/spinlock.h>
160 #include <linux/slab.h>
161 #include <linux/err.h>
163 static z_hash_table *get_htable(reiser4_tree *,
164 const reiser4_block_nr * const blocknr);
165 static z_hash_table *znode_get_htable(const znode *);
166 static void zdrop(znode *);
168 /* hash table support */
170 /* compare two block numbers for equality. Used by hash-table macros */
171 static inline int
172 blknreq(const reiser4_block_nr * b1, const reiser4_block_nr * b2)
174 assert("nikita-534", b1 != NULL);
175 assert("nikita-535", b2 != NULL);
177 return *b1 == *b2;
180 /* Hash znode by block number. Used by hash-table macros */
181 /* Audited by: umka (2002.06.11) */
182 static inline __u32
183 blknrhashfn(z_hash_table * table, const reiser4_block_nr * b)
185 assert("nikita-536", b != NULL);
187 return *b & (REISER4_ZNODE_HASH_TABLE_SIZE - 1);
190 /* The hash table definition */
191 #define KMALLOC(size) kmalloc((size), reiser4_ctx_gfp_mask_get())
192 #define KFREE(ptr, size) kfree(ptr)
193 TYPE_SAFE_HASH_DEFINE(z, znode, reiser4_block_nr, zjnode.key.z, zjnode.link.z,
194 blknrhashfn, blknreq);
195 #undef KFREE
196 #undef KMALLOC
198 /* slab for znodes */
199 static struct kmem_cache *znode_cache;
201 int znode_shift_order;
204 * init_znodes - create znode cache
206 * Initializes slab cache of znodes. It is part of reiser4 module initialization.
208 int init_znodes(void)
210 znode_cache = kmem_cache_create("znode", sizeof(znode), 0,
211 SLAB_HWCACHE_ALIGN |
212 SLAB_RECLAIM_ACCOUNT, NULL);
213 if (znode_cache == NULL)
214 return RETERR(-ENOMEM);
216 for (znode_shift_order = 0; (1 << znode_shift_order) < sizeof(znode);
217 ++znode_shift_order);
218 --znode_shift_order;
219 return 0;
223 * done_znodes - delete znode cache
225 * This is called on reiser4 module unloading or system shutdown.
227 void done_znodes(void)
229 destroy_reiser4_cache(&znode_cache);
232 /* call this to initialise tree of znodes */
233 int znodes_tree_init(reiser4_tree * tree /* tree to initialise znodes for */ )
235 int result;
236 assert("umka-050", tree != NULL);
238 rwlock_init(&tree->dk_lock);
240 result = z_hash_init(&tree->zhash_table, REISER4_ZNODE_HASH_TABLE_SIZE);
241 if (result != 0)
242 return result;
243 result = z_hash_init(&tree->zfake_table, REISER4_ZNODE_HASH_TABLE_SIZE);
244 return result;
247 /* free this znode */
248 void zfree(znode * node /* znode to free */ )
250 assert("nikita-465", node != NULL);
251 assert("nikita-2120", znode_page(node) == NULL);
252 assert("nikita-2301", list_empty_careful(&node->lock.owners));
253 assert("nikita-2302", list_empty_careful(&node->lock.requestors));
254 assert("nikita-2663", (list_empty_careful(&ZJNODE(node)->capture_link) &&
255 NODE_LIST(ZJNODE(node)) == NOT_CAPTURED));
256 assert("nikita-3220", list_empty(&ZJNODE(node)->jnodes));
257 assert("nikita-3293", !znode_is_right_connected(node));
258 assert("nikita-3294", !znode_is_left_connected(node));
259 assert("nikita-3295", node->left == NULL);
260 assert("nikita-3296", node->right == NULL);
262 /* not yet phash_jnode_destroy(ZJNODE(node)); */
264 kmem_cache_free(znode_cache, node);
267 /* call this to free tree of znodes */
268 void znodes_tree_done(reiser4_tree * tree /* tree to finish with znodes of */ )
270 znode *node;
271 znode *next;
272 z_hash_table *ztable;
274 /* scan znode hash-tables and kill all znodes, then free hash tables
275 * themselves. */
277 assert("nikita-795", tree != NULL);
279 ztable = &tree->zhash_table;
281 if (ztable->_table != NULL) {
282 for_all_in_htable(ztable, z, node, next) {
283 node->c_count = 0;
284 node->in_parent.node = NULL;
285 assert("nikita-2179", atomic_read(&ZJNODE(node)->x_count) == 0);
286 zdrop(node);
289 z_hash_done(&tree->zhash_table);
292 ztable = &tree->zfake_table;
294 if (ztable->_table != NULL) {
295 for_all_in_htable(ztable, z, node, next) {
296 node->c_count = 0;
297 node->in_parent.node = NULL;
298 assert("nikita-2179", atomic_read(&ZJNODE(node)->x_count) == 0);
299 zdrop(node);
302 z_hash_done(&tree->zfake_table);
306 /* ZNODE STRUCTURES */
308 /* allocate fresh znode */
309 znode *zalloc(gfp_t gfp_flag /* allocation flag */ )
311 znode *node;
313 node = kmem_cache_alloc(znode_cache, gfp_flag);
314 return node;
317 /* Initialize fields of znode
318 @node: znode to initialize;
319 @parent: parent znode;
320 @tree: tree we are in. */
321 void zinit(znode * node, const znode * parent, reiser4_tree * tree)
323 assert("nikita-466", node != NULL);
324 assert("umka-268", current_tree != NULL);
326 memset(node, 0, sizeof *node);
328 assert("umka-051", tree != NULL);
330 jnode_init(&node->zjnode, tree, JNODE_FORMATTED_BLOCK);
331 reiser4_init_lock(&node->lock);
332 init_parent_coord(&node->in_parent, parent);
336 * remove znode from indices. This is called jput() when last reference on
337 * znode is released.
339 void znode_remove(znode * node /* znode to remove */ , reiser4_tree * tree)
341 assert("nikita-2108", node != NULL);
342 assert("nikita-470", node->c_count == 0);
343 assert_rw_write_locked(&(tree->tree_lock));
345 /* remove reference to this znode from cbk cache */
346 cbk_cache_invalidate(node, tree);
348 /* update c_count of parent */
349 if (znode_parent(node) != NULL) {
350 assert("nikita-472", znode_parent(node)->c_count > 0);
351 /* father, onto your hands I forward my spirit... */
352 znode_parent(node)->c_count--;
353 node->in_parent.node = NULL;
354 } else {
355 /* orphaned znode?! Root? */
358 /* remove znode from hash-table */
359 z_hash_remove_rcu(znode_get_htable(node), node);
362 /* zdrop() -- Remove znode from the tree.
364 This is called when znode is removed from the memory. */
365 static void zdrop(znode * node /* znode to finish with */ )
367 jdrop(ZJNODE(node));
371 * put znode into right place in the hash table. This is called by relocate
372 * code.
374 int znode_rehash(znode * node /* node to rehash */ ,
375 const reiser4_block_nr * new_block_nr /* new block number */ )
377 z_hash_table *oldtable;
378 z_hash_table *newtable;
379 reiser4_tree *tree;
381 assert("nikita-2018", node != NULL);
383 tree = znode_get_tree(node);
384 oldtable = znode_get_htable(node);
385 newtable = get_htable(tree, new_block_nr);
387 write_lock_tree(tree);
388 /* remove znode from hash-table */
389 z_hash_remove_rcu(oldtable, node);
391 /* assertion no longer valid due to RCU */
392 /* assert("nikita-2019", z_hash_find(newtable, new_block_nr) == NULL); */
394 /* update blocknr */
395 znode_set_block(node, new_block_nr);
396 node->zjnode.key.z = *new_block_nr;
398 /* insert it into hash */
399 z_hash_insert_rcu(newtable, node);
400 write_unlock_tree(tree);
401 return 0;
404 /* ZNODE LOOKUP, GET, PUT */
406 /* zlook() - get znode with given block_nr in a hash table or return NULL
408 If result is non-NULL then the znode's x_count is incremented. Internal version
409 accepts pre-computed hash index. The hash table is accessed under caller's
410 tree->hash_lock.
412 znode *zlook(reiser4_tree * tree, const reiser4_block_nr * const blocknr)
414 znode *result;
415 __u32 hash;
416 z_hash_table *htable;
418 assert("jmacd-506", tree != NULL);
419 assert("jmacd-507", blocknr != NULL);
421 htable = get_htable(tree, blocknr);
422 hash = blknrhashfn(htable, blocknr);
424 rcu_read_lock();
425 result = z_hash_find_index(htable, hash, blocknr);
427 if (result != NULL) {
428 add_x_ref(ZJNODE(result));
429 result = znode_rip_check(tree, result);
431 rcu_read_unlock();
433 return result;
436 /* return hash table where znode with block @blocknr is (or should be)
437 * stored */
438 static z_hash_table *get_htable(reiser4_tree * tree,
439 const reiser4_block_nr * const blocknr)
441 z_hash_table *table;
442 if (is_disk_addr_unallocated(blocknr))
443 table = &tree->zfake_table;
444 else
445 table = &tree->zhash_table;
446 return table;
449 /* return hash table where znode @node is (or should be) stored */
450 static z_hash_table *znode_get_htable(const znode * node)
452 return get_htable(znode_get_tree(node), znode_get_block(node));
455 /* zget() - get znode from hash table, allocating it if necessary.
457 First a call to zlook, locating a x-referenced znode if one
458 exists. If znode is not found, allocate new one and return. Result
459 is returned with x_count reference increased.
461 LOCKS TAKEN: TREE_LOCK, ZNODE_LOCK
462 LOCK ORDERING: NONE
464 znode *zget(reiser4_tree * tree,
465 const reiser4_block_nr * const blocknr,
466 znode * parent, tree_level level, gfp_t gfp_flag)
468 znode *result;
469 __u32 hashi;
471 z_hash_table *zth;
473 assert("jmacd-512", tree != NULL);
474 assert("jmacd-513", blocknr != NULL);
475 assert("jmacd-514", level < REISER4_MAX_ZTREE_HEIGHT);
477 zth = get_htable(tree, blocknr);
478 hashi = blknrhashfn(zth, blocknr);
480 /* NOTE-NIKITA address-as-unallocated-blocknr still is not
481 implemented. */
483 z_hash_prefetch_bucket(zth, hashi);
485 rcu_read_lock();
486 /* Find a matching BLOCKNR in the hash table. If the znode is found,
487 we obtain an reference (x_count) but the znode remains unlocked.
488 Have to worry about race conditions later. */
489 result = z_hash_find_index(zth, hashi, blocknr);
490 /* According to the current design, the hash table lock protects new
491 znode references. */
492 if (result != NULL) {
493 add_x_ref(ZJNODE(result));
494 /* NOTE-NIKITA it should be so, but special case during
495 creation of new root makes such assertion highly
496 complicated. */
497 assert("nikita-2131", 1 || znode_parent(result) == parent ||
498 (ZF_ISSET(result, JNODE_ORPHAN)
499 && (znode_parent(result) == NULL)));
500 result = znode_rip_check(tree, result);
503 rcu_read_unlock();
505 if (!result) {
506 znode *shadow;
508 result = zalloc(gfp_flag);
509 if (!result) {
510 return ERR_PTR(RETERR(-ENOMEM));
513 zinit(result, parent, tree);
514 ZJNODE(result)->blocknr = *blocknr;
515 ZJNODE(result)->key.z = *blocknr;
516 result->level = level;
518 write_lock_tree(tree);
520 shadow = z_hash_find_index(zth, hashi, blocknr);
521 if (unlikely(shadow != NULL && !ZF_ISSET(shadow, JNODE_RIP))) {
522 jnode_list_remove(ZJNODE(result));
523 zfree(result);
524 result = shadow;
525 } else {
526 result->version = znode_build_version(tree);
527 z_hash_insert_index_rcu(zth, hashi, result);
529 if (parent != NULL)
530 ++parent->c_count;
533 add_x_ref(ZJNODE(result));
535 write_unlock_tree(tree);
537 #if REISER4_DEBUG
538 if (!reiser4_blocknr_is_fake(blocknr) && *blocknr != 0)
539 reiser4_check_block(blocknr, 1);
540 #endif
541 /* Check for invalid tree level, return -EIO */
542 if (unlikely(znode_get_level(result) != level)) {
543 warning("jmacd-504",
544 "Wrong level for cached block %llu: %i expecting %i",
545 (unsigned long long)(*blocknr), znode_get_level(result),
546 level);
547 zput(result);
548 return ERR_PTR(RETERR(-EIO));
551 assert("nikita-1227", znode_invariant(result));
553 return result;
556 /* ZNODE PLUGINS/DATA */
558 /* "guess" plugin for node loaded from the disk. Plugin id of node plugin is
559 stored at the fixed offset from the beginning of the node. */
560 static node_plugin *znode_guess_plugin(const znode * node /* znode to guess
561 * plugin of */ )
563 reiser4_tree *tree;
565 assert("nikita-1053", node != NULL);
566 assert("nikita-1055", zdata(node) != NULL);
568 tree = znode_get_tree(node);
569 assert("umka-053", tree != NULL);
571 if (reiser4_is_set(tree->super, REISER4_ONE_NODE_PLUGIN)) {
572 return tree->nplug;
573 } else {
574 return node_plugin_by_disk_id
575 (tree, &((common_node_header *) zdata(node))->plugin_id);
576 #ifdef GUESS_EXISTS
577 reiser4_plugin *plugin;
579 /* NOTE-NIKITA add locking here when dynamic plugins will be
580 * implemented */
581 for_all_plugins(REISER4_NODE_PLUGIN_TYPE, plugin) {
582 if ((plugin->u.node.guess != NULL)
583 && plugin->u.node.guess(node))
584 return plugin;
586 warning("nikita-1057", "Cannot guess node plugin");
587 print_znode("node", node);
588 return NULL;
589 #endif
593 /* parse node header and install ->node_plugin */
594 int zparse(znode * node /* znode to parse */ )
596 int result;
598 assert("nikita-1233", node != NULL);
599 assert("nikita-2370", zdata(node) != NULL);
601 if (node->nplug == NULL) {
602 node_plugin *nplug;
604 nplug = znode_guess_plugin(node);
605 if (likely(nplug != NULL)) {
606 result = nplug->parse(node);
607 if (likely(result == 0))
608 node->nplug = nplug;
609 } else {
610 result = RETERR(-EIO);
612 } else
613 result = 0;
614 return result;
617 /* zload with readahead */
618 int zload_ra(znode * node /* znode to load */ , ra_info_t * info)
620 int result;
622 assert("nikita-484", node != NULL);
623 assert("nikita-1377", znode_invariant(node));
624 assert("jmacd-7771", !znode_above_root(node));
625 assert("nikita-2125", atomic_read(&ZJNODE(node)->x_count) > 0);
626 assert("nikita-3016", reiser4_schedulable());
628 if (info)
629 formatted_readahead(node, info);
631 result = jload(ZJNODE(node));
632 assert("nikita-1378", znode_invariant(node));
633 return result;
636 /* load content of node into memory */
637 int zload(znode * node)
639 return zload_ra(node, NULL);
642 /* call node plugin to initialise newly allocated node. */
643 int zinit_new(znode * node /* znode to initialise */ , gfp_t gfp_flags)
645 return jinit_new(ZJNODE(node), gfp_flags);
648 /* drop reference to node data. When last reference is dropped, data are
649 unloaded. */
650 void zrelse(znode * node /* znode to release references to */ )
652 assert("nikita-1381", znode_invariant(node));
654 jrelse(ZJNODE(node));
657 /* returns free space in node */
658 unsigned znode_free_space(znode * node /* znode to query */ )
660 assert("nikita-852", node != NULL);
661 return node_plugin_by_node(node)->free_space(node);
664 /* left delimiting key of znode */
665 reiser4_key *znode_get_rd_key(znode * node /* znode to query */ )
667 assert("nikita-958", node != NULL);
668 assert_rw_locked(&(znode_get_tree(node)->dk_lock));
669 assert("nikita-3067", LOCK_CNT_GTZ(rw_locked_dk));
670 assert("nikita-30671", node->rd_key_version != 0);
671 return &node->rd_key;
674 /* right delimiting key of znode */
675 reiser4_key *znode_get_ld_key(znode * node /* znode to query */ )
677 assert("nikita-974", node != NULL);
678 assert_rw_locked(&(znode_get_tree(node)->dk_lock));
679 assert("nikita-3068", LOCK_CNT_GTZ(rw_locked_dk));
680 assert("nikita-30681", node->ld_key_version != 0);
681 return &node->ld_key;
684 ON_DEBUG(atomic_t delim_key_version = ATOMIC_INIT(0);
687 /* update right-delimiting key of @node */
688 reiser4_key *znode_set_rd_key(znode * node, const reiser4_key * key)
690 assert("nikita-2937", node != NULL);
691 assert("nikita-2939", key != NULL);
692 assert_rw_write_locked(&(znode_get_tree(node)->dk_lock));
693 assert("nikita-3069", LOCK_CNT_GTZ(write_locked_dk));
694 assert("nikita-2944",
695 znode_is_any_locked(node) ||
696 znode_get_level(node) != LEAF_LEVEL ||
697 keyge(key, &node->rd_key) ||
698 keyeq(&node->rd_key, reiser4_min_key()) ||
699 ZF_ISSET(node, JNODE_HEARD_BANSHEE));
701 node->rd_key = *key;
702 ON_DEBUG(node->rd_key_version = atomic_inc_return(&delim_key_version));
703 return &node->rd_key;
706 /* update left-delimiting key of @node */
707 reiser4_key *znode_set_ld_key(znode * node, const reiser4_key * key)
709 assert("nikita-2940", node != NULL);
710 assert("nikita-2941", key != NULL);
711 assert_rw_write_locked(&(znode_get_tree(node)->dk_lock));
712 assert("nikita-3070", LOCK_CNT_GTZ(write_locked_dk));
713 assert("nikita-2943",
714 znode_is_any_locked(node) || keyeq(&node->ld_key,
715 reiser4_min_key()));
717 node->ld_key = *key;
718 ON_DEBUG(node->ld_key_version = atomic_inc_return(&delim_key_version));
719 return &node->ld_key;
722 /* true if @key is inside key range for @node */
723 int znode_contains_key(znode * node /* znode to look in */ ,
724 const reiser4_key * key /* key to look for */ )
726 assert("nikita-1237", node != NULL);
727 assert("nikita-1238", key != NULL);
729 /* left_delimiting_key <= key <= right_delimiting_key */
730 return keyle(znode_get_ld_key(node), key)
731 && keyle(key, znode_get_rd_key(node));
734 /* same as znode_contains_key(), but lock dk lock */
735 int znode_contains_key_lock(znode * node /* znode to look in */ ,
736 const reiser4_key * key /* key to look for */ )
738 int result;
740 assert("umka-056", node != NULL);
741 assert("umka-057", key != NULL);
743 read_lock_dk(znode_get_tree(node));
744 result = znode_contains_key(node, key);
745 read_unlock_dk(znode_get_tree(node));
746 return result;
749 /* get parent pointer, assuming tree is not locked */
750 znode *znode_parent_nolock(const znode * node /* child znode */ )
752 assert("nikita-1444", node != NULL);
753 return node->in_parent.node;
756 /* get parent pointer of znode */
757 znode *znode_parent(const znode * node /* child znode */ )
759 assert("nikita-1226", node != NULL);
760 assert("nikita-1406", LOCK_CNT_GTZ(rw_locked_tree));
761 return znode_parent_nolock(node);
764 /* detect uber znode used to protect in-superblock tree root pointer */
765 int znode_above_root(const znode * node /* znode to query */ )
767 assert("umka-059", node != NULL);
769 return disk_addr_eq(&ZJNODE(node)->blocknr, &UBER_TREE_ADDR);
772 /* check that @node is root---that its block number is recorder in the tree as
773 that of root node */
774 #if REISER4_DEBUG
775 static int znode_is_true_root(const znode * node /* znode to query */ )
777 assert("umka-060", node != NULL);
778 assert("umka-061", current_tree != NULL);
780 return disk_addr_eq(znode_get_block(node),
781 &znode_get_tree(node)->root_block);
783 #endif
785 /* check that @node is root */
786 int znode_is_root(const znode * node /* znode to query */ )
788 assert("nikita-1206", node != NULL);
790 return znode_get_level(node) == znode_get_tree(node)->height;
793 /* Returns true is @node was just created by zget() and wasn't ever loaded
794 into memory. */
795 /* NIKITA-HANS: yes */
796 int znode_just_created(const znode * node)
798 assert("nikita-2188", node != NULL);
799 return (znode_page(node) == NULL);
802 /* obtain updated ->znode_epoch. See seal.c for description. */
803 __u64 znode_build_version(reiser4_tree * tree)
805 __u64 result;
807 spin_lock(&tree->epoch_lock);
808 result = ++tree->znode_epoch;
809 spin_unlock(&tree->epoch_lock);
810 return result;
813 void init_load_count(load_count * dh)
815 assert("nikita-2105", dh != NULL);
816 memset(dh, 0, sizeof *dh);
819 void done_load_count(load_count * dh)
821 assert("nikita-2106", dh != NULL);
822 if (dh->node != NULL) {
823 for (; dh->d_ref > 0; --dh->d_ref)
824 zrelse(dh->node);
825 dh->node = NULL;
829 static int incr_load_count(load_count * dh)
831 int result;
833 assert("nikita-2110", dh != NULL);
834 assert("nikita-2111", dh->node != NULL);
836 result = zload(dh->node);
837 if (result == 0)
838 ++dh->d_ref;
839 return result;
842 int incr_load_count_znode(load_count * dh, znode * node)
844 assert("nikita-2107", dh != NULL);
845 assert("nikita-2158", node != NULL);
846 assert("nikita-2109",
847 ergo(dh->node != NULL, (dh->node == node) || (dh->d_ref == 0)));
849 dh->node = node;
850 return incr_load_count(dh);
853 int incr_load_count_jnode(load_count * dh, jnode * node)
855 if (jnode_is_znode(node)) {
856 return incr_load_count_znode(dh, JZNODE(node));
858 return 0;
861 void copy_load_count(load_count * new, load_count * old)
863 int ret = 0;
864 done_load_count(new);
865 new->node = old->node;
866 new->d_ref = 0;
868 while ((new->d_ref < old->d_ref) && (ret = incr_load_count(new)) == 0) {
871 assert("jmacd-87589", ret == 0);
874 void move_load_count(load_count * new, load_count * old)
876 done_load_count(new);
877 new->node = old->node;
878 new->d_ref = old->d_ref;
879 old->node = NULL;
880 old->d_ref = 0;
883 /* convert parent pointer into coord */
884 void parent_coord_to_coord(const parent_coord_t * pcoord, coord_t * coord)
886 assert("nikita-3204", pcoord != NULL);
887 assert("nikita-3205", coord != NULL);
889 coord_init_first_unit_nocheck(coord, pcoord->node);
890 coord_set_item_pos(coord, pcoord->item_pos);
891 coord->between = AT_UNIT;
894 /* pack coord into parent_coord_t */
895 void coord_to_parent_coord(const coord_t * coord, parent_coord_t * pcoord)
897 assert("nikita-3206", pcoord != NULL);
898 assert("nikita-3207", coord != NULL);
900 pcoord->node = coord->node;
901 pcoord->item_pos = coord->item_pos;
904 /* Initialize a parent hint pointer. (parent hint pointer is a field in znode,
905 look for comments there) */
906 void init_parent_coord(parent_coord_t * pcoord, const znode * node)
908 pcoord->node = (znode *) node;
909 pcoord->item_pos = (unsigned short)~0;
912 #if REISER4_DEBUG
914 /* debugging aid: znode invariant */
915 static int znode_invariant_f(const znode * node /* znode to check */ ,
916 char const **msg /* where to store error
917 * message, if any */ )
919 #define _ergo(ant, con) \
920 ((*msg) = "{" #ant "} ergo {" #con "}", ergo((ant), (con)))
922 #define _equi(e1, e2) \
923 ((*msg) = "{" #e1 "} <=> {" #e2 "}", equi((e1), (e2)))
925 #define _check(exp) ((*msg) = #exp, (exp))
927 return jnode_invariant_f(ZJNODE(node), msg) &&
928 /* [znode-fake] invariant */
929 /* fake znode doesn't have a parent, and */
930 _ergo(znode_get_level(node) == 0, znode_parent(node) == NULL) &&
931 /* there is another way to express this very check, and */
932 _ergo(znode_above_root(node), znode_parent(node) == NULL) &&
933 /* it has special block number, and */
934 _ergo(znode_get_level(node) == 0,
935 disk_addr_eq(znode_get_block(node), &UBER_TREE_ADDR)) &&
936 /* it is the only znode with such block number, and */
937 _ergo(!znode_above_root(node) && znode_is_loaded(node),
938 !disk_addr_eq(znode_get_block(node), &UBER_TREE_ADDR)) &&
939 /* it is parent of the tree root node */
940 _ergo(znode_is_true_root(node),
941 znode_above_root(znode_parent(node))) &&
942 /* [znode-level] invariant */
943 /* level of parent znode is one larger than that of child,
944 except for the fake znode, and */
945 _ergo(znode_parent(node) && !znode_above_root(znode_parent(node)),
946 znode_get_level(znode_parent(node)) ==
947 znode_get_level(node) + 1) &&
948 /* left neighbor is at the same level, and */
949 _ergo(znode_is_left_connected(node) && node->left != NULL,
950 znode_get_level(node) == znode_get_level(node->left)) &&
951 /* right neighbor is at the same level */
952 _ergo(znode_is_right_connected(node) && node->right != NULL,
953 znode_get_level(node) == znode_get_level(node->right)) &&
954 /* [znode-connected] invariant */
955 _ergo(node->left != NULL, znode_is_left_connected(node)) &&
956 _ergo(node->right != NULL, znode_is_right_connected(node)) &&
957 _ergo(!znode_is_root(node) && node->left != NULL,
958 znode_is_right_connected(node->left) &&
959 node->left->right == node) &&
960 _ergo(!znode_is_root(node) && node->right != NULL,
961 znode_is_left_connected(node->right) &&
962 node->right->left == node) &&
963 /* [znode-c_count] invariant */
964 /* for any znode, c_count of its parent is greater than 0 */
965 _ergo(znode_parent(node) != NULL &&
966 !znode_above_root(znode_parent(node)),
967 znode_parent(node)->c_count > 0) &&
968 /* leaves don't have children */
969 _ergo(znode_get_level(node) == LEAF_LEVEL,
970 node->c_count == 0) &&
971 _check(node->zjnode.jnodes.prev != NULL) &&
972 _check(node->zjnode.jnodes.next != NULL) &&
973 /* orphan doesn't have a parent */
974 _ergo(ZF_ISSET(node, JNODE_ORPHAN), znode_parent(node) == 0) &&
975 /* [znode-modify] invariant */
976 /* if znode is not write-locked, its checksum remains
977 * invariant */
978 /* unfortunately, zlock is unordered w.r.t. jnode_lock, so we
979 * cannot check this. */
980 /* [znode-refs] invariant */
981 /* only referenced znode can be long-term locked */
982 _ergo(znode_is_locked(node),
983 atomic_read(&ZJNODE(node)->x_count) != 0);
986 /* debugging aid: check znode invariant and panic if it doesn't hold */
987 int znode_invariant(znode * node /* znode to check */ )
989 char const *failed_msg;
990 int result;
992 assert("umka-063", node != NULL);
993 assert("umka-064", current_tree != NULL);
995 spin_lock_znode(node);
996 read_lock_tree(znode_get_tree(node));
997 result = znode_invariant_f(node, &failed_msg);
998 if (!result) {
999 /* print_znode("corrupted node", node); */
1000 warning("jmacd-555", "Condition %s failed", failed_msg);
1002 read_unlock_tree(znode_get_tree(node));
1003 spin_unlock_znode(node);
1004 return result;
1007 /* return non-0 iff data are loaded into znode */
1008 int znode_is_loaded(const znode * node /* znode to query */ )
1010 assert("nikita-497", node != NULL);
1011 return jnode_is_loaded(ZJNODE(node));
1014 unsigned long znode_times_locked(const znode * z)
1016 return z->times_locked;
1019 #endif /* REISER4_DEBUG */
1021 /* Make Linus happy.
1022 Local variables:
1023 c-indentation-style: "K&R"
1024 mode-name: "LC"
1025 c-basic-offset: 8
1026 tab-width: 8
1027 fill-column: 120
1028 End: