FreeBSD ZFS
The Zettabyte File System
Functions

zfs_rlock.c File Reference

File Range Locking for ZFS. More...

#include <sys/zfs_rlock.h>
Include dependency graph for zfs_rlock.c:

Go to the source code of this file.

Functions

static void zfs_range_lock_writer (znode_t *zp, rl_t *new)
 Check if a write lock can be grabbed, or wait and recheck until available.
static rl_tzfs_range_proxify (avl_tree_t *tree, rl_t *rl)
 If this is an original (non-proxy) lock then replace it by a proxy and return the proxy.
static rl_tzfs_range_split (avl_tree_t *tree, rl_t *rl, uint64_t off)
 Split the range lock at the supplied offset returning the *front* proxy.
static void zfs_range_new_proxy (avl_tree_t *tree, uint64_t off, uint64_t len)
 Create and add a new proxy range lock for the supplied range.
static void zfs_range_add_reader (avl_tree_t *tree, rl_t *new, rl_t *prev, avl_index_t where)
static void zfs_range_lock_reader (znode_t *zp, rl_t *new)
 Check if a reader lock can be grabbed, or wait and recheck until available.
rl_tzfs_range_lock (znode_t *zp, uint64_t off, uint64_t len, rl_type_t type)
 Lock an object range.
static void zfs_range_unlock_reader (znode_t *zp, rl_t *remove)
 Unlock a reader lock.
void zfs_range_unlock (rl_t *rl)
void zfs_range_reduce (rl_t *rl, uint64_t off, uint64_t len)
 Reduce range locked as RL_WRITER from whole file to specified range.
int zfs_range_compare (const void *arg1, const void *arg2)
 AVL comparison function used to order range locks Locks are ordered on the start offset of the range.

Detailed Description

File Range Locking for ZFS.

This file contains the code to implement file range locking in ZFS, although there isn't much specific to ZFS (all that comes to mind is support for growing the blocksize).

Interface --------- Defined in zfs_rlock.h but essentially: rl = zfs_range_lock(zp, off, len, lock_type); zfs_range_unlock(rl); zfs_range_reduce(rl, off, len);

AVL tree -------- An AVL tree is used to maintain the state of the existing ranges that are locked for exclusive (writer) or shared (reader) use. The starting range offset is used for searching and sorting the tree.

Common case ----------- The (hopefully) usual case is of no overlaps or contention for locks. On entry to zfs_lock_range() a rl_t is allocated; the tree searched that finds no overlap, and *this* rl_t is placed in the tree.

Overlaps/Reference counting/Proxy locks --------------------------------------- The avl code only allows one node at a particular offset. Also it's very inefficient to search through all previous entries looking for overlaps (because the very 1st in the ordered list might be at offset 0 but cover the whole file). So this implementation uses reference counts and proxy range locks. Firstly, only reader locks use reference counts and proxy locks, because writer locks are exclusive. When a reader lock overlaps with another then a proxy lock is created for that range and replaces the original lock. If the overlap is exact then the reference count of the proxy is simply incremented. Otherwise, the proxy lock is split into smaller lock ranges and new proxy locks created for non overlapping ranges. The reference counts are adjusted accordingly. Meanwhile, the orginal lock is kept around (this is the callers handle) and its offset and length are used when releasing the lock.

Thread coordination ------------------- In order to make wakeups efficient and to ensure multiple continuous readers on a range don't starve a writer for the same range lock, two condition variables are allocated in each rl_t. If a writer (or reader) can't get a range it initialises the writer (or reader) cv; sets a flag saying there's a writer (or reader) waiting; and waits on that cv. When a thread unlocks that range it wakes up all writers then all readers before destroying the lock.

Append mode writes ------------------ Append mode writes need to lock a range at the end of a file. The offset of the end of the file is determined under the range locking mutex, and the lock type converted from RL_APPEND to RL_WRITER and the range locked.

Grow block handling ------------------- ZFS supports multiple block sizes currently upto 128K. The smallest block size is used for the file which is grown as needed. During this growth all other writers and readers must be excluded. So if the block size needs to be grown then the whole file is exclusively locked, then later the caller will reduce the lock range to just the range to be written using zfs_reduce_range.

Definition in file zfs_rlock.c.


Function Documentation

static void zfs_range_add_reader ( avl_tree_t *  tree,
rl_t new,
rl_t prev,
avl_index_t  where 
) [static]

Definition at line 274 of file zfs_rlock.c.

int zfs_range_compare ( const void *  arg1,
const void *  arg2 
)

AVL comparison function used to order range locks Locks are ordered on the start offset of the range.

Definition at line 585 of file zfs_rlock.c.

rl_t* zfs_range_lock ( znode_t zp,
uint64_t  off,
uint64_t  len,
rl_type_t  type 
)

Lock an object range.

Parameters:
offOffset into the file that begins the range
lenLength of the range to lock
typeEither shared (RL_READER) or exclusive (RL_WRITER or RL_APPEND). APPEND is a special type that is converted to WRITER that specified to lock from the start of the end of file.
Returns:
The range lock structure for later unlocking or reduce range (if entire file previously locked as RL_WRITER).

Definition at line 423 of file zfs_rlock.c.

static void zfs_range_lock_reader ( znode_t zp,
rl_t new 
) [static]

Check if a reader lock can be grabbed, or wait and recheck until available.

Definition at line 359 of file zfs_rlock.c.

static void zfs_range_lock_writer ( znode_t zp,
rl_t new 
) [static]

Check if a write lock can be grabbed, or wait and recheck until available.

Definition at line 107 of file zfs_rlock.c.

static void zfs_range_new_proxy ( avl_tree_t *  tree,
uint64_t  off,
uint64_t  len 
) [static]

Create and add a new proxy range lock for the supplied range.

Definition at line 257 of file zfs_rlock.c.

static rl_t* zfs_range_proxify ( avl_tree_t *  tree,
rl_t rl 
) [static]

If this is an original (non-proxy) lock then replace it by a proxy and return the proxy.

Definition at line 194 of file zfs_rlock.c.

void zfs_range_reduce ( rl_t rl,
uint64_t  off,
uint64_t  len 
)

Reduce range locked as RL_WRITER from whole file to specified range.

Unlock range and destroy range lock structure.

Asserts the whole file is exclusivly locked and so there's only one entry in the tree.

Definition at line 562 of file zfs_rlock.c.

static rl_t* zfs_range_split ( avl_tree_t *  tree,
rl_t rl,
uint64_t  off 
) [static]

Split the range lock at the supplied offset returning the *front* proxy.

Definition at line 226 of file zfs_rlock.c.

void zfs_range_unlock ( rl_t rl)

Definition at line 524 of file zfs_rlock.c.

static void zfs_range_unlock_reader ( znode_t zp,
rl_t remove 
) [static]

Unlock a reader lock.

Definition at line 460 of file zfs_rlock.c.

 All Data Structures Files Functions Variables Typedefs Enumerations Enumerator Defines