-
Notifications
You must be signed in to change notification settings - Fork 129
Transactions and Concurrency
TokuDB uses two-phase locking in its transactional concurrency control algorihm. A transaction acquires locks while it executes. This is the lock expansion phase. A transaction releases its locks when it commits or aborts. This is the retiring phase. Within a single transaction, the phases never overlap. This algorithm and its variants have been proven to provide serializability, which is a nice property to have.
TokuDB use two types of locks.
- Read locks are exclusive locks (yes, exclusive locks). This may limit concurrency in some cases.
- Write locks are exclusive locks.
TokuDB stores locks in an in memory data structure called the lock tree. The lock tree contains the set of locks currently owned by each live transaction. In addition, the lock tree contains the set of transactions that are waiting for a lock owned by another transaction.
The lock tree stores lock ranges which have a left and right edge. All of the space between these edges is locked.
A point lock is a range lock where the left and right edges are the same.
Since the lock tree is an in memory data structure with a limit on its size, what happens when the limit is reached? When this happens, lock escalation runs. The goal of lock escalation is to shrink the memory footprint of the locks. It does this by merging adjacent range locks that are owned by the same transaction into a single larger range lock. The merge will include the gap space between the original range locks in the merged lock.
The lock tree tries to run lock escalation when a big transactions hits the lock tree memory limit. Small transactions continue to run while lock escalation is running.
MySQL supports several transaction isolation levels. Each of these transaction isolation levels use different locking strategies.
TokuDB implements locking strategies similar to InnoDB's locking strategies.
- All isolation levels
- Write lock points written
- Serializable
- Repeatable read
- Read lock ranges read
- Write lock points written
- Read committed
- Write lock points written
- Read uncommitted
- Serializable
- Repeatable read
- Read lock ranges read
- Write lock target table
- Read committed
- Write lock target table
- Read uncommitted
- Serializable
- Read lock ranges read
- Repeatable read
- Snapshot read, no locks
- Read committed
- Snapshot read, no locks
- Read uncommitted
- All isolation levels
- Read lock ranges read
- Write lock points written
- All isolation levels
- Read lock ranges read
- Write lock points written