Recovery Systems
Failures are classified into three types:
- Transaction Failures - Can be due to erroneous logic in queries or system errors causing deadlocks
- System Crashes
- Disk Failures
There are two main kinds of blocks:
- Physical Blocks are the blocks residing on the disk
- Buffer Blocks are temporary blocks in main memory
input(B)
transfers a physical block to the main memory, whereas output(B)
transfers a block from the buffer to the disk. We assume each data item to fit in a single physical block.
Each transaction $T_i$ has its own copy of a data item $X$ called $X_i$. read(X)
by the transaction assigns $X$ to $X_i$ and write(X)
by the transaction would send $X_i$ to the buffer. The system can call output(B)
to transfer this buffer block to main memory when it deems fit.
Log Based Recovery
Log is a sequence of log records which store information about the activities made by a transaction in a database. Consider the transaction $T_i$.
Activity | Log |
---|---|
Transaction start | $<T_i, \text{start}>$ |
write(X) executed |
$<T_i, X,V_{before}, V_{after}>$ |
Last statement of $T_i$ done | $<T_i, \text{commit}>$ |
Immediate Modification scheme lets the updates of a transaction to be sent to the buffer / disk before the transaction commits. Note that log record must be written before execution of query! On the other hand, Deferred-modification scheme performs the updates to disk only at the time of commit.
If $T_i$ modifies an item $X$, no other transaction is allowed to modify $X$ until $T_i$ either commits or aborts
undo(T_i)
restores original values of data items that were updated by $T_i$. For every write $<T_i, X, V_1, V_2>$ we add a special log record $<T_i, X, V_1>$. $<T_i, \text{abort}>$ is written out once the abort it completed.
redo(T_i)
on the other hand simply executes all the records sequentially.
Checkpoints
Periodically save the state of the database to streamline recovery procedure.
- Output all log records currently in memory to stable memory
- Output all modified buffer blocks to disk
- Write a record $<\text{checkpoint}, L>$ onto the stable memory where $L$ is a list of all active transactions at that time
- All updates are stopped while checkpointing! (Fuzzy checkpoints try to fix this)
Recovery Algorithm
There are two main phases to the recovery algorithm; redo and undo phases.
Redo Phase
- Find the last checkpoint $<\text{checkpoint},L>$ and set
undo_list
to be $L$ - Scan forward and redo all write operations. Add $T_i$ to
undo_list
when $<T_i, \text{start}>$ is found and remove $T_i$ fromundo_list
when either $<T_i, \text{commit}>$ or $<T_i, \text{abort}>$ are found.
Undo Phase
- Scan the log backwards from the end
- Undo write operations $<T_i,X,V_1,V_2>$ by adding $<T_i, X, V_1>$ where $T_i$ belongs to
undo_list
- Remove $T_i$ from
undo_list
and add log line $<T_i, \text{abort}>$ upon encountering $<T_i, \text{start}>$ - Stop when
undo_list
is empty
Buffering
Log records are generally buffered by storing them in memory instead of sending them to stable memory to improve I/O. They are flushed when a block is either full or log_force
is called. log_force
is usually called when a transaction commits.
Write-Ahead-Logging (WAL): Before a block of data in main memory is output to database, the corresponding log block must be flushed to the stable memory
Database also has a buffer, which has three behaviors based on how the blocks are pushed to database. No-force policy is when the updated blocks need not be pushed when a transaction commits, force policy is when the blocks are pushed when a transaction commits (not before) and steal policy is when blocks can be pushed even before a transaction commits.
- Acquire a latch on the block
- Flush logs
- Output block to disk
- Release latch on the block
Fuzzy Checkpoints
All updates have to be stopped during checkpointing, which is inefficient. Fuzzy checkpointing aims to solve this issue.
- Temporarily stop updates
- Write $<\text{checkpoint},L>$ and force logs to stable storage
- Note the list $M$ of modified blocks
- Allow transactions’ actions to resume
- Output all blocks in $M$ to disk following WAL
- Store a pointer to checkpoint in a fixed memory on disk called
last_checkpoint
Remote Backup Systems
“Heartbeat” messages are sent between the primary and the backup sites so they know the other’s status. To transfer control to the backup, use its copy of the database and the logs generated by the primary to recover data. The transactions that were incomplete are rolled back.
To reduce recovery time, the backup periodically processes redo logs so its database is kept up-to-date with the primary site. Hot-spare configuration lets the backup continually process log files as they come in.
Durability is of three levels;
- One-safe: Commit as soon has transaction’s commit log is written in primary
- Two-very-safe: Commit when transaction’s log is written in both primary and backup
- Two-safe: Proceed as two-very-safe when both are up, and as one-safe if only primary is active