Read "Streaming Systems" 1&2, Streaming 101 Read "F1, a distributed SQL database that scales" Read "Zanzibar, Google’s Consistent, Global Authorization System" Read "Spanner, Google's Globally-Distributed Database" Read "Designing Data-intensive applications" 12, The Future of Data Systems IOS development with Swift Read "Designing Data-intensive applications" 10&11, Batch and Stream Processing Read "Designing Data-intensive applications" 9, Consistency and Consensus Read "Designing Data-intensive applications" 8, Distributed System Troubles Read "Designing Data-intensive applications" 7, Transactions Read "Designing Data-intensive applications" 6, Partitioning Read "Designing Data-intensive applications" 5, Replication Read "Designing Data-intensive applications" 3&4, Storage, Retrieval, Encoding Read "Designing Data-intensive applications" 1&2, Foundation of Data Systems Three cases of binary search TAMU Operating System 2 Memory Management TAMU Operating System 1 Introduction Overview in cloud computing 2 TAMU Operating System 7 Virtualization TAMU Operating System 6 File System TAMU Operating System 5 I/O and Disk Management TAMU Operating System 4 Synchronization TAMU Operating System 3 Concurrency and Threading TAMU Computer Networks 5 Data Link Layer TAMU Computer Networks 4 Network Layer TAMU Computer Networks 3 Transport Layer TAMU Computer Networks 2 Application Layer TAMU Computer Networks 1 Introduction Overview in distributed systems and cloud computing 1 A well-optimized Union-Find implementation, in Java A heap implementation supporting deletion TAMU Advanced Algorithms 3, Maximum Bandwidth Path (Dijkstra, MST, Linear) TAMU Advanced Algorithms 2, B+ tree and Segment Intersection TAMU Advanced Algorithms 1, BST, 2-3 Tree and Heap TAMU AI, Searching problems Factorization Machine and Field-aware Factorization Machine for CTR prediction TAMU Neural Network 10 Information-Theoretic Models TAMU Neural Network 9 Principal Component Analysis TAMU Neural Network 8 Neurodynamics TAMU Neural Network 7 Self-Organizing Maps TAMU Neural Network 6 Deep Learning Overview TAMU Neural Network 5 Radial-Basis Function Networks TAMU Neural Network 4 Multi-Layer Perceptrons TAMU Neural Network 3 Single-Layer Perceptrons Princeton Algorithms P1W6 Hash Tables & Symbol Table Applications Stanford ML 11 Application Example Photo OCR Stanford ML 10 Large Scale Machine Learning Stanford ML 9 Anomaly Detection and Recommender Systems Stanford ML 8 Clustering & Principal Component Analysis Princeton Algorithms P1W5 Balanced Search Trees TAMU Neural Network 2 Learning Processes TAMU Neural Network 1 Introduction Stanford ML 7 Support Vector Machine Stanford ML 6 Evaluate Algorithms Princeton Algorithms P1W4 Priority Queues and Symbol Tables Stanford ML 5 Neural Networks Learning Princeton Algorithms P1W3 Mergesort and Quicksort Stanford ML 4 Neural Networks Basics Princeton Algorithms P1W2 Stack and Queue, Basic Sorts Stanford ML 3 Classification Problems Stanford ML 2 Multivariate Regression and Normal Equation Princeton Algorithms P1W1 Union and Find Stanford ML 1 Introduction and Parameter Learning

Read "Designing Data-intensive applications" 7, Transactions


All chapters

  1. Reliable, Scalable, and Maintainable Applications
  2. Data Models and Query Languages
  3. Storage and Retrieval
  4. Encoding and Evolution
  5. Replication
  6. Partitioning
  7. Transactions
  8. Distributed System Troubles
  9. Consistency and Consensus
  10. Batch Processing
  11. Stream Processing
  12. The Future of Data Systems


  • A transaction is a way for an application to group several reads and writes together into a logical unit.
  • Conceptually, all the reads and writes in a transaction are executed as one operation: either the entire transaction succeeds (commit) or it fails (abort, rollback).
    • If it fails, the application can safely retry.
    • Don’t need to worry about partial failure.
  • Not every application needs transactions, and sometimes there are advantages to weakening transactional guarantees or abandoning them entirely.
    • For example, to achieve higher performance, higher availability or safety.

The Slippery concept of a transaction

  • Most of database transactions follow the style by IBM System R.
  • NoSQL database
    • Improvement
      • New data model
      • Replication
      • Partitioning
    • Transactions are abandoned or adapted to a much weaker set of guarantees.
  • Trade-offs
    • Transactions were the antithesis of scalability, and that any large-scale system would have to abandon transactions in order to maintain good performance and high availability.
    • Transactional guarantees are sometimes presented by database vendors as an essential requirement for “serious applications” with “valuable data”.

The meaning of ACID

The safety guarantees provided by transactions are often described by the well-known acronym ACID, which stands for Atomicity, Consistency, Isolation, and Durability. But the concepts are quite ambiguous.


  • If the writes are grouped together into an atomic transaction, and the transaction cannot be completed (committed) due to a fault, then the transaction is aborted and the database must discard or undo any writes it has made so far in that transaction.
  • Safe to retry the transaction.


  • Certain statements about your data (invariants) that must always be true.
    • e.g., in an accounting system, credits and debits across all accounts must always be balanced.
  • If a transaction starts with a database that is valid according to these invariants, and any writes during the transaction preserve the validity, then you can be sure that the invariants are always satisfied.
  • This idea of consistency depends on the application’s notion of invariants, and it’s the application’s responsibility to define its transactions correctly so that they preserve consistency.
    • Not something that database can guarantee.
  • A property of the application; AID are properties of the database.


  • Concurrently executing transactions are isolated from each other.
    • For example, if one transaction makes several writes, then another transaction should see either all or none of those writes, but not some subset.
  • The database ensures that when the transactions have committed, the result is the same as if they had run serially (one after another), even though in reality they may have run concurrently.
  • Serializable isolation is rarely used because it carries a performance penalty.
  • Weaker guarantees: snapshot isolation, etc.


  • Durability is the promise that once a transaction has committed successfully, any data it has written will not be forgotten, even if there is a hardware fault or the database crashes.
  • Operation
    • Single-node: Data has been written to nonvolatile storage.
      • May involves a write-ahead log.
    • Replicated database: Data has been successfully copied to some number of nodes.
      • A database must wait until these writes are complete before reporting the successfully committed transaction.
  • Perfect durability doesn’t exist.

Single-object and Multi-Object operations

In ACID, atomicity and isolation describe what the database should do if a client makes several writes within the same transaction.

  • Single-object operations: on a single object, like a k-v pair.
    • Atomicity: use a log for crash recovery.
    • Isolation: lock on each object.
  • Multi-object: modify several objects (rows, documents, records) at once.
    • Relational database: group multiple operations by BEGIN TRANSACTION, COMMIT.
    • Many non-relational databases don’t have such transactions.

Retrying an aborted transaction is simple and effective but not perfect.

  • If the transaction actually succeeded, but the ACK lost.
  • If the error is due to overload, retrying the transaction will make the problem worse, not better.
  • Retry is pointless after a permanent error.
  • If the transaction also has side effects outside of the database, those side effects may happen even if the transaction is aborted.
  • If the client process fails while retrying, any data it was trying to write to the database is lost.
  • Popular object-relational mapping (ORM) frameworks such as Rails’s ActiveRecord and Django don’t retry aborted transactions

Weak isolation levels

Concurrency issues (race conditions) only come into play when one transaction reads data that is concurrently modified by another transaction, or when two transactions try to simultaneously modify the same data.

  • Databases try to hide concurrency issues from application developers by providing transaction isolation.
  • Serializable isolation has a performance cost.
  • Weaker isolation levels are much harder to understand and can lead to subtle bugs.
    • But used in practice.

Some race conditions:

  • Dirty writes
  • Dirty reads
  • Lost updates
  • Read skews
  • Write skews and Phantom reads

Read committed

Two guarantees:

  • No dirty reads: When reading from the database, you will only see data that has been committed.
  • No dirty writes: When writing to the database, you will only overwrite data that has been committed.


  • Writes: row-level locks
  • Reads: Locks can slow down read operations.
    • Database remembers both the old committed value and the new value set by the transaction that currently holds the write lock.
    • While the transaction is ongoing, any other transactions that read the object are simply given the old value.
    • Only when the new value is committed do transactions switch over to reading the new value.
  • Default in: Oracle 11g, PostgreSQL, SQL Server 2012, MemSQL

Snapshot isolation and repeatable read

Read skew

  • One operation sees different versions of different parts of data thus temporarily inconsistent.
    • Can happen in Read committed.
  • Cannot be tolerated for
    • Backups
    • Analytic queries and integrity checks
  • Snapshot isolation is the most common solution to this problem.
    • Called serializable in Oracle; repeatable read in MySQL and PostgreSQL
  • The idea is that each transaction reads from a consistent snapshot of the database; that is, the transaction sees all the data that was committed in the database at the start of the transaction.
  • Even if the data is subsequently changed by another transaction, each transaction sees only the old data from that particular point in time.
  • Supported in: PostgreSQL, MySQL with the InnoDB storage engine, Oracle, SQL Server


  • Use write locks to prevent dirty writes.
  • But readers never block writers, and writers never block readers.
  • The database must potentially keep several different committed versions of an object, because various in-progress transactions may need to see the state of the database at different points in time.
    • Snapshot isolation can implement read committed by using a snapshot for each query.
      • Only two versions are needed.
    • Multi-version concurrency control (MVCC)
    • e.g., tag row versions with transaction ID。

Visibility rules for observing a consistent snapshot

By carefully defining visibility rules, the database can present a consistent snapshot of the database to the application.

  1. At the start of each transaction, the database makes a list of all the other transactions that are in progress (not yet committed or aborted) at that time. Any writes that those transactions have made are ignored, even if the transactions subsequently commit.
  2. Any writes made by aborted transactions are ignored.
  3. Any writes made by transactions with a later transaction ID (i.e., which started after the current transaction started) are ignored, regardless of whether those transactions have committed.
  4. All other writes are visible to the application’s queries.


  • Have the index simply point to all versions of an object and require an index query to filter out any non-visible object versions.
  • Or, append-only/copy-on-write B-trees.
    • Create a new copy of each modified page as child page,don’t override old page.
    • Parent pages (up to the root of the tree) are copied and point to their new children. (only those affected by this write)
    • Every write transaction create a new B-tree root and a root is a consistent snapshot.
    • Compaction and garbage collection are needed.

Preventing lost updates

The lost update problem can occur if an application reads some value from the database, modifies it, and writes back the modified value (a read-modify-write cycle). If two transactions do this concurrently, one of the modifications can be lost, because the second write does not include the first modification.

  • Atomic write operations
    • Many database provides these operations.
    • Best solution if application code can be expressed in terms of these operations.
    • Implemented with locks or forcing all such operations executed on a single thread.
  • Explicit locking
    • By application code
    • Prone to make mistakes
  • Automatically detecting lost updates
    • Allow read-modify-write cycles to execute in parallel and, if the transaction manager detects a lost update, abort the transaction and force it to retry its read-modify-write cycle.
    • Databases can perform this check efficiently in conjunction with snapshot isolation.
    • Happens automatically and is thus less error-prone.
  • Compare-and-set
    • Atomic compare-and-set operation provided by some databases.
    • The purpose of this operation is to avoid lost updates by allowing an update to happen only if the value has not changed since you last read it.
    • If the current value does not match what you previously read, the update has no effect, and the read-modify-write cycle must be retried.
  • Conflict resolution and replication
    • In replicated databases, there are copies of the data on multiple nodes, and the data can potentially be modified concurrently on different nodes.
    • Can use application code or special data structures to resolve and merge conflicting versions of data by concurrent writes.
    • Atomic operations can work well in a replicated context, especially if they are commutative (i.e., you can apply them in a different order on different replicas, and still get the same result).
    • The last write wins (LWW) conflict resolution method is prone to lost updates.

Write skew and phantoms

Write skew: Two transactions happen almost together; read the same old snapshot value and take actions, as a result, some invariant is violated.

  • Write skew can occur if two transactions read the same objects, and then update some of those objects (different transactions may update different objects).
  • Won’t happen if two transactions had run one after another.
  • e.g.,
    • Two doctors want to take leave, but hospital requires at least one stays.
    • They submitted the leave application at the same time.
    • Read snapshot version, both saw 2 doctors are on duty.
    • Approve both cases.
    • No doctor left.

Options for this case:

  • Atomic single-object operations don’t help, as multiple objects are involved.
  • The automatic detection of lost updates doesn’t help.
    • Write skew is not automatically detected.
    • Automatically preventing write skew requires true serializable isolation.
  • Most databases don’t support multi-object constraints.
  • Can explicitly lock rows that the transaction depends on.
    • Both doctors are locked while updating one.
  • Materializing conflicts
    • Basically the problem of phantoms is there is no object to which we can attach the locks.
    • Introduce lock objects into the database like (one object for a doctor and time slot)
    • Hard to error-prone and ugly to leak a concurrency control mechanism into the application data model.
    • Last solution to be used, even serializable isolation level is much preferable.
SELECT * FROM doctors
    WHERE on_call = true
    AND shift_id = 1234 FOR UPDATE;
UPDATE doctors
    SET on_call = false 
    WHERE name = 'Alice' 
    AND shift_id = 1234;

Pattern of cases causing write skews is as below. This effect, where a write in one transaction changes the result of a search query in another transaction, is called a phantom.

  1. A SELECT query checks whether some requirement is satisfied by searching for rows that match some search condition.
  2. Depending on the result of the first query, the application code decides how to continue.
  3. If the application decides to go ahead, it makes a write (INSERT, UPDATE, or DELETE) to the database and commits the transaction.
    • If you were to repeat the SELECT query from step 1 after committing the write, you would get a different result.


Problems with weaker isolation levels:

  • Isolation levels are hard to understand and inconsistently implemented in different databases.
  • Difficult to tell if the application code is safe at an isolation level.
  • No good tools to detect race conditions.

Serializable isolation is usually regarded as the strongest isolation level. It guarantees that even though transactions may execute in parallel, the end result is the same as if they had executed one at a time, serially, without any concurrency.

3 implementations:

  • Actual serial execution
  • Two-phase locking
  • Serializable snapshot isolation

Actual serial execution

  • Execute only one transaction at a time, in serial order, on a single thread.
    • Feasible because:
      • RAM is cheap enough to load entire active dataset in memory.
      • OLTP transactions are usually short and only make a small number of reads and writes.
    • Throughput is limited to a single CPU core.
    • Used by VoltDB/H-Store, Redis and Datomic.
  • Interactive multi-statement transactions
    • It’s necessary to process multiple transactions concurrently in order to get reasonable performance.
    • An application makes a query, read the result and makes another query depending on the result. A lot of time is spent in network communication between application and database.
    • Single-threaded transaction processing doesn’t allow interactive multi-statement transactions.
      • Stored procedure: Application must submit entire transaction code to database ahead of time.
  • Stored procedure
    • Cons
      • Languages for it are out-dated.
      • Code running in a database is difficult to manage.
      • Database is much more performance-sensitive than application server. Bad stored procedure can cause much more trouble.
    • Pros
      • Many databases use more general languages.
        • Redis uses Lua; VoltDB uses Java or Groovy.
      • Executing all transactions on a single thread is feasible.
        • Don’t need to wait for IO.
        • Avoid concurrency overhead.
      • Can be used for replication.
        • Execute the stored procedure on each replica.
        • VoltDB
  • Partitioning data
    • For applications with high write throughput, the single-threaded transaction processor can become a serious bottleneck.
    • In order to scale to multiple CPU cores, and multiple nodes, you can potentially partition your data.
    • If a transaction may touch multiple partition, a lock across all partitions is needed.
    • Vastly slower than single-partition transactions.

Two-phase Locking (2PL)

Transactions are allowed to concurrently read the same object as long as nobody is writing to it. But as soon as anyone wants to write (modify or delete) an object, exclusive access is required:

  • If transaction A has read an object and transaction B wants to write to that object, B must wait until A commits or aborts before it can continue.
  • If transaction A has written an object and transaction B wants to read that object, B must wait until A commits or aborts before it can continue.
    • Reading an old value is impossible.
    • Writers can block both writers and readers.


The blocking of readers and writers is implemented by a having a lock on each object in the database. The lock can either be in shared mode or in exclusive mode.

  • If a transaction wants to read an object, it must first acquire the lock in shared mode. Several transactions are allowed to hold the lock in shared mode simultaneously, but if another transaction already has an exclusive lock on the object, these transactions must wait.
  • If a transaction wants to write to an object, it must first acquire the lock in exclusive mode. No other transaction may hold the lock at the same time (either in shared or in exclusive mode), so if there is any existing lock on the object, the transaction must wait.
  • If a transaction first reads and then writes an object, it may upgrade its shared lock to an exclusive lock. The upgrade works the same as getting an exclusive lock directly.
    • Might be problematic if do upgrade. (Tang)
      • Initial value 0, transaction A +1, transaction B +2.
      • May need CAS to prevent dirty reads.
  • After a transaction has acquired the lock, it must continue to hold the lock until the end of the transaction (commit or abort). This is where the name “two-phase” comes from: the first phase (while the transaction is executing) is when the locks are acquired, and the second phase (at the end of the transaction) is when all the locks are released.

Deadlocks may happen since there are so many locks. The database automatically detects deadlocks between transactions and aborts one of them so that the others can make progress. The aborted transaction needs to be retried by the application.

Performance issues:

  • Throughput and response time are significantly worse than weak isolation.
  • Problems:
    • Overhead of locks
    • Concurrency is reduced.
    • Unstable latencies
      • A transaction may need to wait for a lot of others to be completed.
      • Very slow at hight percentiles.
    • Deadlocks happen frequently.
      • Wasted efforts due to abortion

Predicate locks solves write skews by introducing a lock belonging to all objects that match some search conditions.

  • If transaction A wants to read objects matching some condition, it must acquire a shared-mode predicate lock on the conditions of the query. If another transaction B currently has an exclusive lock on any object matching those conditions, A must wait until B releases its lock before it is allowed to make its query.
  • If transaction A wants to insert, update, or delete any object, it must first check whether either the old or the new value matches any existing predicate lock. If there is a matching predicate lock held by transaction B, then A must wait until B has committed or aborted before it can continue.

Index-range locks

  • Predicate locks do not perform well: if there are many locks by active transactions, checking for matching locks becomes time-consuming.
  • Index-range is an approximation of the precise search condition of predicate lock.
  • Much lower overheads.

Serializable snapshot isolation (SSI)

  • Good compromise
    • Provide full serializability.
    • Only a small performance penalty compared to snapshot isolation.
  • Optimistic concurrency control
    • Instead of blocking if something potentially dangerous happens, transactions continue anyway, in the hope that everything will turn out all right.
    • When a transaction wants to commit, the database checks whether anything bad happened. If so, the transaction is aborted and has to be retried. Only transactions that executed serializably are allowed to commit.
    • Disadvantages
      • It performs badly if there is high contention (many transactions trying to access the same objects), as this leads to a high proportion of transactions needing to abort.
      • If the system is already close to its maximum throughput, the additional transaction load from retried transactions can make performance worse.
    • Good if
      • Spare capacity is enough.
      • Contention between transactions is not too high.
  • SSI
    • An optimistic method
    • All reads within a transaction are made from a consistent snapshot of the database.
    • An algorithm detecting serialization conflicts among writes and determining which transactions to abort.

If the transaction is taking an action based on a premise(a fact that was true at the beginning of the transaction). Later, when the transaction wants to commit, the original data may have changed - the premise may no longer be true. The database must detect such situations that a transaction have acted on an outdated premise and abort the transaction.

Database detects two cases where a query result might have changed.

  • Detecting reads of a stale MVCC object version. (uncommitted write occurred before the read)
    • The database needs to track when a transaction ignores another transaction’s writes due to MVCC visibility rules.
    • When the transaction wants to commit, the database checks whether any of the ignored writes have now been committed. If so, the transaction must be aborted.
  • Detecting writes that affect prior reads. (the write occurs after the read)
    • Whether another transaction modifies data after it has been read.
    • Use index-range locks as 2PL but don’t block other transactions, just notify the transactions.


  • No blocks compared with 2PL.
    • Much more predictable and less variable query latency.
    • Fast read-only queries, good for read-heavy workloads.
  • Not limited to the single CPU core compared with serial execution.
    • Can scale to very high throughput.
  • SSI requires that read-write transactions be fairly short.
    • Long read-only queries may be OK.

Creative Commons License
Melon blog is created by melonskin. This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
© 2016-2019. All rights reserved by melonskin. Powered by Jekyll.