TAMU Operating System 4 Synchronization

Introduction

Critical section is where inter-task invariant violated
Execution of critical section by threads must be mutually exclusive
Need a protocol to enforce mutual exclusion
- enter CS
- execute
- exit CS
Code can have multiple unrelated critical sections. They don’t block each other
Can be implemented by lock
- lock()
- unlock()

Critical section, software solution

Falsy implementations

For two threads,

By turn
- If one thread exits, the other one may not get its turn from then
Check and set
- If preemption happens right after the check of thread 1 and before set, and check in thread 2, both threads can enter critical section
Set and check
- Use a busy[2]; set busy[self] first; check while (busy[other]);
- Cause dead lock

Peterson solution

Use turn and busy[2].

class Lock {
    private int turn;
    private bool busy[2]; /* initialize to false */

    public void lock() {
        busy[self] = true; /* ‘self’ is current thread */
        int other = 1 - self ; /* ‘other’ is other thread */
        turn = other;
        while(busy[other] && turn != self); /* busy loop */
    }

    public void unlock() {
        busy[self] = false; /* mark lock not-busy */
    }
};

Pros
- Provides Mutual Exclusion: at most 1 thread in CS.
- Provides Progress: Only threads wanting to enter CS participate in selection of who gets to enter next. Selection cannot be indefinitely postponed.
- Provides Bounded Waiting: A thread never has to wait for more than one turn to enter CS
Cons
- Works for 2 threads only
  - Alternatives: Filter, Baker’s algorithm
- Busy waiting
  - Burn CPU bandwidth, even memory
  - Can immediately yield CPU instead
- Does not work on most modern CPUs, due to out-of-memory operations
  - Compiler may rearrange or optimize memory operations
  - Modern CPU may execute operations out-of-order.
  - Can use memory fence to solve, load whatever current in memory. __sync_synchronize(); in gcc

Critical section, hardware solution

Disable interrupts

By disallowing interrupts the current thread cannot be interrupted. This turns the entire executable into one critical section.

Pros
- Simple
Cons
- Interrupt service latency
- Only for one critical section
- Not work on multi-processors

Atomic operations

Atomic operations: Operations that check and modify memory areas in a single step (i.e. operation can not be interrupted)

Test-And-Set
Fetch-And-Add
Exchange, Compare-And-Swap
Load-Link/Store Conditional

In a multiprocessor, atomicity is achieved by locking the bus for the length of the operation. In this way, the memory locations cannot be accessed by other threads on the
current or on other processor.

Test-And-Set (TAS)

The function bool TAS(bool * var) atomically stores true in the given memory location and returns previous value.

Pseudo hardware code:

bool TAS(bool * var) { /* Test-And-Set */
    bool temp;
    temp = *var;
    *var = true;
    return temp;
}

Lock implementation:

class TASLock {
    private bool locked = false; /* locked? */
    public void lock() {
        while (TAS(&locked)); /* loop until not locked */
    }
    public void unlock() {
        locked = false; /* mark lock as free */
    }
};

Problem is TAS keeps locking the bus. Other threads (even not involved in this critical section) are affected. It is also hard to give up lock as well. Performance is poor.

Locks with Test-and-Test-and-Set

locked is in cache, so bus won’t be locked during the while loop. Cache may be wrong but it is not a problem. Cache coherence protocol can take effect to update new version.

class TTASLock {
    private bool locked = false; /* locked? */
    public void lock() {
        for(;;) {
            while (locked){}; /* certainly locked */
            if (!TAS(&locked)) break; /* appears free; check! */
        }
    }
}

Exchange (XCHG)

The function void XCHG(bool * a, bool * b) atomically exchanges the values stored in locations a and b.

Pseudo hardware code:

void XCHG(bool * a, bool * b) { /* Exchange */
    bool temp;
    temp = *a;
    *a = *b;
    *b = temp;
}

Mimic TAS:

class XCHGLock {
    private bool locked = false; /* locked? */
    public void lock() {
        bool dummy = true;
        do XCHG(&locked, &dummy);
        while (dummy) /* loop until not locked */
    }
    public void unlock() {
        locked = false; /* mark lock as free */
    }
};

Fetch-And-Add (FAA)

The function int FAA(int * addr) atomically increments the value stored in location addr and returns value before increment.

Pseudo hardware code:

int FAA(int * addr) {
    int value = *addr;
    *location = *addr + 1;
    return value;
}

Lock implementation:

class FAALock {
    private int ticketnumber = 0; int turn = 0;
    public void lock() {
        int myturn = FAA(ticketnumber);
        while (turn != myturn); /* wait for our turn */
    }
    public void unlock() {
        FAA(turn);
    }
};

Compare-And-Swap (CAS)

The function int CAS(T o, T n, T * a) atomically compares the value stored in location addr with old. If they are equal, it assigns new to location addr, and returns true. Returns false otherwise.

Pseudo hardware code:

int CAS(Type old, Type new, Type * addr) {
    if (*addr == old) {
        *addr = new;
        return true;
    }
    else {
        return false;
    }
}

Lock implementation:

class CASLock {
    private bool locked = false; /* locked? */
    public void lock() {
        while (!CAS(false, true, &locked));
        /* loop until not locked */
    }
    public void unlock() {
        locked = false;
    }
};

Locking and data structures

An example using locks for a concurrent linked list.

A initial thread-not-safe version

Coarse-Grained locking

Lock entire add()
Lock entire remove()
Entire list is locked, other threads cannot modify it even if two parts are not in conflict

class List {
private:
    Node * head;
    Lock * lock;
public:
    CoarseList();
    bool add(T * item);
    bool remove(T * item);
};

// init with two dummy node as head and tail
List::List {
    head = new Node(MIN_VALUE);
    head->next = new Node(MAX_VALUE);
}

// add
bool List::add(T * item) {
    lock->lock();
    Node * pred = head;
    Node * curr = pred->next;
    while (curr->key < item->key) {
        pred = curr;
        curr = curr->next;
    }
    bool success = false;
    if (item->key != curr->key) {
        Node * node = new Node(item);
        node->next = curr;
        pred->next = node;
        success = true;
    }
    lock->unlock();
    return success;
}

// remove
bool CoarseList::remove(T * item) {
    lock->lock();
    Node * pred = head;
    Node * curr = pred->next;
    while (curr->key < item->key) {
        pred = curr;
        curr = curr->next;
    }
    bool success = false;
    if (item->key == curr->key) {
        pred->next = curr->next;
        success = true;
    }
    lock->unlock();
    return success;
}

Fine-Grained locking

Lock two nodes pred and cur at add() and remove()
Node-based locking
Locking operations are a lot as we traverse the list

bool FineList::add(T * item) {
    Node * pred = head;
    Node * curr = pred->next;
    pred->lock();
    curr->lock();
    while (curr->key < item->key) {
        pred->unlock();
        pred = curr;
        curr = curr->next;
        curr->lock();
    }
    bool success = false;
    if (item->key != curr->key) {
        Node * node = new Node(item);
        node->next = curr;
        pred->next = node;
        success = true;
    }
    curr->unlock();
    pred->unlock();
    return success;
}


bool FineList::remove(T * item) {
    Node * pred = head; 
    Node * curr = pred->next;
    pred->lock();
    curr->lock();
    while (curr->key < item->key) {
        pred->unlock();
        pred = curr;
        curr = curr->next;
        curr->lock();
    }
    bool success = false;
    if (item->key != curr->key) {
        pred->next = curr->next;
        success = true;
    }
    curr->unlock();
    pred->unlock();
    return success;
}

Optimistic list

Add a validate() function, check
- pred points to cur
- pred is reachable from head
Reduce lock operations
But needs to traverse the list for validation

bool OptimisticList::validate(Node *pred, Node *curr) {
    Node * node = head;
    while (node->key <= pred->key) {
        if (node == pred)
            return pred.next == curr;
        node = node->next;
    }
    return false;
}

bool OptimisticList::add(T * item) { 
    bool success = false, done = false;
    while (!done) {
        Node * pred = head;
        Node * curr = pred->next;
        while (curr->key < item->key) {
            pred = curr;
            curr = curr->next;
        }
        pred->lock(); curr->lock();
        if (validate(pred, curr)) {
            done = true;
            if (item->key != curr->key) {
                Node * node = new Node(item);
                node->next = curr;
                pred->next = node;
                success = true;
            }
        }
        curr->unlock(); pred->unlock();
        }
    }
    return success;
}

bool OptimisticList::remove(T * item) { 
    bool success = false, done = false;
    while (!done) {
        Node * pred = head;
        Node * curr = pred->next;
        while (curr->key < item->key) {
            pred = curr;
            curr = curr->next;
        }
        pred->lock(); curr->lock();
        if (validate(pred, curr)) {
            done = true;
            if (item->key == curr->key) {
                pred->next = curr.next;
                success = true;
            }
        }
        curr->unlock(); pred->unlock();
        }
    }
    return success;
}

Lazy list

Each node has a mark for deleted
add() is the same as OptimisticList
Efficient implementation for contains()

bool LazyList::validate(Node * pred, Node * curr) {
    return !pred->marked && !curr->marked && pred->next == curr;
}

bool LazyList::remove(T * item) { 
    bool success = false, done = false;
    while (!done) {
        Node * pred = head;
        Node * curr = pred->next;
        while (curr->key < item->key) {
            pred = curr;
            curr = curr->next;
        }
        pred->lock(); curr->lock();
        if (validate(pred, curr)) {
            done = true;
            if (item->key == curr->key) {
                curr->marked = true;
                pred->next = curr.next;
                success = true;
            }
        }
        curr->unlock(); pred->unlock();
        }
    }
    return success;
}

Vanilla lock-free list

Only add(), no remove()
Use atomic operation CAS

bool LockFreeList::add(T * item) {
    while(true) {
        Node * pred = head;
        Node * curr = pred->next;
        while (curr->key < item->key) {
            pred = curr;
            curr = curr->next;
        }
        if (item->key == curr->key) {
            return false;
        }
        else {
            Node * new_node = new Node(item);
            new_node->next = curr;
            if (CAS(curr, new_node, pred->next))
                return true;
        }
    }
}

Atomic transactions

Problems with locks

Reduce concurrency
Cause deadlock
Cause priority inversion
- e.g., Mars Pathfinder Mission
- A low-priority thread locked mutex
- High-priority thread blocked by mutex
- System is busy executing other intermediate-priority threads
- High-priority thread is blocked for a long time
- Solution: modify mutex behavior, make program know high-priority thread is blocked
Lead to convoying
- One thread locked a mutex
- This may block a lot of other threads
- After the lock is released, it takes a while to clean the thread queue
Lack soft composability
- Composability is difficult
- Not easy to use components without knowing details
Lack hard composability
- Composability is impossible
- Components cannot be combined into large system even knowing all details

Transactions

Transaction is a group of operations executed by a thread

begin_transaction
<operation>
<operation>
throw new TransAbortEx()
<operation>
end_transaction

Serializability and atomicity

Serial execution: Operations are scheduled one transaction at a time. One transaction executes after another other.
Serializable execution: Transactions appear to execute serially, one transaction at a time.

Basic idea:

Execute transaction without attention to serializability.
Keep track of accessed objects.
At commit point, check for conflicts with other transactions.
Abort if conflicts occurred.

Approach:

Assign timestamp to each transaction.
- Can be done by FAA or other atomic CPU instructions
Make sure that execution has the same effect of a serial execution in order of assigned timestamps

Make a list for read timestamp and write timestamp. Following transaction A must be aborted because it writes after future read.

value	TSr	TSw
0	-	-
0	6 B	-
1	6 B	5 A

For following example, Transaction A at 5pm tries to write 1 at the last line, it can be safely ignored. Because TSw is larger than 5. A can continue.

value	TSr	TSw
2	-	6 B
2	8 C	6 B
2	8 C	6 B

Optimistic concurrency control

Objects are tagged with read-time and write-time

Transaction cannot read value of object if that value has not been written until after the transaction executed.
- Transaction with T.S. t1 cannot read item with write-time t2 if t2 > t1. (abort and try with new timestamp)
Transaction cannot write object if object has value read at later time.
- Transaction with T.S. t1 cannot write item with read-time t2 if t2 > t1. (abort and try with new timestamp)
Perform the operation X
- if X == READ and t >= tw
- or if X == WRITE and t >= tr and t >= tw
  - if X == READ : set tr = t if t > tr
  - if X == WRITE: set tw = t if t > tw
Do nothing
- if X == WRITE and tr <= t < tw
Abort transaction
- if X == READ and t < tw
- or X == WRITE and t < tr

Aborting transactions

How to maintain information for not-yet committed transactions: “Prepare for aborts”

transactions use a private workspace
“read set” of versions of objects that have been read.
“write set” of tentatively written objects.
threads commit object values into memory when transaction is done.
if transaction aborts, thread releases read and write sets.

Reference

This is my class notes while taking CSCE 611 at TAMU. Credit to the instructor Dr. Bettati.

AI 2

Algorithm 17

Amazon 1

Authorization 1

Blog 3

Bootstrap 1

C++ 1

CCpp 5

CSS 2

Cloud 3

Code 1

Crawler 1

DNS 1

Database 17

DeepLearning 1

Design 17

Development 1

Docker 1

English 1

Express 1

GDB 1

Go 3

Google 4

HTML 3

IOS 1

Java 17

Javascript 4

Jekyll 1

Linux 4

MacOS 2

MachineLearning 17

Markdown 4

Mobile 1

MongoDB 2

Multi-threading 3

NAS 1

Network 11

NeuralNetwork 10

Node 1

OS 8

Public-speaking 1

Python 15

RESTful 1

Rails 9

React 1

Redis 1

Ruby 6

Shell 2

Spring 2

System 17

TCP 1

TDD 1

Thread 2

Vim 1

awk 1

git 1

jQuery 1

media 1

network 1

php 1