Introduction
- Critical section is where inter-task invariant violated
- Execution of critical section by threads must be mutually exclusive
- Need a protocol to enforce mutual exclusion
- enter CS
- execute
- exit CS
- Code can have multiple unrelated critical sections. They don’t block each other
- Can be implemented by lock
lock()
unlock()
Critical section, software solution
Falsy implementations
For two threads,
- By turn
- If one thread exits, the other one may not get its turn from then
- Check and set
- If preemption happens right after the check of thread 1 and before set, and check in thread 2, both threads can enter critical section
- Set and check
- Use a
busy[2]
; setbusy[self]
first; checkwhile (busy[other]);
- Cause dead lock
- Use a
Peterson solution
Use turn and busy[2]
.
class Lock {
private int turn;
private bool busy[2]; /* initialize to false */
public void lock() {
busy[self] = true; /* ‘self’ is current thread */
int other = 1 - self ; /* ‘other’ is other thread */
turn = other;
while(busy[other] && turn != self); /* busy loop */
}
public void unlock() {
busy[self] = false; /* mark lock not-busy */
}
};
- Pros
- Provides Mutual Exclusion: at most 1 thread in CS.
- Provides Progress: Only threads wanting to enter CS participate in selection of who gets to enter next. Selection cannot be indefinitely postponed.
- Provides Bounded Waiting: A thread never has to wait for more than one turn to enter CS
- Cons
- Works for 2 threads only
- Alternatives: Filter, Baker’s algorithm
- Busy waiting
- Burn CPU bandwidth, even memory
- Can immediately yield CPU instead
- Does not work on most modern CPUs, due to out-of-memory operations
- Compiler may rearrange or optimize memory operations
- Modern CPU may execute operations out-of-order.
- Can use memory fence to solve, load whatever current in memory.
__sync_synchronize();
in gcc
- Works for 2 threads only
Critical section, hardware solution
Disable interrupts
By disallowing interrupts the current thread cannot be interrupted. This turns the entire executable into one critical section.
- Pros
- Simple
- Cons
- Interrupt service latency
- Only for one critical section
- Not work on multi-processors
Atomic operations
Atomic operations: Operations that check and modify memory areas in a single step (i.e. operation can not be interrupted)
- Test-And-Set
- Fetch-And-Add
- Exchange, Compare-And-Swap
- Load-Link/Store Conditional
In a multiprocessor, atomicity is achieved by locking the bus for the length of the operation. In this way, the memory locations cannot be accessed by other threads on the
current or on other processor.
Test-And-Set (TAS)
The function bool TAS(bool * var)
atomically stores true in the given memory location and returns previous value.
Pseudo hardware code:
bool TAS(bool * var) { /* Test-And-Set */
bool temp;
temp = *var;
*var = true;
return temp;
}
Lock implementation:
class TASLock {
private bool locked = false; /* locked? */
public void lock() {
while (TAS(&locked)); /* loop until not locked */
}
public void unlock() {
locked = false; /* mark lock as free */
}
};
Problem is TAS keeps locking the bus. Other threads (even not involved in this critical section) are affected. It is also hard to give up lock as well. Performance is poor.
Locks with Test-and-Test-and-Set
locked
is in cache, so bus won’t be locked during the while
loop. Cache may be wrong but it is not a problem. Cache coherence protocol can take effect to update new version.
class TTASLock {
private bool locked = false; /* locked? */
public void lock() {
for(;;) {
while (locked){}; /* certainly locked */
if (!TAS(&locked)) break; /* appears free; check! */
}
}
}
Exchange (XCHG)
The function void XCHG(bool * a, bool * b)
atomically exchanges the values stored in locations a and b.
Pseudo hardware code:
void XCHG(bool * a, bool * b) { /* Exchange */
bool temp;
temp = *a;
*a = *b;
*b = temp;
}
Mimic TAS:
class XCHGLock {
private bool locked = false; /* locked? */
public void lock() {
bool dummy = true;
do XCHG(&locked, &dummy);
while (dummy) /* loop until not locked */
}
public void unlock() {
locked = false; /* mark lock as free */
}
};
Fetch-And-Add (FAA)
The function int FAA(int * addr)
atomically increments the value stored in location addr and returns value before increment.
Pseudo hardware code:
int FAA(int * addr) {
int value = *addr;
*location = *addr + 1;
return value;
}
Lock implementation:
class FAALock {
private int ticketnumber = 0; int turn = 0;
public void lock() {
int myturn = FAA(ticketnumber);
while (turn != myturn); /* wait for our turn */
}
public void unlock() {
FAA(turn);
}
};
Compare-And-Swap (CAS)
The function int CAS(T o, T n, T * a)
atomically compares the value stored in location addr with old. If they are equal, it assigns new to location addr, and returns true. Returns false otherwise.
Pseudo hardware code:
int CAS(Type old, Type new, Type * addr) {
if (*addr == old) {
*addr = new;
return true;
}
else {
return false;
}
}
Lock implementation:
class CASLock {
private bool locked = false; /* locked? */
public void lock() {
while (!CAS(false, true, &locked));
/* loop until not locked */
}
public void unlock() {
locked = false;
}
};
Locking and data structures
An example using locks for a concurrent linked list.
A initial thread-not-safe version
Coarse-Grained locking
- Lock entire
add()
- Lock entire
remove()
- Entire list is locked, other threads cannot modify it even if two parts are not in conflict
class List {
private:
Node * head;
Lock * lock;
public:
CoarseList();
bool add(T * item);
bool remove(T * item);
};
// init with two dummy node as head and tail
List::List {
head = new Node(MIN_VALUE);
head->next = new Node(MAX_VALUE);
}
// add
bool List::add(T * item) {
lock->lock();
Node * pred = head;
Node * curr = pred->next;
while (curr->key < item->key) {
pred = curr;
curr = curr->next;
}
bool success = false;
if (item->key != curr->key) {
Node * node = new Node(item);
node->next = curr;
pred->next = node;
success = true;
}
lock->unlock();
return success;
}
// remove
bool CoarseList::remove(T * item) {
lock->lock();
Node * pred = head;
Node * curr = pred->next;
while (curr->key < item->key) {
pred = curr;
curr = curr->next;
}
bool success = false;
if (item->key == curr->key) {
pred->next = curr->next;
success = true;
}
lock->unlock();
return success;
}
Fine-Grained locking
- Lock two nodes
pred
andcur
atadd()
andremove()
- Node-based locking
- Locking operations are a lot as we traverse the list
bool FineList::add(T * item) {
Node * pred = head;
Node * curr = pred->next;
pred->lock();
curr->lock();
while (curr->key < item->key) {
pred->unlock();
pred = curr;
curr = curr->next;
curr->lock();
}
bool success = false;
if (item->key != curr->key) {
Node * node = new Node(item);
node->next = curr;
pred->next = node;
success = true;
}
curr->unlock();
pred->unlock();
return success;
}
bool FineList::remove(T * item) {
Node * pred = head;
Node * curr = pred->next;
pred->lock();
curr->lock();
while (curr->key < item->key) {
pred->unlock();
pred = curr;
curr = curr->next;
curr->lock();
}
bool success = false;
if (item->key != curr->key) {
pred->next = curr->next;
success = true;
}
curr->unlock();
pred->unlock();
return success;
}
Optimistic list
- Add a
validate()
function, checkpred
points tocur
pred
is reachable fromhead
- Reduce lock operations
- But needs to traverse the list for validation
bool OptimisticList::validate(Node *pred, Node *curr) {
Node * node = head;
while (node->key <= pred->key) {
if (node == pred)
return pred.next == curr;
node = node->next;
}
return false;
}
bool OptimisticList::add(T * item) {
bool success = false, done = false;
while (!done) {
Node * pred = head;
Node * curr = pred->next;
while (curr->key < item->key) {
pred = curr;
curr = curr->next;
}
pred->lock(); curr->lock();
if (validate(pred, curr)) {
done = true;
if (item->key != curr->key) {
Node * node = new Node(item);
node->next = curr;
pred->next = node;
success = true;
}
}
curr->unlock(); pred->unlock();
}
}
return success;
}
bool OptimisticList::remove(T * item) {
bool success = false, done = false;
while (!done) {
Node * pred = head;
Node * curr = pred->next;
while (curr->key < item->key) {
pred = curr;
curr = curr->next;
}
pred->lock(); curr->lock();
if (validate(pred, curr)) {
done = true;
if (item->key == curr->key) {
pred->next = curr.next;
success = true;
}
}
curr->unlock(); pred->unlock();
}
}
return success;
}
Lazy list
- Each node has a mark for deleted
add()
is the same asOptimisticList
- Efficient implementation for
contains()
bool LazyList::validate(Node * pred, Node * curr) {
return !pred->marked && !curr->marked && pred->next == curr;
}
bool LazyList::remove(T * item) {
bool success = false, done = false;
while (!done) {
Node * pred = head;
Node * curr = pred->next;
while (curr->key < item->key) {
pred = curr;
curr = curr->next;
}
pred->lock(); curr->lock();
if (validate(pred, curr)) {
done = true;
if (item->key == curr->key) {
curr->marked = true;
pred->next = curr.next;
success = true;
}
}
curr->unlock(); pred->unlock();
}
}
return success;
}
Vanilla lock-free list
- Only
add()
, noremove()
- Use atomic operation
CAS
bool LockFreeList::add(T * item) {
while(true) {
Node * pred = head;
Node * curr = pred->next;
while (curr->key < item->key) {
pred = curr;
curr = curr->next;
}
if (item->key == curr->key) {
return false;
}
else {
Node * new_node = new Node(item);
new_node->next = curr;
if (CAS(curr, new_node, pred->next))
return true;
}
}
}
Atomic transactions
Problems with locks
- Reduce concurrency
- Cause deadlock
- Cause priority inversion
- e.g., Mars Pathfinder Mission
- A low-priority thread locked mutex
- High-priority thread blocked by mutex
- System is busy executing other intermediate-priority threads
- High-priority thread is blocked for a long time
- Solution: modify mutex behavior, make program know high-priority thread is blocked
- Lead to convoying
- One thread locked a mutex
- This may block a lot of other threads
- After the lock is released, it takes a while to clean the thread queue
- Lack soft composability
- Composability is difficult
- Not easy to use components without knowing details
- Lack hard composability
- Composability is impossible
- Components cannot be combined into large system even knowing all details
Transactions
Transaction is a group of operations executed by a thread
begin_transaction
<operation>
<operation>
throw new TransAbortEx()
<operation>
end_transaction
Serializability and atomicity
- Serial execution: Operations are scheduled one transaction at a time. One transaction executes after another other.
- Serializable execution: Transactions appear to execute serially, one transaction at a time.
Basic idea:
- Execute transaction without attention to serializability.
- Keep track of accessed objects.
- At commit point, check for conflicts with other transactions.
- Abort if conflicts occurred.
Approach:
- Assign timestamp to each transaction.
- Can be done by FAA or other atomic CPU instructions
- Make sure that execution has the same effect of a serial execution in order of assigned timestamps
Make a list for read timestamp and write timestamp. Following transaction A must be aborted because it writes after future read.
value | TSr | TSw |
---|---|---|
0 | - | - |
0 | 6 B | - |
1 | 6 B | 5 A |
For following example, Transaction A at 5pm tries to write 1 at the last line, it can be safely ignored. Because TSw
is larger than 5. A can continue.
value | TSr | TSw |
---|---|---|
2 | - | 6 B |
2 | 8 C | 6 B |
2 | 8 C | 6 B |
Optimistic concurrency control
Objects are tagged with read-time and write-time
- Transaction cannot read value of object if that value has not been written until after the transaction executed.
- Transaction with T.S. t1 cannot read item with write-time t2 if t2 > t1. (abort and try with new timestamp)
- Transaction cannot write object if object has value read at later time.
- Transaction with T.S. t1 cannot write item with read-time t2 if t2 > t1. (abort and try with new timestamp)
- Perform the operation X
- if X == READ and t >= tw
- or if X == WRITE and t >= tr and t >= tw
- if X == READ : set tr = t if t > tr
- if X == WRITE: set tw = t if t > tw
- Do nothing
- if X == WRITE and tr <= t < tw
- Abort transaction
- if X == READ and t < tw
- or X == WRITE and t < tr
Aborting transactions
How to maintain information for not-yet committed transactions: “Prepare for aborts”
- transactions use a private workspace
- “read set” of versions of objects that have been read.
- “write set” of tentatively written objects.
- threads commit object values into memory when transaction is done.
- if transaction aborts, thread releases read and write sets.
Reference
This is my class notes while taking CSCE 611 at TAMU. Credit to the instructor Dr. Bettati.