TAMU Operating System 3 Concurrency and Threading

User level view

Why concurrency

Latency reduction
- Apply parallel algorithm
Latency hiding
- Perform useful work while another operation is pending (multiprogramming)
Throughput increase
- Multiple concurrent executions to allow more simultaneous work

How

Serial
- Complete one task then start new
- Pros:
  - No potential for conflict
  - Inter-task invariants cannot be violated
- Cons:
  - No multiprogramming
  - No multiprocessor parallelism
Cooperative
- Voluntarily yield CPU at well-defined points in execution
- Pros:
  - Allows for some controlled multiprogramming
  - Invariant only needs to be ensured at yielding points
- Cons:
  - Invariants are not automatically enforced
Preemptive
- Execution of tasks can interleave
- A scheduler is needed
- Pros:
  - Allows for high-level of multiprogramming
  - Parallelism
- Cons:
  - Invariants must be hold at all times
  - Mechanism: synchronization

Invariant: predicate that has to be true at all times

Threads

Scheduler manages threads
Threads has its own registers and stack
POSIX thread model is widely used
- Thread attributes controls the behavior of threads
- Functions:
  - pthread_create()
  - pthread_cancel()
  - …

Low-level thread dispatching on x86

Dispatch will store state of current thread execution and load state of target thread onto CPU.

Dispatching in 3 steps:

Save state of current thread
- processor’s, like registers, etc
- Higher level can be handled by the system
Load state of new thread
Continue execution of new thread

Thread dispatching can be handled in the way of handling exceptions.

Handling exceptions:

Push error code and interrupt number onto stack
Call a function to
- push remaining registers onto stack (save states), and
- High-level exception handler is called and has access to those CPU registers
- Previously saved states is loaded from the stack and into the register
- Return with iret

Steps for dispatching:

Thread A call function dispatch_to(B)
Current state for A is saved in A stack
Load state of B from B stack into register
iret to return, B takes control
B return from previous dispatch_to()
Continue execution of B

Structure of a thread

Thread control block

thread id
stack: pointer to the stack
stackptr: pointer to the current top of the stack
priority
bookkeeping
etc

Details of dispatch

Steps:

fix stack
- have return_addr, KERNEL_CS, eflags and _thread in stack
save state
- push fake error code and interrupt number
- save general purpose registers
- save stack pointer in the TCB
locate state of new thread (from function argument)
- make the new thread current, change pointer in Thread
load state of the new thread

Thread creation

Thread needs to be created by pushing:

tshutdown
_tfunction, thread function
eflags, instruction pointer
fake error code and interrupt number
general purpose registers
segment registers

Execution life cycle of a thread

Create a new thread
Someone dispatches to this thread
Load those states of the new thread
Execute thread_start() at the top of stack
Execute thread function _tfunction
Execute tshutdown()

CPU scheduling

State of a thread/process

starting, –> 2
ready, ready to execute, switch to running
running, can be switched back to ready
blocked, switched from running
suspended ready, swap out due to memory shortage
suspended blocked, swap out due to memory shortage
terminated, from running

Scheduler:

long-term (admission) scheduler
- state 1 and 2
short-term (CPU) scheduler
- 2 and 3
- for CPU-burst, threads compete for CPU
medium-term (memory) scheduler
- 2 and 5, 4 and 6

Decision points for scheduler:

3 to 4
3 to 7
4 to 2
3 to 2

Scheduler

There is a ready queue containing TCBs for ready threads. Scheduler is responsible to pick a thread from it and dispatch to this thread.

Criteria:

User oriented
- Response time, between start and first response
- Normalized response time, ratio of response time to execution time
- Wait time, sum of periods waiting in ready queue
System oriented
- CPU utilization: percentage of time CPU is busy
- Throughput: number of job completed per time unit

FIFO

First come, first serve
very simple
long average and worst-case waiting time
poor dynamic behavior

Shortest Job First

When CPU is idle, pick thread with shortest next CPU burst
minimize average waiting times
starvation of threads with long CPU bursts
difficulty to determine the length of CPU burst

Fixed-priority

When CPU is idle, pick thread with highest priority
priority: thread/process class or urgency
Starvation of low-priority threads
Solution: aging, increase priority along time
- Use an array of FIFO queues
- Aging by move between queues

Round-Robin

FIFO with preemption after time quantum
Jobs are only allowed to be executed continuously for a time quantum
After a time quantum, a job will be moved back to the tail of FIFO queue
Choice of time quantum
- large, FIFO
- small, processor sharing, context switching overhead

MLFQ, Multilevel Feedback Queue

An array of Round-robin queues with different quantum values
Queues have different priority
Aging is implemented
Demotion: preempted thread is added at the end of the next lower-priority queue

Multiprocessor scheduling

Better to have a single scheduler for a CPU
- Reduce context switching overload
A global thread partitioning module is used to assign threads

User-level vs. kernel-level threads

Kernel-level threads

threads managed by OS
- A thread in user space is transferred into kernel-level thread in kernel space
- A thread in user space corresponds to one thread in kernel space
Pro: avoid system-integration issues
Con: Too heavy-weight
- Performance cost of thread management operations
  - Cross protection boundary on every thread operation
  - System calls
- Cost of generality, a single implementation must be used by all applications
  - user-level libraries can be tuned to applications

User-level threads

managed by runtime library
Management operations require no kernel intervention
- Multiple user-level threads are actually one thread in kernel space (while system call)
- Library can allocate multiple threads in kernel space as well
- Pros
  - Low cost
  - Flexible (various API, POSIX, Actors, Java)
  - Implementation requires no change to OS
- Cons
  - Performance issues due to mapping to OS
  - Difficult to use and to integrated with system services
    - kernel events are invisible to user level
      - a kernel thread blocking may lead to loss of user-level thread
    - kernel threads are scheduled obliviously with user-level thread state
      - user threads are scheduled accordingly
      - cause issues because OS doesn’t know priority

Three functions with standard C library:

int getcontext(ucontext_t *ucp);
// saves current thread's execution context in the structure pointed to by ucp.
int setcontext(const ucontext_t * ucp);
//make a previously saved thread context the current thread context. The current
//context is lost and setcontext() does not return.
//Instead, execution continue in the context specified by ucp.
void makecontext(ucontext_t *ucp, void (*func)(), int argc, ...);
//modifies context pointed to by ucp so that it will continue execution by invoking
//func() with the arguments provided. Context upc must have previously been
//initialized by call to getcontext().

Solution:

To balance KL and UL, we can provide kernel support to user-level threads
Kernel can tell user library about thread states in kernel with upcall
UL thread system can request/release processors
UL thread system has complete control over which thread to run on allocated processors

Example:

User thread UA is running with kernel thread KA
KA is blocked
Kernel create a new kernel thread KB and inform user library
User library resume user thread UB to run on KB
KA unblocked
Kernel preempts KB and inform user library
User library preempts UB
UA resumed

Event-based programming

Thread is hard

Dead lock
Module cannot be designed independently
- Cause dead lock
Achieving good performance is hard
- simple lock, low concurrency
- fine-grain lock, reduce performance
- OS limit, scheduling, context switching
Not well supported
- Hard to port threaded code
- libraries often not thread safe
- kernel calls, often not multi-threaded

Event-driven programming is a good way for cooperative task management.

Handlers “register” for events
Event loop waits for events and then invokes handlers (through function calls)
Handlers execute to completion (no preemption)
Handlers are typically short-lived.
Events can be initiated by devices as well

Advantages

Easier to use:
- No concurrency, no preemption, no synchronization, no deadlocks.
Easier to debug:
- Timing dependencies all generated by events, not by internal scheduling.
- Problems easier to trace, e.g. slow event handling vs. corrupted memory.
Fast:
- No context switching, no scheduling overhead.
Portable:
- It’s just function calls, after all

Difficulties

Manual control flow management:
- Control flow for single task is broken up across multiple procedures.
Manual stack management:
- Have to manually carry local state across these procedures.
Unexpected blocking:
- What if event handler is blocked by page fault?
- Wait
No parallelism:
- How to run events on a multiprocessor?
- No way

Reference

This is my class notes while taking CSCE 611 at TAMU. Credit to the instructor Dr. Bettati.

AI 2

Algorithm 17

Amazon 1

Authorization 1

Blog 3

Bootstrap 1

C++ 1

CCpp 5

CSS 2

Cloud 3

Code 1

Crawler 1

DNS 1

Database 17

DeepLearning 1

Design 17

Development 1

Docker 1

English 1

Express 1

GDB 1

Go 3

Google 4

HTML 3

IOS 1

Java 17

Javascript 4

Jekyll 1

Linux 4

MacOS 2

MachineLearning 17

Markdown 4

Mobile 1

MongoDB 2

Multi-threading 3

NAS 1

Network 11

NeuralNetwork 10

Node 1

OS 8

Public-speaking 1

Python 15

RESTful 1

Rails 9

React 1

Redis 1

Ruby 6

Shell 2

Spring 2

System 17

TCP 1

TDD 1

Thread 2

Vim 1

awk 1

git 1

jQuery 1

media 1

network 1

php 1