User level view
Why concurrency
- Latency reduction
- Apply parallel algorithm
- Latency hiding
- Perform useful work while another operation is pending (multiprogramming)
- Throughput increase
- Multiple concurrent executions to allow more simultaneous work
How
- Serial
- Complete one task then start new
- Pros:
- No potential for conflict
- Inter-task invariants cannot be violated
- Cons:
- No multiprogramming
- No multiprocessor parallelism
- Cooperative
- Voluntarily yield CPU at well-defined points in execution
- Pros:
- Allows for some controlled multiprogramming
- Invariant only needs to be ensured at yielding points
- Cons:
- Invariants are not automatically enforced
- Preemptive
- Execution of tasks can interleave
- A scheduler is needed
- Pros:
- Allows for high-level of multiprogramming
- Parallelism
- Cons:
- Invariants must be hold at all times
- Mechanism: synchronization
Invariant: predicate that has to be true at all times
Threads
- Scheduler manages threads
- Threads has its own registers and stack
- POSIX thread model is widely used
- Thread attributes controls the behavior of threads
- Functions:
pthread_create()
pthread_cancel()
- …
Low-level thread dispatching on x86
Dispatch will store state of current thread execution and load state of target thread onto CPU.
Dispatching in 3 steps:
- Save state of current thread
- processor’s, like registers, etc
- Higher level can be handled by the system
- Load state of new thread
- Continue execution of new thread
Thread dispatching can be handled in the way of handling exceptions.
Handling exceptions:
- Push error code and interrupt number onto stack
- Call a function to
- push remaining registers onto stack (save states), and
- High-level exception handler is called and has access to those CPU registers
- Previously saved states is loaded from the stack and into the register
- Return with
iret
Steps for dispatching:
- Thread A call function
dispatch_to(B)
- Current state for A is saved in A stack
- Load state of B from B stack into register
iret
to return, B takes control- B return from previous
dispatch_to()
- Continue execution of B
Structure of a thread
Thread control block
- thread id
- stack: pointer to the stack
- stackptr: pointer to the current top of the stack
- priority
- bookkeeping
- etc
Details of dispatch
Steps:
- fix stack
- have
return_addr
,KERNEL_CS
,eflags
and_thread
in stack
- have
- save state
- push fake error code and interrupt number
- save general purpose registers
- save stack pointer in the TCB
- locate state of new thread (from function argument)
- make the new thread current, change pointer in
Thread
- make the new thread current, change pointer in
- load state of the new thread
Thread creation
Thread needs to be created by pushing:
tshutdown
_tfunction
, thread functioneflags
, instruction pointer- fake error code and interrupt number
- general purpose registers
- segment registers
Execution life cycle of a thread
- Create a new thread
- Someone dispatches to this thread
- Load those states of the new thread
- Execute
thread_start()
at the top of stack - Execute thread function
_tfunction
- Execute
tshutdown()
CPU scheduling
State of a thread/process
- starting, –> 2
- ready, ready to execute, switch to running
- running, can be switched back to ready
- blocked, switched from running
- suspended ready, swap out due to memory shortage
- suspended blocked, swap out due to memory shortage
- terminated, from running
Scheduler:
- long-term (admission) scheduler
- state 1 and 2
- short-term (CPU) scheduler
- 2 and 3
- for CPU-burst, threads compete for CPU
- medium-term (memory) scheduler
- 2 and 5, 4 and 6
Decision points for scheduler:
- 3 to 4
- 3 to 7
- 4 to 2
- 3 to 2
Scheduler
There is a ready queue containing TCBs for ready threads. Scheduler is responsible to pick a thread from it and dispatch to this thread.
Criteria:
- User oriented
- Response time, between start and first response
- Normalized response time, ratio of response time to execution time
- Wait time, sum of periods waiting in ready queue
- System oriented
- CPU utilization: percentage of time CPU is busy
- Throughput: number of job completed per time unit
FIFO
- First come, first serve
- very simple
- long average and worst-case waiting time
- poor dynamic behavior
Shortest Job First
- When CPU is idle, pick thread with shortest next CPU burst
- minimize average waiting times
- starvation of threads with long CPU bursts
- difficulty to determine the length of CPU burst
Fixed-priority
- When CPU is idle, pick thread with highest priority
- priority: thread/process class or urgency
- Starvation of low-priority threads
- Solution: aging, increase priority along time
- Use an array of FIFO queues
- Aging by move between queues
Round-Robin
- FIFO with preemption after time quantum
- Jobs are only allowed to be executed continuously for a time quantum
- After a time quantum, a job will be moved back to the tail of FIFO queue
- Choice of time quantum
- large, FIFO
- small, processor sharing, context switching overhead
MLFQ, Multilevel Feedback Queue
- An array of Round-robin queues with different quantum values
- Queues have different priority
- Aging is implemented
- Demotion: preempted thread is added at the end of the next lower-priority queue
Multiprocessor scheduling
- Better to have a single scheduler for a CPU
- Reduce context switching overload
- A global thread partitioning module is used to assign threads
User-level vs. kernel-level threads
Kernel-level threads
- threads managed by OS
- A thread in user space is transferred into kernel-level thread in kernel space
- A thread in user space corresponds to one thread in kernel space
- Pro: avoid system-integration issues
- Con: Too heavy-weight
- Performance cost of thread management operations
- Cross protection boundary on every thread operation
- System calls
- Cost of generality, a single implementation must be used by all applications
- user-level libraries can be tuned to applications
- Performance cost of thread management operations
User-level threads
- managed by runtime library
- Management operations require no kernel intervention
- Multiple user-level threads are actually one thread in kernel space (while system call)
- Library can allocate multiple threads in kernel space as well
- Pros
- Low cost
- Flexible (various API, POSIX, Actors, Java)
- Implementation requires no change to OS
- Cons
- Performance issues due to mapping to OS
- Difficult to use and to integrated with system services
- kernel events are invisible to user level
- a kernel thread blocking may lead to loss of user-level thread
- kernel threads are scheduled obliviously with user-level thread state
- user threads are scheduled accordingly
- cause issues because OS doesn’t know priority
- kernel events are invisible to user level
Three functions with standard C library:
int getcontext(ucontext_t *ucp);
// saves current thread's execution context in the structure pointed to by ucp.
int setcontext(const ucontext_t * ucp);
//make a previously saved thread context the current thread context. The current
//context is lost and setcontext() does not return.
//Instead, execution continue in the context specified by ucp.
void makecontext(ucontext_t *ucp, void (*func)(), int argc, ...);
//modifies context pointed to by ucp so that it will continue execution by invoking
//func() with the arguments provided. Context upc must have previously been
//initialized by call to getcontext().
Solution:
- To balance KL and UL, we can provide kernel support to user-level threads
- Kernel can tell user library about thread states in kernel with upcall
- UL thread system can request/release processors
- UL thread system has complete control over which thread to run on allocated processors
Example:
- User thread UA is running with kernel thread KA
- KA is blocked
- Kernel create a new kernel thread KB and inform user library
- User library resume user thread UB to run on KB
- KA unblocked
- Kernel preempts KB and inform user library
- User library preempts UB
- UA resumed
Event-based programming
Thread is hard
- Dead lock
- Module cannot be designed independently
- Cause dead lock
- Achieving good performance is hard
- simple lock, low concurrency
- fine-grain lock, reduce performance
- OS limit, scheduling, context switching
- Not well supported
- Hard to port threaded code
- libraries often not thread safe
- kernel calls, often not multi-threaded
Event-driven programming is a good way for cooperative task management.
- Handlers “register” for events
- Event loop waits for events and then invokes handlers (through function calls)
- Handlers execute to completion (no preemption)
- Handlers are typically short-lived.
- Events can be initiated by devices as well
Advantages
- Easier to use:
- No concurrency, no preemption, no synchronization, no deadlocks.
- Easier to debug:
- Timing dependencies all generated by events, not by internal scheduling.
- Problems easier to trace, e.g. slow event handling vs. corrupted memory.
- Fast:
- No context switching, no scheduling overhead. - Portable:
- It’s just function calls, after all
Difficulties
- Manual control flow management:
- Control flow for single task is broken up across multiple procedures.
- Manual stack management:
- Have to manually carry local state across these procedures.
- Unexpected blocking:
- What if event handler is blocked by page fault?
- Wait
- No parallelism:
- How to run events on a multiprocessor?
- No way
Reference
This is my class notes while taking CSCE 611 at TAMU. Credit to the instructor Dr. Bettati.