I/O management and device driver
Structure of I/O system
- Applications
- System call interface
- Device interface
- Block I/O for disk, CD-ROM
- Streaming for keyboard, mouse, printer
- Network for Ethernet, Bluetooth
- Device Management layer
- Device driver
- SW/HW interface
- Device controller
- Device
Device controllers vs. Device drivers
Device controller:
- opcode register
- User writes command to be executed
- operand register
- Stores parameters for command
- busy register
- Check if completed
- status register
- Check status, such as error code
- data buffer
- Data is available here
- Can be used to transfer data to device
Device driver:
- Write opcode and operand register
- Read busy and status register
- Read/write data buffer
Programmed I/O, Polling, Interrupts
To access device register:
- Explicit
- Port CPU register on device register
- Memory-mapped
- Map a portion of memory to the device register
Polling:
- The device driver keeps checking the busy register
- Frequency needs to be chosen carefully
- Easy to implement
- CPU bandwidth related to I/O is controlled by CPU, not device
- Better protected against malicious devices (may flooding CPU with interrupts)
Interrupts:
- Interrupt is raised once the operation is done
Direct Memory Access (DMA)
- It is easy to let CPU transfer data from buffer to memory, but expensive
- DMA can use DMA channel to do the transfer
Modern I/O Architectures
- Under the I/O management layer: we have
- File system driver
- Hard disk driver
- Once a request is made
- I/O management issues a IRB (I/O Request Block)
- File system driver figures out the location on disk and sends IRB to disk driver (through I/O layer)
- Disk driver updates IRB and sends back to FS (through I/O layer)
- FS returns IRB to I/O handler
- System call returns
- This structure can have a volume manager disk driver between FS and disk driver
- Easy to add power management, plug-play management on top of I/O layer
Structure of device drivers
- Top end
- Write, issue command
- Bottom end, executed after completion of interrupt
- interrupt service routine
- critical, executed immediately
- deferred procedure call (DPC) routine
- can be delayed
- interrupt service routine
- Other parts
- Device management
- add-device routine
- initialization routine
- power management
- Interaction with I/O manager
- dispatch routines
- Device management
Disks
Structure of hard disk drive
- Platters, plates for data
- Spindle, in the middle of platter, can rotate
- Head, read data, one for a platter surface
- Arm, hold heads
For a platter,
- Surface
- Track, a round
- Cylinder, multiple tracks with the same size
- can be read without moving arm
- Cylinder, multiple tracks with the same size
- Sector, a part of a track
Performance modeling
- Seek latency, move arm
- time:
- , track traversal time
- , num of tracks traversed
- , startup time
- Rotational latency, rotate platter
- time:
- , number of revolutions per time unit
- Transfer latency, transfer data from disk to memory
- time:
- , bytes to be transferred
- , bytes on a track
Disk scheduling
Goal is to optimize the seek latency. (especially, number of track traversed).
Algorithms:
- FIFO
- Short-Seek-Time-First
- Pros: short service times
- Cons: starvation of far requests
- Can be improved by aging
- Elevator (SCAN)
- Scan one-direction, reach one end, change direction
- Cons:
- Few requests after the head, since just passing through
- Requests on two ends are handled rarely
- Circular SCAN
- only scan in one direction, etc, left to right
- Cons: Waste if no requests at end
- LOOK, Circular-LOOK
- Improve on SCAN
- Look forward to check if there are requests at two ends
RAID
RAID: Redundant Arrays of Independent Disks, Striping + redundancy
Common characteristics of RAID:
- Array of physical disks that are visible as single device to OS.
- Data is distributed across physical drives of array.
- Redundant disk capacity is used for error detection/correction.
- Pros:
- Improved I/O performance
- Enables incremental upgrade
- Cons: Reliability, more devices increase the probability of failure
- Solution: redundancy
RAID Levels
RAID is Striping + redundancy. For striping, we have:
- Block-level striping
- Block is the atomic element among disks
- Bit-level striping
- Strip blocks into smaller parts and distribute them among disk
Levels:
RAID 0
- Block-level striped set without parity (redundancy)
- Distribute blocks among disks
- Pros:
- Increase capacity
- Read/Write performance
- Cons:
- Increase failure rate (no tolerance)
RAID 1
- Mirrored Set without parity
- Performance:
- Good for read
- Penalty for write
- Cons:
- Cost (100% redundancy)
RAID 2
- Memory-style error-correcting parity
- Strip a block into small strips and distribute
- Heads and spindles synchronized
- Error correction code calculated over bits of data disks (Hamming), stored in other special disks
- Appropriate for systems with many failures
- Typically not implemented
RAID 3
- Bit-interleaved parity
- Bit-level striping
- Heads and spindles synchronized
- Simple parity bits stored in another disk (can be used to infer the correct bits for a failed disk)
RAID 4
- Block level parity
- Similar to RAID 3
- Disks not synchronized
- Each block on parity disk contains parity information for all data disks
- To update the parity bits, we don’t need to read other disk, just use old values
- use
- Problem: parity disk gets lots of loads
- Solution: RAID 5
RAID 5
- Striped set with interleaved parity
- Same as RAID 4, but parity spread across all data disks
- No synchronization across disks
- Large strips
- Problem: Failure may take long time to recover
- If 2nd failure happens then, system will lose data
RAID 6
- Striped set with dual interleaved parity
- Use two bits to store parity
- Use two independent parity functions and writes to two disks
- Tolerates two failures
RAID combinations
- RAID 1 + 0 = RAID 10
- RAID 0 built on RAID 1
Flash disks
NAND Flash
- Solid-state persistent storage technology with “disk-like” user interface.
- READs/WRITEs are sector-based (page-based)
- Suited for storage of sequential data.
- Random access can be “faked” by caching (“shadowing”) data in RAM main memory.
A block (e.g. 128KB) in NAND Flash is divided into several pages (e.g. 2KB).
Operations on NAND flash
Operations:
Operation | Level | Function | Latency/microsec |
---|---|---|---|
READ | page | read | 75 |
PROGRAM | page | write to empty page | 220 |
ERASE | block | erase | 500 |
Note:
- We cannot overwrite data on a page
- Cannot erase a single page
- Blocks start failing after about 100K PROGRAM/ERASE cycles
Flash translation layer (FTL)
- Use a log-structured FTL
- FTL dynamically remaps page number with a translation table
- It also keeps track of state of each page, valid, empty, invalid
- A pointer as head of empty page array is maintained
- To overwrite a page
- write data into the first empty page
- mark old page as invalid
- map new page number in the translation table
- advance pointer of empty pages
- Once empty array reaches low-water mark, do garbage collection of invalid pages
- move valid pages in combined valid/invalid pages into remaining empty pages and mark old pages as invalid
- erase blocks with all invalid pages
- change page marks and move pointer
Reference
This is my class notes while taking CSCE 611 at TAMU. Credit to the instructor Dr. Bettati.