Read "Streaming Systems" 1&2, Streaming 101 Read "F1, a distributed SQL database that scales" Read "Zanzibar, Google’s Consistent, Global Authorization System" Read "Spanner, Google's Globally-Distributed Database" Read "Designing Data-intensive applications" 12, The Future of Data Systems IOS development with Swift Read "Designing Data-intensive applications" 10&11, Batch and Stream Processing Read "Designing Data-intensive applications" 9, Consistency and Consensus Read "Designing Data-intensive applications" 8, Distributed System Troubles Read "Designing Data-intensive applications" 7, Transactions Read "Designing Data-intensive applications" 6, Partitioning Read "Designing Data-intensive applications" 5, Replication Read "Designing Data-intensive applications" 3&4, Storage, Retrieval, Encoding Read "Designing Data-intensive applications" 1&2, Foundation of Data Systems Three cases of binary search TAMU Operating System 2 Memory Management TAMU Operating System 1 Introduction Overview in cloud computing 2 TAMU Operating System 7 Virtualization TAMU Operating System 6 File System TAMU Operating System 5 I/O and Disk Management TAMU Operating System 4 Synchronization TAMU Operating System 3 Concurrency and Threading TAMU Computer Networks 5 Data Link Layer TAMU Computer Networks 4 Network Layer TAMU Computer Networks 3 Transport Layer TAMU Computer Networks 2 Application Layer TAMU Computer Networks 1 Introduction Overview in distributed systems and cloud computing 1 A well-optimized Union-Find implementation, in Java A heap implementation supporting deletion TAMU Advanced Algorithms 3, Maximum Bandwidth Path (Dijkstra, MST, Linear) TAMU Advanced Algorithms 2, B+ tree and Segment Intersection TAMU Advanced Algorithms 1, BST, 2-3 Tree and Heap TAMU AI, Searching problems Factorization Machine and Field-aware Factorization Machine for CTR prediction TAMU Neural Network 10 Information-Theoretic Models TAMU Neural Network 9 Principal Component Analysis TAMU Neural Network 8 Neurodynamics TAMU Neural Network 7 Self-Organizing Maps TAMU Neural Network 6 Deep Learning Overview TAMU Neural Network 5 Radial-Basis Function Networks TAMU Neural Network 4 Multi-Layer Perceptrons TAMU Neural Network 3 Single-Layer Perceptrons Princeton Algorithms P1W6 Hash Tables & Symbol Table Applications Stanford ML 11 Application Example Photo OCR Stanford ML 10 Large Scale Machine Learning Stanford ML 9 Anomaly Detection and Recommender Systems Stanford ML 8 Clustering & Principal Component Analysis Princeton Algorithms P1W5 Balanced Search Trees TAMU Neural Network 2 Learning Processes TAMU Neural Network 1 Introduction Stanford ML 7 Support Vector Machine Stanford ML 6 Evaluate Algorithms Princeton Algorithms P1W4 Priority Queues and Symbol Tables Stanford ML 5 Neural Networks Learning Princeton Algorithms P1W3 Mergesort and Quicksort Stanford ML 4 Neural Networks Basics Princeton Algorithms P1W2 Stack and Queue, Basic Sorts Stanford ML 3 Classification Problems Stanford ML 2 Multivariate Regression and Normal Equation Princeton Algorithms P1W1 Union and Find Stanford ML 1 Introduction and Parameter Learning

TAMU Computer Networks 3 Transport Layer

2018-04-24

Transport-layer services

Logical communication between processes on different hosts

  • TCP
    • Congestion control
    • Flow control
    • Connection setup
  • UDP
    • Unreliable, unordered
  • No such services
    • Delay or loss guarantees
    • Bandwidth guarantees

Multiplexing and demultiplexing

  • Multiplexing at sender host
    • Gathering data from multiple sockets, enveloping data with header (later used for demultiplexing)
  • Demultiplexing at receiver host
    • Delivering received segments to correct socket

UDP socket

UDP socket is identified by two-tuple: dest IP address, dest port number. Receiver uses destination port number in transport header to direct the package to appropriate socket (if the socket is open).

TCP socket

TCP socket is identified by following 4-tuple. Receiver host uses all four values to direct segment to appropriate socket.

  • Source IP address
  • Source port number
  • Destination IP address
  • Destination port number

Server host may support many simultaneous TCP sockets on same port, e.g., web server on port 80.

Connectionless transport: UDP

  • Best-effort service
  • Connectionless
  • Often used for streaming multimedia
  • DNS

Advantages:

  • Less overhead, no connection or retransmission
  • Simplicity: no connection state at sender/receiver
  • Small segment header
  • No congestion control
    • For short transfers, this is completely unnecessary
    • In other cases, desirable to control rate directly from application

Packet format:

// 32 bits for a line
source port, dest port
length, checksum
// begin of message
data 

Principles of reliable data transfer

Stop-and-wait

  • rdt1.0
    • Assumption
      • No bit errors
      • No loss of packets
      • No reordering
    • Simple
    • Send and receive
  • rdt2.0
    • May have bit errors
    • Receiver sends ACKs or NAKs back
    • Sender may retransmit
    • Problem: ACKs and NAKs may corrupt
  • rdt2.1
    • Handles garbled ACK/ NAKs
    • Add sequence number to packet (0 and 1)
  • rdt2.2
    • Use ACKs only
    • ACKs contains the sequence number of packet being ACKed
  • rdt3.0
    • Add retransmit when timeout

The link utilization of rdt3.0 is low. We can improve it by pipelining, sending multiple unACK’ed packets at any time.

Go-back-N

  • Window of up to N consecutive unack’ed pkts allowed
  • A field in header that holds k unique seq numbers
    • Maximum window size is k - 1
  • ACK(n): ACKs all consecutive pkts up to & including seq # n (cumulative ACK)
    • Means packets 1…n have been delivered to application
  • Timer for the oldest unacknowledged pkt (send_base)
    • Upon timeout: retransmit all sent pkts in current window (start from send_base ); reset the timer
  • ACK-only: Receiver always sends ACK for correctly-received pkt with highest in-order seq #
    • Duplicate ACKs during loss
    • Need only remember expectedseqnum
  • Out-of-order pkt
    • Discard, no receiver buffering!
    • Re-ACK pkt with highest in-order seq #

Selective repeat

  • Receiver individually acknowledges all correctly received pkts
    • Buffers pkts, as needed, for eventual in-order delivery to upper layer
  • Sender only resends pkts for which ACK was not received
    • Separate timer for each unACKed pkt
  • Sender window
    • N consecutive packets in [snd_base, snd_base+N-1]
  • The maximum window size is half of the sequence number range

Connection-oriented transport: TCP

  • Point-to-point (unicast)
  • Reliable, in-order byte stream
    • Packet boundaries are not visible to the application
  • Pipelined
    • TCP congestion and flow control set window size
  • Send & receive buffers
  • Full duplex data
    • Bi-directional data flow in same connection
  • Connection-oriented
    • Handshaking
  • Flow control
    • Sender will not overwhelm receiver

TCP segment format is as below.

TCP segment format

  • Sequence number is the number of first byte of the segment’s data
  • ACK number is the number of next byte expected

TCP RTO

  • RTO (timeout value) is estimated based on history RTTs (round trip time).
  • SampleRTT: measured time from segment transmission until ACK receipt
    • Ignore retransmissions to avoid inaccurate measurements
  • Exponentially weighted moving average (EWMA)
    • Influence of past sample decreases exponentially fast
    • Typical value: ,

Estimated RTT:

Dev RTT:

Final RTO:

TCP Reliable Data Transfer

  • TCP creates rdt service on top of IP’s unreliable service
    • Hybrid of Go-back-N and Selective Repeat
  • Pipelined segments
  • Cumulative acks
  • TCP uses single retransmission timer
    • For the oldest unACK’ed packet
  • Retransmissions are triggered by
    • Timeout events
    • Duplicate acks for base (3 in row), fast retransmission
  • Nagle’s algorithm
    • For in-order arrival of packets, send ACKs for every pair of segments; if second segment of a pair not received in 500ms, ACK the first one alone
  • Fast retransmit
    • If sender receives 3 duplicate ACKs for its base, it assumes this packet was lost
    • Resend the base segment immediately (i.e., without waiting for RTO)
    • Reordering may trigger unnecessary retx
    • To combat this problem, modern routers avoid load-balancing packets of same flow along multiple paths

TCP flow control

Sender won’t overflow receiver buffer by transmitting too much, too fast

  • Receiver advertises spare room by including value of RcvWin in segments
    • RcvWin = RcvBuffer – [LastByteReceivedInOrder - LastByteDelivered]
  • Sender enforces seq < ACK + RcvWin
  • Combining both constraints (sender, receiver)
    • seq < min(sndBase+sndWin, ACK+RcvWin)

TCP connection

Purpose of connection establishment:

  • Exchange initial seq #s
    • Can be random picked for security
  • Exchange flow control info (i.e., RcvWin)
  • Negotiate options (SACK, large windows, etc.)

3-way handshake

  1. Client sends TCP SYN to server
    • Specifies initial seq # X and buffer size RcvWin
    • No data, ACK = 0
  2. Server gets SYN, replies with SYN+ACK
    • Sends server initial seq # Y and buffer size RcvWin
    • No data, ACK = X+1
  3. Client receives SYN+ACK, replies with ACK segment
    • Seq = X+1, ACK = Y+1
    • May contain regular data, but many servers will break
  4. regular packets
    • Seq = X+1, ACK = Y+1

4-way bye-bye

  • TCP initiates a close when it has all ACKs for the transmitted data
  • To close a connection like closesocket(sock);, birectional transfer means both sides must agree to close.
  1. Originator end system sends TCP FIN control segment to server
    • Server may issue FIN as well
  2. Other side receives FIN, replies with ACK. Connection in “closing” state. Then sends FIN
  3. Originator receives FIN, replies with ACK
    • Enters “timed wait” - will respond with ACK to received FINs
  4. Other side receives ACK; its connection considered closed
  5. after a timeout (TIME_WAIT state lasts 240 seconds), originator’s connection is closed as well

TCP congestion control

Congestion is the case when too many sources sending too much data too fast for the network to handle, not the receiver (in the case of flow control).

Manifestations:

  • Lost packets (buffer overflows)
  • Delays (queueing in routers)

Approaches Towards Congestion Control

  • End-to-end
    • No explicit feedback from network
    • Congestion inferred by end-systems from observed loss/delay
    • TCP
  • Network-assisted
    • Routers provide feedback to end systems
    • ATM (2 bits)
    • 3 states: Increase, decrease, hold steady
    • Use RM (resource management) packets, sent by sender

TCP congestion control

Various of algorithms have been developed over the years.

  • TCP Tahoe (1988)
  • TCP Reno (1990)
  • Linux 2.6.8 and later: BIC TCP (2004)
  • Vista and later: Compound TCP (2005)

Idea

  • Sender limits transmission by: .
  • is a function of network congestion.
  • The effective window is the minimum of CongWin, flow-control window and sender’s buffer space.
  • Loss event is either timeout or 3 duplicate ACKs.
  • TCP sender will reduce after lost event

Mechanisms

  • AIMD (congestion avoidance)
    • Add 1 packet to window size per RTT
  • Slow start
    • Add 1 packet to window size per ACK; double W per RTT
  • Conservative after timeout events

Refinement

TCP Tahoe only responds to timeouts.

  • Reset:
  • is set to 1 MSS
  • Slow start until is reached, then AIMD

TCP Reno:

  • Timeouts same as Tahoe
  • 3 dup ACKs:
    • cut in half
    • AMID

Initial slow start ends when either:

  • Loss occurs
  • Initial threshold is reached
    • Usually set to the receiver’s advertised window

Transfer rate

Rate is , is the current window size. Average rate is .

Between two losses, number of packets sent is:

Denote loss probability is , Therefore, we have:

The average rate can be calculated again to be:

TCP Fairness

Fairness goal: if K TCP sessions share same bottleneck link of bandwidth C, each should have average rate of C/K.

TCP will converge to fairness eventually, in the case that flows have the same MSS/RTT.

Nothing prevents app from opening parallel flows between 2 hosts. Can use this strategy to obtain larger share of bandwidth. Web browsers do this.

Multimedia apps do not use TCP; they use UDP instead. Because they don’t want rate throttled by congestion control.

Reference

This is my class notes while taking CSCE 612 at TAMU. Credit to the instructor Dr. Loguinov.


Creative Commons License
Melon blog is created by melonskin. This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
© 2016-2019. All rights reserved by melonskin. Powered by Jekyll.