TAMU Computer Networks 4 Network Layer

Introduction

Network layer protocols: IP, ICMP, IGMP.

IP
- Datagram format
- Addressing conventions
ICMP
- Error reporting
- Ping, traceroute
IGMP
- Multicast

Network Layer = IP Layer

Transports segments from sending to receiving host
On the sending side, encapsulates segments into datagrams
On the receiving side, delivers segments to transport layer
Network layer protocols in every host and router
Router examines header fields in all IP datagrams passing through it

Key Network-Layer Functions

Routing: determine the path taken by packets from source to dest
- Build a minimum-cost table at each router
- Table has next-hop neighbor for each possible destination
- send packet along the least-expensive path
Forwarding: move packets from a router’s input port to appropriate router output port (physical interface (NIC) inside router, not TCP/UDP port)
- Table lookup
- Port-to-port transfer
- Goal: efficiency
Connection setup in certain network architectures
- e.g., ATM (Asynchronous Transfer Mode)
- Before datagrams flow in such networks, two hosts and intermediate routers establish virtual circuit (VC)
  - Routers get involved to set up a path

Virtual circuit and datagram networks

Internet is preferred compared with ATM.

Driven by data exchange among computers
- “Elastic” service, no strict timing requirements
Smart end systems, complexity at edges
Simple network core

ATM:

Like Telephony
Strict timing, bandwidth requirements
Need for guaranteed service
Dumb end system
Complexity in network core

Virtual Circuits

VCs may create a path that behaves much like a telephone circuit (no congestion, low delay, no loss)
Call setup for each connection before data can flow
- Similar to TCP’s handshake, but involves routers
Each packet carries a VC tag instead of the 5-tuple
- 5-tuple: <src addr, dest addr, src port, dest port, proto>
Every router on source-dest path maintains “state” for each passing connection
- Mapping from tags to next-hop router
Fraction of router resources (bandwidth, buffers) are allocated to each VC

Datagram networks

No call setup at network layer
Routers: no state about end-to-end connections
- No network-level concept of “connection”
Packets forwarded using destination host address
- Packets between the same source-dest pair may take different paths (multi-path routing)

Inside a router

A router has several ports (interface that can send and receive data) and a switching hardware connecting them.

Key router functions:

Run routing algorithms
- RIP, OSPF, BGP
Forward datagrams from incoming to outgoing link
- Through switch fabric hardware
- Given datagram destination, look up output port using forwarding table in input port memory
- Queueing may happen at both input and output ports
  - Customer traffic: single FIFO drop-tail queue
  - ISP traffic: multiple queues with WRR or priority queuing

Switching methods:

Switching Via Memory
Switching Via a Bus
- A shared bus
Switching Via An Interconnection Network
- Overcomes bus bandwidth limitations
- Crossbar

IP, Internet Protocol

IP datagram

Overhead of IP is 20 bytes, same as TCP.

IP datagram format (some):

IP protocol version
TTL: max number remaining hops (decremented at each router)
source and destination IP addresses
16-bit ID, flags, fragment offset (for IP fragmentation)
upper layer protocol to deliver (TCP, ICMP, etc)
QoS (priority of packet)
checksum (header only)

IP fragmentation & reassembly

Network links have varying MTUs (maximum transmission units) – largest possible link-level frames
- Different link types, different MTUs (most common 1500)
Large IP datagram divided (“fragmented”) within network
- One datagram becomes several datagrams
- Reassembled only at final destination

For example, a large datagram can be divided into 3 smaller one.

length includes the 20 bytes header
offset in in 8-byte words
Last packet is identified by flag 0 and nonzero offset

// large
length 4000, id x, flag 0, offset 0

//small 1
length 1500, id x, flag 1, offset 0

//small 2
length 1500, id x, flag 1, offset 185

//small 3
length 1040, id x, flag 0, offset 370

IP Addressing

IP address: 32-bit identifier for host or router interface
Interface: connection between host/router and physical link
- Also called a port
- Routers typically have multiple interfaces
- Multihoming: Hosts have multiple interfaces

Subnet (LAN) is a network composed of devices with the same subnet prefix of IP address.

Subnet prefix: k bits
Host suffix: 32-k remaining bits
Can physically reach each other without intervening router
To determine the subnets, detach each interface from its host or router, creating islands of isolated networks
Subnet mask: 255.255.255.0 or /24

CIDR stands for Classless InterDomain Routing. It has the address format of 200.23.16.0/23.

A host can get an IP address by:

hard-coded by system admin in a file
- Windows: Control-panel–network–configuration–tcp/ip–properties
- Linux: /etc/rc.config
Or dynamically assigned by DHCP
- Plug-and-play

Hierarchical Addressing: allows efficient advertisement of routing information. ISPs can ask for different CIDR address range, like “Send me anything in 199.31.0.0/16 or 200.23.18.0/23”.

ISP get a block of addresses from ICANN (Internet Corporation for Assigned Names and Numbers). These registries process ISP and user requests for subnet space. Also manage DNS and resolve disputes.

ARIN (North/South America)
RIPE (Europe)
APNIC (Asia-Pacific)
AfriNIC (Africa)

NAT, Network Address Translation

Local network uses just one IP address as far as the outside world is concerned.

No need to be allocated a range of addresses from ISP
Just one IP address is used for all devices
Can change addresses of devices in local network without notifying outside world
Can change ISP without changing addresses of devices in local network
Devices inside local net not explicitly addressable or visible to outside world (a security plus)

NAT router has a translation table that translates between WAN side addr (ip, port) and LAN side addr. It will change datagram source addr while sending out and dest addr while receiving in.

With 16-bit port, a NAT allows 64K simultaneous connections with a single LAN-side address.
NAT router touches Transport layer (Router should process up to Network layer), which is controversial.
Makes inbound connections difficult
- Inbound connections needed in P2P and other applications
- May be overcome by UPnP or manually configuring NAT to route incoming connections to a particular host
Some believe that address shortage should instead be solved by IPv6

ICMP, Internet Control Message Protocol

Communicates network-level debug information
- Error reporting: unreachable host, network, port, protocol
- Echo request/reply (ping)
Network-layer above IP
- ICMP msgs carried in IP datagrams (“layer 3.5”)
ICMP error message
- Payload contains first 28 bytes of IP pkt causing error

Datagram format:

Type: 8 bits
Code: 8 bits
Checksum: 16 bits
ID: 16 bits
Sequence: 16 bits

Type	Code	description
0	0	echo reply(ping)
3	3	dest port unreachable
8	0	echo request (ping)
11	0	TTL expired

Traceroute

Source sends series of UDP segments to dest
- First with TTL = 1
- Second with TTL = 2
- Unlikely port number
When the n-th datagram arrives to the n-th router
- Router discards datagram
- Sends to source a TTL Expired (type 11, code 0)
- Message includes IP hdr from router & first 28 bytes of original packet
When ICMP message arrives, source calculates RTT
- Traceroute does this 3 times per hop
UDP segment eventually arrives at destination host
- Destination returns ICMP “port unreachable” packet (type 3, code 3)
- When source gets this ICMP, it stops

IPv6

Address length is 16 bytes.
Simpler header format helps speed up forwarding
Header changes to facilitate QoS (priority of packet) and extensions
Data format
- Fixed-length 40 byte header
- No fragmentation allowed
- No checksum
- Allow options, by “next header”
Tunneling: IPv6 can be carried as payload in IPv4 datagrams among IPv4 routers.

Routing algorithms

Global
- Routers have complete topology, link cost info
- Link state algorithms
Local (decentralized)
- Router knows physically-connected neighbors, link costs to neighbors
- Iterative process of computation, exchange of info with neighbors
- Distance vector algorithms

Link state

Dijkstra’s algorithm:

Entire network topology and link costs known
- Accomplished via “link state broadcast”
- Eventually, all nodes have same info
Iterative: after k iterations, know least-cost path to k closest destinations
Heap-based implementation: $O(n \log n)$
Oscillations possible, but only for traffic-dependent cost

Distance vector

Two metrics known to each node
- Estimate $D_x(y)$ of least cost from $x$ to $y$
- Link cost $c(x,v)$ to reach $x$ ’s immediate neighbors
Each node maintains a distance vector
- to neighbors $V$ : $D_x = \{D_x(y) : y \in V\}$
Node periodically asks its neighbors for their distance vectors
- Thus, $x$ has access to the following for each neighbor $v$
- for neighbor $v$ , $D_v = \{D_v(y) : y \in V\}$

Bellman-Ford algorithm

When a node $x$ receives new DV estimate from neighbor $v$ , it updates its own DV using the Bellman-Ford equation:

$D_x(y) = min\{ D_x(y), c(x, v) + D_v(y)\}, \forall y \in V$

Centralized Bellman Ford requires time
- Convergence of decentralized version depends on topology, link weights, update delays, and timing of events
Allow negative weights
Iterative, asynchronous. Each iteration caused by:
- Local link cost change
- DV update message from neighbor
Distributed:
- Each node notifies neighbors only when its DV changes

Link cost changes

Steps:

Node detects local link cost change
Recalculates distance vector, updates routing info if needed
If DV changes, notifies neighbors

Performance:

Good news travels fast
Bad news travels slow, “count to infinity” problem!
- Use poisoned reverse to solve it
  - If z routes through y to get to x:
  - z tells y that z’s distance to x is infinite.
  - So y won’t route to x via z.
  - May not work for complex topology.

Comparison of LS and DV

Message complexity:

LS with nodes, links
- Sent $O(nE)$ messages
DV exchange messages between neighbors only

Convergence time:

LS: plus time sending messages
- Oscillations (cost = congestion)
DV: convergence time varies
- May have routing loops
- Count-to-infinity problem

Robustness:

LS
- Node can advertise incorrect link cost
- Affects only a small portion of the graph
DS
- Node can advertise incorrect path cost
- Each node’s table used by others
- Error propagate thru network

Hierarchical routing

Limitations: Memory, CPU time, Message overhead with billions links. ISPs are not willing to share their topology with others.

Internet is network of networks. Network admins control routing in their own networks, export reachable subnets to outside world.

So the idea is:

Aggregate routers into regions called AS (Autonomous Systems)
Routers in the same AS run the same algorithm
- Accomplished via intra-AS routing protocols
ISPs gain flexibility
- Routers in different ASes can run different intra-AS protocols

Gateway routers:

Direct links to routers in other ASes
Exchange routing view of each AS using an inter-AS protocol

Interconnected ASes:

Intra-AS sets entries for all internal destinations
Inter-AS accepts external destinations from neighbor ASes
- E.g., a gateway in AS1 learns 128.194/16 is reachable via AS2
Inter-AS broadcasts pairs (subnet, exit router)
- E.g., a gateway in AS1 notifies all routers in AS1 that it can reach 128.194/16
Inter-AS protocol is also responsible to determine which path to take while multiple path.
- E.g., hot potato routing: use the closest exit point

Routing in the Internet

Intra-AS routing
- RIP: Routing Information Protocol (DV)
  - Distance metric: # of hops (max = 15)
  - DV exchanged among neighbors every 30 sec using advertisement messages
    - UDP port 520
  - If no advertisement heard after 180 sec –> neighbor/link declared dead
  - QoS only applies to specialized packets generated by ISP
  - RIP routing tables managed by an application-level process called routed (daemon)
- OSPF: Open Shortest Path First (LS)
  - Flood ads messages to entire AS
  - use IP datagram with protocol number 89, authenticated
  - Layer 3.5
  - Hierarchical OSPF in large domains
- IGRP: Interior Gateway Routing Protocol (Cisco, DV); EIGRP (Extended IGRP)
- IS-IS (Intermediate System to Intermediate System, LS, independent of the network layer)
Inter-AS, only one option
- BGP (Border Gateway Protocol)
- All ISPs must support it

BGP, Border Gateway Protocol

BGP allows a subnet to advertise its existence to the rest of the Internet: “I am here”. It also provides each AS a means to:

Obtain subnet reachability information from neighboring ASes
Propagate the reachability information to all routers internal to the AS
Determine “good” routes to subnets based on reachability information and policy

Basics:

Pairs of routers (BGP peers) exchange routing info over TCP (port 179) connections: BGP sessions.
- Note that BGP sessions do not correspond to physical links.
When AS2 advertises a prefix 128.194/16 to AS1, AS2 is promising it will forward any datagrams destined to that prefix towards the prefix
- AS2 can aggregate prefixes in its advertisement
eBGP session is between ASes
iBGP will distribute info to all routers inside an AS
Internal AS routers combine intra-AS data with iBGP broadcasts to set up actual forwarding tables

Path Attributes & BGP Routes:

When advertising an IP prefix (i.e., subnet), message includes BGP attributes
- Route: subnet + attributes
Two important attributes:
- AS-PATH: contains ASes through which the advert for the prefix passed (latest AS first)
- NEXT-HOP: indicates the router that should receive traffic (usually border router of the AS that advertised prefix; multiple values possible)

Route selection:

When gateway router receives route advert, it uses an import policy to accept/decline
- Filters and rules decide allowed/prohibited routes
Decide which route is better
- Multi-exit discriminator (MED) attribute: policy of foreign AS that assigns different weight to different incoming points
- Shortest AS-PATH
- Closest NEXT-HOP router: hot potato routing
- Local preference attribute: policy decision of accepting AS that assigns different weight to various exit points (only used in iBGP)
ISPs want to route mainly to/from their customers!

Broadcast and multicast routing

Broadcast: send a packet to all hosts in the network
Multicast: send to a certain subset of nodes
Unicast: one sender, one receiver

Broadcast

Controlled flooding: routers re-broadcast each received packet only once
- Must keep a table of all previously received pkts to avoid re-sending of the same data (not scalable)
Reverse-path forwarding (RPF)
- routers re-broadcast only packets received on the interface leading towards the source along their own shortest path
- Drawbacks of RPF: redundant packets are still transmitted and routing must be symmetric
Minimum Spanning Tree
- a tree subgraph of G that spans all network nodes and has the minimum cost of all such trees
- Kruskal’s and Prim’s algorithms build MST in $O(E \log E)$ time
- often impractical due to lack of global knowledge
Center-Based Spanning Tree: a “center” node is selected first (various methods exist)
- All other nodes asynchronously send join requests using unicast routing towards the center until intersection with tree

Multicast

Multicast involves a subset of routers. Goal is to create a tree between routers to which multicast group members are attached. (NP-complete)

Approach:

Source-based mcast forwarding tree
- tree of shortest path routes from source S to all receivers
- Dijkstra’s algorithm when S knows entire topology from LS algorithm like MOSPF
Source-specific RPF (default opt-in)
- Initially flood every router, not ideal
- Remove unrelated routers from the tree. Prune messages sent upstream by router with no downstream group members
Steiner Tree: minimum cost tree connecting all routers with attached group members
- NP-complete
- Even though heuristics exists, not used in practice
Center-Based Tree (CBT) (default opt-out)
- Single delivery tree shared by all
- One router identified as “center” of tree
- Join messages sent towards center until existing tree is met

DVMRP

DVMRP: Distance Vector Multicast Routing Protocol, RFC 1075 (1988)
Flood and prune (default opt-in): reverse path forwarding (RPF), tree rooted at source
IGMP broadcasts proceed between neighbor routers
Multicast IP addresses are in 224.0.0.0/4
- To join a particular group, use setsockopt with IP_ADD_MEMBERSHIP

Tunneling: Mcast datagram encapsulated inside “normal” (non-multicast-addressed) datagram.

Unicast IP datagram sent thru “tunnel” via regular IP unicast to receiving mcast router
Receiving mcast router decapsulates mcast datagrams

PIM: Protocol Independent Multicast

Dense (default opt-in)
Sparse (default opt-out)

Multicast future

Wide-area multicast deployment has been traditionally slow, now practically dead
- E.g., Mbone
One issue is scalability
- Flooding all Internet receivers is just insane
- Opens loopholes for DDoS attacks
ISP unwillingness to accept multicast traffic
- Who pays for a single packet being replicated 1M times?
Finally, multicast congestion control is hard

Reference

This is my class notes while taking CSCE 612 at TAMU. Credit to the instructor Dr. Loguinov.

AI 2

Algorithm 17

Amazon 1

Authorization 1

Blog 3

Bootstrap 1

C++ 1

CCpp 5

CSS 2

Cloud 3

Code 1

Crawler 1

DNS 1

Database 17

DeepLearning 1

Design 17

Development 1

Docker 1

English 1

Express 1

GDB 1

Go 3

Google 4

HTML 3

IOS 1

Java 17

Javascript 4

Jekyll 1

Linux 4

MacOS 2

MachineLearning 17

Markdown 4

Mobile 1

MongoDB 2

Multi-threading 3

NAS 1

Network 11

NeuralNetwork 10

Node 1

OS 8

Public-speaking 1

Python 15

RESTful 1

Rails 9

React 1

Redis 1

Ruby 6

Shell 2

Spring 2

System 17

TCP 1

TDD 1

Thread 2

Vim 1

awk 1

git 1

jQuery 1

media 1

network 1

php 1