TAMU Computer Networks 2 Application Layer

Principles

3 architectures:

Client-server
Peer-to-peer
- Example: Gntella
- Highly scalable
- Difficult to be reliable
Hybrid
- Example: Napster, instant msg (search, auth part can be centralized)

Web and HTTP

Requests and responses

URL format:

scheme://[user:pass@]host[:port][/path][?query][#fragment]

HTTP request is [/path][?query]

Example

A example of HTTP request:

GET /courses/ HTTP/1.0\r\n
Host: irl.cs.tamu.edu\r\n
Connection: close\r\n
\r\n

HTTP response:

HTTP/1.0 200 OK\r\n
Cache-Control: private\r\n
Content-Type: text/html\r\n
Server: Microsoft-IIS/7.0\r\n
X-Powered-By: ASP.NET\r\n
MicrosoftOfficeWebServer: 5.0_Pub\r\n
MS-Author-Via: MS-FP/4.0\r\n
Date: Thu, 17 Jan 2013 09:22:34 GMT\r\n
Connection: close\r\n
Content-Length: 16367\r\n
\r\n
<html>
<head>
<meta http-equiv="Content-Language"
content="en-us"> <meta http-
equiv="Content-Type"
content="text/html; charset=windows-
1252">...

Overview

HTTP: HyperText Transfer Protocol

HTTP 1.0: RFC 1945 (1996)
HTTP 1.1: RFC 2068 (1997), RFC 2616 (1999)
HTTP 2: RFC 7540 (May 2015)

HTTP can be:

Non-persistent
- At most one object is sent over a TCP connection
- HTTP/1.0
Persistent
- Multiple objects sent over a single TCP connection
- HTTP/1.1
- “Connection: close” overrides this behavior

Browsers can open parallel TCP connections to fetch referenced objects, called pipelining.

Methods

HTTP/1.0
- GET
- POST
- HEAD
HTTP/1.1
- GET, POST, HEAD
- PUT
- DELETE

Upload Input

POST
- Input is in the entity body, used for large amount of data
URL method
- Use GET
- Input is encoded in the URL field, after ?
- GET /map.cgi?city=College+Station&zip=77843 HTTP/1.0

Status code

200, OK
301, Move Permanently
400, Bad Request
404, Not Found
505, HTTP Version Not Supported

Cookies

Cookies keep user-server states.

Cookie header, line in HTTP response
- Set-cookie: 1112
Cookie file, kept on host and managed by user’s browser
Cookie header, line in HTTP request
- Cookie: 1112
Back-end DB at websites

We can specify the path for cookie in Set-Cookie: ...; path=/. Shared caching can be not allowed by Cache-control: private.

Web caches (proxy server)

Goal is to satisfy client request without involving origin server.

Browsers sends requests via cache, or cache intercepts all outgoing HTTP traffic
- Object in cache: just return the object
- Else: cache requests object from origin server, then return it to client
Cache acts as both client and server
Installed by ISP, university or company

Purpose:

Reduce request time
Reduce traffic on access link
Reduce load on servers
Increase security, proxy server can scan objects
Filter URLs to prevent undesirable destinations

Conditional GET

Don’t send object if cache has up-to-date cached version. Server can also specify expiration by Expires: Sat, 01 Oct 2011 16:00:00 GMT.

If-modified-since: <date>
HTTP/1.0 304 Not Modified
- Else: HTTP/1.0 200 OK <data>

Robots.txt

/robots.txt is used by sites to protect some of their contents from web crawlers.

Crawl-delay specifies the # of seconds between visits
Sitemap points to an XML file that lists all available documents

User-agent: *
Disallow: /images
Disallow: /catalogs
Allow: /catalogs/about
Allow: /catalogs/p?
Disallow: /catalogues
User-agent: *
Disallow: /*.asp$
Disallow: /sdch/*.php
Crawl-delay: 64
Sitemap: http://www.google.com/sitemaps_webmasters.xml

FTP

Transfer file to/from remote host
RFC 959
TCP, Port: 21
Mode
- Active, server opens data connection to client
- Passive, client opens connection. Useful when client is behind a firewall

Command:

USER username
PASS password
PORT or PASV
LIST return list of files in current directory
RETR filename, retrieve file
STOR filename, put file on to remote host

Email

SMTP
POP3
IMAP

SMTP

SMTP transfers messages from user agents to mail servers and between mail servers.

A push protocol
Port: 25
3 phases
- SMTP handshake
- Transfer of messages
- Closure
Commands
- ASCII text separated by \r\n
Responses
- Status code and phrase (one line)
Non-pipelined persistent
Message must in 7-bit ASCII.

Mail servers has:

message queue of outgoing mails
mailbox for incoming mails for user

Command

HELO host
MAIL FROM:<sender_addr>
RCPT TO:<recv_addr>
DATA
Can type message now
End message with . by itself in a line
- UA will insert a dot in front of all lines already starting with a dot
QUIT

Access protocols

POP3
- Port: 110
- auth then download
- stateless
- Commands: user, pass, list, retr(retrieve msg), dele, quit
- Responses: +OK, -ERR
IMAP
- Port: 143
- more features
- manipulation of stored messages on server
HTTP
- by Hotmail, Gmail, etc.

Message

Header lines
- To:
- From:
- Subject:
Body
- 7-bit ASCII

Format: MIME, multipurpose Internet mail extensions, RFCs 2045, 2056

Message can be encoded
Multiple objects separated by a specific boundary

Additional header lines for MIME:

From: alice@crepes.fr
To: bob@hamburger.edu
Subject: vacation pics
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Type: image/jpeg
RSAxNjAxOTQvTiAxNC9UIDkyNzg0OS9
IIFsgNTcwIDQ2N10+Pg1lbmRvYmoNIC

Boundary:

Content-Type: multipart/mixed;
boundary="----=_NextPart_000_0074_01C6DB4C.731EBEB0"
This is a multi-part message in MIME format.
------=_NextPart_000_0074_01C6DB4C.731EBEB0
Content-Type: text/plain;charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Some text message here...
------=_NextPart_000_0074_01C6DB4C.731EBEB0
Content-Type: application/pdf;name="9-18-06.pdf"
Content-Transfer-Encoding: base64

DNS

DNS stands for Domain Name System.

Distributed database
- Hierarchy of many name servers
Application Layer
- Host communicates with name servers to resolve
- UDP, port 53
- Single-packet query & response

Service:

Forward lookup
- Hostname to IP
Reverse
- IP to hostname
Host aliasing (CNAME)
Mail server (MX)
Load distribution
- replicate web servers, set of IP addresses for one DNS name

Hierarchy:

Root server, addresses are hardwired into OS, 13 of them
- Top-level domain (TLD) server, com, org, edu, country etc.
  - company server, university, etc

Local name servers don’t belong to this hierarchy. It can be any computer that accepts DNS requests and finds out answer by traversing the DNS tree. It does iterated query by asking servers on DNS tree path one by one.

Local server:

Set server to be 127.0.0.1 if run BIND
Auto-configure via DHCP
Set to 8.8.8.8 (Google)

Records

Once (any) name server learns a mapping, it caches the mapping. Cache entries time out (disappear) after some time (TTL).

If a record comes from cache, it is called non-authoritative
If original DNS server is contacted, the record is authoritative

TLD servers are typically cached in local servers.

Record has the format of (name, value, type ttl).

Types:

A: host, IPv4
NS: domain, hostname of the authoritative name server for this domain
CNAME: host, host it’s aliased to
MX: domain, name of SMTP server for this domain

Protocol

Formats of query and reply message are the same. All numbers are in network byte order.

// | width of a line is 4 bytes |
TXID | flags
nQuestions | nAnswers
nAuthority | nAdditional
questions (variable size)
answers (variable size)
authority (variable size)
additional (variable size)

Transaction ID (TXID)
- 16-bit number for each query by client
- Echoed by server
Flags
- type of request and response status
Other 4 fields are counts of 4 sections
Question
- Queries contain only the question section
- Response packets always repeat the question
Authority
- carries NS records
- Used during iterative lookups to specify next DNS server to query

Flags has 16 bits:

QR(1)
- 0 for query, 1 for response
opcode(4)
- 0 for standard query
AA(1)
- authoritative answer
TC(1), truncated response
RD(1), recursion desired
RA(1), recursion available
reserved(3)
result(4)
- 0, success
- 1, format error
- 2, server failure
- 3, no DNS name

For query packet:

set RD = 1, all other fields 0
nQuestions = 1

To query DNS reversely for hostname, construct the query as:

Question: reverse IP address and append a suffix
- example: 128.194.135.65 to 65.135.194.128.in-addr.arpa
- IPv6: ip6.arpa
Type: PTR

The question has the format of

str1_size + str1 + ... + strn_size + strn + 0 + query_type + query_class
// size is 1 byte
// query_type and class are 2 bytes

Packet parsing

Answers start with the name of the record, followed by a fixed DNS reply header, then the answer. 0x0 denotes the end of a string.

Answers can be compressed. The read cursor needs to jump to a position in this packet. Denote by 2 upper bits 11 at size field, next 14 bits are jump offset.

// uncompressed
0x3 "www" 0x6 "google" ox3 "com 0x0 <DNSanswerHdr> <ANSWER>
//compressed
0xC0 0x0C <DNSanswerHdr> <ANSWER>

For type A, the answer is a 4-byte IP.

Caveats to be handled:

Jump outside of the packet
Infinite jump
- Detects it if jumping more times than the number of chars in packet.

Vulnerabilities

IP spoofing

IP spoofing is packets with fake source IP.

For spoofing to work, ISP network of attacker must allow such packets to depart
TCP spoofing is hard; UDP easy

Amplification attacks is that Hacker transmits small packets to intermediate hosts, which
then generate more traffic towards the victim. It is a kind of DDoS (Distributed Denial of Service) attacks.

DNS can be used for amplification. 65 bytes (40 for header and 15 for question) to 512 bytes (max over UDP).

Large DNS reply:

DNS TXT queries
many A records
IPv6
DNS extensions

Remote TXID Guess attack

DNS responses cannot be verified for authenticity
Possible for attacker to send fake replies to fool local resolver
Attacker must send fake reply quicker than the authoritative server
- DNS servers use only the first reply they get, ignore all others

Attacker must know:

Local DNS server’s IP
Query string

Recursive DNS resolver rejects answers unless:

Source IP of reply matches that of the authoritative server
Local port number is correct
TXID in DNS header matches that of the query

Cache poisoning

Attacker must wait until target expires, then pull off attack just before the host gets cached again.

But NS records override cached versions if they come from an authoritative server. Here is a good video about this issue.

With known client port number:

Local user issues request for hash1.chase.com (not cached).
Sends K spoofed packets to LR with random TXIDs.
Spoofed packets have no answers, only NS and additional records for domain chase.com.
- NS points to the badguy name server
- Additionals contain the A records for the badguy name server
If attack does not work, repeat with hash2.chase.com.
Response manages to overwrite existing NS entries!

To fix it:

Randomization of port numbers for each query (IIS, BIND)
Random capitalization of query strings (wWw.ChasE.coM) and case-sensitive comparison of answers (Pydig, Unbound)

Domain flux

Infected hosts are organized into botnets.
Botnet is under control of a botmaster.
Early botnets used Internet Relay Chat to send and receive commands
- Blocked by ISPs eventually.
- Easy to target IRC servers.
New generation of botnets uses dynamically changing points C&C (command & control).
- C&C’s IPs rapidly change over time.

Fast flux:

Fast flux is a method for discovering the IP address of C&C and other resources the botnet may need.
- Botmaster registers a domain (say xyz.com) and controls the DNS server ns.xyz.com
Botnet contacts nameserver ns.xyz.com and obtains the current IP of the C&C (or multiple ones)
- Performs a type-A lookup on hash.xyz.com
Main defense against botnet traffic is blocking communication with the C&C
TLD servers auto-detect fast flux and block suspected domains in conjunction with the registrar

Domain flux:

Botnet constantly generates random domain names and tries to resolve them to find the C&C.

Current domain name stays in effect until it is blocked.
In reality, the botnet goes through thousands of failed lookup attempts until it finds an active domain.
In some cases, reverse engineering the random generator allows one to predict future domain names.

CDN

Content Distribution Networks (CDNs):

Push replicated content (files, video, images) towards edges
Distributed system of application-layer servers
Example: Akamai
Over 200K in 120 countries and 1500 networks

To get user the closest replica:

Akamai relies on DNS to bounce the user to the best server
Based on location of local resolver to find best server (e.g. using distance, load, latency, available bandwidth)
Often Akamai produces long redirect chains
- Usually through CNAMEs based on the IP of local resolver

Drawback:

distance from user to their local resolver is generally unknown
long resolution chains
- Caching helps with latency, but Akamai uses extremely small TTLs (e.g., 20 sec)

P2P

Evolution

Napster (1999)
- Centralized directory server
- Single point failure
- Doesn’t scale
Gnutella/0.4 (2001)
- Fully distributed
- Overlay network: graph of all peers and edges
- Search by flooding to some depth
- Download from a single user
  - Unreliable, inefficient
KaZaA(2002), Gnutella/0.6
- Peer is either a group leader (supernode) or assigned to one
- Group leader tracks the content of all its children, acting like a mini-Napster
- Peers query their group leaders, which flood the supernode graph to search
- Parallel downloads

Other P2P

Seed: user holding a complete file.

BitTorrent(2001):

Let non-seeds grab chunks from each other.
Rarest chunk in torrent is replicated first.
Force peers to transfer chunks they have.

Tor(Onion Router):

Packets are sent through a random chain of P2P nodes
Extremely slow
Many exit points are known and blocked

Freenet:

Anonymous info exchange
hiding identities of communicating parties

Skype

Directly between users
Or relayed through non-firewalled peers

Distributed hash tables

General class of P2P systems that map information into high-dimensional search space with guaranteed $log(N)$ bounds on delay to find content.
Chord

Reference

This is my class notes while taking CSCE 612 at TAMU. Credit to the instructor Dr. Loguinov.

AI 2

Algorithm 17

Amazon 1

Authorization 1

Blog 3

Bootstrap 1

C++ 1

CCpp 5

CSS 2

Cloud 3

Code 1

Crawler 1

DNS 1

Database 17

DeepLearning 1

Design 17

Development 1

Docker 1

English 1

Express 1

GDB 1

Go 3

Google 4

HTML 3

IOS 1

Java 17

Javascript 4

Jekyll 1

Linux 4

MacOS 2

MachineLearning 18

Markdown 4

Mobile 1

MongoDB 2

Multi-threading 3

NAS 1

Network 11

NeuralNetwork 10

Node 1

OS 8

Public-speaking 1

Python 15

RESTful 1

Rails 9

React 1

Redis 1

Ruby 6

Shell 2

Spring 2

System 17

TCP 1

TDD 1

Thread 2

Vim 1

awk 1

git 1

jQuery 1

media 1

network 1

php 1