Principles
3 architectures:
- Client-server
- Peer-to-peer
- Example: Gntella
- Highly scalable
- Difficult to be reliable
- Hybrid
- Example: Napster, instant msg (search, auth part can be centralized)
Web and HTTP
Requests and responses
URL format:
scheme://[user:pass@]host[:port][/path][?query][#fragment]
HTTP request is [/path][?query]
Example
A example of HTTP request:
GET /courses/ HTTP/1.0\r\n
Host: irl.cs.tamu.edu\r\n
Connection: close\r\n
\r\n
HTTP response:
HTTP/1.0 200 OK\r\n
Cache-Control: private\r\n
Content-Type: text/html\r\n
Server: Microsoft-IIS/7.0\r\n
X-Powered-By: ASP.NET\r\n
MicrosoftOfficeWebServer: 5.0_Pub\r\n
MS-Author-Via: MS-FP/4.0\r\n
Date: Thu, 17 Jan 2013 09:22:34 GMT\r\n
Connection: close\r\n
Content-Length: 16367\r\n
\r\n
<html>
<head>
<meta http-equiv="Content-Language"
content="en-us"> <meta http-
equiv="Content-Type"
content="text/html; charset=windows-
1252">...
Overview
HTTP: HyperText Transfer Protocol
- HTTP 1.0: RFC 1945 (1996)
- HTTP 1.1: RFC 2068 (1997), RFC 2616 (1999)
- HTTP 2: RFC 7540 (May 2015)
HTTP can be:
- Non-persistent
- At most one object is sent over a TCP connection
- HTTP/1.0
- Persistent
- Multiple objects sent over a single TCP connection
- HTTP/1.1
- “Connection: close” overrides this behavior
Browsers can open parallel TCP connections to fetch referenced objects, called pipelining.
Methods
HTTP/1.0
- GET
- POST
- HEAD
HTTP/1.1
- GET, POST, HEAD
- PUT
- DELETE
Upload Input
- POST
- Input is in the entity body, used for large amount of data
- URL method
- Use GET
- Input is encoded in the URL field, after
?
GET /map.cgi?city=College+Station&zip=77843 HTTP/1.0
Status code
- 200, OK
- 301, Move Permanently
- 400, Bad Request
- 404, Not Found
- 505, HTTP Version Not Supported
Cookies
Cookies keep user-server states.
- Cookie header, line in HTTP response
Set-cookie: 1112
- Cookie file, kept on host and managed by user’s browser
- Cookie header, line in HTTP request
Cookie: 1112
- Back-end DB at websites
We can specify the path for cookie in Set-Cookie: ...; path=/
. Shared caching can be not allowed by Cache-control: private
.
Web caches (proxy server)
Goal is to satisfy client request without involving origin server.
- Browsers sends requests via cache, or cache intercepts all outgoing HTTP traffic
- Object in cache: just return the object
- Else: cache requests object from origin server, then return it to client
- Cache acts as both client and server
- Installed by ISP, university or company
Purpose:
- Reduce request time
- Reduce traffic on access link
- Reduce load on servers
- Increase security, proxy server can scan objects
- Filter URLs to prevent undesirable destinations
Conditional GET
Don’t send object if cache has up-to-date cached version. Server can also specify expiration by Expires: Sat, 01 Oct 2011 16:00:00 GMT
.
If-modified-since: <date>
HTTP/1.0 304 Not Modified
- Else:
HTTP/1.0 200 OK <data>
- Else:
Robots.txt
/robots.txt
is used by sites to protect some of their contents from web crawlers.
- Crawl-delay specifies the # of seconds between visits
- Sitemap points to an XML file that lists all available documents
User-agent: *
Disallow: /images
Disallow: /catalogs
Allow: /catalogs/about
Allow: /catalogs/p?
Disallow: /catalogues
User-agent: *
Disallow: /*.asp$
Disallow: /sdch/*.php
Crawl-delay: 64
Sitemap: http://www.google.com/sitemaps_webmasters.xml
FTP
- Transfer file to/from remote host
- RFC 959
- TCP, Port: 21
- Mode
- Active, server opens data connection to client
- Passive, client opens connection. Useful when client is behind a firewall
Command:
USER username
PASS password
PORT
orPASV
LIST
return list of files in current directoryRETR filename
, retrieve fileSTOR filename
, put file on to remote host
- SMTP
- POP3
- IMAP
SMTP
SMTP transfers messages from user agents to mail servers and between mail servers.
- A push protocol
- Port: 25
- 3 phases
- SMTP handshake
- Transfer of messages
- Closure
- Commands
- ASCII text separated by
\r\n
- ASCII text separated by
- Responses
- Status code and phrase (one line)
- Non-pipelined persistent
- Message must in 7-bit ASCII.
Mail servers has:
- message queue of outgoing mails
- mailbox for incoming mails for user
Command
HELO host
MAIL FROM:<sender_addr>
RCPT TO:<recv_addr>
DATA
- Can type message now
- End message with
.
by itself in a line- UA will insert a dot in front of all lines already starting with a dot
QUIT
Access protocols
- POP3
- Port: 110
- auth then download
- stateless
- Commands:
user
,pass
,list
,retr
(retrieve msg),dele
,quit
- Responses:
+OK
,-ERR
- IMAP
- Port: 143
- more features
- manipulation of stored messages on server
- HTTP
- by Hotmail, Gmail, etc.
Message
- Header lines
- To:
- From:
- Subject:
- Body
- 7-bit ASCII
Format: MIME, multipurpose Internet mail extensions, RFCs 2045, 2056
- Message can be encoded
- Multiple objects separated by a specific boundary
Additional header lines for MIME:
From: alice@crepes.fr
To: bob@hamburger.edu
Subject: vacation pics
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Type: image/jpeg
RSAxNjAxOTQvTiAxNC9UIDkyNzg0OS9
IIFsgNTcwIDQ2N10+Pg1lbmRvYmoNIC
Boundary:
Content-Type: multipart/mixed;
boundary="----=_NextPart_000_0074_01C6DB4C.731EBEB0"
This is a multi-part message in MIME format.
------=_NextPart_000_0074_01C6DB4C.731EBEB0
Content-Type: text/plain;charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Some text message here...
------=_NextPart_000_0074_01C6DB4C.731EBEB0
Content-Type: application/pdf;name="9-18-06.pdf"
Content-Transfer-Encoding: base64
DNS
DNS stands for Domain Name System.
- Distributed database
- Hierarchy of many name servers
- Application Layer
- Host communicates with name servers to resolve
- UDP, port 53
- Single-packet query & response
Service:
- Forward lookup
- Hostname to IP
- Reverse
- IP to hostname
- Host aliasing (CNAME)
- Mail server (MX)
- Load distribution
- replicate web servers, set of IP addresses for one DNS name
Hierarchy:
- Root server, addresses are hardwired into OS, 13 of them
- Top-level domain (TLD) server, com, org, edu, country etc.
- company server, university, etc
- Top-level domain (TLD) server, com, org, edu, country etc.
Local name servers don’t belong to this hierarchy. It can be any computer that accepts DNS requests and finds out answer by traversing the DNS tree. It does iterated query by asking servers on DNS tree path one by one.
Local server:
- Set server to be
127.0.0.1
if run BIND - Auto-configure via DHCP
- Set to
8.8.8.8
(Google)
Records
Once (any) name server learns a mapping, it caches the mapping. Cache entries time out (disappear) after some time (TTL).
- If a record comes from cache, it is called non-authoritative
- If original DNS server is contacted, the record is authoritative
TLD servers are typically cached in local servers.
Record has the format of (name, value, type ttl)
.
Types:
- A: host, IPv4
- NS: domain, hostname of the authoritative name server for this domain
- CNAME: host, host it’s aliased to
- MX: domain, name of SMTP server for this domain
Protocol
Formats of query and reply message are the same. All numbers are in network byte order.
// | width of a line is 4 bytes |
TXID | flags
nQuestions | nAnswers
nAuthority | nAdditional
questions (variable size)
answers (variable size)
authority (variable size)
additional (variable size)
- Transaction ID (TXID)
- 16-bit number for each query by client
- Echoed by server
- Flags
- type of request and response status
- Other 4 fields are counts of 4 sections
- Question
- Queries contain only the question section
- Response packets always repeat the question
- Authority
- carries NS records
- Used during iterative lookups to specify next DNS server to query
Flags has 16 bits:
- QR(1)
- 0 for query, 1 for response
- opcode(4)
- 0 for standard query
- AA(1)
- authoritative answer
- TC(1), truncated response
- RD(1), recursion desired
- RA(1), recursion available
- reserved(3)
- result(4)
- 0, success
- 1, format error
- 2, server failure
- 3, no DNS name
For query packet:
- set RD = 1, all other fields 0
- nQuestions = 1
To query DNS reversely for hostname, construct the query as:
- Question: reverse IP address and append a suffix
- example:
128.194.135.65
to65.135.194.128.in-addr.arpa
- IPv6:
ip6.arpa
- example:
- Type:
PTR
The question has the format of
str1_size + str1 + ... + strn_size + strn + 0 + query_type + query_class
// size is 1 byte
// query_type and class are 2 bytes
Packet parsing
Answers start with the name of the record, followed by a fixed DNS reply header, then the answer. 0x0
denotes the end of a string.
Answers can be compressed. The read cursor needs to jump to a position in this packet. Denote by 2 upper bits 11 at size field, next 14 bits are jump offset.
// uncompressed
0x3 "www" 0x6 "google" ox3 "com 0x0 <DNSanswerHdr> <ANSWER>
//compressed
0xC0 0x0C <DNSanswerHdr> <ANSWER>
For type A, the answer is a 4-byte IP.
Caveats to be handled:
- Jump outside of the packet
- Infinite jump
- Detects it if jumping more times than the number of chars in packet.
Vulnerabilities
IP spoofing
IP spoofing is packets with fake source IP.
- For spoofing to work, ISP network of attacker must allow such packets to depart
- TCP spoofing is hard; UDP easy
Amplification attacks is that Hacker transmits small packets to intermediate hosts, which
then generate more traffic towards the victim. It is a kind of DDoS (Distributed Denial of Service) attacks.
DNS can be used for amplification. 65 bytes (40 for header and 15 for question) to 512 bytes (max over UDP).
Large DNS reply:
- DNS TXT queries
- many A records
- IPv6
- DNS extensions
Remote TXID Guess attack
- DNS responses cannot be verified for authenticity
- Possible for attacker to send fake replies to fool local resolver
- Attacker must send fake reply quicker than the authoritative server
- DNS servers use only the first reply they get, ignore all others
Attacker must know:
- Local DNS server’s IP
- Query string
Recursive DNS resolver rejects answers unless:
- Source IP of reply matches that of the authoritative server
- Local port number is correct
- TXID in DNS header matches that of the query
Cache poisoning
Attacker must wait until target expires, then pull off attack just before the host gets cached again.
But NS records override cached versions if they come from an authoritative server. Here is a good video about this issue.
With known client port number:
- Local user issues request for hash1.chase.com (not cached).
- Sends K spoofed packets to LR with random TXIDs.
- Spoofed packets have no answers, only NS and additional records for domain chase.com.
- NS points to the badguy name server
- Additionals contain the A records for the badguy name server
- If attack does not work, repeat with hash2.chase.com.
- Response manages to overwrite existing NS entries!
To fix it:
- Randomization of port numbers for each query (IIS, BIND)
- Random capitalization of query strings (wWw.ChasE.coM) and case-sensitive comparison of answers (Pydig, Unbound)
Domain flux
- Infected hosts are organized into botnets.
- Botnet is under control of a botmaster.
- Early botnets used Internet Relay Chat to send and receive commands
- Blocked by ISPs eventually.
- Easy to target IRC servers.
- New generation of botnets uses dynamically changing points C&C (command & control).
- C&C’s IPs rapidly change over time.
Fast flux:
- Fast flux is a method for discovering the IP address of C&C and other resources the botnet may need.
- Botmaster registers a domain (say xyz.com) and controls the DNS server ns.xyz.com
- Botnet contacts nameserver ns.xyz.com and obtains the current IP of the C&C (or multiple ones)
- Performs a type-A lookup on hash.xyz.com
- Main defense against botnet traffic is blocking communication with the C&C
- TLD servers auto-detect fast flux and block suspected domains in conjunction with the registrar
Domain flux:
Botnet constantly generates random domain names and tries to resolve them to find the C&C.
- Current domain name stays in effect until it is blocked.
- In reality, the botnet goes through thousands of failed lookup attempts until it finds an active domain.
- In some cases, reverse engineering the random generator allows one to predict future domain names.
CDN
Content Distribution Networks (CDNs):
- Push replicated content (files, video, images) towards edges
- Distributed system of application-layer servers
- Example: Akamai
- Over 200K in 120 countries and 1500 networks
To get user the closest replica:
- Akamai relies on DNS to bounce the user to the best server
- Based on location of local resolver to find best server (e.g. using distance, load, latency, available bandwidth)
- Often Akamai produces long redirect chains
- Usually through CNAMEs based on the IP of local resolver
Drawback:
- distance from user to their local resolver is generally unknown
- long resolution chains
- Caching helps with latency, but Akamai uses extremely small TTLs (e.g., 20 sec)
P2P
Evolution
- Napster (1999)
- Centralized directory server
- Single point failure
- Doesn’t scale
- Gnutella/0.4 (2001)
- Fully distributed
- Overlay network: graph of all peers and edges
- Search by flooding to some depth
- Download from a single user
- Unreliable, inefficient
- KaZaA(2002), Gnutella/0.6
- Peer is either a group leader (supernode) or assigned to one
- Group leader tracks the content of all its children, acting like a mini-Napster
- Peers query their group leaders, which flood the supernode graph to search
- Parallel downloads
Other P2P
Seed: user holding a complete file.
BitTorrent(2001):
- Let non-seeds grab chunks from each other.
- Rarest chunk in torrent is replicated first.
- Force peers to transfer chunks they have.
Tor(Onion Router):
- Packets are sent through a random chain of P2P nodes
- Extremely slow
- Many exit points are known and blocked
Freenet:
- Anonymous info exchange
- hiding identities of communicating parties
Skype
- Directly between users
- Or relayed through non-firewalled peers
Distributed hash tables
- General class of P2P systems that map information into high-dimensional search space with guaranteed bounds on delay to find content.
- Chord
Reference
This is my class notes while taking CSCE 612 at TAMU. Credit to the instructor Dr. Loguinov.