Read "Streaming Systems" 1&2, Streaming 101 Read "F1, a distributed SQL database that scales" Read "Zanzibar, Google’s Consistent, Global Authorization System" Read "Spanner, Google's Globally-Distributed Database" Read "Designing Data-intensive applications" 12, The Future of Data Systems IOS development with Swift Read "Designing Data-intensive applications" 10&11, Batch and Stream Processing Read "Designing Data-intensive applications" 9, Consistency and Consensus Read "Designing Data-intensive applications" 8, Distributed System Troubles Read "Designing Data-intensive applications" 7, Transactions Read "Designing Data-intensive applications" 6, Partitioning Read "Designing Data-intensive applications" 5, Replication Read "Designing Data-intensive applications" 3&4, Storage, Retrieval, Encoding Read "Designing Data-intensive applications" 1&2, Foundation of Data Systems Three cases of binary search TAMU Operating System 2 Memory Management TAMU Operating System 1 Introduction Overview in cloud computing 2 TAMU Operating System 7 Virtualization TAMU Operating System 6 File System TAMU Operating System 5 I/O and Disk Management TAMU Operating System 4 Synchronization TAMU Operating System 3 Concurrency and Threading TAMU Computer Networks 5 Data Link Layer TAMU Computer Networks 4 Network Layer TAMU Computer Networks 3 Transport Layer TAMU Computer Networks 2 Application Layer TAMU Computer Networks 1 Introduction Overview in distributed systems and cloud computing 1 A well-optimized Union-Find implementation, in Java A heap implementation supporting deletion TAMU Advanced Algorithms 3, Maximum Bandwidth Path (Dijkstra, MST, Linear) TAMU Advanced Algorithms 2, B+ tree and Segment Intersection TAMU Advanced Algorithms 1, BST, 2-3 Tree and Heap TAMU AI, Searching problems Factorization Machine and Field-aware Factorization Machine for CTR prediction TAMU Neural Network 10 Information-Theoretic Models TAMU Neural Network 9 Principal Component Analysis TAMU Neural Network 8 Neurodynamics TAMU Neural Network 7 Self-Organizing Maps TAMU Neural Network 6 Deep Learning Overview TAMU Neural Network 5 Radial-Basis Function Networks TAMU Neural Network 4 Multi-Layer Perceptrons TAMU Neural Network 3 Single-Layer Perceptrons Princeton Algorithms P1W6 Hash Tables & Symbol Table Applications Stanford ML 11 Application Example Photo OCR Stanford ML 10 Large Scale Machine Learning Stanford ML 9 Anomaly Detection and Recommender Systems Stanford ML 8 Clustering & Principal Component Analysis Princeton Algorithms P1W5 Balanced Search Trees TAMU Neural Network 2 Learning Processes TAMU Neural Network 1 Introduction Stanford ML 7 Support Vector Machine Stanford ML 6 Evaluate Algorithms Princeton Algorithms P1W4 Priority Queues and Symbol Tables Stanford ML 5 Neural Networks Learning Princeton Algorithms P1W3 Mergesort and Quicksort Stanford ML 4 Neural Networks Basics Princeton Algorithms P1W2 Stack and Queue, Basic Sorts Stanford ML 3 Classification Problems Stanford ML 2 Multivariate Regression and Normal Equation Princeton Algorithms P1W1 Union and Find Stanford ML 1 Introduction and Parameter Learning

TAMU Computer Networks 2 Application Layer

2018-04-18

Principles

3 architectures:

  • Client-server
  • Peer-to-peer
    • Example: Gntella
    • Highly scalable
    • Difficult to be reliable
  • Hybrid
    • Example: Napster, instant msg (search, auth part can be centralized)

Web and HTTP

Requests and responses

URL format:

scheme://[user:pass@]host[:port][/path][?query][#fragment]

HTTP request is [/path][?query]

Example

A example of HTTP request:

GET /courses/ HTTP/1.0\r\n
Host: irl.cs.tamu.edu\r\n
Connection: close\r\n
\r\n

HTTP response:

HTTP/1.0 200 OK\r\n
Cache-Control: private\r\n
Content-Type: text/html\r\n
Server: Microsoft-IIS/7.0\r\n
X-Powered-By: ASP.NET\r\n
MicrosoftOfficeWebServer: 5.0_Pub\r\n
MS-Author-Via: MS-FP/4.0\r\n
Date: Thu, 17 Jan 2013 09:22:34 GMT\r\n
Connection: close\r\n
Content-Length: 16367\r\n
\r\n
<html>
<head>
<meta http-equiv="Content-Language"
content="en-us"> <meta http-
equiv="Content-Type"
content="text/html; charset=windows-
1252">...

Overview

HTTP: HyperText Transfer Protocol

  • HTTP 1.0: RFC 1945 (1996)
  • HTTP 1.1: RFC 2068 (1997), RFC 2616 (1999)
  • HTTP 2: RFC 7540 (May 2015)

HTTP can be:

  • Non-persistent
    • At most one object is sent over a TCP connection
    • HTTP/1.0
  • Persistent
    • Multiple objects sent over a single TCP connection
    • HTTP/1.1
    • “Connection: close” overrides this behavior

Browsers can open parallel TCP connections to fetch referenced objects, called pipelining.

Methods

  • HTTP/1.0
    • GET
    • POST
    • HEAD
  • HTTP/1.1
    • GET, POST, HEAD
    • PUT
    • DELETE

Upload Input

  • POST
    • Input is in the entity body, used for large amount of data
  • URL method
    • Use GET
    • Input is encoded in the URL field, after ?
    • GET /map.cgi?city=College+Station&zip=77843 HTTP/1.0

Status code

  • 200, OK
  • 301, Move Permanently
  • 400, Bad Request
  • 404, Not Found
  • 505, HTTP Version Not Supported

Cookies

Cookies keep user-server states.

  • Cookie header, line in HTTP response
    • Set-cookie: 1112
  • Cookie file, kept on host and managed by user’s browser
  • Cookie header, line in HTTP request
    • Cookie: 1112
  • Back-end DB at websites

We can specify the path for cookie in Set-Cookie: ...; path=/. Shared caching can be not allowed by Cache-control: private.

Web caches (proxy server)

Goal is to satisfy client request without involving origin server.

  • Browsers sends requests via cache, or cache intercepts all outgoing HTTP traffic
    • Object in cache: just return the object
    • Else: cache requests object from origin server, then return it to client
  • Cache acts as both client and server
  • Installed by ISP, university or company

Purpose:

  • Reduce request time
  • Reduce traffic on access link
  • Reduce load on servers
  • Increase security, proxy server can scan objects
  • Filter URLs to prevent undesirable destinations

Conditional GET

Don’t send object if cache has up-to-date cached version. Server can also specify expiration by Expires: Sat, 01 Oct 2011 16:00:00 GMT.

  1. If-modified-since: <date>
  2. HTTP/1.0 304 Not Modified
    • Else: HTTP/1.0 200 OK <data>

Robots.txt

/robots.txt is used by sites to protect some of their contents from web crawlers.

  • Crawl-delay specifies the # of seconds between visits
  • Sitemap points to an XML file that lists all available documents
User-agent: *
Disallow: /images
Disallow: /catalogs
Allow: /catalogs/about
Allow: /catalogs/p?
Disallow: /catalogues
User-agent: *
Disallow: /*.asp$
Disallow: /sdch/*.php
Crawl-delay: 64
Sitemap: http://www.google.com/sitemaps_webmasters.xml

FTP

  • Transfer file to/from remote host
  • RFC 959
  • TCP, Port: 21
  • Mode
    • Active, server opens data connection to client
    • Passive, client opens connection. Useful when client is behind a firewall

Command:

  • USER username
  • PASS password
  • PORT or PASV
  • LIST return list of files in current directory
  • RETR filename, retrieve file
  • STOR filename, put file on to remote host

Email

  • SMTP
  • POP3
  • IMAP

SMTP

SMTP transfers messages from user agents to mail servers and between mail servers.

  • A push protocol
  • Port: 25
  • 3 phases
    • SMTP handshake
    • Transfer of messages
    • Closure
  • Commands
    • ASCII text separated by \r\n
  • Responses
    • Status code and phrase (one line)
  • Non-pipelined persistent
  • Message must in 7-bit ASCII.

Mail servers has:

  • message queue of outgoing mails
  • mailbox for incoming mails for user

Command

  • HELO host
  • MAIL FROM:<sender_addr>
  • RCPT TO:<recv_addr>
  • DATA
  • Can type message now
  • End message with . by itself in a line
    • UA will insert a dot in front of all lines already starting with a dot
  • QUIT

Access protocols

  • POP3
    • Port: 110
    • auth then download
    • stateless
    • Commands: user, pass, list, retr(retrieve msg), dele, quit
    • Responses: +OK, -ERR
  • IMAP
    • Port: 143
    • more features
    • manipulation of stored messages on server
  • HTTP
    • by Hotmail, Gmail, etc.

Message

  • Header lines
    • To:
    • From:
    • Subject:
  • Body
    • 7-bit ASCII

Format: MIME, multipurpose Internet mail extensions, RFCs 2045, 2056

  • Message can be encoded
  • Multiple objects separated by a specific boundary

Additional header lines for MIME:

From: alice@crepes.fr
To: bob@hamburger.edu
Subject: vacation pics
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Type: image/jpeg
RSAxNjAxOTQvTiAxNC9UIDkyNzg0OS9
IIFsgNTcwIDQ2N10+Pg1lbmRvYmoNIC

Boundary:

Content-Type: multipart/mixed;
boundary="----=_NextPart_000_0074_01C6DB4C.731EBEB0"
This is a multi-part message in MIME format.
------=_NextPart_000_0074_01C6DB4C.731EBEB0
Content-Type: text/plain;charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Some text message here...
------=_NextPart_000_0074_01C6DB4C.731EBEB0
Content-Type: application/pdf;name="9-18-06.pdf"
Content-Transfer-Encoding: base64

DNS

DNS stands for Domain Name System.

  • Distributed database
    • Hierarchy of many name servers
  • Application Layer
    • Host communicates with name servers to resolve
    • UDP, port 53
    • Single-packet query & response

Service:

  • Forward lookup
    • Hostname to IP
  • Reverse
    • IP to hostname
  • Host aliasing (CNAME)
  • Mail server (MX)
  • Load distribution
    • replicate web servers, set of IP addresses for one DNS name

Hierarchy:

  • Root server, addresses are hardwired into OS, 13 of them
    • Top-level domain (TLD) server, com, org, edu, country etc.
      • company server, university, etc

Local name servers don’t belong to this hierarchy. It can be any computer that accepts DNS requests and finds out answer by traversing the DNS tree. It does iterated query by asking servers on DNS tree path one by one.

Local server:

  • Set server to be 127.0.0.1 if run BIND
  • Auto-configure via DHCP
  • Set to 8.8.8.8 (Google)

Records

Once (any) name server learns a mapping, it caches the mapping. Cache entries time out (disappear) after some time (TTL).

  • If a record comes from cache, it is called non-authoritative
  • If original DNS server is contacted, the record is authoritative

TLD servers are typically cached in local servers.

Record has the format of (name, value, type ttl).

Types:

  • A: host, IPv4
  • NS: domain, hostname of the authoritative name server for this domain
  • CNAME: host, host it’s aliased to
  • MX: domain, name of SMTP server for this domain

Protocol

Formats of query and reply message are the same. All numbers are in network byte order.

// | width of a line is 4 bytes |
TXID | flags
nQuestions | nAnswers
nAuthority | nAdditional
questions (variable size)
answers (variable size)
authority (variable size)
additional (variable size)
  • Transaction ID (TXID)
    • 16-bit number for each query by client
    • Echoed by server
  • Flags
    • type of request and response status
  • Other 4 fields are counts of 4 sections
  • Question
    • Queries contain only the question section
    • Response packets always repeat the question
  • Authority
    • carries NS records
    • Used during iterative lookups to specify next DNS server to query

Flags has 16 bits:

  • QR(1)
    • 0 for query, 1 for response
  • opcode(4)
    • 0 for standard query
  • AA(1)
    • authoritative answer
  • TC(1), truncated response
  • RD(1), recursion desired
  • RA(1), recursion available
  • reserved(3)
  • result(4)
    • 0, success
    • 1, format error
    • 2, server failure
    • 3, no DNS name

For query packet:

  • set RD = 1, all other fields 0
  • nQuestions = 1

To query DNS reversely for hostname, construct the query as:

  • Question: reverse IP address and append a suffix
    • example: 128.194.135.65 to 65.135.194.128.in-addr.arpa
    • IPv6: ip6.arpa
  • Type: PTR

The question has the format of

str1_size + str1 + ... + strn_size + strn + 0 + query_type + query_class
// size is 1 byte
// query_type and class are 2 bytes

Packet parsing

Answers start with the name of the record, followed by a fixed DNS reply header, then the answer. 0x0 denotes the end of a string.

Answers can be compressed. The read cursor needs to jump to a position in this packet. Denote by 2 upper bits 11 at size field, next 14 bits are jump offset.

// uncompressed
0x3 "www" 0x6 "google" ox3 "com 0x0 <DNSanswerHdr> <ANSWER>
//compressed
0xC0 0x0C <DNSanswerHdr> <ANSWER>

For type A, the answer is a 4-byte IP.

Caveats to be handled:

  • Jump outside of the packet
  • Infinite jump
    • Detects it if jumping more times than the number of chars in packet.

Vulnerabilities

IP spoofing

IP spoofing is packets with fake source IP.

  • For spoofing to work, ISP network of attacker must allow such packets to depart
  • TCP spoofing is hard; UDP easy

Amplification attacks is that Hacker transmits small packets to intermediate hosts, which
then generate more traffic towards the victim. It is a kind of DDoS (Distributed Denial of Service) attacks.

DNS can be used for amplification. 65 bytes (40 for header and 15 for question) to 512 bytes (max over UDP).

Large DNS reply:

  • DNS TXT queries
  • many A records
  • IPv6
  • DNS extensions

Remote TXID Guess attack

  • DNS responses cannot be verified for authenticity
  • Possible for attacker to send fake replies to fool local resolver
  • Attacker must send fake reply quicker than the authoritative server
    • DNS servers use only the first reply they get, ignore all others

Attacker must know:

  • Local DNS server’s IP
  • Query string

Recursive DNS resolver rejects answers unless:

  • Source IP of reply matches that of the authoritative server
  • Local port number is correct
  • TXID in DNS header matches that of the query

Cache poisoning

Attacker must wait until target expires, then pull off attack just before the host gets cached again.

But NS records override cached versions if they come from an authoritative server. Here is a good video about this issue.

With known client port number:

  1. Local user issues request for hash1.chase.com (not cached).
  2. Sends K spoofed packets to LR with random TXIDs.
  3. Spoofed packets have no answers, only NS and additional records for domain chase.com.
    • NS points to the badguy name server
    • Additionals contain the A records for the badguy name server
  4. If attack does not work, repeat with hash2.chase.com.
  5. Response manages to overwrite existing NS entries!

To fix it:

  • Randomization of port numbers for each query (IIS, BIND)
  • Random capitalization of query strings (wWw.ChasE.coM) and case-sensitive comparison of answers (Pydig, Unbound)

Domain flux

  • Infected hosts are organized into botnets.
  • Botnet is under control of a botmaster.
  • Early botnets used Internet Relay Chat to send and receive commands
    • Blocked by ISPs eventually.
    • Easy to target IRC servers.
  • New generation of botnets uses dynamically changing points C&C (command & control).
    • C&C’s IPs rapidly change over time.

Fast flux:

  • Fast flux is a method for discovering the IP address of C&C and other resources the botnet may need.
    • Botmaster registers a domain (say xyz.com) and controls the DNS server ns.xyz.com
  • Botnet contacts nameserver ns.xyz.com and obtains the current IP of the C&C (or multiple ones)
    • Performs a type-A lookup on hash.xyz.com
  • Main defense against botnet traffic is blocking communication with the C&C
  • TLD servers auto-detect fast flux and block suspected domains in conjunction with the registrar

Domain flux:

Botnet constantly generates random domain names and tries to resolve them to find the C&C.

  • Current domain name stays in effect until it is blocked.
  • In reality, the botnet goes through thousands of failed lookup attempts until it finds an active domain.
  • In some cases, reverse engineering the random generator allows one to predict future domain names.

CDN

Content Distribution Networks (CDNs):

  • Push replicated content (files, video, images) towards edges
  • Distributed system of application-layer servers
  • Example: Akamai
  • Over 200K in 120 countries and 1500 networks

To get user the closest replica:

  • Akamai relies on DNS to bounce the user to the best server
  • Based on location of local resolver to find best server (e.g. using distance, load, latency, available bandwidth)
  • Often Akamai produces long redirect chains
    • Usually through CNAMEs based on the IP of local resolver

Drawback:

  • distance from user to their local resolver is generally unknown
  • long resolution chains
    • Caching helps with latency, but Akamai uses extremely small TTLs (e.g., 20 sec)

P2P

Evolution

  • Napster (1999)
    • Centralized directory server
    • Single point failure
    • Doesn’t scale
  • Gnutella/0.4 (2001)
    • Fully distributed
    • Overlay network: graph of all peers and edges
    • Search by flooding to some depth
    • Download from a single user
      • Unreliable, inefficient
  • KaZaA(2002), Gnutella/0.6
    • Peer is either a group leader (supernode) or assigned to one
    • Group leader tracks the content of all its children, acting like a mini-Napster
    • Peers query their group leaders, which flood the supernode graph to search
    • Parallel downloads

Other P2P

Seed: user holding a complete file.

BitTorrent(2001):

  • Let non-seeds grab chunks from each other.
  • Rarest chunk in torrent is replicated first.
  • Force peers to transfer chunks they have.

Tor(Onion Router):

  • Packets are sent through a random chain of P2P nodes
  • Extremely slow
  • Many exit points are known and blocked

Freenet:

  • Anonymous info exchange
  • hiding identities of communicating parties

Skype

  • Directly between users
  • Or relayed through non-firewalled peers

Distributed hash tables

  • General class of P2P systems that map information into high-dimensional search space with guaranteed bounds on delay to find content.
  • Chord

Reference

This is my class notes while taking CSCE 612 at TAMU. Credit to the instructor Dr. Loguinov.


Creative Commons License
Melon blog is created by melonskin. This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
© 2016-2019. All rights reserved by melonskin. Powered by Jekyll.