Read "Streaming Systems" 1&2, Streaming 101 Read "F1, a distributed SQL database that scales" Read "Zanzibar, Google’s Consistent, Global Authorization System" Read "Spanner, Google's Globally-Distributed Database" Read "Designing Data-intensive applications" 12, The Future of Data Systems IOS development with Swift Read "Designing Data-intensive applications" 10&11, Batch and Stream Processing Read "Designing Data-intensive applications" 9, Consistency and Consensus Read "Designing Data-intensive applications" 8, Distributed System Troubles Read "Designing Data-intensive applications" 7, Transactions Read "Designing Data-intensive applications" 6, Partitioning Read "Designing Data-intensive applications" 5, Replication Read "Designing Data-intensive applications" 3&4, Storage, Retrieval, Encoding Read "Designing Data-intensive applications" 1&2, Foundation of Data Systems Three cases of binary search TAMU Operating System 2 Memory Management TAMU Operating System 1 Introduction Overview in cloud computing 2 TAMU Operating System 7 Virtualization TAMU Operating System 6 File System TAMU Operating System 5 I/O and Disk Management TAMU Operating System 4 Synchronization TAMU Operating System 3 Concurrency and Threading TAMU Computer Networks 5 Data Link Layer TAMU Computer Networks 4 Network Layer TAMU Computer Networks 3 Transport Layer TAMU Computer Networks 2 Application Layer TAMU Computer Networks 1 Introduction Overview in distributed systems and cloud computing 1 A well-optimized Union-Find implementation, in Java A heap implementation supporting deletion TAMU Advanced Algorithms 3, Maximum Bandwidth Path (Dijkstra, MST, Linear) TAMU Advanced Algorithms 2, B+ tree and Segment Intersection TAMU Advanced Algorithms 1, BST, 2-3 Tree and Heap TAMU AI, Searching problems Factorization Machine and Field-aware Factorization Machine for CTR prediction TAMU Neural Network 10 Information-Theoretic Models TAMU Neural Network 9 Principal Component Analysis TAMU Neural Network 8 Neurodynamics TAMU Neural Network 7 Self-Organizing Maps TAMU Neural Network 6 Deep Learning Overview TAMU Neural Network 5 Radial-Basis Function Networks TAMU Neural Network 4 Multi-Layer Perceptrons TAMU Neural Network 3 Single-Layer Perceptrons Princeton Algorithms P1W6 Hash Tables & Symbol Table Applications Stanford ML 11 Application Example Photo OCR Stanford ML 10 Large Scale Machine Learning Stanford ML 9 Anomaly Detection and Recommender Systems Stanford ML 8 Clustering & Principal Component Analysis Princeton Algorithms P1W5 Balanced Search Trees TAMU Neural Network 2 Learning Processes TAMU Neural Network 1 Introduction Stanford ML 7 Support Vector Machine Stanford ML 6 Evaluate Algorithms Princeton Algorithms P1W4 Priority Queues and Symbol Tables Stanford ML 5 Neural Networks Learning Princeton Algorithms P1W3 Mergesort and Quicksort Stanford ML 4 Neural Networks Basics Princeton Algorithms P1W2 Stack and Queue, Basic Sorts Stanford ML 3 Classification Problems Stanford ML 2 Multivariate Regression and Normal Equation Princeton Algorithms P1W1 Union and Find Stanford ML 1 Introduction and Parameter Learning

Princeton Algorithms P1W4 Priority Queues and Symbol Tables

2017-01-11

Priority Queues

  1. Stack: Remove the item most recently added
  2. Queue: Remove the item least recently added
  3. Randomized queue: Remove a random item
  4. Priority queue: Remove the largest (or smallest) item

APIs and implementations

Requirement: Generic items are Comparable

public class MaxPQ<Key extends Comparable<Key>>
    MaxPQ()
    
    void insert (Key y)
    
    Key delMax()
    
    boolean isEmpty()
    
    Key max()
    
    int size()

Find the largest M items in a stream of N items. But no enough memory to store N items.

MinPQ<Transaction> pq = new MinPQ<Transaction>();

while (StdIn.hasNextLine())
{
    String line = StdIn.readLine();
    Transaction item = new Transaction(line);
    pq.insert(item);
    if (pq.size() > M)
    pq.delMin();
}
implementation time space
sort N log N N
elementary PQ M N M
binary heap N log M M
best in theory N M

unordered implementation

public class UnorderedMaxPQ<Key extends Comparable<Key>>
{
    private Key[] pq; // pq[i] = ith element on pq
    private int N; // number of elements on pq
    public UnorderedMaxPQ(int capacity)
    { pq = (Key[]) new Comparable[capacity]; }
    public boolean isEmpty()
    { return N == 0; }
    public void insert(Key x)
    { pq[N++] = x; }
    public Key delMax()
    {
        int max = 0;
        for (int i = 1; i < N; i++)
        if (less(max, i)) max = i;
        exch(max, N-1);
        return pq[--N];
    }
}
implementation insert del max max
unordered 1 N N
ordered N 1 1
goal log N log N log N

binary heaps

Complete binary tree:

  • Binary tree: empty or node with links to left and right binary trees
  • Complete tree: perfect balanced, except for bottom level
    • height of complete tree with N nodes is

Array representation:

  • parent’s key is no smaller than children’s keys
  • indices start at 1
  • take nodes in level order
  • no explicit links needed
  • largest key is a[1], which is root of binary tree
  • parent of node k is at k/2
  • children of node at k are at 2k and 2k+1

promotion in a heap

To avoid having larger child’s key than its parent’s key

  • exchange key in child with key in parent
  • repeat until heap order restored

insertion in a heap:

  • insert: add node at end, then swim it up
  • cost: at most compares
private void swim(int k)
{
    while (k > 1 && less(k/2, k))
    {
        exch(k/2,k);
        k = k/2;
    }
}
// insertion in a heap
public void insert(Key x)
{
    pq[++N] = x;
    swim[N];
}

demotion in a heap

Parent’s key becomes smaller than one or both of its children’s

  • exchange key in parent with key in larger child
  • repeat until heap order restored
private void sink(int k)
{
    while (2*k <= N)
    {
        int j = 2*k;
        if (j < N && less(j,j+1)) j++;
        if (!less(k, j)) break;
        exch(k,j);
        k = j;
    }
}

// del max
public Key delMax()
{
    Key max = pq[1];
    exch(1,N--);
    sink(1);
    pq[N+1] = null;     // prevent loitering  
    return max;
}

full implementation

public class MaxPQ<Key extends Comparable<Key>>
{
    private Key[] pq;
    private int N;
    public MaxPQ(int capacity)
    { pq = (Key[]) new Comparable[capacity+1]; }
    public boolean isEmpty()
    { return N == 0; }
    public void insert(Key key)
    public Key delMax()
    { /* see previous code */ }
    private void swim(int k)
    private void sink(int k)
    { /* see previous code */ }
    private boolean less(int i, int j)
    { return pq[i].compareTo(pq[j]) < 0; }
    private void exch(int i, int j)
    { Key t = pq[i]; pq[i] = pq[j]; pq[j] = t; }
}

considerations

  • immutability of keys
    • client does not change keys while they’re on the PQ
    • best practice: use immutable keys
  • underflow and overflow
    • throw exception if deleting from empty PQ
    • add no-arg constructor and use resizing array
  • minumum-oriented priority queue
    • replace less() with greater()
    • implement greater()
  • other operations (use sink() and swim())
    • remove an arbitrary item
    • change the priority of an item

immutability

Data type: set of values and operations on those values
immutable data type: can’t change the data type value once created
immutable: string, integer, double, color, vector,…
mutable: stringbuilder, stack, counter, java array

// immutability, use private, final
public final class Vector {
    private final int N;
    private final double[] data;
    
    public Vector(double[] data) {
        this.N = data.length;
        this.data = new double[N];
        for (int i = 0; i < N; i++)
            this.data[i] = data[i];
    }

}

heapsort

plan for the in-place sort

  • create max-heap with all N keys
  • repeatedly remove the maximum key
  1. start with array of keys in arbitrary order
  2. build a max-heap (in place)
  3. sorted result (in place)
// build heap using bottom-up method
for (int k = N/2; k >= 1; k--)
    sink(a, k, N);
// remove max
while (N>1)
{
    exch(a,1, N--);
    sink(a, 1, N);
}

complete implementation

public class Heap
{
    public static void sort(Comparable[] a)
    {
        int N = a.length;
        for (int k = N/2; k >= 1; k--)
            sink(a, k, N);
        while (N > 1)
        {
            exch(a, 1, N);
            sink(a, 1, --N);
        }
    }
    
    private static void sink(Comparable[] a, int k, int N)
    { /* as before */ }
    
    private static boolean less(Comparable[] a, int i, int j)
    { /* as before */ }
    
    private static void exch(Comparable[] a, int i, int j)
    { /* as before */ }
    
}

analysis

  • Heap construction uses compares and exchanges
  • Heapsort uses compares and exchanges

In-place sorting algorithm with worst-case

Bottom line: heapsort is optimal for both time and space but:

  • inner loop longer than quicksort’s
  • makes poor use of cache memory
  • not stable

event-driven simulation

dynamics of hard discs

goal: simulate the motion of N moving particles that behave according to the law of elastic collision

elementary symbol tables

symbol table api

symbol tables

key-value pair abstraction

  • insert a value with specified key
  • given a key, search for the corresponding value
public class ST<Key, Value>
{
    ST() create a symbol table
    void put(Key key, Value val) //put key-value pair into the table
    // (remove key from table if value is null)
    Value get(Key key) //value paired with key
    // (null if key is absent)
    void delete(Key key) // remove key (and its value) from table
    boolean contains(Key key) //is there a value paired with //key?
    boolean isEmpty() // is the table empty?
    int size() // number of key-value pairs in the table
    Iterable<Key> keys() // all the keys in the table
}

conventions

  • values are not null
  • method get() returns null if key not present
  • method put() overwrites old value with new value

keys and values

value type: any generic type

key type: several natural assumptions

  • keys are Comparable, use compareTo()
  • keys are any generic type, use equals() to test equality
  • keys are any generic type, use hashCode() to scramble key

elementary implementations

sequential search in a linked list

data structure: maintain an (unordered) linked list of key-value pairs

search: scan through all keys until find a match
insert: scan trhough all keys until find a match; if no match add to front

equals() is the key interface

binary search in an ordered array

maintain an ordered array of key-value pairs

compareTo() is the key interface

ordered operations

public class ST<Key extends Comparable<Key>, Value>
{
    ST() // create an ordered symbol table
    void put(Key key, Value val) // put key-value pair into // the table
    // (remove key from table if value is null)
    Value get(Key key) // value paired with key
    // (null if key is absent)
    void delete(Key key) // remove key (and its value) from // table
    boolean contains(Key key) // is there a value paired with // key?
    boolean isEmpty() // is the table empty?
    int size() // number of key-value pairs
    Key min() // smallest key
    Key max() // largest key
    Key floor(Key key) // largest key less than or equal to // key
    Key ceiling(Key key) // smallest key greater than or //equal to key
    int rank(Key key) // number of keys less than key
    Key select(int k) // key of rank k
    void deleteMin() // delete smallest key
    void deleteMax() // delete largest key
    int size(Key lo, Key hi) // number of keys in [lo..hi]
    Iterable<Key> keys(Key lo, Key hi) // keys in [lo..hi], // in sorted order
    Iterable<Key> keys() // all keys in the table, in sorted // order
}

binary search trees

Definition: A BST is a binary tree in symmetric order.

A BST is either:

  • empty
  • two disjoint binary trees (left and right)

symmetric order: each node has a key, and every node’s key is

  • larger than all keys in its left subtree
  • smaller than all keys in its right subtree

In java, A BST is a reference to a root node.

A node is comprised of four fields:

  • A key and a value
  • A reference to the left and right subtree

search and insert

compare keys: if less, go left; if greater, go right; if null, insert.

private class Node
{
    private Key key;
    private Value val;
    private Node left, right;
    public Node(Key key, Value val)
    {
        this.key = key;
        this.val = val;
    }
}
public Value get(Key key)
{
    Node x = root;
    while (x != null)
    {
        int cmp = key.compareTo(x.key);
        if (cmp < 0) x = x.left;
        else if (cmp > 0) x = x.right;
        else if (cmp == 0) return x.val;
    }
    return null;
}

put:

public void put(Key key, Value val)
{ root = put(root, key, val); }
    private Node put(Node x, Key key, Value val)
    {
        if (x == null) return new Node(key, val);
        int cmp = key.compareTo(x.key);
        if (cmp < 0)
            x.left = put(x.left, key, val);
        else if (cmp > 0)
            x.right = put(x.right, key, val);
        else if (cmp == 0)
            x.val = val;
        return x;
}

ordered operations in BSTs

find minimum and maximum
find floor and ceiling

floor:

public Key floor(Key key)
{
    Node x = floor(root, key);
    if (x == null) return null;
    return x.key;
}
private Node floor(Node x, Key key)
{
    if (x == null) return null;
    int cmp = key.compareTo(x.key);
    if (cmp == 0) return x;
    if (cmp < 0) return floor(x.left, key);
    Node t = floor(x.right, key);
    if (t != null) return t;
    else return x;
}

rank (how many keys < k):

public int rank(Key key)
{ return rank(key, root); }
private int rank(Key key, Node x)
{
    if (x == null) return 0;
    int cmp = key.compareTo(x.key);
    if (cmp < 0) return rank(key, x.left);
    else if (cmp > 0) return 1 + size(x.left) + rank(key, x.right);
    else if (cmp == 0) return size(x.left);
}

deletions in BSTs

unsatisfactory to remove a node with a given key:

  • set its value to null
  • leave key in tree to guide searchs (but don’t consider it equal in search)

deleteMin() and deleteMax() are easy.

Hibbard deletion

Unsatisfactory solution: not symmetric, leading to height

If the node to be deleted has no child or only one child, deleting is easy.

node with 2 children:

  • find successor x of t (x has no left child, in the right tree of t)
  • delete the minimum x in t’s right tree
  • put x in t’s spot
public void delete(Key key)
{ root = delete(root, key); }
private Node delete(Node x, Key key) {
    if (x == null) return null;
    int cmp = key.compareTo(x.key);
    if (cmp < 0) x.left = delete(x.left, key);
    else if (cmp > 0) x.right = delete(x.right, key);
    else {
        if (x.right == null) return x.left;
        if (x.left == null) return x.right;
        Node t = x;
        x = min(t.right);
        x.right = deleteMin(t.right);
        x.left = t.left;
    }
    x.count = size(x.left) + size(x.right) + 1;
    return x;
}

Creative Commons License
Melon blog is created by melonskin. This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
© 2016-2019. All rights reserved by melonskin. Powered by Jekyll.