Public speaking course notes Read "Dynamo, Amazon’s Highly Available Key-value Store" Read "Bigtable, A Distributed Storage System for Structured Data" Read "Streaming Systems" 3, Watermarks Read "Streaming Systems" 1&2, Streaming 101 Read "F1, a distributed SQL database that scales" Read "Zanzibar, Google’s Consistent, Global Authorization System" Read "Spanner, Google's Globally-Distributed Database" Read "Designing Data-intensive applications" 12, The Future of Data Systems IOS development with Swift Read "Designing Data-intensive applications" 10&11, Batch and Stream Processing Read "Designing Data-intensive applications" 9, Consistency and Consensus Read "Designing Data-intensive applications" 8, Distributed System Troubles Read "Designing Data-intensive applications" 7, Transactions Read "Designing Data-intensive applications" 6, Partitioning Read "Designing Data-intensive applications" 5, Replication Read "Designing Data-intensive applications" 3&4, Storage, Retrieval, Encoding Read "Designing Data-intensive applications" 1&2, Foundation of Data Systems Three cases of binary search TAMU Operating System 2 Memory Management TAMU Operating System 1 Introduction Overview in cloud computing 2 TAMU Operating System 7 Virtualization TAMU Operating System 6 File System TAMU Operating System 5 I/O and Disk Management TAMU Operating System 4 Synchronization TAMU Operating System 3 Concurrency and Threading TAMU Computer Networks 5 Data Link Layer TAMU Computer Networks 4 Network Layer TAMU Computer Networks 3 Transport Layer TAMU Computer Networks 2 Application Layer TAMU Computer Networks 1 Introduction Overview in distributed systems and cloud computing 1 A well-optimized Union-Find implementation, in Java A heap implementation supporting deletion TAMU Advanced Algorithms 3, Maximum Bandwidth Path (Dijkstra, MST, Linear) TAMU Advanced Algorithms 2, B+ tree and Segment Intersection TAMU Advanced Algorithms 1, BST, 2-3 Tree and Heap TAMU AI, Searching problems Factorization Machine and Field-aware Factorization Machine for CTR prediction TAMU Neural Network 10 Information-Theoretic Models TAMU Neural Network 9 Principal Component Analysis TAMU Neural Network 8 Neurodynamics TAMU Neural Network 7 Self-Organizing Maps TAMU Neural Network 6 Deep Learning Overview TAMU Neural Network 5 Radial-Basis Function Networks TAMU Neural Network 4 Multi-Layer Perceptrons TAMU Neural Network 3 Single-Layer Perceptrons Princeton Algorithms P1W6 Hash Tables & Symbol Table Applications Stanford ML 11 Application Example Photo OCR Stanford ML 10 Large Scale Machine Learning Stanford ML 9 Anomaly Detection and Recommender Systems Stanford ML 8 Clustering & Principal Component Analysis Princeton Algorithms P1W5 Balanced Search Trees TAMU Neural Network 2 Learning Processes TAMU Neural Network 1 Introduction Stanford ML 7 Support Vector Machine Stanford ML 6 Evaluate Algorithms Princeton Algorithms P1W4 Priority Queues and Symbol Tables Stanford ML 5 Neural Networks Learning Princeton Algorithms P1W3 Mergesort and Quicksort Stanford ML 4 Neural Networks Basics Princeton Algorithms P1W2 Stack and Queue, Basic Sorts Stanford ML 3 Classification Problems Stanford ML 2 Multivariate Regression and Normal Equation Princeton Algorithms P1W1 Union and Find Stanford ML 1 Introduction and Parameter Learning

Princeton Algorithms P1W1 Union and Find

2016-12-26

topic data structures and algorithms
data types stack, queue, bag, union-find, priority queue
sorting quicksort, mergesort, heapsort
searching BST, red-black BST, hash table
graphs BFS, DFS, Prim, Kruskal,DijKstra
strings radix sorts, tries, KMP, regexps, data compression
advanced B-tree, suffix array, maxflow

union-find

union: connect two objects
find/connected query: is there a path connecting the two objects

public class QuickFindUF {
    private int[] id;
    public QuickFindUF(int N) {
        id = new int[N];
        for (int i = 0, i < N, i++) 
            id[i] = i;
    }
    public boolean connected(int p, int q)
    {
        return (id[p] == id[q])
    }
    public void union (int p, int q)
    {
        pid = id[p];
        qid = id[q];
        for (int i = 0, i < N, i++) 
            if (id[i] == pid) id[i] = qid;
    }
}

if performing N union operation on N objects, the time is \(N^2\), which is too slow

  • union too expensive (N array accesses)
  • trees are flat, but too expensive to keep them flat

quick union

public class QuickFindUF {
    private int[] id;
    public QuickFindUF(int N) {
        id = new int[N];
        for (int i = 0, i < N, i++) 
            id[i] = i;
    }
    
    private int root(int i)
    {
        while (i != id[i]) i = id[i];
        return i;
    }
    
    public boolean connected(int p, int q)
    {
        return (root[p] == root[q])
    }
    public void union (int p, int q)
    {
        int i = root(p);
        int j = root(q);
        id[i] = j;
    }
}

quick union is also too slow.

  • trees can get tall
  • find too expensive ( could be N array accesses)

quick union improvement (weighted)

idea:

  • modify quick-uniton to avoid tall trees
  • keep track of size of each tree
  • balance by linking root of smaller tree to root of larger tree
public class QuickFindUF {
    private int[] id;
    public QuickFindUF(int N) {
        id = new int[N];
        sz = new int[N];
        for (int i = 0, i < N, i++) 
            id[i] = i;
            sz[i] = 1;
    }
    
    private int root(int i)
    {
        while (i != id[i]) i = id[i];
        return i;
    }
    
    public boolean connected(int p, int q)
    {
        return (root[p] == root[q])
    }
    public void union (int p, int q)
    {
        int i = root(p);
        int j = root(q);
        if (i == j) return;
        if (sz[i] < sz[j]) {id[i] = j; sz[j] += sz[i];}
        else               {id[j] = i; sz[i] += sz[j];}
    }
}

depth of any node x is at most \(log_2N\):
increase by 1 when tree T1 containing x is merged into another tree T2

  • size of the tree containing x at least doubles since size(T2)>size(T1)
  • size of the tree can double at most \(\log_2N\) \(\lg N\) times

compare

algorithm initialize union connected
quick-find N N 1
quick-union N N* N
weighted(QU) N lgN* lgN

*: includes cost of finding roots

improvement 2: path compression

two-pass implementation: add second loop to root() to set the id[ ] of each examined node to the root

simpler one-pass variant: make every other node in path point to its grandparent(thereby halving path length)

private int root(int i)
{
    while (i !=id[i])
    {
        id[i] = id[id[i]];
        i = id[i];
    }
    return i;
}

M union-find operation on a set of N objects

algorithm worst-case time
quick find M N
quick union M N
weighted QU N+MlogN
QU+pathcomp N+MlogN
weig +pathc N+Mlg*N

application

percolation

create 2 virtual sites (and connections to top and bottom)
connected if virtual top site is connected to the virtual bottom site

threshold: 0.592746

analysis of algorithms

plot the running time \(T\) vs size of the input \(N\).
The slope for log-log plot is \(s\), so \(T\) is propotional to \(N^s\)
\(T=aN^b\); \(a\) and \(b\) can be estimated by changing size \(N\)

mathematical model

total running time: sum of cost times frequency for all operations
pick an operation that is most expensive or often to compute the running time instead of going over all details

tilde notation

definition: \(f(N) \sim g(N)\) means \(\lim_{N\rightarrow \infty} \frac{f(N)}{g(N)}=1\)
ignore the lower order terms
e.g.: \(1/6N^3 +20N+16 \sim 1/6N^3\)

order-of-growth classifications

order: 1, \(\log N\),\(N\), \(N \log N\), \(N^2\), \(N^3\), \(2^N\)

better to have linear and linearithemic scales

order description example
1 statement add 2 numbers
logN divide in half binary search
N loop find the maximum
N log N divide and conquer mergesort
\(N^2\) double loop check all pairs

analysis of algorithms

estimate the upper bound \(O(N^n)\) and the lower bound \(\Omega(N^n)\) of the running time \(\Theta(N^n)\) of an algorithm

memory

memory usage by bytes

Type bytes
boolean 1
byte 1
char 2
int 4
float 4
long 8
double 8
char[] 2N+24
int[] 4N+24
double[] 8N+24

Creative Commons License
Melon blog is created by melonskin. This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
© 2016-2024. All rights reserved by melonskin. Powered by Jekyll.