## Princeton Algorithms P1W1 Union and Find

2016-12-26

topic data structures and algorithms
data types stack, queue, bag, union-find, priority queue
sorting quicksort, mergesort, heapsort
searching BST, red-black BST, hash table
graphs BFS, DFS, Prim, Kruskal,DijKstra
strings radix sorts, tries, KMP, regexps, data compression

# union-find

union: connect two objects
find/connected query: is there a path connecting the two objects

public class QuickFindUF {
private int[] id;
public QuickFindUF(int N) {
id = new int[N];
for (int i = 0, i < N, i++)
id[i] = i;
}
public boolean connected(int p, int q)
{
return (id[p] == id[q])
}
public void union (int p, int q)
{
pid = id[p];
qid = id[q];
for (int i = 0, i < N, i++)
if (id[i] == pid) id[i] = qid;
}
}


if performing N union operation on N objects, the time is $$N^2$$, which is too slow

• union too expensive (N array accesses)
• trees are flat, but too expensive to keep them flat

# quick union

public class QuickFindUF {
private int[] id;
public QuickFindUF(int N) {
id = new int[N];
for (int i = 0, i < N, i++)
id[i] = i;
}

private int root(int i)
{
while (i != id[i]) i = id[i];
return i;
}

public boolean connected(int p, int q)
{
return (root[p] == root[q])
}
public void union (int p, int q)
{
int i = root(p);
int j = root(q);
id[i] = j;
}
}


quick union is also too slow.

• trees can get tall
• find too expensive ( could be N array accesses)

# quick union improvement (weighted)

idea:

• modify quick-uniton to avoid tall trees
• keep track of size of each tree
• balance by linking root of smaller tree to root of larger tree
public class QuickFindUF {
private int[] id;
public QuickFindUF(int N) {
id = new int[N];
sz = new int[N];
for (int i = 0, i < N, i++)
id[i] = i;
sz[i] = 1;
}

private int root(int i)
{
while (i != id[i]) i = id[i];
return i;
}

public boolean connected(int p, int q)
{
return (root[p] == root[q])
}
public void union (int p, int q)
{
int i = root(p);
int j = root(q);
if (i == j) return;
if (sz[i] < sz[j]) {id[i] = j; sz[j] += sz[i];}
else               {id[j] = i; sz[i] += sz[j];}
}
}


depth of any node x is at most $$log_2N$$：
increase by 1 when tree T1 containing x is merged into another tree T2

• size of the tree containing x at least doubles since size(T2)>size(T1)
• size of the tree can double at most $$\log_2N$$ $$\lg N$$ times

compare

algorithm initialize union connected
quick-find N N 1
quick-union N N* N
weighted(QU) N lgN* lgN

*: includes cost of finding roots

## improvement 2: path compression

two-pass implementation: add second loop to root() to set the id[ ] of each examined node to the root

simpler one-pass variant: make every other node in path point to its grandparent(thereby halving path length)

private int root(int i)
{
while (i !=id[i])
{
id[i] = id[id[i]];
i = id[i];
}
return i;
}


M union-find operation on a set of N objects

algorithm worst-case time
quick find M N
quick union M N
weighted QU N+MlogN
QU+pathcomp N+MlogN
weig +pathc N+Mlg*N

# application

## percolation

create 2 virtual sites (and connections to top and bottom)
connected if virtual top site is connected to the virtual bottom site

threshold: 0.592746

# analysis of algorithms

plot the running time $$T$$ vs size of the input $$N$$.
The slope for log-log plot is $$s$$, so $$T$$ is propotional to $$N^s$$
$$T=aN^b$$; $$a$$ and $$b$$ can be estimated by changing size $$N$$

## mathematical model

total running time: sum of cost times frequency for all operations
pick an operation that is most expensive or often to compute the running time instead of going over all details

### tilde notation

definition: $$f(N) \sim g(N)$$ means $$\lim_{N\rightarrow \infty} \frac{f(N)}{g(N)}=1$$
ignore the lower order terms
e.g.: $$1/6N^3 +20N+16 \sim 1/6N^3$$

## order-of-growth classifications

order: 1, $$\log N$$,$$N$$, $$N \log N$$, $$N^2$$, $$N^3$$, $$2^N$$

better to have linear and linearithemic scales

order description example
logN divide in half binary search
N loop find the maximum
N log N divide and conquer mergesort
$$N^2$$ double loop check all pairs

# analysis of algorithms

estimate the upper bound $$O(N^n)$$ and the lower bound $$\Omega(N^n)$$ of the running time $$\Theta(N^n)$$ of an algorithm

# memory

memory usage by bytes

Type bytes
boolean 1
byte 1
char 2
int 4
float 4
long 8
double 8
char[] 2N+24
int[] 4N+24
double[] 8N+24