topic | data structures and algorithms |
---|---|
data types | stack, queue, bag, union-find, priority queue |
sorting | quicksort, mergesort, heapsort |
searching | BST, red-black BST, hash table |
graphs | BFS, DFS, Prim, Kruskal,DijKstra |
strings | radix sorts, tries, KMP, regexps, data compression |
advanced | B-tree, suffix array, maxflow |
union-find
union: connect two objects
find/connected query: is there a path connecting the two objects
public class QuickFindUF {
private int[] id;
public QuickFindUF(int N) {
id = new int[N];
for (int i = 0, i < N, i++)
id[i] = i;
}
public boolean connected(int p, int q)
{
return (id[p] == id[q])
}
public void union (int p, int q)
{
pid = id[p];
qid = id[q];
for (int i = 0, i < N, i++)
if (id[i] == pid) id[i] = qid;
}
}
if performing N union operation on N objects, the time is \(N^2\), which is too slow
- union too expensive (N array accesses)
- trees are flat, but too expensive to keep them flat
quick union
public class QuickFindUF {
private int[] id;
public QuickFindUF(int N) {
id = new int[N];
for (int i = 0, i < N, i++)
id[i] = i;
}
private int root(int i)
{
while (i != id[i]) i = id[i];
return i;
}
public boolean connected(int p, int q)
{
return (root[p] == root[q])
}
public void union (int p, int q)
{
int i = root(p);
int j = root(q);
id[i] = j;
}
}
quick union is also too slow.
- trees can get tall
- find too expensive ( could be N array accesses)
quick union improvement (weighted)
idea:
- modify quick-uniton to avoid tall trees
- keep track of size of each tree
- balance by linking root of smaller tree to root of larger tree
public class QuickFindUF {
private int[] id;
public QuickFindUF(int N) {
id = new int[N];
sz = new int[N];
for (int i = 0, i < N, i++)
id[i] = i;
sz[i] = 1;
}
private int root(int i)
{
while (i != id[i]) i = id[i];
return i;
}
public boolean connected(int p, int q)
{
return (root[p] == root[q])
}
public void union (int p, int q)
{
int i = root(p);
int j = root(q);
if (i == j) return;
if (sz[i] < sz[j]) {id[i] = j; sz[j] += sz[i];}
else {id[j] = i; sz[i] += sz[j];}
}
}
depth of any node x is at most \(log_2N\):
increase by 1 when tree T1 containing x is merged into another tree T2
- size of the tree containing x at least doubles since
size(T2)>size(T1)
- size of the tree can double at most \(\log_2N\) \(\lg N\) times
compare
algorithm | initialize | union | connected |
---|---|---|---|
quick-find | N | N | 1 |
quick-union | N | N* | N |
weighted(QU) | N | lgN* | lgN |
*: includes cost of finding roots
improvement 2: path compression
two-pass implementation: add second loop to root()
to set the id[ ] of each examined node to the root
simpler one-pass variant: make every other node in path point to its grandparent(thereby halving path length)
private int root(int i)
{
while (i !=id[i])
{
id[i] = id[id[i]];
i = id[i];
}
return i;
}
M union-find operation on a set of N objects
algorithm | worst-case time |
---|---|
quick find | M N |
quick union | M N |
weighted QU | N+MlogN |
QU+pathcomp | N+MlogN |
weig +pathc | N+Mlg*N |
application
percolation
create 2 virtual sites (and connections to top and bottom)
connected if virtual top site is connected to the virtual bottom site
threshold: 0.592746
analysis of algorithms
plot the running time \(T\) vs size of the input \(N\).
The slope for log-log plot is \(s\), so \(T\) is propotional to \(N^s\)
\(T=aN^b\); \(a\) and \(b\) can be estimated by changing size \(N\)
mathematical model
total running time: sum of cost times frequency for all operations
pick an operation that is most expensive or often to compute the running time instead of going over all details
tilde notation
definition: \(f(N) \sim g(N)\) means \(\lim_{N\rightarrow \infty} \frac{f(N)}{g(N)}=1\)
ignore the lower order terms
e.g.: \(1/6N^3 +20N+16 \sim 1/6N^3\)
order-of-growth classifications
order: 1, \(\log N\),\(N\), \(N \log N\), \(N^2\), \(N^3\), \(2^N\)
better to have linear and linearithemic scales
order | description | example |
---|---|---|
1 | statement | add 2 numbers |
logN | divide in half | binary search |
N | loop | find the maximum |
N log N | divide and conquer | mergesort |
\(N^2\) | double loop | check all pairs |
analysis of algorithms
estimate the upper bound \(O(N^n)\) and the lower bound \(\Omega(N^n)\) of the running time \(\Theta(N^n)\) of an algorithm
memory
memory usage by bytes
Type | bytes |
---|---|
boolean | 1 |
byte | 1 |
char | 2 |
int | 4 |
float | 4 |
long | 8 |
double | 8 |
char[] | 2N+24 |
int[] | 4N+24 |
double[] | 8N+24 |