Public speaking course notes Read "Dynamo, Amazon’s Highly Available Key-value Store" Read "Bigtable, A Distributed Storage System for Structured Data" Read "Streaming Systems" 3, Watermarks Read "Streaming Systems" 1&2, Streaming 101 Read "F1, a distributed SQL database that scales" Read "Zanzibar, Google’s Consistent, Global Authorization System" Read "Spanner, Google's Globally-Distributed Database" Read "Designing Data-intensive applications" 12, The Future of Data Systems IOS development with Swift Read "Designing Data-intensive applications" 10&11, Batch and Stream Processing Read "Designing Data-intensive applications" 9, Consistency and Consensus Read "Designing Data-intensive applications" 8, Distributed System Troubles Read "Designing Data-intensive applications" 7, Transactions Read "Designing Data-intensive applications" 6, Partitioning Read "Designing Data-intensive applications" 5, Replication Read "Designing Data-intensive applications" 3&4, Storage, Retrieval, Encoding Read "Designing Data-intensive applications" 1&2, Foundation of Data Systems Three cases of binary search TAMU Operating System 2 Memory Management TAMU Operating System 1 Introduction Overview in cloud computing 2 TAMU Operating System 7 Virtualization TAMU Operating System 6 File System TAMU Operating System 5 I/O and Disk Management TAMU Operating System 4 Synchronization TAMU Operating System 3 Concurrency and Threading TAMU Computer Networks 5 Data Link Layer TAMU Computer Networks 4 Network Layer TAMU Computer Networks 3 Transport Layer TAMU Computer Networks 2 Application Layer TAMU Computer Networks 1 Introduction Overview in distributed systems and cloud computing 1 A well-optimized Union-Find implementation, in Java A heap implementation supporting deletion TAMU Advanced Algorithms 3, Maximum Bandwidth Path (Dijkstra, MST, Linear) TAMU Advanced Algorithms 2, B+ tree and Segment Intersection TAMU Advanced Algorithms 1, BST, 2-3 Tree and Heap TAMU AI, Searching problems Factorization Machine and Field-aware Factorization Machine for CTR prediction TAMU Neural Network 10 Information-Theoretic Models TAMU Neural Network 9 Principal Component Analysis TAMU Neural Network 8 Neurodynamics TAMU Neural Network 7 Self-Organizing Maps TAMU Neural Network 6 Deep Learning Overview TAMU Neural Network 5 Radial-Basis Function Networks TAMU Neural Network 4 Multi-Layer Perceptrons TAMU Neural Network 3 Single-Layer Perceptrons Princeton Algorithms P1W6 Hash Tables & Symbol Table Applications Stanford ML 11 Application Example Photo OCR Stanford ML 10 Large Scale Machine Learning Stanford ML 9 Anomaly Detection and Recommender Systems Stanford ML 8 Clustering & Principal Component Analysis Princeton Algorithms P1W5 Balanced Search Trees TAMU Neural Network 2 Learning Processes TAMU Neural Network 1 Introduction Stanford ML 7 Support Vector Machine Stanford ML 6 Evaluate Algorithms Princeton Algorithms P1W4 Priority Queues and Symbol Tables Stanford ML 5 Neural Networks Learning Princeton Algorithms P1W3 Mergesort and Quicksort Stanford ML 4 Neural Networks Basics Princeton Algorithms P1W2 Stack and Queue, Basic Sorts Stanford ML 3 Classification Problems Stanford ML 2 Multivariate Regression and Normal Equation Princeton Algorithms P1W1 Union and Find Stanford ML 1 Introduction and Parameter Learning

TAMU Advanced Algorithms 3, Maximum Bandwidth Path (Dijkstra, MST, Linear)

2017-10-16

Maximum bandwidth path 0928

Given a weighted graph and 2 vertices s and t, find an s-t path of the largest bandwidth.

Def: Path bandwidth is the smallest edge bandwidth along the path.

Dijkstra

Idea is to expand, pick shortest (max bandwidth) path. It’s a greedy method and guaranteed to find optimal with all positive edges.

Dijkstra Max Bandwidth

status: in-tree, fringe, unseen

Algorithm Dijkstra(G, s, t)

  1. FOR EACH v DO status[v] = unseen
  2. status[s] = in-tree
  3. FOR EACH [s, v] DO

    status[v] = fringe
    dad[v] = s
    bw[v] = weight[s, v]; (INSERT(F,v) if using heap)

  4. WHILE status[t] in-tree DO

    4.1. pick a fringe v with the largest bw[v] (v = POP(F), TOP then DELETE)
    4.2. status[v] = in-tree
    FOR EACH edge [v, w] DO

    4.3. IF status[w] = unseen THEN

    status[w] = fringe; dad[w] = v; bw[w] = MIN(bw[v], weight[v, w]) (INSERT(F, w))

    4.4. ELSE IF status[w] = fringe and bw[w] < MIN(bw[v], weight[v, w]) THEN

    dad[w] = v; bw[w] = MIN(bw[v], weight[v, w]) (DELETE old then INSERT new)

  5. output dad

Correctness

Claim: This algorithm is correct.

Assume a better path exists. Any edges on this path is better than the smallest edge of the found path. When we try to pick this smallest edge from the fringe, we actually cannot get it. Instead, we will pick the edge on the better path. Therefore, we cannot find the found path. The assumption is invalid. This algorithm is correct.

Analysis

The time complexity of this algorithm is . is number of nodes. is number of edges. It can actually reduced to . Because for all connected graph, cannot be smaller than . is for initialization. For partially connected graph, we won’t touch isolated nodes. So it still holds.

  1. Step 1 takes
  2. Step 2 takes
  3. Step 3:
  4. Step 4 WHILE loop will be executed at most times. Total time is .
    • 4.1 takes .
    • 4.2 is
  5. The FOR loop in step 4 will be executed times if directed; if undirected. Total time is .
    • 4.3 takes
    • 4.4 is

Heap

In order to use heap as the data structure, it must support DELETE action. We need an extra array to record the positions of the heap elements. Delete that element, then move the last element into that position and update position to keep the heap ordered. So it can be done in .

Graph representation

  1. adjacent matrix: If nodes, the size is .
    • very sparse matrix
    • for each edge
  2. adjacent list: A length list. Every element v stores a linked list [w1, w2,…] with weights, which are actually edges (v, w1), (v, w2),…
    • In the worst case, still for each edge.

Max spanning tree 1004

is a graph, is a spanning tree for . If meets the following requirements:

  1. is a subgraph of .
  2. is a tree.
  3. contains all vertices of

Max spanning tree is a spanning tree whose sum of edge’s weight is the maximum.

Kruskal Algorithm

Algorithm Kruskal(G)

Sort the edges of in non-increasing order: (if “no-decreasing” than for min spanning tree)
FOR = 1 to

IF does not make a cycle (, if do not belong to the same group of ) THEN

DO

RETURN

Analysis

  1. Sorting the edges takes time of .
  2. The for loop takes time where runs times and each takes to check if the endpoints of the new edge belong to the same group of . We can use 2-3 tree to do it. trace from two endpoints to their roots; then check if the same root.

Since , . Therefore, the overall time is .

Correctness

It is a tree

Three elements for a tree: graph, all connected, no cycle. It is apparently a graph. In terms of the Kruskal algorithm, it cannot include cycles. How to prove it is connected?

Proof by contradiction:

Assume pieces and are not connected, but there must be edge connecting and , and must be considered in this algorithm. Since it will not lead to a cycle, it will be added. So, and must be connected.

It is a max spanning tree

Claim: At any point, is a part of a max .

Proof by mathematical induction:

  1. By induction on this number times of for loop in Kruskal algorithm.
  2. The claim holds for .
  3. For , assume the claim holds for , The max ST is .
  4. For the case . Assume after executions of for loop,
    • If the edge makes a cycle, we don’t add it, the claim holds.
    • If we then add an edge, which is part of . The claim holds.
    • Else, we add an edge which is not a part of . So, at a time point after , a edge which belongs to the will not be selected to avoid to form a cycle. Since this edge is after the in the , . So, what we get finally is a spanning tree with the sum of weights not smaller than the . According to the definition of max , the spanning tree is still a max spanning tree. Therefore, the claim holds at .

Maximum bandwidth path with MST 1010

Given a weighted graph and vertices and , find on path of the max bandwidth. if the graph is undirected, how to find it?

Idea is:

  1. construct a max spanning tree for
  2. find the path on
  3. return this path

Correctness

Theorem 1: If is the the max contains the path, is a max bandwidth path in the graph .

Proof

  1. Assume a path is the max bandwidth path
  2. Pick any edge in that is not in .
  3. and some of form a cycle . (Since a max contains all vertices, so there must be a cycle)
  4. Claim the bandwidth of is not larger than any other edges in the cycle. Otherwise, will be a part of the max spanning tree (thinner edge will be removed). will not be a max .
  5. With this claim, we replace the edge by the edges in that forms a cycle with . The result is still a max-bandwidth path from that has fewer edges not in (Compared to ). Repeat this process, you will get max-bandwidth path that is entirely in .

Analysis

Compared with Dijkstra algorithm

  1. It takes time only , with MST constructed.
    • Do DFS with the adjacent list
  2. If including MST construction, MST is generally 4 times faster than Dijkstra
  3. Dijkstra is single-source. MST can trace from any source.

Max bandwidth path divide and conquer 1012

Algorithm

max-bandwidth path

Remove the smaller half of edges
IF and are in the same piece
THEN recursively work on
ELSE

Shrink each pieces into a single vertex in the original graph . Connect two pieces using the possible largest edge in the smaller half(How?) crossing them. Get the new graph be .
Recursively work on

smaller half finding

To find the smaller half, there is a algorithm.

k-th smallest finding

Do it directly when ; RETURN

  1. partition into groups of 5 (a group with 5 elements)
  2. FOR EACH group, find the group medians
  3. let the set of group medians be , It contains
  4. Recursive call on to find the median of
  5. split by into and
  6. IF
    THEN recursively process and
    ELSE recursively process and
Analysis

Assume The algorithm takes .

  1. Step 1 takes
  2. Step 2 takes . For each group, 9 comparisons at most.
  3. Step 3 is trivial
  4. Step 4 takes
  5. Step 5 is
  6. Step 6 is
    • and satisfy

Therefore, we have:

Guess

The guess is valid. So

Reference

This is my class note of CSCE 629 in Texas A&M taught by Dr. Jianer Chen. I also get help from Yun He’s note on MST part. Credit to them.


Creative Commons License
Melon blog is created by melonskin. This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
© 2016-2024. All rights reserved by melonskin. Powered by Jekyll.