Public speaking course notes Read "Dynamo, Amazon’s Highly Available Key-value Store" Read "Bigtable, A Distributed Storage System for Structured Data" Read "Streaming Systems" 3, Watermarks Read "Streaming Systems" 1&2, Streaming 101 Read "F1, a distributed SQL database that scales" Read "Zanzibar, Google’s Consistent, Global Authorization System" Read "Spanner, Google's Globally-Distributed Database" Read "Designing Data-intensive applications" 12, The Future of Data Systems IOS development with Swift Read "Designing Data-intensive applications" 10&11, Batch and Stream Processing Read "Designing Data-intensive applications" 9, Consistency and Consensus Read "Designing Data-intensive applications" 8, Distributed System Troubles Read "Designing Data-intensive applications" 7, Transactions Read "Designing Data-intensive applications" 6, Partitioning Read "Designing Data-intensive applications" 5, Replication Read "Designing Data-intensive applications" 3&4, Storage, Retrieval, Encoding Read "Designing Data-intensive applications" 1&2, Foundation of Data Systems Three cases of binary search TAMU Operating System 2 Memory Management TAMU Operating System 1 Introduction Overview in cloud computing 2 TAMU Operating System 7 Virtualization TAMU Operating System 6 File System TAMU Operating System 5 I/O and Disk Management TAMU Operating System 4 Synchronization TAMU Operating System 3 Concurrency and Threading TAMU Computer Networks 5 Data Link Layer TAMU Computer Networks 4 Network Layer TAMU Computer Networks 3 Transport Layer TAMU Computer Networks 2 Application Layer TAMU Computer Networks 1 Introduction Overview in distributed systems and cloud computing 1 A well-optimized Union-Find implementation, in Java A heap implementation supporting deletion TAMU Advanced Algorithms 3, Maximum Bandwidth Path (Dijkstra, MST, Linear) TAMU Advanced Algorithms 2, B+ tree and Segment Intersection TAMU Advanced Algorithms 1, BST, 2-3 Tree and Heap TAMU AI, Searching problems Factorization Machine and Field-aware Factorization Machine for CTR prediction TAMU Neural Network 10 Information-Theoretic Models TAMU Neural Network 9 Principal Component Analysis TAMU Neural Network 8 Neurodynamics TAMU Neural Network 7 Self-Organizing Maps TAMU Neural Network 6 Deep Learning Overview TAMU Neural Network 5 Radial-Basis Function Networks TAMU Neural Network 4 Multi-Layer Perceptrons TAMU Neural Network 3 Single-Layer Perceptrons Princeton Algorithms P1W6 Hash Tables & Symbol Table Applications Stanford ML 11 Application Example Photo OCR Stanford ML 10 Large Scale Machine Learning Stanford ML 9 Anomaly Detection and Recommender Systems Stanford ML 8 Clustering & Principal Component Analysis Princeton Algorithms P1W5 Balanced Search Trees TAMU Neural Network 2 Learning Processes TAMU Neural Network 1 Introduction Stanford ML 7 Support Vector Machine Stanford ML 6 Evaluate Algorithms Princeton Algorithms P1W4 Priority Queues and Symbol Tables Stanford ML 5 Neural Networks Learning Princeton Algorithms P1W3 Mergesort and Quicksort Stanford ML 4 Neural Networks Basics Princeton Algorithms P1W2 Stack and Queue, Basic Sorts Stanford ML 3 Classification Problems Stanford ML 2 Multivariate Regression and Normal Equation Princeton Algorithms P1W1 Union and Find Stanford ML 1 Introduction and Parameter Learning

Algorithms solving maximum bandwidth path problem in network

2018-04-17

Introduction

Path bandwidth is the smallest edge bandwidth along the path. The problem to solve is: given a weighted graph and 2 vertices s and t, find an s-t path of the largest bandwidth. This project is implemented to solve maximum bandwidth problem of a random undirected graph with Java with OOD. Basically, it includes functions as below.

  1. A graph generator in order to generate random undirected graph with two ways, degree or possibility.
  2. A heap structure to be used as a data structure storing fringe node while doing Dijstra’s algorithm.
  3. An optimized union and find algorithm to be used in maximum spanning tree algorithm
  4. 3 routing algorithms
    • Dijstra’s with no heap
    • Dijstra’s with heap
    • Maximum spanning tree with Kruskal’s algorithm

This report will first introduce the implementation details; then show performance with different algorithms on several random graphs. Finally, a discussion will be given to analyze the result.

The code can be found at https://github.com/melonskin/networkOptimizationAlgorithm.

Implementation

Random Graph generation

Structure of the graph

In order to represent an undirected graph, this project uses a GraphNode class for the vertex and Edge class for edge. The GraphNode contains a set of outgoing edges to connect it with its neighbors. It also has a unique ID to differ it with other nodes. The Edge class contains both ends of this edge, as well as the weight of this edge.

Graph generator

As recommended by the project instruction, at first, a big cycle containing all nodes is generated to ensure all vertices are connected. After that, two generating strategies are used. This generator will return a list of graph nodes as output.

  1. by degree: We have vertices. We pick two ends from those vertices at random and add edge with random edge between them until we have edges.
  2. by possibility: We go over all combinations of two vertices, at 20% possibility, we add an edge with random weight between them.

Heap

This heap structure is used to store fringe nodes in Dijstra’s algorithm. The difference between it and a typical heap is that I may need to delete inside elements in this heap. So I made one instead of using PriorityQueue. The idea is to assign every element an ID. The heap class consists of a position array and a data array. Position array records the position of a given ID in the data array.

I initially tried to use generic type directly for the implementation. But I didn’t figure out how to get the id of a generic-type element. Finally, I decided to create a HeapNode class as the element type in heap and use wrapper for different real data type.

Union Find

A well optimized union find algorithm is implemented for building maximum spanning tree. Basically, the class contains two arrays, parents and ranks.

  1. parents stores the parent ID of a given ID.
  2. ranks stores the rank of roots.

It supports two operations, find and union

  1. For find(n1), it will trace back from given ID to its parent, and parent’s parent, until reaching an element without a parent, i.e. a root.
    • An optimization here is attaching every elements on this tracing path to the final root, so that the height of the tree is reduced dramatically.
  2. For union(n1, n2), first we find the roots of two elements. If two roots are the same, do nothing (but it can be a useful tool to check whether an edge between two nodes makes cycle in a graph). Else, based on the ranks of two roots, we attach one to another to make the tree as balanced as possible.

Dijstra

Idea is to expand, max bandwidth path. It’s a greedy method and guaranteed to find optimal with all positive edges. This project implements two cases with Dijstra’s algorithm, one without heap and the other with heap. Since the main algorithm part of both cases are the same, a base Dijstra class containing the main algorithm is created first. After that, two specific classes with different methods adding and retrieving from the fringe implements the base class.

Maximum spanning tree

Max spanning tree is a spanning tree whose sum of edge’s weight is the maximum. It can solve the maximum bandwidth problem. Its advantage is that once the MST is constructed, we can find the path between any two ends in linear time. This project implements this algorithm in 3 steps.

  1. Offer all unique edges into a max heap
  2. Do Kruskal’s algorithm to construct MST with edges stored in the heap
  3. For two specific ends, do DFS in MST to find the path

Performance

Sample output

This section shows the sample output of 3 different path-finding algorithms. The running time, max bandwidth and the path found are displayed. We can see that the max bandwidth found by all 3 algorithms are the same.

Dijstra without heap

The duration of Dijstra without heap is: 53 ms.
Max bandwidth found by Dijstra without heap is 7656
2179->2180->2085->1640->1805->326->47->48->120->119->1307->75->977->976->838->2441->
1214->1846->337->702->800->258->489->2177->708->1700->595->594->1348->2365->2192->
2193->592->1716->955->1713->368->3959->3960->1618->2035

Dijstra with heap

The duration of Dijstra with heap is: 3 ms.
Max bandwidth found by Dijstra with heap is 7656
2179->2180->2085->1884->1123->1124->2058->2057->4505->2648->3518->4875->4529->873->
4594->3356->3128->3129->3130->2904->3162->1151->3600->3514->4481->3282->1308->999->
1077->4414->4413->1191->4035->1645->2481->2480->2479->1303->1302->1159->3524->4695->
431->4397->705->3636->1895->4223->4089->2491->498->4257->2035

MST

The duration for find path with MST is 3 ms.
Max bandwidth found by MST is 7656
2179->2180->2085->1640->1805->1750->4808->1210->2175->3119->4219->3059->3534->1824->
1823->1822->3373->4265->2529->2530->4098->2591->4302->1076->1075->396->2510->1045->
1505->1504->2935->425->147->4962->4961->3054->2396->2776->4003->490->701->2266->563->
261->4856->970->549->548->547->2181->3426->1748->4377->2912->1739->1738->1765->1544->
3191->4535->3598->3977->976->838->2441->1214->1846->1847->4967->4817->531->2965->
2260->4674->3945->3944->4113->4016->1827->122->123->1306->955->1713->368->3959->
3960->1618->2035

Graph with degree of 8

The running time is shown as below for the 8-degree case for 5 different random graphs, each with 5 pairs of two randomly chosen ends.

Dijstra without heap

Duration, ms: Pair1 Pair2 Pair3 Pair4 Pair5
Graph1 66 12 59 84 48
Graph2 57 89 74 107 69
Graph3 35 4 102 43 92
Graph4 39 41 9 95 62
Graph5 4 29 48 54 48

Dijstra with heap

Duration, ms: Pair1 Pair2 Pair3 Pair4 Pair5
Graph1 19 6 12 10 1
Graph2 2 4 5 8 5
Graph3 0 6 11 0 6
Graph4 1 4 1 5 3
Graph5 5 4 4 3 3

MST

MST build time
Graph Graph1 Graph2 Graph3 Graph4 Graph5
Duration, ms: 95 77 86 39 32
MST find time
Duration, ms: Pair1 Pair2 Pair3 Pair4 Pair5
Graph1 9 6 5 6 6
Graph2 3 3 3 4 6
Graph3 2 4 4 4 3
Graph4 4 2 2 3 3
Graph5 3 3 4 3 3

Graph with 20% connection probability

The running time is shown as below for the 20-percent-probability case for 5 different random graphs, each with 5 pairs of two randomly chosen ends.

Dijstra without heap

Duration, ms: Pair1 Pair2 Pair3 Pair4 Pair5
Graph1 313 162 526 484 114
Graph2 754 157 467 83 365
Graph3 500 629 289 556 529
Graph4 324 579 403 367 445
Graph5 311 120 265 274 141

Dijstra with heap

Duration, ms: Pair1 Pair2 Pair3 Pair4 Pair5
Graph1 211 71 379 39 190
Graph2 543 61 356 93 131
Graph3 360 457 220 443 346
Graph4 132 493 302 140 384
Graph5 258 61 409 341 6

MST

MST build time
Graph Graph1 Graph2 Graph3 Graph4 Graph5
Duration, ms: 23233 22521 22546 22787 21769
MST find time
Duration, ms: Pair1 Pair2 Pair3 Pair4 Pair5
Graph1 7 4 2 2 2
Graph2 2 2 3 2 2
Graph3 2 1 2 2 2
Graph4 2 2 2 3 2
Graph5 2 2 3 3 2

Discussion

Time complexity

The time complexity of different algorithms used in this project is listed as below.

Algorithm Time complexity
Dijstra without heap
Dijstra with heap
MST build
MST find

Case of degree of 8

In the case of degree of 8, number of edges is proportional to the number of vertices , i.e. .

Dijstra’s algorithm with and without heap

Based on the theoretical time complexity of two Dijstra’s, Dijstra with heap is much faster than Dijstra without heap. It’s also be proven by our performance table. Because, insertion and deletion in heap take time. If we don’t use heap, finding the fringe vertex with maximum bandwidth among all fringe vertices takes time . Therefore, the optimization with heap improves the performance dramatically.

MST

The time building MST is in this case, same as the time complexity of Dijstra with heap. We can see from the performance table that the time of building MST is relatively small. Building MST has a larger constant term, since it needs more operation to insert and get edges from the heap. Therefore, we understand the time of building MST is longer. Finding path in MST is linear in time. But it may need to travel a long path in the MST than Dijstra’s (observed during experiment). And if we are lucky, Dijstra’s with heap can find path very quickly, since it may only need to examine a few nodes and edges.

Case with 20% probability

In this case, the number of edges is , much more than the first case. We can observe that both Dijstra’s algorithm take longer time in this case. This is because they may need to explore more nodes due to a larger number of edges. Therefore, although in the worst case, time complexity of Dijstra’s without heap is unrelated with . It still takes a longer time in our random case.

Dijstra’s algorithm with and without heap

We can observe from the performance of two Dijstra’s are similar. The one with heap might be slightly better. Because in this case, is not necessarily larger than . Notice that the performance of Dijstra’s in this case is worse than that in the 8-degree case. This is because we have much more edges to examine now.

MST

Building MST takes a much longer time than Dijstra with heap and the 8-degree MST case. Because we have much more edges and we need to insert and poll all edges in the heap now. Dijstra may find the path very quickly after examining a few vertices. However, the finding path part is the similar to the 8-degree case. Because the tree contains exactly the same number of vertices and edges. We do DFS in linear time to find the path.

Conclusion and future work

Based on the performance table, although the time complexity of Dijstra’s algorithm and MST are similar, the practical running time can differ largely. Because theoretically, we consider the worst case. The real cases can be much random. Apparently, if we find the max bandwidth path between two specific vertices only once, Dijstra’s is better choice, especially in a complex graph. But if we would like to find the path for different pairs of vertices, MST is much better, since once the MST is constructed, paths can be found in linear time.

For further improvement, it seems unnecessary to have duplicate edges in the Kruskal’s algorithm. It will need a duplicate check and double the size of the heap although we don’t use half of the array inside the heap. So the id of edge can be organized so that we only care about the first half of edges, which don’t have duplicates. Another possible improvement is that we can use adjacent list to store the MST, currently this project uses a hash map.


Creative Commons License
Melon blog is created by melonskin. This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
© 2016-2024. All rights reserved by melonskin. Powered by Jekyll.