Read "Streaming Systems" 1&2, Streaming 101 Read "F1, a distributed SQL database that scales" Read "Zanzibar, Google’s Consistent, Global Authorization System" Read "Spanner, Google's Globally-Distributed Database" Read "Designing Data-intensive applications" 12, The Future of Data Systems IOS development with Swift Read "Designing Data-intensive applications" 10&11, Batch and Stream Processing Read "Designing Data-intensive applications" 9, Consistency and Consensus Read "Designing Data-intensive applications" 8, Distributed System Troubles Read "Designing Data-intensive applications" 7, Transactions Read "Designing Data-intensive applications" 6, Partitioning Read "Designing Data-intensive applications" 5, Replication Read "Designing Data-intensive applications" 3&4, Storage, Retrieval, Encoding Read "Designing Data-intensive applications" 1&2, Foundation of Data Systems Three cases of binary search TAMU Operating System 2 Memory Management TAMU Operating System 1 Introduction Overview in cloud computing 2 TAMU Operating System 7 Virtualization TAMU Operating System 6 File System TAMU Operating System 5 I/O and Disk Management TAMU Operating System 4 Synchronization TAMU Operating System 3 Concurrency and Threading TAMU Computer Networks 5 Data Link Layer TAMU Computer Networks 4 Network Layer TAMU Computer Networks 3 Transport Layer TAMU Computer Networks 2 Application Layer TAMU Computer Networks 1 Introduction Overview in distributed systems and cloud computing 1 A well-optimized Union-Find implementation, in Java A heap implementation supporting deletion TAMU Advanced Algorithms 3, Maximum Bandwidth Path (Dijkstra, MST, Linear) TAMU Advanced Algorithms 2, B+ tree and Segment Intersection TAMU Advanced Algorithms 1, BST, 2-3 Tree and Heap TAMU AI, Searching problems Factorization Machine and Field-aware Factorization Machine for CTR prediction TAMU Neural Network 10 Information-Theoretic Models TAMU Neural Network 9 Principal Component Analysis TAMU Neural Network 8 Neurodynamics TAMU Neural Network 7 Self-Organizing Maps TAMU Neural Network 6 Deep Learning Overview TAMU Neural Network 5 Radial-Basis Function Networks TAMU Neural Network 4 Multi-Layer Perceptrons TAMU Neural Network 3 Single-Layer Perceptrons Princeton Algorithms P1W6 Hash Tables & Symbol Table Applications Stanford ML 11 Application Example Photo OCR Stanford ML 10 Large Scale Machine Learning Stanford ML 9 Anomaly Detection and Recommender Systems Stanford ML 8 Clustering & Principal Component Analysis Princeton Algorithms P1W5 Balanced Search Trees TAMU Neural Network 2 Learning Processes TAMU Neural Network 1 Introduction Stanford ML 7 Support Vector Machine Stanford ML 6 Evaluate Algorithms Princeton Algorithms P1W4 Priority Queues and Symbol Tables Stanford ML 5 Neural Networks Learning Princeton Algorithms P1W3 Mergesort and Quicksort Stanford ML 4 Neural Networks Basics Princeton Algorithms P1W2 Stack and Queue, Basic Sorts Stanford ML 3 Classification Problems Stanford ML 2 Multivariate Regression and Normal Equation Princeton Algorithms P1W1 Union and Find Stanford ML 1 Introduction and Parameter Learning

A well-optimized Union-Find implementation, in Java

2017-11-14

Idea

Basically, the class contains two arrays, parents and ranks.

  1. parents stores the parent ID of a given ID.
  2. ranks stores the rank of roots.

It supports two operations, find and union

  1. For find(n1), it will trace back from given ID to its parent, and parent’s parent, until reaching an element without a parent, i.e. a root.
    • an optimization here is attaching every elements on this tracing path to the final root, so that the height of the tree is reduced dramatically
  2. For union(n1, n2), first we find the roots of two elements. If two roots are the same, do nothing (but it can be a useful tool to check whether an edge between two nodes makes cycle in a graph). Else, based on the ranks of two roots, we attach one to another to make the tree as balanced as possible.

Analysis

Union

Union operation takes time.

Find

If we don’t do path shortening, the height of the tree is no more than , is the number of elements in the tree. Because the tree is balanced and a non-leaf node can have no less than 2 children. The time for a find operation is .

If we do the path shortening, there is a tricky proof says the time complexity for a sequence of operations is . There is a even more tricky one to lower this upper bound. I may post the proof later.

Implementation

package unionFind;

import java.util.Arrays;
import java.util.LinkedList;
import java.util.Queue;

public class UnionFind {
    private int[] parents;
    private int[] ranks;

    public UnionFind(int size) {
        parents = new int[size];
        Arrays.fill(parents, -1);
        ranks = new int[size];
    }

    public int find(int curId) {
        Queue<Integer> queue = new LinkedList<>();
        while (parents[curId] != -1) {
            queue.offer(curId);
            curId = parents[curId];
        }
        while (!queue.isEmpty()) {
            parents[queue.poll()] = curId;
        }
        return curId;
    }

    public void union(int root1, int root2) {
        root1 = find(root1);
        root2 = find(root2);
        if (root1 == root2) {
            return;
        }
        if (ranks[root1] < ranks[root2]) {
            parents[root1] = root2;
        } else if (ranks[root2] < ranks[root1]) {
            parents[root2] = root1;
        } else {
            ranks[root1]++;
            parents[root2] = root1;
        }
    }
}

Creative Commons License
Melon blog is created by melonskin. This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
© 2016-2019. All rights reserved by melonskin. Powered by Jekyll.