Public speaking course notes Read "Dynamo, Amazon’s Highly Available Key-value Store" Read "Bigtable, A Distributed Storage System for Structured Data" Read "Streaming Systems" 3, Watermarks Read "Streaming Systems" 1&2, Streaming 101 Read "F1, a distributed SQL database that scales" Read "Zanzibar, Google’s Consistent, Global Authorization System" Read "Spanner, Google's Globally-Distributed Database" Read "Designing Data-intensive applications" 12, The Future of Data Systems IOS development with Swift Read "Designing Data-intensive applications" 10&11, Batch and Stream Processing Read "Designing Data-intensive applications" 9, Consistency and Consensus Read "Designing Data-intensive applications" 8, Distributed System Troubles Read "Designing Data-intensive applications" 7, Transactions Read "Designing Data-intensive applications" 6, Partitioning Read "Designing Data-intensive applications" 5, Replication Read "Designing Data-intensive applications" 3&4, Storage, Retrieval, Encoding Read "Designing Data-intensive applications" 1&2, Foundation of Data Systems Three cases of binary search TAMU Operating System 2 Memory Management TAMU Operating System 1 Introduction Overview in cloud computing 2 TAMU Operating System 7 Virtualization TAMU Operating System 6 File System TAMU Operating System 5 I/O and Disk Management TAMU Operating System 4 Synchronization TAMU Operating System 3 Concurrency and Threading TAMU Computer Networks 5 Data Link Layer TAMU Computer Networks 4 Network Layer TAMU Computer Networks 3 Transport Layer TAMU Computer Networks 2 Application Layer TAMU Computer Networks 1 Introduction Overview in distributed systems and cloud computing 1 A well-optimized Union-Find implementation, in Java A heap implementation supporting deletion TAMU Advanced Algorithms 3, Maximum Bandwidth Path (Dijkstra, MST, Linear) TAMU Advanced Algorithms 2, B+ tree and Segment Intersection TAMU Advanced Algorithms 1, BST, 2-3 Tree and Heap TAMU AI, Searching problems Factorization Machine and Field-aware Factorization Machine for CTR prediction TAMU Neural Network 10 Information-Theoretic Models TAMU Neural Network 9 Principal Component Analysis TAMU Neural Network 8 Neurodynamics TAMU Neural Network 7 Self-Organizing Maps TAMU Neural Network 6 Deep Learning Overview TAMU Neural Network 5 Radial-Basis Function Networks TAMU Neural Network 4 Multi-Layer Perceptrons TAMU Neural Network 3 Single-Layer Perceptrons Princeton Algorithms P1W6 Hash Tables & Symbol Table Applications Stanford ML 11 Application Example Photo OCR Stanford ML 10 Large Scale Machine Learning Stanford ML 9 Anomaly Detection and Recommender Systems Stanford ML 8 Clustering & Principal Component Analysis Princeton Algorithms P1W5 Balanced Search Trees TAMU Neural Network 2 Learning Processes TAMU Neural Network 1 Introduction Stanford ML 7 Support Vector Machine Stanford ML 6 Evaluate Algorithms Princeton Algorithms P1W4 Priority Queues and Symbol Tables Stanford ML 5 Neural Networks Learning Princeton Algorithms P1W3 Mergesort and Quicksort Stanford ML 4 Neural Networks Basics Princeton Algorithms P1W2 Stack and Queue, Basic Sorts Stanford ML 3 Classification Problems Stanford ML 2 Multivariate Regression and Normal Equation Princeton Algorithms P1W1 Union and Find Stanford ML 1 Introduction and Parameter Learning

Princeton Algorithms P1W5 Balanced Search Trees

2017-01-28

balanced search trees

Allow 1 or 2 keys per node

  • 2-node: one key, two children
  • 3-node: two keys, three children

Perfect balance: every path from root to null link has same length.
Symmetric order: Inorder traversal yields keys in ascending order

If has a temp 4-node, move the middle key above.
2-node can be converted to 3-node.

Maintains symetric order and perfect balance.

2-3tree

Direct implementation is complicated

red-black BSTs

Use “internal” left-leaning as “glue” for 3-nodes.

r-b-tree

  • no node has two red links connected to it
  • every path from root to null link has the same number of black links
  • red links lean left

Search is the same as for elementary BST (ignore color)
Most of other ops like ceiling, selection are also identical

Encode the color of links in nodes (the link pointing to their parent)

private static final boolean RED = true;
private static final boolean BLACK = false;
private class Node
{
    Key key;
    Value val;
    Node left, right;
    boolean color; // color of parent link
}
private boolean isRed(Node x)
{
    if (x == null) return false;
    return x.color == RED;
}

rotate left: right child red, left child black

private Node rotateLeft(Node h)
{
    assert isRed(h.right);
    Node x = h.right;
    h.right = x.left;
    x.left = h;
    x.color = h.color;
    h.color = RED;
    return x;
}

rotate right: left child, left-left gradchild red

private Node rotateLeft(Node h)
{
    assert isRed(h.right);
    Node x = h.right;
    h.right = x.left;
    x.left = h;
    x.color = h.color;
    h.color = RED;
    return x;
}

flip color (while one parent node has two red son links)

private void flipColors(Node h)
{
    assert !isRed(h);
    assert isRed(h.left);
    assert isRed(h.right);
    h.color = RED;
    h.left.color = BLACK;
    h.right.color = BLACK;
}

insertion

private Node put(Node h, Key key, Value val)
{
    if (h == null) return new Node(key, val, RED);
    int cmp = key.compareTo(h.key);
    if (cmp < 0) h.left = put(h.left, key, val);
    else if (cmp > 0) h.right = put(h.right, key, val);
    else if (cmp == 0) h.val = val;
    if (isRed(h.right) && !isRed(h.left)) h = rotateLeft(h);
    if (isRed(h.left) && isRed(h.left.left)) h = rotateRight(h);
    if (isRed(h.left) && isRed(h.right)) flipColors(h);
    return h;
}

Height of tree is in the worst case

summary for symbol tables

treesummary

B-trees

Generalize 2-3 trees by allowing up to M-1 key-link pairs per node.

Geometric applications of BSTs

extension of ordered symbol table

  • insert key-value pair
  • search for key k
  • delete key k
  • range search: find all keys between k1 and k2
  • range count: number of keys between k1 and k2

Geometric interpretation

  • keys are points on a line
  • find and count points in given 1-d interval

line segment intersection

sweep line algorithm, horizontally.

  • put x-coordinates on a PQ (or sort)
  • insert y-coordinates into BST
  • delete y-coordinates from BST
  • range searches in BST

Kd-tree

extension of ordered symbol-table to 2d keys

  • insert a 2d key
  • delete a 2d key
  • search for a 2d key
  • range search: find all keys that lie in a 2d range
  • range count: number of keys that lie in a 2d range

recursively partition plane into two halfplanes

kd-tree

works for k dimensions

interval search trees

Data are intervals.

Find all intervals in data structure that intersects a given interval.

use lo as the key
store max endpoint in subtree rooted at node

  • if interval in node intersects query interval, return it
  • else if left subtree is null, go right
  • else if max endpoint in left subtree is less than lo, go right
  • else go left

rectangle intersection

sweep line


Creative Commons License
Melon blog is created by melonskin. This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
© 2016-2024. All rights reserved by melonskin. Powered by Jekyll.