Read "Streaming Systems" 1&2, Streaming 101 Read "F1, a distributed SQL database that scales" Read "Zanzibar, Google’s Consistent, Global Authorization System" Read "Spanner, Google's Globally-Distributed Database" Read "Designing Data-intensive applications" 12, The Future of Data Systems IOS development with Swift Read "Designing Data-intensive applications" 10&11, Batch and Stream Processing Read "Designing Data-intensive applications" 9, Consistency and Consensus Read "Designing Data-intensive applications" 8, Distributed System Troubles Read "Designing Data-intensive applications" 7, Transactions Read "Designing Data-intensive applications" 6, Partitioning Read "Designing Data-intensive applications" 5, Replication Read "Designing Data-intensive applications" 3&4, Storage, Retrieval, Encoding Read "Designing Data-intensive applications" 1&2, Foundation of Data Systems Three cases of binary search TAMU Operating System 2 Memory Management TAMU Operating System 1 Introduction Overview in cloud computing 2 TAMU Operating System 7 Virtualization TAMU Operating System 6 File System TAMU Operating System 5 I/O and Disk Management TAMU Operating System 4 Synchronization TAMU Operating System 3 Concurrency and Threading TAMU Computer Networks 5 Data Link Layer TAMU Computer Networks 4 Network Layer TAMU Computer Networks 3 Transport Layer TAMU Computer Networks 2 Application Layer TAMU Computer Networks 1 Introduction Overview in distributed systems and cloud computing 1 A well-optimized Union-Find implementation, in Java A heap implementation supporting deletion TAMU Advanced Algorithms 3, Maximum Bandwidth Path (Dijkstra, MST, Linear) TAMU Advanced Algorithms 2, B+ tree and Segment Intersection TAMU Advanced Algorithms 1, BST, 2-3 Tree and Heap TAMU AI, Searching problems Factorization Machine and Field-aware Factorization Machine for CTR prediction TAMU Neural Network 10 Information-Theoretic Models TAMU Neural Network 9 Principal Component Analysis TAMU Neural Network 8 Neurodynamics TAMU Neural Network 7 Self-Organizing Maps TAMU Neural Network 6 Deep Learning Overview TAMU Neural Network 5 Radial-Basis Function Networks TAMU Neural Network 4 Multi-Layer Perceptrons TAMU Neural Network 3 Single-Layer Perceptrons Princeton Algorithms P1W6 Hash Tables & Symbol Table Applications Stanford ML 11 Application Example Photo OCR Stanford ML 10 Large Scale Machine Learning Stanford ML 9 Anomaly Detection and Recommender Systems Stanford ML 8 Clustering & Principal Component Analysis Princeton Algorithms P1W5 Balanced Search Trees TAMU Neural Network 2 Learning Processes TAMU Neural Network 1 Introduction Stanford ML 7 Support Vector Machine Stanford ML 6 Evaluate Algorithms Princeton Algorithms P1W4 Priority Queues and Symbol Tables Stanford ML 5 Neural Networks Learning Princeton Algorithms P1W3 Mergesort and Quicksort Stanford ML 4 Neural Networks Basics Princeton Algorithms P1W2 Stack and Queue, Basic Sorts Stanford ML 3 Classification Problems Stanford ML 2 Multivariate Regression and Normal Equation Princeton Algorithms P1W1 Union and Find Stanford ML 1 Introduction and Parameter Learning

Princeton Algorithms P1W6 Hash Tables & Symbol Table Applications

2017-02-11

Hash tables

Save items in a key-indexed table (index is a function of the key).

Hash function: Method for computing array index from key.

All Java classes inherit a method hashcode(), which returns a 32-bit integer.
If x.equals(y), then x.hashCode() == y.hashCode() and vice versa.

Hash code: An int between and
Hash function: An int between 0 and M-1 (for user as array index)

private in hash(Key key)
{ return (key.hashCode() & 0x7fffffff) % M;} 
// get the absolute value first then remainder

Separate Chaining

Separate chaining ST using linked list

For every hash, make a linked list to store the key and value
Make constant time ops.

Linear Probing

Insert: Put at table index i if free; if not try i+1, i+2, etc.

Search: Search table index i; if occupied but no match, try i+1, i+2, etc.

Array size M must be greater than number of key-value pairs N.
Works well when size of the array is significantly bigger than the number of keys.

Hash Table Context

STimplement

  • Simple to code
  • Faster for simple keys
  • Better system support in Java for strings

java.util.HashMap
java.util.IdentityHashMap

Balanced search trees

  • Stronger performance guarantee
  • Support for ordered ST operations
  • Easier to implement compareTo() correctly than equals() and hashCode()

java.util.TreeMap
java.util.TreeSet

Applications

Mathematical Sets

Dictionary Clients

Indexing Clients

public class Concordance
{
   public static void main(String[] args)
   {
      In in = new In(args[0]);
      String[] words = in.readAllStrings();
      ST<String, SET<Integer>> st = new ST<String, SET<Integer>>();
      for (int i = 0; i < words.length; i++)
      {
         String s = words[i];
         if (!st.contains(s))
            st.put(s, new SET<Integer>());
         SET<Integer> set = st.get(s);
         set.add(i);
}
      while (!StdIn.isEmpty())
      {
         String query = StdIn.readString();
         SET<Integer> set = st.get(query);
         for (int k : set)
      }
  } 
}

Sparse Vectors

public class SparseVector
{
     private HashST<Integer, Double> v;
     public SparseVector()
    {  v = new HashST<Integer, Double>();  }
     public void put(int i, double x)
    {  v.put(i, x);  }
    public double get(int i)
    { if (!v.contains(i)) return 0.0;
      else return v.get(i);
    }
    public Iterable<Integer> indices()
    {  return v.keys();  }
    public double dot(double[] that)
    {
        double sum = 0.0;
        for (int i : indices())
            sum += that[i]*this.get(i);
        return sum;
    }
}

Creative Commons License
Melon blog is created by melonskin. This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
© 2016-2019. All rights reserved by melonskin. Powered by Jekyll.