Read "Dynamo, Amazon’s Highly Available Key-value Store" Read "Bigtable, A Distributed Storage System for Structured Data" Read "Streaming Systems" 3, Watermarks Read "Streaming Systems" 1&2, Streaming 101 Read "F1, a distributed SQL database that scales" Read "Zanzibar, Google’s Consistent, Global Authorization System" Read "Spanner, Google's Globally-Distributed Database" Read "Designing Data-intensive applications" 12, The Future of Data Systems IOS development with Swift Read "Designing Data-intensive applications" 10&11, Batch and Stream Processing Read "Designing Data-intensive applications" 9, Consistency and Consensus Read "Designing Data-intensive applications" 8, Distributed System Troubles Read "Designing Data-intensive applications" 7, Transactions Read "Designing Data-intensive applications" 6, Partitioning Read "Designing Data-intensive applications" 5, Replication Read "Designing Data-intensive applications" 3&4, Storage, Retrieval, Encoding Read "Designing Data-intensive applications" 1&2, Foundation of Data Systems Three cases of binary search TAMU Operating System 2 Memory Management TAMU Operating System 1 Introduction Overview in cloud computing 2 TAMU Operating System 7 Virtualization TAMU Operating System 6 File System TAMU Operating System 5 I/O and Disk Management TAMU Operating System 4 Synchronization TAMU Operating System 3 Concurrency and Threading TAMU Computer Networks 5 Data Link Layer TAMU Computer Networks 4 Network Layer TAMU Computer Networks 3 Transport Layer TAMU Computer Networks 2 Application Layer TAMU Computer Networks 1 Introduction Overview in distributed systems and cloud computing 1 A well-optimized Union-Find implementation, in Java A heap implementation supporting deletion TAMU Advanced Algorithms 3, Maximum Bandwidth Path (Dijkstra, MST, Linear) TAMU Advanced Algorithms 2, B+ tree and Segment Intersection TAMU Advanced Algorithms 1, BST, 2-3 Tree and Heap TAMU AI, Searching problems Factorization Machine and Field-aware Factorization Machine for CTR prediction TAMU Neural Network 10 Information-Theoretic Models TAMU Neural Network 9 Principal Component Analysis TAMU Neural Network 8 Neurodynamics TAMU Neural Network 7 Self-Organizing Maps TAMU Neural Network 6 Deep Learning Overview TAMU Neural Network 5 Radial-Basis Function Networks TAMU Neural Network 4 Multi-Layer Perceptrons TAMU Neural Network 3 Single-Layer Perceptrons Princeton Algorithms P1W6 Hash Tables & Symbol Table Applications Stanford ML 11 Application Example Photo OCR Stanford ML 10 Large Scale Machine Learning Stanford ML 9 Anomaly Detection and Recommender Systems Stanford ML 8 Clustering & Principal Component Analysis Princeton Algorithms P1W5 Balanced Search Trees TAMU Neural Network 2 Learning Processes TAMU Neural Network 1 Introduction Stanford ML 7 Support Vector Machine Stanford ML 6 Evaluate Algorithms Princeton Algorithms P1W4 Priority Queues and Symbol Tables Stanford ML 5 Neural Networks Learning Princeton Algorithms P1W3 Mergesort and Quicksort Stanford ML 4 Neural Networks Basics Princeton Algorithms P1W2 Stack and Queue, Basic Sorts Stanford ML 3 Classification Problems Stanford ML 2 Multivariate Regression and Normal Equation Princeton Algorithms P1W1 Union and Find Stanford ML 1 Introduction and Parameter Learning

Read "Streaming Systems" 3, Watermarks




  • Assumption: Any event in the streaming data has an associated logical event timestamp.
    • Can use the time of the original event’s occurrence.
  • Definition: The watermark is a monotonically increasing timestamp of the oldest work not yet completed.
    • Completeness: Watermark allows to know when it’s correct to close a window.
      • If watermark has passed a time T, we are guaranteed by its monotonic policy that no more processing would occur for an event with event time at or before T.
    • Visibility
      • Watermark cannot advance if an event is being stuck in the pipeline.
      • We can find the source of problem if this happens.

Source watermark creation

Watermarks can be perfect or heuristic.

  • Perfect watermarks account for all data.
  • Heuristic watermarks admit some late-data elements.

Perfect watermark creation

This requires perfect knowledge of the input, thus is not impractical for many real-world problems. Following examples can utilize perfect watermarks.

  • Ingress timestamping
    • Use the ingress times as the event times fot data items.
  • Static sets of time-ordered logs.
    • Just the minimum timestamp of unprocessed records across all partitions.

Heuristic watermark creation

Using heuristic watermarks can lead to late data, but it’s still possible to build a highly accurate heuristic watermarks. For example,

  • Dynamic sets of time-ordered logs
    • By tracking the minimum timestamp for unprocessed records in known set, growth rates, external knowledge like network topology and bandwidth.
  • Google Cloud Pub/Sub
    • No guarantee on in-order delivery.

There is no one-fits-all solution. At least, we simplify the problem of tracking completeness in a pipeline, into the problem of creating a watermark at the source.

Watermark propagation

A pipeline may have multiple independent stages. It’s meaningful to track individual watermarks for these stages. Note that for stages come later, their watermark is older than those of prior stages, since they have seen less records.

For a stage, defining watermarks at boundaries. We can calculate the amount of event-time latency/lag introduced by a stage.

  • Input watermark:
    • minimum of the output watermarks of its upstream input/sources/stages.
  • Output watermark:
    • minimum of its input watermark and event times of all non-late active messages within the stage.

Within a stage, processing is not monolithic, as different components/buffer may exist and they can have their own watermarks as well. Therefore, the output watermark of the stage may be the minimum of:

  • Per-source watermark: for each sending stage.
  • Per-external input watermark.
  • Per-state component watermark: for each type of state that can be written.
  • Per-output buffer watermark: for each receiving stage.

Watermark propagation and output timestamps

Within a stage, processing can be divided into windows in event-time domain. The output timestamp of each window advances the output watermark of this stage. We have several choice for the output timestamp for a window:

  • End of the window
    • allows the smoothest watermark progression.
  • Timestamp of the first non-late element
    • Most conservative
    • Watermark progression may be likely delayed.
      • Output watermark is held until that window processing is complete.
  • Timestamp of a specific element
    • For some special use cases, it may make sense.

The tricky case of overlapping windows

If an element is in 3 overlapping windows, when 1st stage of 1st window complete, the input watermark of the 2nd stage of the 1st window is held by the output watermark of the 1st stage of the 2nd and 3rd windows. This delay is unnecessary. Apache Beam has a special logic for this: output timestamp of N+1 window is always greater than that of N window.

Percentile Watermarks

Instead of using the minimum of the event timestamps of active messages, we can consider the entire distribution of event timestamps and use percentile watermark.

  • A 90-percentile watermark means we are guaranteed to have processed this percentage of all events with earlier timestamp.
  • Avoids being delay by outliers.
  • A good way to tune the trade-off between latency of materializing results and precision of results.

Processing-time watermarks

  • Event-time based watermark is unable to distinguish delays caused by old data or a delayed system (processing being stuck).
  • Processing-time watermark is the processing-time timestamp of the oldest active message.
  • Can be used at system-implementation level for tasks such as garbage collections.

Case studies

Watermarks in Google Cloud Dataflow

Source watermarks for Google Cloud Pub/Sub

Creative Commons License
Melon blog is created by melonskin. This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
© 2016-2020. All rights reserved by melonskin. Powered by Jekyll.