Photon: Fault-tolerant and Scalable Joining of Continuous Data Streams

Notes from Google Research paper "Photon: Fault-tolerant and Scalable Joining of Continuous Data Streams (PDF).

On "Unique Event_Id Generation":

An event id consists of three fields: ServerIP, ProcessID and Timestamp.

On timestamp global ordering:

Note that generating the timestamp for each event id on each server locally may be adversely impacted by clock skew on the local machine. To limit the skew to S seconds, we use the TrueTime API [8] that provides clock synchronization primitives. By using GPS and atomic clocks, the primitives guarantee an upper bound on the clock uncertainty. Each server generating event ids executes a background thread that sends an RPC to one of the TrueTime servers every S seconds to synchronize the local clock.

The node local dispatcher does a form of filtering, only sending events to the "joiner" if the event has not already been processed:

Before sending events to the joiner, the dispatcher looks up each event id in IdRegistry to make sure the event has not been joined. This optimization technique significantly improved performance as observed from the measured results (Section 4).

On transactional atomicity:

when a joiner commits a click id to the IdRegistry, it also sends a globally unique token to the IdRegistry (consisting of the joiner server address, the joiner process identifier, and a timestamp) along with the click id.