Synchronization
Motivation
- Synchronization of time is needed to coordinate events at a global level
Physical Clocks
- Basis is a mechanical or astronomical process
- have their limits with accuracy
- Problematic usage: Leap seconds, time zones, leap years, etc.
Example: SBB Clock
- Uhr selbst misst keine Zeit
- Jede Minute wird die Zeit synchronisiert
One-sided Update
- Problem: Network Latency creates an offset on the client
- Estimation of latency is difficult, we don't have enough data
NTP
- Tries to measure the network latency using the round-trip time
- Server sends back original request (
Cs) - Compute RTT with (Cr - Cs) - (Ss - Sr), client knows all of these
- Client can either update its time immediately or adjust the speed to gradually adjust to the correct time
- NTP adjusts gradually. Immediate update would lead to a "jump", mayble a scheduled task is in the gap and wouldn't be executed. (Time must be continous)
Logical Clocks
- Causal Order (Partial Order): Events are causally ordered, but can have no relation (unlike physical time)
- Total Order: Every event can be ordered (physical time, vector clocks)
Database Replication Example
- Problem: Lost updates
- Solution: Nodes must acknowledge updates, only perform update if it is acknowledged by all nodes
- Problem: Order of updates is not guaranteed
- Solution: Chronological Ordering with timestamps
- Problem: Time has to be synchronized between the nodes
- Solution: NTP
- Problem: Physical time has an arbitrary precision, updates could still have the same timestamp
Motivation
- It's often more important that processes agree on an order of events (causality) rather than an exact point in time (physical clock)
- Physical clocks have a limited, arbitrary resolution (accuracy)
Causality
- Event a happens before b -
- Can be reflexive (think DAG) -
- Concurrent Relation:
- and are not causally related
- Not enough for Database replication: We still need an (arbitrary) order for concurrent events, a strict total ordering -> Lamport's Logical Clock
Lamport's Logical Clock
- Each process maintains an internal counter for all events occuring in its process
- The current value of the counter is sent with each message
- The receiver updates its counter to the max of the sender and receiver-counter (respectively) and increments it
- Not totally ordered: There are identical counters on different processes, we can't define a "linked-list" order
- If we use the process id, we can order these concurrent events -> total order
- Use lexicographical ordering over
(eventid, processid)
- Lamport's Clock implementation is usually embedded in the middleware and provided by a library
- see also Lamport's paper: http://lamport.azurewebsites.net/pubs/time-clocks.pdf
- Problem: Not fault tolerant, a failure of a node would halt the distributed system
Vector Clocks
- Track causality exactly:
- Lamports clock only works in one direction, Vector clocks in both
- Each process maintains an internal Vector with the length = number of processes
- The dimensions correspond to the processes, first = P1, second = P2, etc.
- Update vector: When receiving a message, set each index to the max of the sender and receiver
- comparison: each index is equal or less than the other, and at least one is less than the other
- Problem: What if number of nodes in system changes? Vector dimension is the size of the network