Formal Definition of Chandy-Lamport Snapshot Algorithm
The Chandy-Lamport Snapshot Algorithm is a distributed algorithm used to record a consistent global state of a distributed system. It assumes FIFO communication channels and uses a special control message called a marker to distinguish between messages to be included in the snapshot and those that are not. The algorithm ensures that the combination of recorded process states and channel states forms a logically consistent snapshot, even in the absence of a global clock, without interfering with the normal execution of processes.
A global state of a distributed system is the collection of the local states of all processes and the states of all communication channels in the system at some instant.
A consistent global state is a global state in which:
-
Causality is preserved – if an event (such as a message send) is included in the state, then every event that causally precedes must also be included.
-
No message appears to be received without being sent – i.e., if a message receipt is recorded in the state, then the corresponding send must also be recorded.
FIFO Communication Channels in Distributed Systems – Formal Definition:
A communication channel between two processes and is said to be FIFO (First-In-First-Out) if messages are delivered in the same order as they were sent.
Snapshot Algorithms for FIFO Channels – Chandy-Lamport Algorithm
The Chandy-Lamport Algorithm is used in distributed systems to record a consistent global state of the system. It captures both the state of processes and the communication channels such that the combination is meaningful, even though no global clock exists. The algorithm assumes FIFO (First-In-First-Out) channels.
Assumptions
- No failure – all messages arrive intact and exactly once.
- Communication channels are unidirectional and FIFO ordered.
- There exists a communication channel between each pair of processes.
- Any process may initiate the snapshot by sending a marker message.
- The snapshot does not interfere with normal execution.
Concept of Marker
- The algorithm uses a special control message called a marker.
- A marker separates messages in a channel into:
- Those to be recorded in the snapshot.
- Those not to be recorded.
- After recording its state, a process sends a marker along all its outgoing channels before sending further messages.
- A process must record its state no later than when it receives the first marker on any of its incoming channels.
Algorithm Steps
- Initiator process records its local state.
- Marker Sending Rule:
- After recording its state, the process sends one marker message along each outgoing channel before sending further messages on those channels.
- Marker Receiving Rule:
- On receiving a marker along channel C:
- If the process has not yet recorded its state:
- Record its own state.
- Record the state of channel C as empty.
- Start recording messages arriving on all other incoming channels until their markers are received.
- If the process has already recorded its state:
- Record the state of channel C as the set of messages received after recording its state but before receiving the marker on that channel.
- If the process has not yet recorded its state:
- On receiving a marker along channel C:
Properties of the Recorded Global State
- The recorded global state may not correspond to any actual state that occurred during execution.
- Each process records its own local state, and channel states are recorded based on marker separation.
- Since there is no global clock, processes and channels are not recorded at the same instant.
- However, the recorded combination forms a logically consistent snapshot.
Correctness
- The algorithm guarantees conditions C1 and C2:
- C1: All messages received before recording a process’s state are included.
- C2: No message sent after a marker is recorded as part of the channel state.
- This ensures that the snapshot is consistent across the distributed system.
Complexity
- A single execution requires O(e) messages (where e = number of edges in the network).
- Time complexity is O(d), where d is the network diameter.