Gossip in distributed systems
Gossip in distributed systems !?, yes you heard it right, Gossip protocol is a peer-to-peer communication protocol in which nodes periodically exchange state information about themselves and about other nodes they know about. The gossip process runs every second and exchanges state messages with up to three other nodes in the cluster.
The nodes exchange information about themselves and about the other nodes that they have gossiped about, so all nodes quickly learn about all other nodes in the cluster. A gossip message has a version associated with it, so that during a gossip exchange, older information is overwritten with the most current state for a particular node.
Background
In a large distributed environment where we do not have any central node that keeps track of all nodes to know if a node is down or not, how does a node know every other node’s current state? The simplest way to do this is to have every node maintain a heartbeat with every other node. Then, when a node goes down, it will stop sending out heartbeats, and everyone else will find out immediately. But, this means O(N2)O(N2)messages get sent every tick (NN being the total number of nodes), which is a ridiculously high amount and will consume a lot of network bandwidth, and thus, not feasible in any sizable cluster. So, is there any other option for monitoring the state of the cluster?
Definition
Each node keeps track of state information about other nodes in the cluster and gossip (i.e., share) this information to one other random node every second. This way, eventually, each node gets to know about the state of every other node in the cluster.
Solution
This is how Gossip protocol works if any node goes down. Let’s discuss with Gossip protocol definition. Gossip message has a version associated with it, so that during a Gossip exchange, other information is overwritten with the most current state for a particular node.
We can split when that we have a quorum of four nodes and two nodes are not to trouble. When the network heals and we have an again cluster of six nodes, so everything is connected. Then, we need to do something with the state that is a style in those two nodes. So, we need to merge those new states in a cluster of quorum with other nodes. And, then Gossip protocol will choose that newest state, so there is a possibility of losing state.
Example
Dynamo & Cassandra use gossip protocol which allows each node to keep track of state information about the other nodes in the cluster, like which nodes are reachable, what key ranges they are responsible for, etc.