Atomic broadcast

In fault-tolerant distributed computing, an atomic broadcast or total order broadcast is a broadcast where all correct processes in a system of multiple processes deliver the same set of messages in the same order; that is, the same sequence of messages. The broadcast is termed "atomic" because it either eventually completes correctly at all participants, or all participants abort without side effects. Atomic broadcasts are an important distributed computing primitive.

The following properties are usually required from an atomic broadcast protocol:

Rodrigues and Raynal and Schiper et al. define the integrity and validity properties of atomic broadcast slightly differently.

Note that total order is not equivalent to FIFO order, which requires that if a process sent message 1 before it sent message 2, then all participants must deliver message 1 before delivering message 2. It is also not equivalent to "causal order", where message 2 "depends on" or "occurs after" message 1 then all participants must deliver message 2 after delivering message 1. While a strong and useful condition, total order requires only that all participants deliver the messages in the same order, but does not place other constraints on that order.

Designing an algorithm for atomic broadcasts is relatively easy if it can be assumed that computers will not fail. For example, if there are no failures, atomic broadcast can be achieved simply by having all participants communicate with one "leader" which determines the order of the messages, with the other participants following the leader.

However, real computers are faulty; they fail and recover from failure at unpredictable, possibly inopportune, times. For example, in the follow-the-leader algorithm, what if the leader fails at the wrong time? In such an environment achieving atomic broadcasts is difficult. A number of protocols have been proposed for performing atomic broadcast, under various assumptions about the network, failure models, availability of hardware support for multicast, and so forth.

In order for the conditions for atomic broadcast to be satisfied, the participants must effectively "agree" on the order of delivery of the messages. Participants recovering from failure, after the other participants have "agreed" an order and started to deliver the messages, must be able to learn and comply with the agreed order. Such considerations indicate that in systems with crash failures, atomic broadcast and consensus are equivalent problems.

...
Wikipedia