RabbitMQ 3.8 Feature Focus - Quorum Queues

RabbitMQ 3.8 is coming this year, and it will bring four new major features.Perhaps the most significant is a new queue type called Quorum Queues which is areplicated queue to provide high availability and data safety. The idea is toreplicate a queue across multiple servers so that in the event of a servercrashing or being shut down, the queue continues to be available and withoutmessage loss.

RabbitMQ already has an existing solution for this, called Mirrored Queuesor HA Queues. Mirrored Queues has been the de facto way of getting highavailability and data redundancy for many years now, but the feature has someserious design flaws that has made it a less than ideal choice.

What is wrong with Mirrored Queues anyway?

The main problems are around the synchronization model and performance.Performance is slower than it should be because messages are replicatedusing a very inefficient algorithm. HA queue synchronization is a troublesometopic and RabbitMQ administrators live in fear of it.

The way that mirrored queues work are that there is a single leader queue andone or more mirror queues. All reads and writes go through the leader queue,and the leader then replicates all the commands (write, read, ack, nack etc)to the mirrors. Once all the live mirrors have the message, the leader willsend a confirm to the publisher. At this point, if the leader failed, a mirrorwould get promoted to leader and the queue would remain available, with no data loss.

Fig 1 - Leader to mirror replication

When you have multiple mirrored queues, the leaders and mirrors get distributedaround your cluster, so each broker can host multiple leaders and mirrors.

Fig 2 - Leaders and mirrors distributed across a cluster

The basic problem is that when a broker goes offline and comes back again,any data it had in mirrors gets discarded. This is the critical design flaw #1.Now that the mirror is back online but empty, the administrators have a decisionto make: to synchronize the mirror or not. "Synchronize" means replicate thecurrent messages from the leader to the mirror.

That's where critical design flaw #2 comes in. Synchronization is blocking,causing the whole queue to become unavailable. Usually, if everything is goingwell, queues should be empty or have a small number of messages in them.This is the usual healthy state. Messages are getting published and consumedat the same rate and messages remain in the queue for a very short time.But sometimes a queue can grow large, either by choice or because a downstreamsystem is slow or offline. In the meantime, the system remains available butaccumulating messages in its queues.

If you have no messages or a few thousand small messages, then the impact ofsynchronization is small. Synchronization will be quick and publishers canresend any messages that don't get accepted by the broker while it is unavailable.But when a queue is large, the impact is much greater. It can take minutes,hours or in very extreme cases even days to synchronize (though in most ofthe cases when we see this, the server crashes before it finishes, and thenit needs to start over and over again). Not only that, but synchronizationhas been known to cause memory related issues on the cluster sometimes evencausing synchronization to get stuck requiring reboot.

Quorum Queues - The next generation

Quorum queues aim to resolve both the performance and the synchronization failingsof mirrored queues. But it does so with a reduced set of features in its firstrelease and also introduces its own new headaches. Unfortunately we don't havean easy choice to make.

Quorum Queues uses a variant of theRaft protocolwhich has become the industry de facto distributed consensus algorithm.It is both safer and achieves higher throughput than mirrored queues.

Message Replication with Raft

Each Quorum Queue is a replicated queue; it has a leader and multiple followers.A common term to refer to these leaders and followers is the word replica.A quorum queue with a replication factor of five will consist of five replicas:the leader and four followers. Each replica will be hosted on a different node (broker).

Clients (publishers and consumers) always interact with the leader replica,which then replicates all the commands (write, read, ack etc.) to the followers.The followers do not interact with the clients at all; they exist only for redundancy,allowing availability when a RabbitMQ broker fails, is shutdown or rebooted.When a broker goes offline, a follower replica on another broker will be electedleader and service will continue.

Fig 3. Raft consensus

Quorum queues have their name because all operations (message replication and leader election)require a majority (known as a quorum) of the replicas to agree. When a publisher sends amessage, the queue can only confirm it once a majority of replicas have written the messageto disk. This means that a slow minority do not slow down the queue as a whole. Likewise,a leader can only be elected when a majority agree to it, and this prevents two leadersfrom accepting messages when a network partition occurs. So quorum queues are orientedtowards consistency over availability.

Quorum Queues - The Good Parts

Firstly

Clients don’t need to change how they publish and subscribe, the queuetype is not a concern to those operations. The only difference is when the queueis declared, it must be declared as a quorum queue. So if you rely on a clientto do queue declaration, you’ll need it to add the necessary properties.

Secondly

The issues of synchronization are gone. When brokers come back online,they do not discard their data. All messages remain on disk and the leader simplyreplicates messages from where it left off.

Replication of messages to a returning follower is non-blocking. So queues do notget so impacted by new followers or rejoining followers. The only impact can benetwork utilization.

Finally

Raft is more efficient than the mirrored queue algorithm and shouldprovide better throughput.

So far this all adds up to better throughput, better data safety, easierrolling server upgrades (like OS patches). But let's start looking at thedownsides of Quorum Queues.

The Not So Shiny Parts

Less Features

Certain features will not be available in the first release or never.The list of features not available with Quorum Queues:

Non-durable messages
Queue Exclusivity
Queue/message TTL
Some policies are not available. Only DLX and length limit are available.
Priorities
Lazy queues
No global QoS

Disk Usage - Write Amplification

Quorum queues have a different disk and memory profile to normal queues.

Normal queues have a shared storage model where a message is stored once andall queues that it gets delivered to simply get a reference to it. This meansthat in a publish-subscribe model, the fact that a message will be deliveredto multiple queues does not cause the on-disk storage size to grow linearlywith the number of queues.

Let's take the example of a fanout with 10 bound queues.

With each of the ten queues being a mirrored queue with a replication factor of 5.

We end up with 5 messages stored across the cluster for each message sent to the fanout exchange = write amplification x5.

Quorums Queues on the other hand, only have a shared model in memory. On disk each message is stored separately, so publish-subscribe creates a write amplification that may make Quorum Queues infeasible or require higher end disks at best.

With each of the ten queues being a quorum queue with a replication factor of 5.

We end up with 50 messages stored across the cluster for each message sent to the fanout exchange = write amplification x50.

Fan-out is not well suited to Quorum Queues and massive fanout probably just isn't possible at all.

Memory Usage - All messages in-memory all the time

The fact that all messages in Quorum Queues are always in-memory at all times alsoincreases memory usage, to the point that you can end-up causing unavailability of yourcluster. If unchecked, a growing queue could cause all ingress to cease until messagesget consumed and removed from memory. This is why when using Quorum Queues, it is vital thatLength Limit policies are applied and messages are offloaded to lazy queues via a dead letter exchange.

This makes planning and monitoring ever more important. A downstream outage orslowdown could cause multiple queues to grow and you need to plan accordingly.How many quorum queues do you have, what is the expected ingress velocity, whatother queues could be impacted if the cluster reaches its memory limit?

Permanent Loss of a Majority = Lost Queue

If a quorum of queue replicas is permanently lost, their data is gone forever,then even though a minority remains, the queue cannot be recovered and must be force deleted.This is an unlikely scenario, but the risk is there. Use reliable disks, and prefera replication factor of 5 to 3.

Latency

While throughput is better, latency may be higher. This comes down to the use of Raft. We don't get non-durable messages and all messages are always persisted to disk across all replicas. Safety is the primary goal of Quorum Queues.

Only the Beginning

Quorum queues is still in beta right now, but later this year they will beincluded in the 3.8 release, ready for production usage. You can start playingwith the beta version now which is pretty stable. You can find the latest 3.7and 3.8 releases onGitHub

The first release of Quorum Queues aims for minimal features, concentratingon reliability and performance. The RabbitMQ team has plans to improve many aspects,including memory usage. So while not a silver bullet by any means, for certainuse cases Quorum Queues offers a better alternative to mirrored queues.Read up more for yourself on theNext RabbitMQ page.

Please send us an email atcontact@cloudamqp.comif you have any questions or feedback to this blogpost.

RabbitMQ 3.8 Feature Focus - Quorum Queues - CloudAMQP (2024)