The reasons you should switch to Quorum Queues - CloudAMQP (2024)

And why we are updating our plan structure to include Quorum Queues.

We just published our new plans!There is a huge difference in them compared to earlier plans, the maindifference being our decision to switch away from classic mirrored queues. This is the story of the reasonswe are switching away from classic mirrored queues in favor of using quorum queues.

Perhaps one of the most significant changes in RabbitMQ 3.8 was the new queue type called Quorum Queues.This is a replicated queue to provide high availability and data safety. Soon after introducing quorumqueues, the RabbitMQ team wrote this on their website:

“In many cases, quorum queues would be a superior option to classic queue mirroring.Readers are encouraged to get familiar with quorum queues and consider them instead of classic mirrored queues.”

Obviously, we had to investigate, and here is what we found.

Mirrored Queues inefficient algorithm

The main problems revolve around the synchronization model and performance. Performance is slower than it should bebecause messages are replicated using a very inefficient algorithm. HA queue synchronization is a troublesome topicthat RabbitMQ administrators often have to learn the hard way to work with.

Mirrored queues work with a single leader queue and one or more mirror queues. All reads and writes go throughthe leader queue, which replicates all the commands (write, read, ack, nack, etc.) to the mirrors. Once allthe live mirrors have the message, the leader will send a confirmation to the publisher. At this point, ifthe leader failed, a mirror would get promoted to leader and the queue would remain available, with no dataloss. This pointed out a couple of design flaws.

Design flaw #1 - When a broker goes offline and comes back again

The basic problem is that when a broker goes offline and comes back again, any data it had in mirrors getsdiscarded. This is critical design flaw #1. Now that the mirror is back online but empty, the administratorshave a decision to make: to synchronize the mirror or not. "Synchronize" means to replicate the currentmessages from the leader to the mirror.

Design flaw #2 - Synchronization blocking

That's where critical design flaw #2 comes in. Synchronization blocks the whole queue, causing it to become unavailable.That's not normally a problem if the queues are short and the messages in them are semi small. But if the queues arevery long (>1M msgs) or very large (>1GB) this can take a very long time. Messages get published and consumed at thesame rate and messages remain in the queue for a very short time. But sometimes a queue can grow large, either bychoice or because a downstream system is slow or offline. In the meantime, the system remains available butaccumulating messages in its queues.

If you have no messages or a few thousand small messages, then the impact of synchronization is small. Synchronizationwill be quick and publishers can resend any messages that don't get accepted by the broker while it is unavailable.But when a queue is large, the impact is much greater. It can take minutes, hours, or in very extreme cases, evendays to synchronize (though in most of the cases when we see this, the server crashes before it finishes, and thenit needs to start over again). Not only that, but synchronization has been known to cause memory-related issues onthe cluster, sometimes even causing synchronization to get stuck requiring a reboot.

All mirrored queues synchronize automatically, by default, but sometimes administrators simply choose not to synchronizea mirror. All new messages would get replicated but any existing messages would not, causing reduced redundancy andexposing the cluster to a greater chance of message loss.

These issues also made rolling upgrades problematic as a rebooted broker would discard all its data and requiresynchronization to recover full data redundancy.

Quorum Queues

Quorum queues aim to resolve both the performance and the synchronization failings of mirrored queues.Using a variant of theRaft protocolwhich has become the industry de facto distributed consensus algorithm,quorum queues are both safer and achieve higher throughput than mirrored queues.So, what does that mean?

Each quorum queue is a replicated queue; it has a leader and multiple followers. A quorum queue with a replicationfactor of five will consist of five replicated queues: the leader and four followers. Each replicated queue willbe hosted on a different node (broker).

Clients (publishers and consumers) always interact with the leader, which then replicates all the commands(write, read, ack, etc.) to the followers. The followers do not interact with the clients at all; they existonly for redundancy, allowing availability when a RabbitMQ broker fails, is shut down or gets rebooted. Whena broker goes offline, a follower replica on another broker will be elected leader and service will continue.

Quorum queues have their name because all operations (message replication and leader election) require a majority(known as a quorum) of the replicas to agree. When a publisher sends a message, the queue can only confirm it oncea majority of replicas have written the message to disk. This means that a slow minority does not slow down thequeue as a whole. Likewise, a leader can only be elected when a majority agrees to it, and this prevents two leadersfrom accepting messages when a network partition occurs. So quorum queues are oriented towards consistency over availability.

CloudAMQP plan Specification

1/3/5 nodes: Cold Standby setup

Monitoring tools automatically detect if a node goes down, and if that happens, a new instance (a cold standby system) isavailable within seconds and the data is mounted on the new instance. The data located on EBS drives are replicated 3 timesfor each EBS drive (i.e 15 times for 5 node cluster).

You get a single URL to the instance. Failover and load balancing between nodes is done over DNS, with a low TTL (30s).Users connecting overAWS PrivateLink.are load-balanced over elastic load balancing (ELB) instead of DNS load balancer.

Multi-AZ on AWS and GCE

Multi-Availability Zone (AZ) means that the nodes are located in different availability zones. An Availability Zone isan isolated location within a region and a region is a separated geographic area (like EU-West or US-East). Multi-AZis a standard setting for all clusters (with at least three nodes) for CloudAMQP instances hosted in AWS and GCE.

Availability set on Azure

CloudAMQP Instances in Azure were placed in an availability sets until December 21, 2022. Azure makes sure that all nodes within an availabilityset run across multiple physical servers etc. Availability zones in Azure are supported by CloudAMQP for all clusters created since December 21, 2022.

Poison Message Handling

Quorum queues support the handling of poison messages, which are messages that cause consumers to requeue messages dueto a failure of some kind. Poison messages are never consumed completely and therefore never positively acknowledgedso that they can be marked for deletion by RabbitMQ. Quorum queues keep track of unsuccessful delivery attempts andinclude the information in the “x-delivery-count” header added to any redelivered message.

MQTT and quorum queues

MQTT is using raft for client tracking. The protocol is not working on 2 nodes (it requires a quorum). We thereforerecommend users on a two node cluster using MQTT to upgrade to the new plans.

1 node: Best performance, cold standby for high availability, always consistent

One node plans are fastest and simple, all data that is written to disk is safe, but if you use transient messages ornon-durable queues you might lose messages if there's a hardware failure and the node has to be restarted.

In order to avoid losing messages in a single node broker, you should be prepared for broker restarts, broker hardwarefailure, or broker crashes. To ensure that messages and broker definitions survive restarts, ensure that they are onthe disk. Messages, exchanges, and queues that are not durable and persistent will be lost during a broker restart.

If you cannot afford to lose any messages, make sure that your queue is declared as “durable” and that messages aresent with delivery mode "persistent". The messages will be saved to disk and everything will be intact when the nodecomes back up again.

3 nodes cluster: Data replicated over 3 nodes in 2 AZs

A CloudAMQP cluster with three nodes gives you three RabbitMQ servers. These servers are placed in different zones(availability zones in AWS) in all data centers with support for zones.

Classic mirrored queues

The queues are automatically mirrored between nodes. Each mirrored queue consists of one leader and one or more followers.Messages published to the queue are replicated to all followers. RabbitMQ HA is used to handle this via a default policy.

Classic queues used in CloudAMQP, in a 3 node cluster, are be defaultmirrored into two nodes.

Quorum Queues

The default quorum queue cluster size is set to the same value as the number of nodes in the cluster. This argumentcan be overridden with queue the argument quorum_cluster_size.

Read more about configuration options here:https://www.rabbitmq.com/quorum-queues.html#configuration

5 nodes: Data replicated over 5 nodes in 3 AZs

A CloudAMQP cluster with five nodes gives you five RabbitMQ servers. The servers are divided into differentavailability zones in data centers with support for zones. Two nodes will be allocated in one zone, two inthe next, and the final node in a third zone.

Classic mirrored queues

The queues are automatically mirrored between nodes. Each mirrored queue consists of one leader and one or more followers.Messages published to the queue are replicated to all followers. RabbitMQ HA is used to handle this via a default policy.

Classic queues in a 5 node cluster are mirrored into two nodes.

Quorum Queues

The default quorum queue cluster size is set to the same value, as the number of nodes in the cluster. This argument can beoverridden with queue arguments:https://www.rabbitmq.com/quorum-queues.html#configuration

Frequently Asked Questions

Q: Are mirrored queues still supported?

A:Yes, mirrored queues will still behave in the same way as before, with the difference that CloudAMQP mirrored queues are only mirroredto two nodes by default. This can be overridden to 3, 4 or 5 nodes but is not advisable by CloudAMQP due to a lot of intra cluster traffic.

Q: Do I need to change anything in my code to use quorum queues?

A:Yes, the queue has to be declared as a quorum queue using a queue declare argument, but apart from that you publish and consume from it like normally.With publish confirm and manual acknowledgement of course.

Certain features are not available for Quorum Queues. Check if your current setup requires any of the following features, and if it does,reach out to us for help so that we can discuss your setup.

These features arenotavailable with Quorum Queues:

  • Non-durable messages
  • Queue Exclusivity
  • Some policies are not available. Only dead letter exchange and length limits are available.
  • Priorities

Q: Why do you not set up a plan with 7 or more nodes?

A:Performance can be affected when quorum queue node sizes larger than 5 are in place.Also, AMQP doesn't support client steering, so it's hard to scale out a rabbitmq cluster horizontally.

Read more

  • Quorum Queues - RabbitMQ.com
  • RabbitMQ 3.8 Feature Focus - Quorum Queues
  • Quorum Queues Internals - A Deep Dive
  • [Video] Lifting the lid on Quorum Queues (25 min)

Enjoy this article? Don't forget to share it with others. 😉

The reasons you should switch to Quorum Queues - CloudAMQP (2024)
Top Articles
Revitalize Plants With Used Tea Leaves - Green Packs
Development of a Premium Tea-Picking Robot Incorporating Deep Learning and Computer Vision for Leaf Detection
Wmaz 13
Consignment Shops Milford Ct
Maricopa County Property Assessor Search
Ohio State Football Wiki
Ann Taylor Assembly Row
104 Whiley Road Lancaster Ohio
Aces Charting Ehr
Indicafans
Real Estate Transfers Erie Pa
Pokemon Infinite Fusion Good Rod
Integrations | Information Technology
Georgia Vehicle Registration Fees Calculator
Walgreens Dupont Tonkel
Yoga With Thick Stepmom
Mchoul Funeral Home Of Fishkill Inc. Services
New Jersey Map | Map of New Jersey | NJ Map
Ghostbusters Afterlife 123Movies
Ihub Fnma Message Board
Dr Seuss Star Bellied Sneetches Pdf
Hartford Healthcare Employee Tools
Sunset On November 5 2023
Craigslist Eugene Motorcycles
Kahoot Spamming Bots
Black Adam Showtimes Near Linden Boulevard Multiplex Cinemas
Highway 420 East Bremerton
Black Boobs Oiled
Caliber Near Me
Littleton U Pull Inventory
Conner Westbury Funeral Home Griffin Ga Obituaries
Kobe Express Bayside Lakes Photos
85085 1" Drive Electronic Torque Wrench 150-1000 ft/lbs. - Gearwrench
Megan Hall Bikini
Hyb Urban Dictionary
Ridgid Pro Tool Storage System
Demetrius Meach Nicole Zavala
Tani Ahrefs
Cooktopcove Com
Mvsu Canvas
Meg 2: The Trench Showtimes Near Phoenix Theatres Laurel Park
Craigslist Ft Meyers
20|21 Art: The Chicago Edition 2023-01-25 Auction - 146 Price Results - Wright in IL
Hr Central Luxottica Benefits
Eurorack Cases & Skiffs
18006548818
Greenville Sc Greyhound
5613192063
Before Trump, neo-Nazis pushed false claims about Haitians as part of hate campaign
Fayetteville Arkansas Craigslist
Evalue Mizzou
Ceton Village Diggy
Latest Posts
Article information

Author: Manual Maggio

Last Updated:

Views: 6319

Rating: 4.9 / 5 (49 voted)

Reviews: 80% of readers found this page helpful

Author information

Name: Manual Maggio

Birthday: 1998-01-20

Address: 359 Kelvin Stream, Lake Eldonview, MT 33517-1242

Phone: +577037762465

Job: Product Hospitality Supervisor

Hobby: Gardening, Web surfing, Video gaming, Amateur radio, Flag Football, Reading, Table tennis

Introduction: My name is Manual Maggio, I am a thankful, tender, adventurous, delightful, fantastic, proud, graceful person who loves writing and wants to share my knowledge and understanding with you.