RabbitMQ is an open source message broker middleware created in 2007 and that is now managed by GoPivotal.
Most of the operations are performed in memory. RabbitMQ is not “disk-oriented”:
messages are received by brokers via an exchange (i.e. a logical entry point that will decide based on some criteria in which queue(s) the broker should place a message) and then pushed to the registered consumers. The broker pushes randomly queued messages toward the consumers. They thus receive unordered messages, and do not need to remember anything about the queue state (as messages are unordered and pushed by the brokers. They do not and cannot fetch specific messages on their own). Messages are paged out to disc only if there is no more memory available, or if they are explicitly told to be stored.
RabbitMQ features producers that generate messages and send them to an exchange. Exchanges apply routing rules on the message (possibly based on a routing key that the producer put in the message header) to decide whether the message should be delivered to zero, one or more queues (it then duplicates the message). Consumers have a permanent connection with the broker, which therefore knows which consumers are available, and to which queues they did subscribe. The broker pushes messages to the consumers whenever possible (that is, until the prefetch count is reached, or the consumer refuses messages).
One main difference with Apache Kafka is that messages (as well as queues and exchanges) are not persistent by default. All the elements that the brokers manage do not survive (with the default settings) if the broker is restarted or fails. Everything is kept in memory, which is a fundamental difference with the way Kafka works. Fortunately, RabbitMQ offers convenient settings to make both queues and messages durable. Messages themselves can be tagged as durable or not by the producers, so that they can choose on a per-message basis not to persist some non- critical messages sent to a durable queue.
- Exchanges and bindings
Constitutive parts of the AMQP protocol, exchanges and bindings dispatch messages that come from the producer to queues. A producer sends a message to a specific exchange (and not a queue), which in turn will decide whether it should redirect, duplicate or discard the message. Different types of exchanges are standardized and RabbitMQ supports all of them.
These routing choices can be altered by the value of the message’s routing key, which is put in the headers by the producers. The simplest example of exchange is the direct exchange, as depicted on Figure following:
It namely routes messages via the binding that has the same routing key as them. In the example on Figure above, the presented exchange is also the default exchange, which is an exchange that producers can send message to by specifying an empty string as destination exchange. The default exchange binds automatically all the queues that are created in the system to itself, with a routing key equals to the name of the queue. Therefore, sending a message to the default exchange with the routing key “appgrid” will route the message to the queue named “appgrid”, if it exists.
Another interesting exchange type is the fan-out exchange. As shown on Figure following, it will simply duplicate the messages it receives and put them in all the queues to which the exchange is bound. This is specifically useful to achieve publish/subscribe message distribution scheme, where multiple consumers should perform different tasks on the same messages. One and only one consumer listening to a queue in RabbitMQ consumes a message put in this queue. Duplicating messages is thus the only way to have the message processed multiple times by different consumers.
More complex routing can be achieved using other exchange types (headers, and topic exchanges), which will not be explained in this document. The reader may find additional information in the references and the glossary.
Queues in RabbitMQ are the endpoints for all messages that were successfully routed (i.e. that were not dropped by an exchange). A message in a queue will be (ideally) delivered only once, to only one consumer that subscribed to the queue. This design implies that to mimic the consumer group concept that Kafka features, messages must be duplicated, while each “false” consumer group should be bound to one queue.
Channels are a low level component of AMQP, implemented in RabbitMQ, which is used to multiplex connections from the same producer or consumer to the same broker but that are used by different threads for instance. Instead of having multiple TCP connections established between the peer and the broker, only one is instantiated and kept active, and RabbitMQ performs multiplexing to have all the channels share this single TCP connection.
RabbitMQ is a pretty standard message queuing system, and shares most of the general concepts of AMQP standard. However, even though it follows the AMQP standard (which in itself features some “unusual” functionalities), the development team developed some extensions to the protocol, extensions that can be useful in the studied use case:
- In-memory storage and optional (limited) durability
As already mentioned in the previous section, RabbitMQ stores all the elements it deals with in memory, while providing flags that can be put on queues and messages to mark them durable. However, the achieved durability is not comparable to the one that Apache Kafka features, in that the durability provided is just temporary: messages are written down to the disk only if there is no consumer that can process the message right away. Messages (no matter their durable flag value) can also be paged out to the disk if the memory gets exhausted on the broker. Once the message has been delivered (or that its consumption has been confirmed if these confirmations are enabled), the message is forgotten, even if it was tagged as durable. RabbitMQ does not provide a way to enforce permanent storage in a similar fashion to what Apache Kafka does, and as such does not allow consumers to replay old messages.
- Reliable delivery and message rejection
Messages delivered to consumers can be required to be acknowledged: this ensures that messages have been successfully consumed before they are removed from the broker’s memory. By default, no acknowledgements are needed, and a successful delivery to one of the registered consumers of a queue is enough for the broker to forget about the message. Consumption confirmations help reducing the number of unprocessed messages, while also giving the opportunity to consumer processes to notify the broker that a message could not be processed. They may indeed reject a message, and potentially queue it again for redelivery. For some use cases (e.g. a database that becomes temporarily unavailable and therefore prevents a message’s content to be written down to it, while the connection with RabbitMQ does not encounter any failure), this might be a convenient feature.
- Dynamically declared queues and exchanges
While the administrator of the broker can create them in advance, new queues and exchanges can be created directly into the code of producers and consumers. Actually, this is a very good practice to simulate their creation to ensure that messages will not be pushed to inexistent queues and exchanges .
- Permission management and SSL support
RabbitMQ requires producers, consumers and administrators to authenticate before they can do any operation on the broker. Different permissions can be given to different users, thus allowing the enforcement of an efficient and fine- grained access control. SSL is also supported and can be used to authenticate and encrypt data transmitted over the channels between the brokers and the producers and consumers.
- Multiple load balancing and replication alternatives
Thanks to the distribution capabilities an application coded in Erlang offers, RabbitMQ features different ways to replicate configurations and queues’ contents. One of them, as depicted on Figure above, is to create a RabbitMQ cluster.
In a cluster, individual nodes can fail, as there is a permanent full replication between all the members. Two types of nodes, disk and RAM nodes, make the cluster. A disk node writes its content to disk (exchange, queues and their contents, etc., while a RAM node keeps everything in memory (except for the queue contents, if they are explicitly told to be durable or too large). RAM nodes are usually faster then, and disk nodes are here to allow cluster recovery in case of complete shutdown (at least one is required). Peers can push and pull data from any of the nodes, at any time, provided that the node to which messages were pushed is still up and running (i.e. messages are stored only on the node that was the destination of the message. Messages are not replicated across nodes by default). Please note however that the links between the nodes that make the cluster must be highly reliable with low latencies for it to work properly, and that a bidirectional connection must be maintained between all the nodes.
The fact that messages are not replicated across the replica set makes the cluster quite useless. But thanks to the mirroring capabilities featured by RabbitMQ’s policies, queue contents can now be replicated across either none, a certain fixed number or all the members of a replica set, and provide automatic master election and failover. New nodes can be added on the fly, either catching up naturally (no pre-provisioning: as the queues are emptied on the old nodes, new nodes will eventually have the same contents as the other nodes) or being provisioned with the queue data before the node is put into function. This provisioning however may imply some unavailability time, as the node should have the exact same contents as the others. Note also that this mirroring feature does not do load balancing by itself: peers will publish and consume messages from the master node only, the replicas being used only in case of failure.
RabbitMQ also provides two other alternatives to share the load across multiple brokers. Namely, federation and shovel can be used on unreliable network links to share or copy messages from one broker to another. Messages can be buffered at the entry point in case of failure of the network link, and sent when it goes up again. Those alternatives allow for a partial load balancing approach: one may federate only some of the queues, and keep others local. More information on these approaches can be found in the references.