0

The undelying use case

It is typical pubsub use case: Consider we have M news sources, and there are N subscribers who subscribe to the desired news sources, and who want to get news updates. However, we want these updates to land up in mongodb - essentially maintain most recent 'k' updates (and can be indexed and searched etc.). We want to design for M to scale upto million publishers, N to scale to few millions.

Subscribers' updates are finally received and stored in more than one hosts and their native mongodbs.

Modeling in rabbitmq

Rabbitmq will be used to persist the mappings (who subscribes to which news source).

I have setup a pubsub system in this way: We create publisher exchanges (each mapping to one news source) and of type 'fanout'.

For modelling subscribers, there are two options.

In the first option, have one queue for each subscriber bound to relevant publisher exchanges. And let the client process open connections to all these subscriber queues and receive the updates (and persist them to mongodb). Note that in this option, when the client is restarted, it has to manage list of all susbcribers, and open connections to all subscriber queues it is responsible for.

In the second option, we want to be able to remove overhead of having to explicitly open on each user queue upon startup. Instead, we want to listen to only one queue - representative of all subscribers who will send updates to this client host.

For achieving this, we first create one exchange for each subscriber and let it bind to the publisher exchange(s) that it follows. We let a single queue for each client, and let the subscriber exchange bind to this queue (type=direct) if the subscriber belongs to that client.

Once the client receives the update message, it should come to know which subscriber exchange it came from. Only then we can add it to mongodb for relevant subscriber. Presumably the subscriber exchange should add this information as a new header on the message.

As per rabbitmq docs, I believe there is no way to get achieve this. (Or more specifically, to get the 'delivery path' property from the delivered message, from which we can get this information).

My questions:

  • Is it possible to add a new header to message as it passes through exchange?
  • If this is not possible, then can we achieve it through custom exchange and relevant plugin? Any plugin that I can readily use for this purpose?
  • I am curious as to why rabbitmq is not providing delivery path property as an optional configuration?
  • Is there any other way I can achieve the same? (See pubsubhubbub note below)

PubSubHubBub

The use case is very similar to what pubsubhubbub protocol provides for. And there is rabbitmq plugin too called rabbithub. However, our system will be a closed system, and I believe that the webhook approach of the protocol is going to be too much of overhead compared to listening on single queue (and from performance perspective.)

uniwalker
  • 211
  • 2
  • 8
  • I don't quite get your model. Why can't you have multiple subscriber queues? – Wiktor Zychla Nov 23 '13 at 20:23
  • The amqp client will use one queue, consume the messages to store them in database - on behalf of those (thousands of) subscribers. Mappings are all persisted in rabbitmq. So if the system is restarted, the client listens to only one queue rather than opening thousands of connections to all individual subscriber queues. – uniwalker Nov 24 '13 at 02:13
  • can you describe you basic idea behind you model a little more clearly. ie what you want to achieve not how you want to do it right now. I am with Wiktor something here doesn't make sense – robthewolf Nov 24 '13 at 10:13
  • Edited the question to specify my usecase. – uniwalker Nov 25 '13 at 04:11

1 Answers1

1

The producer (RMQ Client) of the message should add all the required headers (including the originator's identity) before producing (publishing) it on RMQ. These headers are used for routing.

If, while in transit, the message (including headers) needs to be transformed (e.g. adding new headers), it needs to be sent to the transformer (another RMQ Client). This transformer will essentially become the new publisher.

The actual consumer should receive its intended messages (for which it has subscribed to) through single queue. The routing of all its subscribed messages should be arranged on the RMQ Exchange.

Managing the last 'K' updates should neither be the responsibility of the producer nor the consumer. So, it should be done in the transformer. Producers' messages should be routed to this transformer (for storage) before further re-routing to exchange(s) from where consumers consume.

  • In a pub-sub setup (i.e. fanout type exchange), the message is replicated to all the subscribers. So isn't it nice to add a header to message as to which exchange or queue it got replicated to - as a config option? Are there reasons (from architecture point of view) as to why this is not recommended even as an extension? – uniwalker Nov 27 '13 at 02:04