1

Is there existing software for a production-ready low-latency distributed log? The idea is to store input messages to services before they are sent to the service itself. When the service starts up, it takes the most recent snapshot of its state and replays the input messages. If the service is deterministic, multiple instances of the service could be run at once. This would give both high-availability, and zero downtime deployments.

There are many logs available, but some are high latency, some are not quite distributed/clusterable, and some are not yet production ready.

Available distributed log software

  • Kafka

  • NATS Streaming

  • TANK

  • DistributedLog

  • Pulsar

  • RocketMQ

  • Liftbridge

  • Jocko

  • LogDevice

Requirements

  • Message/event persistence (either in memory or also on disk)

  • Message ordering within a topic/partition

  • At-least-once-delivery: Message acknowledgements between publisher and server (for publish operations) and between subscriber and server (to confirm message delivery)

  • Historical message replay by subject: New subscriptions may specify a start position in the stream of messages stored for the subscribed subject's channel.

  • High availability: Should have multiple clustered nodes, with replication between them

  • Low latency: If we're going to be waiting until two nodes have received the message before sending it on, then it has to be low latency. Ideally just a few milliseconds, but nothing more than tens of milliseconds. This is the main reason why it seems Kafka is unsuitable for this.

Are there any options I have missed?

veiph
  • 21
  • 3
  • You aren't required to wait for 2 (or more) acknowledgements in Kafka – OneCricketeer May 05 '19 at 17:14
  • True, but if you don't wait for any acknowledgements then you have no idea if the message will be persisted. Two acknowledgements is probably the least you can do to be pretty sure it will be, as two node failures at the same time are unlikely. In a system designed for low latency, you would expect it to send the message to both nodes at the same time and to get the acks at roughly the same time, roughly the time it takes to do two network hops (to a node and back). Kafka seems to take much longer when configured like this, more than 50ms. – veiph May 05 '19 at 20:11

0 Answers0