Is there a way to make an Apache Flume Source resilient?

Question

I've found some information in the docs and online threads that describe how to define a failover node for a Flume Sink, but what about a Flume Source? Is there a way to define a failover for a Source, or have a Source operate over an array of nodes rather than just a single node so that if my FlumeAgent node dies, my Source can still receive events?

If I understand your question correctly. you can run multiple flume instances with same agent configuration of different nodes. — gorros, Jun 16 '17 at 09:09
@gorros Doesn't that duplicate the events? Unless I set something like a DNS Anycast or something, a service like a Webhook would just post messages to all of my Flume Agents, right? — josiah, Jun 16 '17 at 18:04
I don't really know what you setup is but probably you would need some load balancer or message queue. I usually have kafka as source or sink. — gorros, Jun 16 '17 at 19:50
I would suggest following advice from [http://shop.oreilly.com/product/0636920033196.do](http://shop.oreilly.com/product/0636920033196.do) `By default, Flume uses the groupId flume when reading from Kafka. Adding multiple Flume sources with the same groupId will mean that each Flume agent will get a subset of the messages and can increase throughput. In addition, if one of the sources fails, the remaining sources will rebalance so that they can continue consuming all messages. Flume’s Kafka source is reliable and will not lose messages if a source, channel, sink, or agent fails.` — gorros, Jun 19 '17 at 09:58
So, at the end of the day, the basic solution is "stand up Kafka in-front of Flume" — josiah, Jun 21 '17 at 15:49

Is there a way to make an Apache Flume Source resilient?

0 Answers0