Hi Everyone,
I've requirement to read streaming data from Azure EventHub and dump it to blob location. As per the cost optimization, i cannot prefer either Stream Analytics or Spark Streaming. I can only go with Spark batch job, that i need to explore how to read data from Azure EventHub as a batch(preferably previous day's data) and dump it to blob. My Azure EventHub holds 4 days of data, i need to make sure that i should avoid duplicates every-time i read the data from Azure EventHub.
I'm planning to read the data from azure event-hub once in a day using spark, is there a way i can maintain some sequence every time i read the data so to avoid duplicates.
Any help would be greatly appreciated.