I have this legacy system which streams records into a queue (Azure Event Hubs) in the pace they are changed and, every 24h, another process reads all records and dumps them all into the stream. This mechanism let's any consumer to recreate the data by reading last +24h of this stream.
I'm using Spark to read this stream and recreate a view of the original data (I can't read it directly, unfortunately). This data will be joined by other spark jobs, both for batching and for streaming.
What are my options in terms of suitable storage backend?
Is Delta Table suitable for this kind of load? Should I use a No Sql backend (eg Mongo DB) instead?