1

I am looking for a suggestion on how we can implement a short lived queues(topic) to perform an ETL, after the ETL is completed that queue(topic) and data is not needed anymore.

Here is the scenario.. where a particular job runs, it has to run a query to extract data from database(assume teradata) and load it in a topic. Then a spark job will be kicked off and it will process all the records in that topic and stop the spark job. After that topic and data in that is not needed anymore.

For this I see Kafka and Redis stream as 2 options, looks to me Redis steam is the most appropriate tool because of ease of creating topics and destroying. with Kafka I see it requires additional custom handlers for creating the topics and drop the topic etc, also don't want to exploitate Kafka with too many topics.

I am open and happy to hear from you if we have another alternate and better solution out there.

Varma
  • 47
  • 4

0 Answers0