0

I'm a bit new to this world of batch vs stream processing and I'm going back and forth in making a call. In my case, we have an ELT tool that runs jobs both periodically(with intervals varying from 5 mins to 1 year) as well as on-demand. And this tool generates the data (such customer orders, revenues etc..) in the form of a json file which needs to be stored in a data warehouse. Does it make sense to use any streaming service say kafka/kinesis to transport this kind of data, as those jobs could even run once in 5-10 mins or would that be an overkill in this situation?

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
nerd
  • 15
  • 3
  • 1
    What would be your goal? Why are you considering using Kafka or Kinesis, is there any requirement that makes you think that this kind of services will help you? – Gerard Garcia Jun 15 '22 at 12:09
  • If you're just dumping in a data lake / warehouse, and nothing will query for that data until after every 5 minutes (or longer), then getting data available in "near real time" doesn't matter. You need to change your consumer models before you can consider changing how data is generated. Only other improvement could be - what if your load or transform steps fail? How many records will you lose? With a message queue, you can isolate that down to a granular record, in some cases – OneCricketeer Jun 15 '22 at 13:49

0 Answers0