-1

Our application receives a large dataset(Ex 100 GB) from third party tool every night and we need to process it and feed it to our system by end of day. We are free to use any infrastructure but processing time should be as minimal as possible ,

Can anyone please suggest an optimised approach .?

monk_7
  • 17
  • 2
  • not sure Why it is a -ve vote.It is not that i haven't had any research before posting question. I wanted to understand the Standard procedure . – monk_7 Dec 03 '21 at 04:31

2 Answers2

0

ETL (Extract Transform and Load) tools are common and well-known. They are well-optimized for parallelism and process tracking.

Rob Conklin
  • 8,806
  • 1
  • 19
  • 23
-1

You need to Spark based file processing.

Kris
  • 67
  • 3
  • 1
    Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Sep 30 '21 at 12:54