System Design Question :Processing third party large file every night

Question

Our application receives a large dataset(Ex 100 GB) from third party tool every night and we need to process it and feed it to our system by end of day. We are free to use any infrastructure but processing time should be as minimal as possible ,

Can anyone please suggest an optimised approach .?

not sure Why it is a -ve vote.It is not that i haven't had any research before posting question. I wanted to understand the Standard procedure . — monk_7, Dec 03 '21 at 04:31

score 0 · Answer 1 · answered May 11 '21 at 12:59

0

ETL (Extract Transform and Load) tools are common and well-known. They are well-optimized for parallelism and process tracking.

answered May 11 '21 at 12:59

Rob Conklin

8,806
1
19
23

Thanks Rob. Will give a try with ETL implementation . Thanks again for sharing – monk_7 May 12 '21 at 01:29

score -1 · Answer 2 · answered Sep 30 '21 at 10:20

-1

You need to Spark based file processing.

answered Sep 30 '21 at 10:20

Kris

67
3

1

Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Sep 30 '21 at 12:54

System Design Question :Processing third party large file every night

2 Answers2