0

Why Data Fusion, well coz I need to run several more steps ( run Data Proc clusters ) , insert to DBs and do it in a schedule. Also the data could explode ( 10s of TB ) or shrink ( 10s of GBs).

Gaurav Taneja
  • 1,084
  • 1
  • 8
  • 19

1 Answers1

0

Stacking several TB files isn't a good idea. The Storage size limit per object is 5TB.

I don't know your need of stacking file.

Maybe Bigquery can be a solution for loading easily your CSV files and then to query subset of file for further processing. But querying 10s of TB is expensive! (5$ per TB)

For more help, add more detail on what you want to achieve.

guillaume blaquiere
  • 66,369
  • 2
  • 47
  • 76