Can data fusion process CSV files from GCS in batches? I need to process multiple folders worth of CSV files (different structures) into Big Query on my current project, and I am required to use Data Fusion. I tried simply connecting a GCS node with the path to the folder (not any of the files) connecting it to wrangler (parse-as-csv :body ',' true \ drop :body nothing too complex) and connecting that to Big Query multitable sink, but that did not work ("BigQuery Multi Table has no outputs. Please check that the sink calls addOutput at some point"). The only way to do this from what I see is to make a big pipeline that would manually connect all files to separate wranglers and Big Query sinks, but making something like that would be extremely time consuming and tedious. Is there any batch processors I dont know about yet?
Asked
Active
Viewed 481 times
0
-
Try connecting GCS source to BigQuery sink (not BigQuery MultiTable Sink). – user3126412 Nov 30 '21 at 18:24
-
@user3126412 It ends up processing only the first file, afterwards column structure chages, and it breaks – V. Kolom Dec 02 '21 at 09:37
1 Answers
0
Parse your CSVs with the GCS source by selecting appropriate type. Then connect it to the BigQuery Sink. Be sure to have your full schema specified as this is used in BigQuery Sink. The schema should correspond to the schema you want to see in BigQuery.

Віталій Тимчишин
- 41
- 1
-
It works well for multiple files with the same structures, but each CSV file I need to process has different structure – V. Kolom Dec 10 '21 at 16:10
-
Are you writing to the existing BigQuery tables? Is it a single table or multiple tables with different structure? It's often more about the destination (sink), not about the source. – Віталій Тимчишин Dec 11 '21 at 17:17
-
I need to create tables at runtime if they are not created yet, and there are multiple tables with different structure – V. Kolom Dec 13 '21 at 09:04