Adding "missing" columns to a CSV file so it can be correctly loaded into a BigQuery external table

Asked Oct 13 '22 at 21:57

Active Oct 13 '22 at 21:57

Viewed 251 times

I have dozens of CSV files that start in an S3 bucket, and then are transferred to a GCS bucket using the GCP storage transfer service. I can't do anything about the source files in S3.

Some files have 30 columns, while others will have fewer (as few as 4 or 5 columns). It's not that the columns are there, and are null. The columns simply don't exist in certain files. The total number of columns is 30. When I create an external table and set allow_jagged_rows=true, the files are loaded, but the columns are mismatched in the external table.

Is there any way to create this external table so that the columns are not mismatched? I'm assuming not, and so what is the best way to go about adding columns which aren't there? Some sort of cloud function? The difficulty is finding a simple way to determine which columns are missing, and then to simply insert a blank/null value for those.

asked Oct 13 '22 at 21:57

Sultan of Swing

Can you provide sample data for proper replication? – Scott B Oct 15 '22 at 11:30

Adding "missing" columns to a CSV file so it can be correctly loaded into a BigQuery external table

0 Answers0