1

I have dozens of CSV files that start in an S3 bucket, and then are transferred to a GCS bucket using the GCP storage transfer service. I can't do anything about the source files in S3.

Some files have 30 columns, while others will have fewer (as few as 4 or 5 columns). It's not that the columns are there, and are null. The columns simply don't exist in certain files. The total number of columns is 30. When I create an external table and set allow_jagged_rows=true, the files are loaded, but the columns are mismatched in the external table.

Is there any way to create this external table so that the columns are not mismatched? I'm assuming not, and so what is the best way to go about adding columns which aren't there? Some sort of cloud function? The difficulty is finding a simple way to determine which columns are missing, and then to simply insert a blank/null value for those.

Sultan of Swing
  • 430
  • 1
  • 6
  • 20

0 Answers0