0

I have multiple in .CSV files in Google cloud storage but I am unable to push data into table because number of columns in .CSV files are different.

Let's say I have 10 .CSV files with columns A, B, C, D, and I have another 10 .CSV files with columns A, B, D.

When I push .CSV files into Big Query table, I want to create to A column to A , B column To B, C column to C and making NULL for other 10 .CSV, D column to D .

Lets Suppose 10 .CSV files have column :

NAME,   DOB,         GENDER, MOBILE NO.
Dan ,   12/08/1999,  MALE,  1234567889
Oliver, 17/03/19998, MALE,  5267382736

Another 10 .CSV file have column :

NAME,   DOB,         MOBILE NO.
Akash,  12/02/1999,  1234567889
Ram,    17/09/19998, 5267382736

But In Big Query Table Final result i want:

NAME,   DOB,         GENDER, MOBILE NO.
Dan ,   12/08/1999,  MALE,  1234567889
Oliver, 17/03/19998, MALE,  5267382736
Akash,  12/02/1999,         1234567889
Ram,    17/09/19998,        5267382736

Can anyone help me with this?

kakaji
  • 161
  • 9
  • Have you tried anything yet? Like do you have a code that you are currently working on? If so can you include it as well? Can you provide the sample CSV files and provide your expected output table? Just so this can be easily understood and visualized by the community. – Ricco D Jan 13 '22 at 03:58
  • Can you further elaborate on your use case? Based from my understanding you would like to load CSV files properly with the number of fields not matching. Is that correct? What do you mean by "making NULL for other 10 .CSV"? Can you elaborate more on this? – Ricco D Jan 13 '22 at 04:02
  • 1
    Are these external tables with a wildcard call? Would need to understand more, but one simple thing you could do is make two separate tables, and UNION together. – Mike Karp Jan 13 '22 at 05:28
  • @MikeKarp it will be difficult to separate we have thousands of files. can Big Query Separate itself – kakaji Jan 13 '22 at 10:56
  • @RiccoD i have tried to create external table and using json and native table as well by defining schema in json it didn't work – kakaji Jan 13 '22 at 10:58
  • @kakaji I don't think so actually, will have to think on this. Best that comes to mind right now, is creating a separate bucket of processed files, and set up a processing step to make sure the schema's match by adding a null field in the ones that don't. – Mike Karp Jan 14 '22 at 00:03
  • @kakaji how are you trying to load the data? Do you use the BigQuery API on python? – Ricco D Jan 14 '22 at 01:43
  • 1
    How about you run through all your files, only read the 1st line of the file (the header), segregate files based on the number of columns and create a list of files with the same number of columns. You can then create two separate tables and UNION together. I think processing only one row for each file won't take that much time I guess? – Ricco D Jan 14 '22 at 07:22

0 Answers0