1

I have a series of .txt files that I need to upload to Google Cloud in order to create some tables out of them in BigQuery. The files are tables whose separator is a ';'.

For some reason it seems like BigQuery has issues at identifying columns (automatically or not) when the files are in this shape, even if specifying ';' as the separator.

I generated a sample table, and find-and-replace the ';' with some ',' and saved the file as .csv. Now BigQuery has no issue at creating the apt table.

Question: Should I find-and-replace all the all the separators in all the tables? or am I missing something?

If yes how can I implement on a OS 10 the sep prompt command? (files are large and I have issues at timely substitute character also with UltraEdit)

Best!

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
MCS
  • 1,071
  • 9
  • 23
  • Which character is interpreted by default as separator in CSV files by Microsoft Access or Microsoft Excel depends on region settings. For Germany and Austria configured in Windows __Region and Language__ settings on tab __Format__ the __list separator__ is `;` because of __decimal symbol__ is `,` while for most other countries the __list separator__ is `,` because of __decimal symbol__ is `.` as it can be seen after clicking on button __Additional settings...__. – Mofi Nov 19 '18 at 15:41
  • It is of course no problem to run a __Replace in Files__ in UltraEdit to replace all `;` by `,` in the file(s) independent of their file sizes. But if some values contain already one or more commas, it would be additionally necessary to enclose the value in `"`. And if a value contains `"`, it must be escaped with an additional `"` on value being enclosed in double quotes. For details about CSV file format see Wikipedia article about [comma-separated values](https://en.wikipedia.org/wiki/Comma_separated_values). – Mofi Nov 19 '18 at 15:45
  • What is the command you are using to export gstorage files to bq tables? It should respect the delimiter you give in. Can you please post the command here? – khan Nov 19 '18 at 17:41

1 Answers1

1

To address this issue - you can upload such file(s) as having just one column and then after you have data in BigQuery - parse it using SQL into whatever schema you need. I think this is the most generic and easy way to deal with this. Usually SPLIT(col, ';') would be used as in below quite simplified example or in more complex cases REGEXP_EXTRACT_ALL()

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'aa,a;bbb;ccc' col
)
SELECT 
  SPLIT(col, ';')[SAFE_OFFSET(0)] col1, 
  SPLIT(col, ';')[SAFE_OFFSET(1)] col2, 
  SPLIT(col, ';')[SAFE_OFFSET(2)] col3
FROM `project.dataset.table`   

with result

Row col1    col2    col3     
1   aa,a    bbb     ccc  
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230