Replace character in large txt file

Question

I have a series of .txt files that I need to upload to Google Cloud in order to create some tables out of them in BigQuery. The files are tables whose separator is a ';'.

For some reason it seems like BigQuery has issues at identifying columns (automatically or not) when the files are in this shape, even if specifying ';' as the separator.

I generated a sample table, and find-and-replace the ';' with some ',' and saved the file as .csv. Now BigQuery has no issue at creating the apt table.

Question: Should I find-and-replace all the all the separators in all the tables? or am I missing something?

If yes how can I implement on a OS 10 the sep prompt command? (files are large and I have issues at timely substitute character also with UltraEdit)

Best!

Which character is interpreted by default as separator in CSV files by Microsoft Access or Microsoft Excel depends on region settings. For Germany and Austria configured in Windows __Region and Language__ settings on tab __Format__ the __list separator__ is `;` because of __decimal symbol__ is `,` while for most other countries the __list separator__ is `,` because of __decimal symbol__ is `.` as it can be seen after clicking on button __Additional settings...__. — Mofi, Nov 19 '18 at 15:41
It is of course no problem to run a __Replace in Files__ in UltraEdit to replace all `;` by `,` in the file(s) independent of their file sizes. But if some values contain already one or more commas, it would be additionally necessary to enclose the value in `"`. And if a value contains `"`, it must be escaped with an additional `"` on value being enclosed in double quotes. For details about CSV file format see Wikipedia article about [comma-separated values](https://en.wikipedia.org/wiki/Comma_separated_values). — Mofi, Nov 19 '18 at 15:45
What is the command you are using to export gstorage files to bq tables? It should respect the delimiter you give in. Can you please post the command here? — khan, Nov 19 '18 at 17:41

score 1 · Accepted Answer · answered Nov 19 '18 at 16:59

To address this issue - you can upload such file(s) as having just one column and then after you have data in BigQuery - parse it using SQL into whatever schema you need. I think this is the most generic and easy way to deal with this. Usually SPLIT(col, ';') would be used as in below quite simplified example or in more complex cases REGEXP_EXTRACT_ALL()

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'aa,a;bbb;ccc' col
)
SELECT 
  SPLIT(col, ';')[SAFE_OFFSET(0)] col1, 
  SPLIT(col, ';')[SAFE_OFFSET(1)] col2, 
  SPLIT(col, ';')[SAFE_OFFSET(2)] col3
FROM `project.dataset.table`

with result

Row col1    col2    col3     
1   aa,a    bbb     ccc

Replace character in large txt file

1 Answers1