1

I have a table TabA in BigQuery which has a column ColA and the column ColA has the following structure

1038627|21514184

and the table TabA has more than a million records. I used this to split into multiple columns

SELECT ColA,FIRST(SPLIT(ColA, '/')) part1,
       NTH(2, SPLIT(ColA, '/')) part2
FROM TabA

But for some reason, after certain rows the split seems to be not working properly.

We are getting records like this,

     ColA            part1   part2
1038627|21507470    1038627 21507470     
1038627|21534857    1038627 21507470     
1038627|21546455    1038627 21507470     
1038627|21577167    1038627 21507470

It his happening on a random basis. Not sure where is there error.

SELECT COUNT(*) FROM TabA - returns say 1.7M records


SELECT ColA,FIRST(SPLIT(ColA, '|')) part1, NTH(2, SPLIT(ColA, '|')) part2 FROM TabA - returns 1.7M records with the wrong split


SELECT FIRST(SPLIT(ColA, '|')) part1, NTH(2, SPLIT(ColA, '|')) part2 FROM TabA - returns just 1.4L records with correct split

Don't know what exactly is happening...is it the problem with the data or the problem with the split ??

Any help would be greatly appreciated. Thanks in advance!!

Munagala
  • 69
  • 2
  • 12
  • Do you have to use the Legacy Version of SQL? I highly recommend you to use the [Standard Version](https://cloud.google.com/bigquery/docs/reference/standard-sql/) as solving this would be quite simple there. – Willian Fuks Jun 29 '17 at 15:16

1 Answers1

5

is it the problem with the data or the problem with the split ??

To help in troubleshooting - I would recommend running same logic in BigQuery Standard SQL

#standardSQL
SELECT 
  ColA,
  SPLIT(ColA, '|')[SAFE_OFFSET(0)] AS part1,
  SPLIT(ColA, '|')[SAFE_OFFSET(1)] AS part2
FROM TabA
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
  • The standard SQL syntax is throwing an error. I am new to standard SQL...Can you please help – Munagala Jul 06 '17 at 06:09
  • Thanks a lot @mikhail-berlyant for helping.I did few googling and came to know that the syntax I am using is wrong..later I corrected it. It is working now..but the problem is I am getting only 1.9 million records...where as the total no.of records is 17.99 million records....any idea why is this happening even in standard SQL...the split is working is fine...but the no.of records are way less than the actual. – Munagala Jul 13 '17 at 06:15