3

I have a table in BigQuery with millions of rows, and I want to split adx_catg_id column to multiple new columns. Please note that the adx_catg_id column contains an arbitrary number of words separated by space.

This example of Query below can split the adx_catg_id to multiple columns if the string contains only less than five words. I can extend it to support more number of words, but I need to automate it.

SELECT
  TS, str0, str2, str4, str6, str7
  from
  (select REGEXP_EXTRACT(str5, r'^(.*) .*') as str7
  from
  (select SUBSTR (str5, LENGTH(REGEXP_EXTRACT(str5, r'^(.*) .*')) + 2, LENGTH(str5)) as str6
  from
  (select REGEXP_EXTRACT(str3, r'^(.*) .*') as str5
  from
  (select SUBSTR (str3, LENGTH(REGEXP_EXTRACT(str3, r'^(.*) .*')) + 2, LENGTH(str3)) as str4
  from
  (select REGEXP_EXTRACT(str1, r'^(.*) .*') as str3
  from
  (select SUBSTR (str1, LENGTH(REGEXP_EXTRACT(str1, r'^(.*) .*')) + 2, LENGTH(str1)) as str2
  from
  (select REGEXP_EXTRACT(TS, r'^(.*) .*') as str1
  from
  (select SUBSTR(TS, LENGTH(REGEXP_EXTRACT(TS, r'^(.*) .*')) + 2,LENGTH(TS)) as str0
  from 
  (select adx_catg_id TS from [mydataset.conversions])
  ))))))))

How can I loop the above query to generate all words in new columns depending of string length?

gadhgadhi
  • 123
  • 1
  • 1
  • 7
  • Possible duplicate of [BigQuery: SPLIT() returns only one value](https://stackoverflow.com/questions/27060396/bigquery-split-returns-only-one-value) – marengaz Feb 28 '18 at 16:01

2 Answers2

5

Check this out...

SELECT  
Regexp_extract(StringToParse,r'^(?:[^\s]*\s){0}([^\s]*)\s?') as Word0,
Regexp_extract(StringToParse,r'^(?:[^\s]*\s){1}([^\s]*)\s?') as Word1,
Regexp_extract(StringToParse,r'^(?:[^\s]*\s){2}([^\s]*)\s?') as Word2,
Regexp_extract(StringToParse,r'^(?:[^\s]*\s){3}([^\s]*)\s?') as Word3,
Regexp_extract(StringToParse,r'^(?:[^\s]*\s){4}([^\s]*)\s?') as Word4,
Regexp_extract(StringToParse,r'^(?:[^\s]*\s){5}([^\s]*)\s?') as Word5,
Regexp_extract(StringToParse,r'^(?:[^\s]*\s){6}([^\s]*)\s?') as Word6, 
Regexp_extract(StringToParse,r'^(?:[^\s]*\s){7}([^\s]*)\s?') as Word7,
Regexp_extract(StringToParse,r'^(?:[^\s]*\s){8}([^\s]*)\s?') as Word8,
Regexp_extract(StringToParse,r'^(?:[^\s]*\s){9}([^\s]*)\s?') as Word9,
Regexp_extract(StringToParse,r'^(?:[^\s]*\s){10}([^\s]*)\s?') as Word10,
Regexp_extract(StringToParse,r'^(?:[^\s]*\s){11}([^\s]*)\s?') as Word11,
Regexp_extract(StringToParse,r'^(?:[^\s]*\s){12}([^\s]*)\s?') as Word12,
FROM
(SELECT 'arbitrary number of words separated by space.' as StringToParse)

Or if you want it in reverse order:

SELECT  
Regexp_extract(StringToParse,r'\s?([^\s]*)(?:[^\s]*\s?){1}$') as Word1,
Regexp_extract(StringToParse,r'\s?([^\s]*)(?:[^\s]*\s?){2}$') as Word2,
Regexp_extract(StringToParse,r'\s?([^\s]*)(?:[^\s]*\s?){3}$') as Word3,
Regexp_extract(StringToParse,r'\s?([^\s]*)(?:[^\s]*\s?){4}$') as Word4,
Regexp_extract(StringToParse,r'\s?([^\s]*)(?:[^\s]*\s?){5}$') as Word5,
Regexp_extract(StringToParse,r'\s?([^\s]*)(?:[^\s]*\s?){6}$') as Word6, 
Regexp_extract(StringToParse,r'\s?([^\s]*)(?:[^\s]*\s?){7}$') as Word7,
FROM
(SELECT 'arbitrary number of words separated by space.' as StringToParse)

Its still a fixed number of fields, but coding is simpler and more readable.

Hope this helps

N.N.
  • 3,094
  • 21
  • 41
  • Many thx N.N. for your query. It is more clear and readable, but it still works with fixed number and it can't print words by starting from the last word even using the total number of words. – gadhgadhi Mar 04 '14 at 18:28
  • I added a sample for parsing from the last word. – N.N. Mar 05 '14 at 07:50
  • Many thanks again N.N. Although it works fine only with fixed number, however your answer still always the best and it is very helpful! – gadhgadhi Mar 05 '14 at 13:09
0

Unfortunately there is no easy SPLIT() today in BigQuery - but it's a good feature request.

I like the answer you developed, I'll experiment more with it. For an alternative approach you can also try https://stackoverflow.com/a/18711812/132438.

The best way to automate this in the meantime, could be automating the query generation outside BigQuery.

Community
  • 1
  • 1
Felipe Hoffa
  • 54,922
  • 16
  • 151
  • 325