I'm new to BQ. I have a table which some of the columns are record-repeated. I'm trying to flat the table, so it would be kind of relational, and to insert the flatted data into new BigQuery table. Is it possible? How should I do it?
Asked
Active
Viewed 649 times
1 Answers
0
below is for BigQuery Standard SQL
Assume you have simple table like below
Row id repeated_record
--- -- ---------------
1 1 google
facebook
viant
2 2 dell
hp
You can easily mimic it with below query
#standardSQL
WITH `table-with-repeated-record` AS (
SELECT 1 AS id, ['google', 'facebook', 'viant'] AS repeated_record UNION ALL
SELECT 2, ['dell', 'hp']
)
SELECT *
FROM `table-with-repeated-record`
So now, to get it flatten - use below query
#standardSQL
WITH `table-with-repeated-record` AS (
SELECT 1 AS id, ['google', 'facebook', 'viant'] AS repeated_record UNION ALL
SELECT 2, ['dell', 'hp']
)
SELECT id, flatted_data
FROM `table-with-repeated-record`,
UNNEST(repeated_record) AS flatted_data
result will be as below
Row id flatted_data
--- -- ------------
1 1 google
2 1 facebook
3 1 viant
4 2 dell
5 2 hp
Below is another example
#standardSQL
WITH `table-with-repeated-record` AS (
SELECT 1 AS id, [STRUCT<line INT64, name STRING>(1, 'google'), (2, 'facebook'), (3, 'viant')] AS repeated_record UNION ALL
SELECT 2, [STRUCT<line INT64, name STRING>(5, 'dell'), (6, 'hp')]
)
SELECT *
FROM `table-with-repeated-record`
which mimic below table
Row id repeated_record.line repeated_record.name
--- -- -------------------- --------------------
1 1 1 google
2 facebook
3 viant
2 2 5 dell
6 hp
and way to flatten it is:
#standardSQL
WITH `table-with-repeated-record` AS (
SELECT 1 AS id, [STRUCT<line INT64, name STRING>(1, 'google'), (2, 'facebook'), (3, 'viant')] AS repeated_record UNION ALL
SELECT 2, [STRUCT<line INT64, name STRING>(5, 'dell'), (6, 'hp')]
)
SELECT id, flatted_data.line, flatted_data.name
FROM `table-with-repeated-record`,
UNNEST(repeated_record) AS flatted_data
which ends up with
Row id line name
--- -- ---- ----
1 1 1 google
2 1 2 facebook
3 1 3 viant
4 2 5 dell
5 2 6 hp
do you have any idea how to do it without specify the data like ['google', 'facebook', 'viant'] etc? the size of the table is not constant, and it is changing from time to time, as well the data the is stored in the table, the only thing I know for sure is the columns
you should just use below (without dummy data that was used as an example and for you to be able to play with query)
#standardSQL
SELECT id, flatted_data.line, flatted_data.name
FROM `yourProject.yourDataset.yourTable`,
UNNEST(repeated_record) AS flatted_data

Mikhail Berlyant
- 165,386
- 8
- 154
- 230
-
thanks! do you have any idea how to do it without specify the data like ['google', 'facebook', 'viant'] etc? the size of the table is not constant, and it is changing from time to time, as well the data the is stored in the table, the only thing I know for sure is the columns. – Shir Da. Aug 10 '17 at 12:50
-
just use your table instead of dummy data used in answer to just give you ability to play with code and get started! – Mikhail Berlyant Aug 10 '17 at 12:52
-
so, instead of table-with-repeated-record put your real project.dataset.table – Mikhail Berlyant Aug 10 '17 at 12:52
-
By the way, I downloaded your BQ mate. It helps me a lot! – Shir Da. Aug 16 '17 at 08:42
-
How can I query my table and add the "id" and "line" without specify the data in columns? I think misunderstood you before. – Shir Da. Aug 17 '17 at 08:20
-
check update in my answer. I understand you are novice - but hope now you will get this – Mikhail Berlyant Aug 17 '17 at 13:31
-
now it is working. I'll clarify myself: I have tables with multiple repeated records. I want to query the table and get flatted data. In order to flat the data of the table, you should use the unnest() as Mikhail explained, and in the "SELECT" clause, do not specify the "RECORD" objects. – Shir Da. Aug 20 '17 at 13:31