1

I have a relational dataset in Bigquery containing two tables.

The first table holds customer data

+-------------+--------+
| Customer ID | Name   |
+-------------+--------+
| 1           | Bob    |
+-------------+--------+
| 2           | Jenny  |
+-------------+--------+
| 3           | Janice |
+-------------+--------+

The second table holds various name/value pairs associated to the customer in first table:

+-------------+----------+-------+
| Customer ID | Category | Value |
+-------------+----------+-------+
| 1           | A        | A     |
+-------------+----------+-------+
| 1           | A        | B     |
+-------------+----------+-------+
| 1           | B        | A     |
+-------------+----------+-------+
| 2           | B        | B     |
+-------------+----------+-------+

I would like to generate a report that enumerates each customer, and sets a TRUE under each name:value where it is found in table 2 e.g:

+-------------+------+------+-----+------+------+
| Customer ID | A:A  | A:B  | A:C | B:A  | B:B  |
+-------------+------+------+-----+------+------+
| 1           | TRUE | TRUE |     | TRUE |      |
+-------------+------+------+-----+------+------+
| 2           |      |      |     |      | TRUE |
+-------------+------+------+-----+------+------+
| 3           |      |      |     |      |      |
+-------------+------+------+-----+------+------+

I've tried to specify each of the category:value combinations as columns in my select statement

select
  customer id,
  a:a, 
  a:b, 
  a:c, 
  b:a,
  b:b 
from 
  table_1 t1
join 
  table_2 t2 
on
  t1.customer_id = t2.customer_id  

But that gives me nothing because I don't know how to get the query to set a cell to TRUE, once the value is found.

Apologies if this is obvious, i'm painfully new to SQL.

  • Thanks @MikhailBerlyant last time I tried to vote up the answers it told me i'm too new to affect the score. Seems to be working now though. – Zak Smith-Sanga Apr 27 '19 at 09:06

2 Answers2

1

You need some sort of aggregation, such as:

select t1.customer_id,
       bool_or(t2.category = 'a' and t2.value = 'a') as a_a,
       bool_or(t2.category = 'a' and t2.value = 'b') as a_b,
       bool_or(t2.category = 'a' and t2.value = 'c') as a_c,
       bool_or(t2.category = 'b' and t2.value = 'a') as b_a,
       bool_or(t2.category = 'b' and t2.value = 'b') as b_b
from table_1 t1 join
     table_2 t2 
     on t1.customer_id = t2.customer_id  
group by t1.customer_id;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
1

Below is for BigQuery Standard SQL

#standardSQL
SELECT customer_id,
  LOGICAL_OR((category, value) = ('A', 'A')) AS a_a,
  LOGICAL_OR((category, value) = ('A', 'B')) AS a_b,
  LOGICAL_OR((category, value) = ('A', 'C')) AS a_c,
  LOGICAL_OR((category, value) = ('B', 'A')) AS b_a,
  LOGICAL_OR((category, value) = ('B', 'B')) AS b_b
FROM `project.dataset.table1`  
JOIN `project.dataset.table2`
USING (customer_id)
GROUP BY customer_id   

You can test, play with above using sample data from your question as in example below

#standardSQL
WITH `project.dataset.table1` AS (
  SELECT 1 Customer_ID, 'Bob' Name UNION ALL
  SELECT 2, 'Jenny' UNION ALL
  SELECT 3, 'Janice' 
), `project.dataset.table2` AS (
  SELECT 1 Customer_ID, 'A' Category, 'A' Value UNION ALL
  SELECT 1, 'A', 'B' UNION ALL
  SELECT 1, 'B', 'A' UNION ALL
  SELECT 2, 'B', 'B' 
)
SELECT customer_id,
  LOGICAL_OR((category, value) = ('A', 'A')) AS a_a,
  LOGICAL_OR((category, value) = ('A', 'B')) AS a_b,
  LOGICAL_OR((category, value) = ('A', 'C')) AS a_c,
  LOGICAL_OR((category, value) = ('B', 'A')) AS b_a,
  LOGICAL_OR((category, value) = ('B', 'B')) AS b_b
FROM `project.dataset.table1`  
JOIN `project.dataset.table2`
USING (customer_id)
GROUP BY customer_id   

with result

Row customer_id a_a     a_b     a_c     b_a     b_b  
1   1           true    true    false   true    false    
2   2           false   false   false   false   true      

In case if you need/want output exactly as in your question - you can use below adjusted version

#standardSQL
SELECT customer_id,
  IF(LOGICAL_OR((category, value) = ('A', 'A')), 'TRUE', '') AS a_a,
  IF(LOGICAL_OR((category, value) = ('A', 'B')), 'TRUE', '') AS a_b,
  IF(LOGICAL_OR((category, value) = ('A', 'C')), 'TRUE', '') AS a_c,
  IF(LOGICAL_OR((category, value) = ('B', 'A')), 'TRUE', '') AS b_a,
  IF(LOGICAL_OR((category, value) = ('B', 'B')), 'TRUE', '') AS b_b
FROM `project.dataset.table1`  
JOIN `project.dataset.table2`
USING (customer_id)
GROUP BY customer_id

with result

Row customer_id a_a     a_b     a_c     b_a     b_b  
1   1           TRUE    TRUE            TRUE         
2   2                                           TRUE     

Note: in above examples - you don't really need joins as you are not using fields from table1 rather than as a filter (to present only users from table1)

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230