1

First post, hope I don't do anything too crazy I want to go from JSON/object to long in terms of formatting.

I have a table set up as follows (note: there will be a large but finite number of 50+ activity columns, 2 is a minimal working example). I'm not concerned about the formatting of the date column - different problem.

customer_id(varcahr), activity_count(object, int), activity_duration(object, numeric) sample starting point

In this case I'd like to explode this into this: customer_id(varcahr), time_period, activity_count(int), activity_duration(numeric) sample end point - long

minimum data set


WITH smpl AS (
SELECT
  '12a' AS id,
  OBJECT_CONSTRUCT(
    'd1910', 0,
    'd1911', 26,
    'd1912', 6,
    'd2001', 73) as activity_count,
   OBJECT_CONSTRUCT(
    'd1910', 0,
    'd1911', 260.1,
    'd1912', 30,
    'd2001', 712.3) AS activity_duration
UNION ALL
SELECT
  '13b' AS id,
  OBJECT_CONSTRUCT(
    'd1910', 1,
    'd1911', 2,
    'd1912', 3,
    'd2001', 4) as activity_count,
   OBJECT_CONSTRUCT(
    'd1910', 1,
    'd1911', 2.2,
    'd1912', 3.3,
    'd2001', 4.3) AS activity_duration
)
select * from smpl

Extra credit for also taking this from JSON/object to wide (in Google Big Query it's SELECT id, activity_count.* FROM tbl

Thanks in advance.

I've tried tons of random FLATTEN() based joins. In this instance I probably just need one working example. This needs to scale to a moderate but finite number of objects (e.g. 50)

I'll also see if I can combine with THIS - I'll see if I can combine it - Lateral flatten two columns without repetition in snowflake

2 Answers2

0

Using FLATTEN:

WITH (...)
SELECT s1.ID, s1.KEY, s1.value AS activity_count, s2.value AS activity_duration
FROM (select ID, Key, VALUE from smpl,table(flatten(input=>activity_count))) AS s1
JOIN (select ID, Key, VALUE from smpl,table(flatten(input=>activity_duration))) AS s2
  ON S1.ID = S2.ID AND S1.KEY = S2.KEY;

Output:

enter image description here

Lukasz Szozda
  • 162,964
  • 23
  • 234
  • 275
  • 1
    Awesome. It also extends to 3 fields. SELECT s1.ID, s1.KEY, s1.value AS activity_count, s2.value AS activity_duration, s3.value AS activity_duration2 FROM (select ID, Key, VALUE from smpl, table(flatten(input => activity_count))) AS s1 JOIN (select ID, Key, VALUE from smpl, table(flatten(input => activity_duration))) AS s2 ON S1.ID = S2.ID AND S1.KEY = S2.KEY JOIN (select ID, Key, VALUE from smpl, table(flatten(input => activity_duration2))) AS s3 ON S1.ID = S3.ID AND S1.KEY = S3.KEY – internet_geek Jan 12 '23 at 20:58
0

@Lukasz Szozda gets close but the answer doesn't scale as well with multiple variables (it's essentially a bunch of cartesian products and I'd need to do a lot of ON conditions). I have a known constraint (each field is in a strict format) so it's easy to recycle the key.

After WAY WAY WAY too much messing with this (off and on searches for weeks) it finally snapped and it's pretty easy.

SELECT
  id, key, activity_count[key], activity_duration[key], activity_duration2[key]
FROM smpl, LATERAL flatten(input => activity_count);

You can also use things OTHER than key such as index

It's inspired by THIS link but I just didn't quite follow it. https://stackoverflow.com/a/36804637/20994650