3

Via SQL, I'm trying to get from this:

id group_id session_id field_label field_value sent_at
1 frosted flakes blue bowl first_name Bob 2022-11-05 18:18:19.093
2 frosted flakes blue bowl first_name Bobby 2022-11-05 18:17:31.274
3 frosted flakes blue bowl last_name Brown 2022-11-05 18:17:16.241
4 frosted flakes blue bowl last_name Browning 2022-11-05 18:15:34.492
5 frosted flakes blue bowl last_name Brownson 2022-11-05 18:14:58.465
6 cheerios green cup first_name Christine 2022-11-05 18:18:58.222
7 cheerios green cup last_name Christmas 2022-11-05 18:20:41.212
8 cheerios green cup last_name Christopherson 2022-11-05 18:24:58.222

where

  • id is unique
  • group_id is not unique
  • session_id is not unique

to this:

group_id session_id amalgamated_field
frosted flakes blue bowl Bob Brown
cheerios green cup Christine Christopherson

Where I know the field_labels that I want to amalgamate, and I want to get the latest value for each amalgamated field label based on sent_at grouped by group_id.

So for group frosted flakes, I want to get the most recent field_value associated with field_label first_name (Bob) and the most recent field_value associated with field_label last_name (Brown).

And repeat.

I tried a cross join and I also tried an inner join similar to this thread. But I keep getting all combinations :/

Jaye
  • 184
  • 8

3 Answers3

2

You may use ROW_NUMBER function with conditional aggregation as the following:

SELECT group_id, session_id,
       CONCAT_WS(' ', 
                  MAX(CASE WHEN field_label='first_name' AND rn=1 THEN field_value END),
                  MAX(CASE WHEN field_label='last_name' AND rn=1 THEN field_value END)
                ) AS amalgamated_field
FROM
(
  SELECT *, 
    ROW_NUMBER() OVER (PARTITION BY group_id, session_id, field_label ORDER BY sent_at DESC) rn
  FROM table_name
) T
GROUP BY group_id, session_id
ORDER BY group_id

See a demo.

ahmed
  • 9,071
  • 3
  • 9
  • 22
2

In Postgres I would recommend distinct on and string aggregation:

select group_id, session_id, 
    string_agg(field_value, ' ' order by field_label) full_name
from (
    select distinct on (group_id, session_id, field_label) t.*
    from mytable t
    order by group_id, session_id, field_label, sent_at desc
) t
where field_label in ('first_name', 'last_name')
group by group_id, session_id

The distinct on subquery returns the latest row for each group/session/label tuple. In the outer query, we filter on the two labels we are interested in, and aggregate the field values for each group/session tuple. The order by clause of string_agg ensures that the first and last name are concatenated in sequence.

GMB
  • 216,147
  • 25
  • 84
  • 135
  • 1
    This is a smart and more efficient query, really I liked this solution. – ahmed Nov 05 '22 at 22:27
  • 1
    I think this is ULTIMATELY the most elegant way to accomplish my goal, but I'm currently stuck on a v of postgresql that doesn't support `distinct on` or even `string_agg.` (Curse you, Redshift.) We're upgrading this minute, so hope to pull this into the new environment once it's been QA'ed. In any case, I went with a variant of @ahmed's answer because it was more backwards compatible, but I think this is the "right one" – Jaye Nov 06 '22 at 17:17
1

You can first find the maximum timestamp for each field_label, and then perform a self-join of the original table back onto the maximum timestamp, aggregating the field_values for each group_id:

select t1.group_id, t2.session_id, string_agg(t2.field_value,' ') 
from (select t.group_id, t.field_label, max(t.sent_at) d 
   from tbl t group by t.group_id, t.field_label) t1 
join tbl t2 on t1.d = t2.sent_at group by t1.group_id, t2.session_id

See fiddle.

Ajax1234
  • 69,937
  • 8
  • 61
  • 102
  • Thank you! This is what I tried to do originally and it didn't QUITE work, but I must've missed something that you captured, because this totally worked, too. – Jaye Nov 06 '22 at 17:18