0
TimeStamp USER ID string_col
1 100001 Here
2 100001 there
3 100001 Apple
4 200002 this is
5 200002 that is
6 200002 Apple
7 200002 some
8 200002 summer
9 200002 winter
10 200002 Apple

That is my raw table & I want to use the word "Apple" as a trigger word to concatenate ONLY N (For this example 2) rows above that for every userID.

Below table is the output I'm looking for:

TimeStamp USER ID string_col Result
1 100001 Here null
2 100001 there null
3 100001 Apple Here There
4 200002 this is null
5 200002 that is null
6 200002 Apple this is that is
7 200002 some null
8 200002 summer null
9 200002 winter null
10 200002 Apple summer winter
  • Does this answer your question? [How to concatenate N rows of a column based on a specific column value in Google BigQuery?](https://stackoverflow.com/questions/76823722/how-to-concatenate-n-rows-of-a-column-based-on-a-specific-column-value-in-google) – Brent Worden Aug 02 '23 at 23:57

2 Answers2

1

Use below

SELECT * EXCEPT(grp),
  CASE 
    WHEN string_col = 'Apple'
    THEN STRING_AGG(string_col, ' ') OVER win
  END AS Result
FROM (
  SELECT *, 
    COUNTIF(string_col = 'Apple') OVER win AS grp
  FROM your_table
  WINDOW win AS (PARTITION BY user_id ORDER BY ts ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
)
WINDOW win AS (PARTITION BY user_id, grp ORDER BY ts ROWS BETWEEN 2 PRECEDING AND 1 PRECEDING)     

with output

enter image description here

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
  • I have edited the question for a scenario where this solution faced a limitation (look at row 11) – user22329205 Aug 03 '23 at 00:52
  • I don't think this is a limitation of any sort. This just does not fall into your specified logic! It is recommended not to change your question after it is already answered! Post new question with your new logic! and someone will hopefully help you – Mikhail Berlyant Aug 03 '23 at 00:57
  • Thank you, reverted the question to original one where I had not called out a few specifics. https://stackoverflow.com/questions/76824493/how-to-concatenate-n-rows-of-a-column-based-on-a-specific-column-value-based-on link to the fresh question. – user22329205 Aug 03 '23 at 02:03
1

For the passion of problem solving and coding, I would like to share this simple solution to see if it helps to answer your question:

SELECT
  t1."TimeStamp",
  t1."USER ID",
  t1."string_col",
  (
    SELECT
      STRING_AGG(t2."string_col", ' ')
    FROM
      raw_table AS t2
    WHERE
      t2."USER ID" = t1."USER ID"
      AND t2."TimeStamp" < t1."TimeStamp"
      AND t1."string_col" = 'Apple'
      AND t2."string_col" != 'Apple'
      AND t1."TimeStamp" - t2."TimeStamp" <= 2
  ) AS Result
FROM
  raw_table AS t1

The output looks like this:
enter image description here

blackraven
  • 5,284
  • 7
  • 19
  • 45