2
USER ID string_col
100001 Here
100001 there
100001 Apple
200002 this is
200002 that is
200002 Apple
200002 Cell 4

That is my raw table & I want to use the word "Apple" as a trigger word to concatenate everything above that for every userID.

Below table is the output I'm looking for:

USER ID string_col Result
100001 Here null
100001 there null
100001 Apple Here There
200002 this is null
200002 that is null
200002 Apple this is that is
200002 Cell 4 null

2 Answers2

1

In order to maintain proper ordering within concatenation - you should have extra column that defines that order - this is usually date or datetime or timestamp or just numeric, etc.

So, below is for BigQuery Standard SQL assuming you do have such a column named for example as ts

SELECT * EXCEPT(grp),
  CASE 
    WHEN string_col = 'Apple'
    THEN STRING_AGG(string_col, ' ') OVER win
  END AS Result
FROM (
  SELECT *, 
    COUNTIF(string_col = 'Apple') OVER win AS grp
  FROM your_table
  WINDOW win AS (PARTITION BY user_id ORDER BY ts ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
)
WINDOW win AS (PARTITION BY user_id, grp ORDER BY ts ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)

if applied to sample data as in your question - output is

enter image description here

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
  • This is exactly what I needed! Thank you! I upvoted your answer but since I'm new it won't reflect on your answer apparently. – user22329205 Aug 02 '23 at 22:46
  • If I were to have a filter to concat the string_col, example: (1< ts < ts of apple). How would I add that to the above code? Expected output for 10001-Apple would be just "There" as compared to the previous answer "Here There" – user22329205 Aug 02 '23 at 22:51
  • see answer in https://stackoverflow.com/a/76824239/5221944 – Mikhail Berlyant Aug 03 '23 at 00:32
1

The solution can be simple. However there is still a need for an "Index" column to help to determine the requirement to "concatenate everything above that for every userID". Try the code:

CREATE TABLE IF NOT EXISTS raw_table
(
    "Idx" integer,
    "USER ID" integer,
    "string_col" varchar(255)
)

INSERT INTO raw_table ("Idx", "USER ID", "string_col")
VALUES
  (1, 100001, 'Here'),
  (2, 100001, 'there'),
  (3, 100001, 'Apple'),
  (4, 200002, 'this is'),
  (5, 200002, 'that is'),
  (6, 200002, 'Apple'),
  (7, 200002, 'Cell 4');
  
SELECT
  t1."Idx",
  t1."USER ID",
  t1."string_col",
  (
    SELECT
      STRING_AGG(t2."string_col", ' ')
    FROM
      raw_table AS t2
    WHERE
      t2."USER ID" = t1."USER ID"
      AND t2."Idx" < t1."Idx"
      AND t1."string_col" = 'Apple'
      AND t2."string_col" != 'Apple'
  ) AS Result
FROM
  raw_table AS t1

The output is:
enter image description here

blackraven
  • 5,284
  • 7
  • 19
  • 45