0

I have 2 tables I want to join to explore the most popular combinations of location, by distinct id, ordered by count. I get location from l, date from d. The results from this join would be:

  id loc_id location date
    1  111    NYC      20200101
    1  222    LA       20200102
    2  111    NYC      20200103
    2  333    LON      20200103
    3  444    NYC      20200105
    4  444    LA       20200106
    4  555    PAR      20200107
    5  111    NYC      20200110
    5  222    LA       20200111

I would like to use STRING_AGG if possible, but get an error with the WITHIN statement -

'expecting ')' but got WITHIN

..( I'm on BigQuery for this). Here is what I've attempted so far.

 SELECT t.combination, count(*) count
    FROM (
      SELECT
        STRING_AGG(location, ',') WITHIN GROUP (ORDER BY d.date) combination
      FROM location as l
    JOIN date d
        USING (loc_id)
    GROUP BY id
    ) t
    WHERE date BETWEEN 20190101 AND 20200228 GROUP BY t.combination
    ORDER BY  count DESC;

I want to end up with something like:

combination count
NYC, LA     3
NYC, LON    1
LA,  PAR    1
NYC         1

If there's another method I'd be happy to change from string_agg.

C_psy
  • 647
  • 8
  • 22
  • (1) If you have two dates, why are you only showing data from one of them. (2) A join on a table called `date` using `loc_id` does not look correct. – Gordon Linoff Mar 05 '21 at 13:33

1 Answers1

1

The correct BQ syntax would be:

SELECT t.combination, count(*) count
FROM (SELECT STRING_AGG(location, ',' ORDER BY d.date) as combination
      FROM location l JOIN
           date d
           USING (loc_id)
     GROUP BY id
    ) t
WHERE date BETWEEN 20190101 AND 20200228
GROUP BY t.combination
ORDER BY count DESC;

Note that your JOIN condition still looks wrong.

And if you are using dates, then I would expect DATE constants.

And your date filtering code won't work in the outer query, because you haven't selected the dates in the inner query. You probably want the filtering in the inner query.

This answer does not address these issues.

BigQuery has quite good documentation. There is no WITHIN GROUP for STRING_AGG().

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786