1

I have a query something like this:

SELECT product_id,
         site,
         category_id,
         session_time,
         sum(cast(coalesce("#clicks",
         0) AS bigint)) AS clicks
FROM df
WHERE site IN ('com', 'co')
        AND session_time = DATE('2020-02-27')
GROUP BY  product_id, site, session_time, category_id
ORDER BY clicks desc
LIMIT 10

But now, I want to see the top 10 product_id for each site and category_id based on the clicks. When I write the LIMIT function, it only shows the top 10 products but it does not group it by category_id and shop_id.

How can I do this?

GMB
  • 216,147
  • 25
  • 84
  • 135
datazang
  • 989
  • 1
  • 7
  • 20

1 Answers1

1

Use window functions. You can RANK() records by descending clicks within site/category partitions in a subquery, and then filter in the outer query:

SELECT *
FROM (
    SELECT 
        product_id,
        site,
        category_id,
        session_time,
        SUM("#clicks") clicks,
        RANK() OVER(PARTITION BY site, category_id ORDER BY sum("#clicks") DESC) rn
    FROM df
    WHERE 
        site IN ('com', 'co')
        AND session_time = DATE('2020-02-27')
    GROUP BY  product_id, site, session_time, category_id
) t
WHERE rn <= 10
ORDER BY site, category, clicks desc

I am unclear on why you need the coalesce()/cast() logic in the sum() (just like other aggregate functions, sum() ignore null values, and it seems like #clicks is a number already), so I removed it - you can add it back if you do need it, for some reason that I could not think of.

GMB
  • 216,147
  • 25
  • 84
  • 135