Count distinct per month/year but display all dates in the query result

Question

CREATE TABLE sales (
    id int auto_increment primary key,
    orderID VARCHAR(255),
    sent_date DATE
);

INSERT INTO sales
(orderID, sent_date
)
VALUES 
("Order_01", "2019-03-15"),
("Order_01", "2019-03-16"),
("Order_02", "2020-06-16"),
("Order_03", "2020-07-27"),
("Order_03", "2020-08-05"),
("Order_03", "2020-08-10");

Expected Result:

sent_date      COUNT(distinct orderID)
2019-03-15              1
2019-03-16              0
2020-06-16              1
2020-07-27              1
2020-08-05              1
2020-08-10              0

In the above table I have the same orders with multiple sent_dates.

Now, I want to count the unique orderIDs per month/year on a daily-basis.
Therefore, I am looking for a query that checks for each orderID if it already exists at a previous sent_date and if so the value for the current sent_date should be 0.

I know the most simple way to the unique count per month/year would be this query:

SELECT
YEAR(sent_date),
MONTH(sent_date),
COUNT(distinct orderID)
FROM sales
GROUP BY 1,2;

However, I need to have each sent_date displayed seperately in a list as you can see in the expected result.
What query do I need to get the count unique although I need to query the data on a daily-basis?

score 1 · Answer 1 · answered Jul 30 '20 at 12:53

If I understand correctly, you want to flag the first occurrence of each order in a month. You can use window functions:

select s.*,
       ( row_number() over (partition by extract(year_month from sent_date), orderid order by sent_date) = 1 ) as flag
from s;

Window functions are supported in MySQL 8+.

GMB · Accepted Answer · 2020-07-30T13:05:46.717

I think that you want to a histogram of the first monthly occurence of each orderid per date.

Here is an approach using window functions, available in MySQL 8.0:

select sent_date, sum(rn = 1) cnt_distinct_orderid
from (
    select s.*, row_number() over(partition by year(sent_date), month(sent_date), orderid order by sent_date) rn
    from sales s
) t
group by sent_date
order by sent_date

In earlier versions, one option uses a self-join:

select d.sent_date, count(s.orderid) cnt_distinct_orderid
from (select distinct sent_date from sales) d
left join (
    select orderid, min(sent_date) min_sent_date
    from sales 
    group by orderid, date_format(sent_date, '%Y-%m-01')
) s on d.sent_date = s.min_sent_date
group by d.sent_date

Demon on DB Fiddle:

sent_date  | cnt_distinct_orderid
:--------- | -------------------:
2019-03-15 |                    1
2019-03-16 |                    0
2020-06-16 |                    1
2020-07-27 |                    1
2020-08-05 |                    1
2020-08-10 |                    0

https://dbfiddle.uk/?rdbms=mariadb_10.3&fiddle=623d964406b8175f9721c4e33465fba0 — Michi, Jul 30 '20 at 13:09
Just for my documenation. Using the SQL for daily import: https://dbfiddle.uk/?rdbms=mariadb_10.3&fiddle=0f559794e87bc43cdc26569d3fba58c5 — Michi, Aug 03 '20 at 12:27

Count distinct per month/year but display all dates in the query result

2 Answers2