1

Here is my question

Question B: I have a log table below

Date                 Visitor
2020-01-01            A
2020-01-01            B
2020-01-01            C
2020-01-02            C
2020-01-02            D
2020-01-03            D
2020-01-03            E
2020-01-04            A
2020-01-04            E
2020-01-04            F
2020-01-04            D

Q_B1: find # of visitor_change (gain or loss) compared to the previous day on each day

Example: On 1/2, we had C & D, compared to A & B & C on 1/1, then # of visitor_change should be -1 (2 - 3 = -1). Similarly, on 1/3, we had D & E (2), the same as on 1/2 (C & D), then the # of visitor_change should be 0 (2 - 2 = 0)

Output:

Date         #
2020-01-01   3
2020-01-02   -1 (2-3)
2020-01-03   0 (2-2)
2020-01-04   2 (4-2)

Q_B2: find # of total_new_visitors (who never visited before) on each day

Example: On 1/1, A & B & C were all net new, so the number should be 3. On 1/2, we had net new visitor D, so the number should be 1. Similarly, on 1/3 we had net new visitor E, so the number should be 1

Output:

Date         #
2020-01-01   3
2020-01-02   1 (D)
2020-01-03   1 (E)
2020-01-04   1 (F)

My solution:

select a.date, count(distinct a.visitor)
from log a
left join log b on a.visitor = b.visitor and b.date < a.date
where b.visitor is null
group by a.date

Thanks!

user12562215
  • 115
  • 2
  • 11
  • What is `visitor_gain`? What is `total_new_visitors`? Proper definitions and desired results would help. – Gordon Linoff Apr 27 '20 at 13:12
  • First is tricky, because some days may be missing in the original table, but since you're required to output statistics for every day, you have to add these missing days. One solution would be to select [all days between](https://stackoverflow.com/questions/23290454/get-all-dates-between-two-dates-in-sql-server) min and max date in the table, fill these days with count zero and `UNION` them with `SELECT Date, COUNT(DISTINCT Visitor) FROM Tab GROUP BY Date`. Thus you'll have count of visitors for **each** day. From there you need to use `LAG` function. – Alexey S. Larionov Apr 27 '20 at 13:39
  • `LAG` is needed to select 1 row behind the current and so you'll obtain on a single row both counts for a given day and a day before it. Subtract those values to get the difference in visitors. Can't write the full code, I'm not familiar with Amazon SQL dialect, just googled that it has `LAG` function just as in Oracle – Alexey S. Larionov Apr 27 '20 at 13:41
  • @AlexLarionov Thanks. That makes sense – user12562215 Apr 27 '20 at 23:47

1 Answers1

1

Q1: you can use aggregation and window functions:

select
    date,
    count(*) no_visitors
    count(*) - lag(count(*)) over(order by date) visitor_diff
from mytable
group by date
order by date

Q2: you can use window functions and aggregation!

select date, sum(case when date = first_visit then 1 else 0 end) no_first_visits
from (  
    select
        date,
        min(date) over(partition by visitor) first_visit
    from mytable
) t
group by date
order by date
GMB
  • 216,147
  • 25
  • 84
  • 135