I am trying to get a count of unique visitors. I first checked it by total without separating it by anytime frame.
Main table (big data table sample):
+-----------+----+-------+
|theDateTime|vD | vis |
+----------------+-------+
|2018-10-03 |123 |abc |
|2018-10-04 |123 |abc |
|2018-10-04 |123 |pqr |
|2018-10-05 |123 |xyz |
+-----------+----+-------+
the total distinct count of the above will be 3 but when I group by day abc
is counted twice. First on the 3rd and then on the 2nd. I just want the first one counted.
My Query for total:
select
d.eId AS vD
, COUNT(DISTINCT visitorId) AS vis
from decisions
WHERE d.eId = 123
AND timestamp BETWEEN unix_timestamp('2018-10-03 00:00:00')*1000 AND
unix_timestamp('2018-10-06 12:17:00')*1000
GROUP BY d.eId
ORDER BY vId
My Results:
+----+---------+
| vD | vis |
+----+---------+
|123 | 3 |
+----+---------+
My Query by Day:
select DISTINCT
cast(from_unixtime(timestamp DIV 1000) AS date) AS theDateTime
, d.eId AS vD
, COUNT(DISTINCT visitorId) AS vis
from decisions
WHERE timestamp BETWEEN unix_timestamp('2018-10-03 00:00:00')*1000 AND
unix_timestamp('2018-10-06 12:17:00')*1000
AND d.eId IN (11550123588)
GROUP BY cast(from_unixtime(timestamp DIV 1000) AS date),
d.vD
ORDER BY vD, theDateTime
My Results:
+-----------+----+-------+
|theDateTime|vD | vis |
+----------------+-------+
|2018-10-03 |123 | 1 |
|2018-10-04 |123 | 2 |
|2018-10-05 |123 | 1 |
+-----------+----+-------+
The total of this is 1122585. WHich is more than the total sum
I know this is because just incase the visitor is repeated on a different day and when I group by day he is counted twice. Is there a way for me to not count the visitor on day 2 if he has already been counted on day 1?
Please help!