1

I have written a query in Google Big Query and want to get the same number of users I see in Google Analytics. I used Legacy and Normal SQL and got 3 different users numbers while the sessions were the same. What did I do wrong, or does anyone have an explanation/solution for it? Every help is appreciated!

Normal SQL

SELECT COUNT(DISTINCT fullVisitorId) AS users, SUM(IF(totals.visits IS 
NULL,0,totals.visits)) AS sessions
FROM `XXX.XXX.ga_sessions_*`
WHERE _TABLE_SUFFIX BETWEEN '20181120' AND '20181120'

Legacy SQL

SELECT COUNT(DISTINCT fullVisitorId) AS users, SUM(IF(totals.visits IS 
NULL,0,totals.visits)) AS sessions
FROM TABLE_DATE_RANGE([XXX:XXX.ga_sessions_], TIMESTAMP('2018-11-20'), 
TIMESTAMP('2018-11-20'))
Pierre W
  • 21
  • 5

1 Answers1

2

I think this warning from the documentation explains what is happening:

In legacy SQL, COUNT(DISTINCT x) returns an approximate count. In standard SQL, it returns an exact count.

StandardSQL has the correct number. You can test this by attempting to use EXACT_COUNT_DISTINCT() in legacy SQL.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thank you Gordon. Now the numbers are the same, but there is still a difference between Google Analytics and Big Query. Any ideas? – Pierre W Dec 03 '18 at 13:48
  • 1
    @PierreWunderlich how much are you out and is this users, visits or both? Have you checked that Google Analytics is not sampling your data? https://support.google.com/analytics/answer/2637192?hl=en – Ben P Dec 03 '18 at 13:55
  • The data is unsampled and the difference is small, but there is one. Sessions are the same in GA and Big Query. – Pierre W Dec 03 '18 at 14:22