0

Using legacy SQL, I am trying to use COUNT(DISTINCT field, n) in Google BigQuery. But I am get following error:

UNIQUE_HEAP requires an int32 argument which is greater than 0 (error code: invalidQuery)

Here is my query that I have used:

SELECT
    hits.page.pagePath AS Page,
    COUNT(DISTINCT CONCAT(fullVisitorId, INTEGER(visitId)), 1e6) AS UniquePageviews,
    COUNT(DISTINCT fullVisitorId, 1e6) as Users
FROM
    [xxxxxxxx.ga_sessions_20170101]
GROUP BY
    Page
ORDER BY
    UniquePageviews DESC
LIMIT
    20

BigQuery is not even showing line number of error therefore I am not sure which line is causing this error.

What could be possible cause of above error?

Graham Polley
  • 14,393
  • 4
  • 44
  • 80
Shamshad Alam
  • 1,684
  • 3
  • 19
  • 31

1 Answers1

1

Don't use 1e6 in your COUNT(DISTINCT). Instead, use an actual INTEGER value for the 2nd parameter 'N' (default is 1000), or use EXACT_COUNT_DISTINCT() instead.

COUNT(DISTINCT) documentation

EXACT_COUNT_DISTINCT() documentation

If you require greater accuracy from COUNT(DISTINCT), you can specify a second parameter, n, which gives the threshold below which exact results are guaranteed. By default, n is 1000, but if you give a larger n, you will get exact results for COUNT(DISTINCT) up to that value of n. However, giving larger values of n will reduce scalability of this operator and may substantially increase query execution time or cause the query to fail.

To compute the exact number of distinct values, use EXACT_COUNT_DISTINCT. Or, for a more scalable approach, consider using GROUP EACH BY on the relevant field(s) and then applying COUNT(*). The GROUP EACH BY approach is more scalable but might incur a slight up-front performance penalty.

Graham Polley
  • 14,393
  • 4
  • 44
  • 80