3

I have a table with 2157 records, let's say with 3 columns (A,B,C) and I know that in column A there are 2154 different values.

Using Tableau Desktop (and its own fenctions) connected to BigQuery, I get these results:

  • Count(A) --> 2157
  • CountD(A) --> 2180 (more than Count!!!)

If i run the same calculations using queries in Big Query console, I get:

  • Count --> 2157
  • CountD --> 2154 (right)

How can this be possible?

By the way, if use a CSV file of the same table and I use it as datasource using Tableau Public (no Desktop), I get correct results as in Big Query.

Thanks in advance

Fabio Fantoni
  • 3,077
  • 3
  • 22
  • 32

2 Answers2

1

If you use the DISTINCT keyword, the function returns the number of distinct values for the specified field. Note that the returned value for DISTINCT is a statistical approximation and is not guaranteed to be exact - the documentation is also clear about this.

If you require greater accuracy from COUNT(DISTINCT), you can specify a second parameter, n, which gives the threshold below which exact results are guaranteed. By default, n is 1000, but if you give a larger n, you will get exact results for COUNT(DISTINCT) up to that value of n. However, giving larger values of n will reduce scalability of this operator and may substantially increase query execution time or cause the query to fail.

To compute the exact number of distinct values, use EXACT_COUNT_DISTINCT. Or, for a more scalable approach, consider using GROUP BY on the relevant field(s) and then applying COUNT(*). The GROUP BY approach is more scalable but might incur a slight up-front performance penalty.

https://cloud.google.com/bigquery/query-reference#aggfunctions

Pentium10
  • 204,586
  • 122
  • 423
  • 502
  • I've used EXACT_COUNT_DISTINCT and I got (correctly) 2154 distinct values (in Google Big Query console). The real problem is when Tableau tries to perform his own CountD which returns 2180 records (more than the original 2157). – Fabio Fantoni Oct 07 '15 at 12:17
0

Try using a calculated field with this formula - RAWSQLAGG_INT("count(unique(%1))",['Your column'])

Slows down performance, so only use when you really need it.

outlier123
  • 217
  • 1
  • 2
  • 10