0

I wanna do some outlier detection with BigQuery and Datalab. MIN and MAX give me the same values but Quantiles are different. BigQuery gave me

Row f0_    f1_      approx_quantiles     
1   0.01   820.55   0.01     
                    190.04   
                    820.55

which seems wrong they are just the min & max. Datalab give me other Quantiles.

I did:

SELECT MIN(Menge),MAX(Menge),APPROX_QUANTILES(Menge,2)  AS approx_quantiles  FROM `nifty-stage-155512.de_veolia_vus_dev_views.tank_data_4`

Do someone else experienced the same or does someone having a solutions???

Edit

From datalab I get

0.25     61.645
0.50    190.000
0.75    253.000
Name: MENGE, dtype: float64
quimiluzon
  • 118
  • 6
ASP YOK
  • 131
  • 3
  • 12

2 Answers2

2

I'm a little confused. Are you wanting the 25th and 75th quantiles? From the documentation.

APPROX_QUANTILES

APPROX_QUANTILES([DISTINCT] expression, number [{IGNORE|RESPECT} NULLS])

Description

Returns the approximate boundaries for a group of expression values, where number represents the number of quantiles to create. This function returns an array of number + 1 elements, where the first element is the approximate minimum and the last element is the approximate maximum.

If so you need want more quantiles, you need to increase the value for "number". So 4 would give you [min,25th,50th,75th,max] for example

Community
  • 1
  • 1
Bobbylank
  • 1,906
  • 7
  • 15
  • See my edit. Normally I would expect other quantiles. So everything from my 61 and below and everything from 253 up could be a potential outlier – ASP YOK Jul 17 '18 at 15:07
  • So APPROX_QUANTILES(Menge,4) would give you those values as the 2nd and 4th value in the array. But you might find PERCENTILE_CONT ir PERCENTILE_DISC more useful? https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#percentile_cont – Bobbylank Jul 17 '18 at 15:10
  • Okay now I get it. I thought APPROX_Quantiles will already give me 75,50,25 quantiles. – ASP YOK Jul 17 '18 at 15:20
  • SELECT percentiles[offset(25)], percentiles[offset(50)], percentiles[offset(75)] FROM (SELECT APPROX_QUANTILES(Menge, 100) percentiles FROM ... Did it – ASP YOK Jul 17 '18 at 15:21
2

Could be also the solution:

SELECT percentiles[offset(25)], percentiles[offset(50)], percentiles[offset(75)]
FROM (SELECT APPROX_QUANTILES(Menge, 100) percentiles FROM 
ASP YOK
  • 131
  • 3
  • 12