Questions tagged [percentile]

In statistics, a percentile (or centile) is the value of a variable below which a certain percent of observations fall.

A closely related concept is "quantile". See .

739 questions
2
votes
2 answers

functools: computing inter quartile range

I use functools to compute percentiles this way: import functools percentiles = tuple(functools.partial(np.percentile, q=q) for q in (75, 85, 95)) percentiles (functools.partial(, q=75), …
user13641081
2
votes
2 answers

Calculating percentile for each gridpoint in xarray

I am currently using xarray to make probability maps. I want to use a statistical assessment like a “counting” exercise. Meaning, for all data points in NEU count how many times both variables jointly exceed their threshold. That means 1th…
2
votes
2 answers

Finding Percentile in Spark-Scala per a group

I am trying to do a percentile over a column using a Window function as below. I have referred here to use the ApproxQuantile definition over a group. val df1 = Seq( (1, 10.0), (1, 20.0), (1, 40.6), (1, 15.6), (1, 17.6), (1, 25.6), (1,…
abc_spark
  • 383
  • 3
  • 19
2
votes
1 answer

Rank computation considering time stamp in grouped data

In my game dataset, I have observations for several game players for several points in time. For each observation, I want to compute a rank for this player based on the number of points compared to the number of points of other players at this point…
Scijens
  • 541
  • 2
  • 11
2
votes
1 answer

Percentile calculation in HIVE

How can I calculate 25 percentile in Hive using sql. Let's say there is category, sub category and sales column. So how can I calculate the 25 percentile of sales? I tried to use the percentile(sales, 0.25) in hive but it is throwing an…
Karan6787
  • 21
  • 1
  • 2
2
votes
2 answers

GCP Console: How are percentile charts calculated?

I do not understand how the charts that show percentiles are calculated inside the Google Cloud Platform Monitoring UI. Here is how I am creating the standard chart: Example log events Creating a log-based metric for request…
2
votes
5 answers

Using Numpy, how 25 percentile is calculate for number 1 to10?

from numpy import percentile import numpy as np data=np.array([1,2,3,4,5,6,7,8,9,10]) # calculate quartiles quartile_1 = percentile(data, 25) quartile_3 =percentile(data, 75) # calculate min/max print(quartile_1) # show 3.25 print(quartile_3) #…
2
votes
1 answer

Compute rolling percentiles in PySpark

I have a dataframe with dates, ID (let's say of a city) and two columns of temperatures (in my real dataframe I have a dozen of columns to compute). I want to "rank" those temperatures for a given window. I want this ranking to be scaled from 0 (the…
2
votes
1 answer

How implement SAS percentile statement into R?

I have such SAS statement: proc univariate data = df noprint; class &var1. &var2.; var &var3.; output out = STAT PCTLPTS = 2 5 98 99 95 PCTLPRE = P; I have output from SAS proc like this: How can I get the same result in R? (with 5 P-columns and…
red_quark
  • 971
  • 5
  • 20
2
votes
2 answers

Calculate percentiles ignoring missing values

I have a PySpark dataframe with columns ID and BALANCE. I am trying to bucket the column balance into 100 percentile (1-100%) buckets and calculate how many IDs fall in each bucket. I cannot use anything related to RDD, I can only use PySpark…
2
votes
2 answers

Plot a histogram, based on percentiles

I have a frame with the folowing structure: df = pd.DataFrame({'ID': np.random.randint(1, 13, size=1000), 'VALUE': np.random.randint(0, 300, size=1000)}) How could i plot the graph, where on the X-axis there will be percentiles…
Denis Ka
  • 137
  • 1
  • 1
  • 10
2
votes
2 answers

Is it possible to get the PERCENT_RANK for a single record, but relative to the entire table?

I would like the PERCENT_RANK value for a single record, but in relation to the entire table. Is this possible? Examples I've seen are like this: SELECT Name, Salary PERCENT_RANK() OVER (ORDER BY Salary) FROM Employees Notice that it's…
Deane
  • 8,269
  • 12
  • 58
  • 108
2
votes
3 answers

Get percentiles from a grouped dataframe

I have a dataframe that has 2 experiment groups and I am trying to get percentile distributions. However, the data is already grouped: df = pd.DataFrame({'group': ['control', 'control', 'control','treatment','treatment','treatment'], …
Utopia025
  • 1,181
  • 3
  • 11
  • 21
2
votes
1 answer

Understanding numpy percentile computation

I understand percentile in the context of test scores with many examples (eg. you SAT score falls in the 99th percentile), but I am not sure I understand percentile in the following context and what is going on. Imagine a model outputs probabilities…
Jane Sully
  • 3,137
  • 10
  • 48
  • 87
2
votes
2 answers

Calculate percentile with groupBy on PySpark dataframe

I am trying to groupBy and then calculate percentile on PySpark dataframe. I've tested the following piece of code according to this Stack Overflow post: from pyspark.sql.types import FloatType import pyspark.sql.functions as func import numpy as…
Marc S
  • 97
  • 2
  • 11