In statistics, a percentile (or centile) is the value of a variable below which a certain percent of observations fall.
Questions tagged [percentile]
739 questions
3
votes
2 answers
How to convert string into aggregated quantiles?
I have a dataframe that is nested by groups. I want to convert variable 'x' from its raw value to quantile position (20%, 40%, 60%, 80%, 100% or 1, 2, 3, 4, 5).
Here is an example of the data I'm using:
df <- data.frame(x=c(1, 5, 21, 24, 43, 47, 56,…

Marco Pastor Mayo
- 803
- 11
- 25
3
votes
1 answer
How to create an alert the log fields based on the percentage of failures?
I have logging done on sumologic. The log JSON contains the response time of the request. Let it be a JSON key whose name is "response_time". Each request is identified by unique ID , denoted by JSON key "request_id". and a URL denoted by JSON key…

user9920500
- 606
- 7
- 21
3
votes
1 answer
Rank Pandas dataframe by quantile
I have a Pandas dataframe in which each column represents a separate property, and each row holds the properties' value on a specific date:
import pandas as pd
dfstr = \
''' AC BO C CCM CL CRD CT …

tel
- 13,005
- 2
- 44
- 62
3
votes
1 answer
How does pandas quantile( ) function works internally?
In this post:
How does pandas calculate quartiles?
This is the explanation given by @perl on the working of quantile() function:
df = pd.DataFrame([5,7,10,15,19,21,21,22,22,23,23,23,23,23,24,24,24,24,25], columns=['val'])
Let's consider 0.25 (same…

vineet
- 31
- 4
3
votes
2 answers
Apache Commons Math 2.2 Percentile bug?
I am not 100% sure if this is a bug or I am not doing something right but if you give Percentile a large amount of data that is the consistent of the same value (see code below) the evaluate method takes a very long time. If you give Percentile the…

Dimitry
- 4,503
- 6
- 26
- 40
3
votes
2 answers
Fastest way to multithread doing quickselect on all columns or all rows of a matrix in Rcpp - OpenMP, RcppParallel or RcppThread
I was using this Rcpp code to do a quickselect on a vector of values, i.e. obtain the kth largest element from a vector in O(n) time (I saved this as qselect.cpp):
// [[Rcpp::depends(RcppArmadillo)]]
#include
using namespace…

Tom Wenseleers
- 7,535
- 7
- 63
- 103
3
votes
2 answers
Pandas - Based on top x% value of each column, Mark as new number
I have a pandas dataframe below:
df
name value
0 Jack 3
1 Luke 3
2 Mark 2
3 Chris 1
4 Ace 10
5 Isaac 8
Based on the "value" column, I want to have the top 50%…

SwagZ
- 759
- 1
- 9
- 16
3
votes
1 answer
Flagging percentiles in SQL
I want to create a column in SQL similar to the flag below where I can identify the top 20th percent and bottom 20th percent of sales per block group in a given time period. I already have the sales aggregated to the block group but now I'm having…

Alex S. Sandoval
- 107
- 12
3
votes
1 answer
Python percentile of recent value vs window of previous values
Apologies I am a noob looking to transition from R!
Reproducible data example;
df = pd.DataFrame(1.26 + np.random.rand(size)/100.0,
index=pd.date_range('20160101 09:00:00',
periods=size,
…

redbaron1981
- 407
- 3
- 9
3
votes
2 answers
Pandas: filter data frame based on percentile condition
I have a data frame df with some basic web stats ranked by Page Views (PVs):
URL PVs
1 1500
2 1200
3 900
4 700
:
100 25
I am trying to filter and count number of URLs which contribute different percentile of page views (PVs). Say, I…

aviss
- 2,179
- 7
- 29
- 52
3
votes
1 answer
Tableau percentile calculation
I would like to know if the Percentile function in Tableau includes or excludes NULL, or rather, NA values from the calculation. If it includes the NA values, I would like to know how to write the function myself to exclude the NA values. I am new…

AyeTown
- 831
- 1
- 5
- 20
3
votes
2 answers
Pandas percentrank based on groups within each index
I have a dataframe, with a index which has dates (there are multiple same dates). For each date there are columns such as Price, Score, Category etc....
I want 1 new column in the dataframe called pctrank.
In the pctrank column, I want to calculate…

MysterioProgrammer91
- 569
- 1
- 8
- 24
3
votes
2 answers
Which method does pandas use for percentile?
I was trying to understand lower/upper percentiles calculation in pandas and got a bit confused. Here is the sample code and output for it.
test = pd.Series([7, 15, 36, 39, 40, 41])
test.describe()
output:
I am interested in only 25%, 75%…

Natig Aliyev
- 379
- 6
- 18
3
votes
0 answers
Matlab find percentile curve of a set of scatter points
I have a set of scatter points. They are height of sixty plants (cm) over time(days). I measure each of them for three times (days:~10, ~50, ~100)But some of the plants does not have the second or/and third measurement yet. Here are the small…

Cii
- 133
- 8
3
votes
1 answer
Filter outliers from Pandas dataframe from all columns except one
Say I have a dataframe with features and labels:
f1 f2 label
-1000 -100 1
-5 3 2
0 4 3
1 5 1
3 6 1
1000 100 2
I want to filter outliers from columns f1 and f2 to get:
f1 f2 label
-5 3 2
0 4 3
1 …

shda
- 729
- 7
- 19