Questions tagged [percentile]

In statistics, a percentile (or centile) is the value of a variable below which a certain percent of observations fall.

A closely related concept is "quantile". See .

739 questions
3
votes
2 answers

How to convert string into aggregated quantiles?

I have a dataframe that is nested by groups. I want to convert variable 'x' from its raw value to quantile position (20%, 40%, 60%, 80%, 100% or 1, 2, 3, 4, 5). Here is an example of the data I'm using: df <- data.frame(x=c(1, 5, 21, 24, 43, 47, 56,…
Marco Pastor Mayo
  • 803
  • 11
  • 25
3
votes
1 answer

How to create an alert the log fields based on the percentage of failures?

I have logging done on sumologic. The log JSON contains the response time of the request. Let it be a JSON key whose name is "response_time". Each request is identified by unique ID , denoted by JSON key "request_id". and a URL denoted by JSON key…
user9920500
  • 606
  • 7
  • 21
3
votes
1 answer

Rank Pandas dataframe by quantile

I have a Pandas dataframe in which each column represents a separate property, and each row holds the properties' value on a specific date: import pandas as pd dfstr = \ ''' AC BO C CCM CL CRD CT …
tel
  • 13,005
  • 2
  • 44
  • 62
3
votes
1 answer

How does pandas quantile( ) function works internally?

In this post: How does pandas calculate quartiles? This is the explanation given by @perl on the working of quantile() function: df = pd.DataFrame([5,7,10,15,19,21,21,22,22,23,23,23,23,23,24,24,24,24,25], columns=['val']) Let's consider 0.25 (same…
vineet
  • 31
  • 4
3
votes
2 answers

Apache Commons Math 2.2 Percentile bug?

I am not 100% sure if this is a bug or I am not doing something right but if you give Percentile a large amount of data that is the consistent of the same value (see code below) the evaluate method takes a very long time. If you give Percentile the…
Dimitry
  • 4,503
  • 6
  • 26
  • 40
3
votes
2 answers

Fastest way to multithread doing quickselect on all columns or all rows of a matrix in Rcpp - OpenMP, RcppParallel or RcppThread

I was using this Rcpp code to do a quickselect on a vector of values, i.e. obtain the kth largest element from a vector in O(n) time (I saved this as qselect.cpp): // [[Rcpp::depends(RcppArmadillo)]] #include using namespace…
Tom Wenseleers
  • 7,535
  • 7
  • 63
  • 103
3
votes
2 answers

Pandas - Based on top x% value of each column, Mark as new number

I have a pandas dataframe below: df name value 0 Jack 3 1 Luke 3 2 Mark 2 3 Chris 1 4 Ace 10 5 Isaac 8 Based on the "value" column, I want to have the top 50%…
SwagZ
  • 759
  • 1
  • 9
  • 16
3
votes
1 answer

Flagging percentiles in SQL

I want to create a column in SQL similar to the flag below where I can identify the top 20th percent and bottom 20th percent of sales per block group in a given time period. I already have the sales aggregated to the block group but now I'm having…
3
votes
1 answer

Python percentile of recent value vs window of previous values

Apologies I am a noob looking to transition from R! Reproducible data example; df = pd.DataFrame(1.26 + np.random.rand(size)/100.0, index=pd.date_range('20160101 09:00:00', periods=size, …
redbaron1981
  • 407
  • 3
  • 9
3
votes
2 answers

Pandas: filter data frame based on percentile condition

I have a data frame df with some basic web stats ranked by Page Views (PVs): URL PVs 1 1500 2 1200 3 900 4 700 : 100 25 I am trying to filter and count number of URLs which contribute different percentile of page views (PVs). Say, I…
aviss
  • 2,179
  • 7
  • 29
  • 52
3
votes
1 answer

Tableau percentile calculation

I would like to know if the Percentile function in Tableau includes or excludes NULL, or rather, NA values from the calculation. If it includes the NA values, I would like to know how to write the function myself to exclude the NA values. I am new…
AyeTown
  • 831
  • 1
  • 5
  • 20
3
votes
2 answers

Pandas percentrank based on groups within each index

I have a dataframe, with a index which has dates (there are multiple same dates). For each date there are columns such as Price, Score, Category etc.... I want 1 new column in the dataframe called pctrank. In the pctrank column, I want to calculate…
3
votes
2 answers

Which method does pandas use for percentile?

I was trying to understand lower/upper percentiles calculation in pandas and got a bit confused. Here is the sample code and output for it. test = pd.Series([7, 15, 36, 39, 40, 41]) test.describe() output: I am interested in only 25%, 75%…
Natig Aliyev
  • 379
  • 6
  • 18
3
votes
0 answers

Matlab find percentile curve of a set of scatter points

I have a set of scatter points. They are height of sixty plants (cm) over time(days). I measure each of them for three times (days:~10, ~50, ~100)But some of the plants does not have the second or/and third measurement yet. Here are the small…
Cii
  • 133
  • 8
3
votes
1 answer

Filter outliers from Pandas dataframe from all columns except one

Say I have a dataframe with features and labels: f1 f2 label -1000 -100 1 -5 3 2 0 4 3 1 5 1 3 6 1 1000 100 2 I want to filter outliers from columns f1 and f2 to get: f1 f2 label -5 3 2 0 4 3 1 …
shda
  • 729
  • 7
  • 19