Questions tagged [percentile]

In statistics, a percentile (or centile) is the value of a variable below which a certain percent of observations fall.

A closely related concept is "quantile". See .

739 questions
3
votes
3 answers

Compute percentile and max value per variable

Bash Gurus, I need to compute the max and percentile numbers for each item in the list, using awk aa 1 ab 3 aa 4 ac 5 aa 3 ad 2 ab 4 ac 2 ae 2 ac 5 Expected output Item 90th percentile max value aa 3.8 4 ab 3.9 …
Pradeep BS
  • 51
  • 1
  • 7
3
votes
1 answer

Postgres percentile_cont with cardinality

I have been using the new percentile_cont in Postgres to calculate percentiles for a table ever since it was launched. However, we are now changing the table to include cardinality for each row, and I'm unsure as to how to implement percentile_cont…
Fredrik
  • 301
  • 3
  • 10
3
votes
3 answers

r bin equal deciles

I have a dataset containing over 6,000 observations, each record having a score ranging from 0-100. Below is a sample: +-----+-------+ | uID | score | +-----+-------+ | 1 | 77 | | 2 | 61 | | 3 | 74 | | 4 | 47 | | 5 | 65 | | …
Jrausch2
  • 33
  • 1
  • 5
3
votes
3 answers

pure python implementation for calculating percentiles: what is the use of the lambda function here?

I have stumbled upon this pure python implementation for calculating percentiles here and here: import math import functools def percentile(N, percent, key=lambda x:x): """ Find the percentile of a list of values. @parameter N - is a list of…
jov14
  • 139
  • 9
3
votes
1 answer

why aren't pandas "rank" percentiles bounded between 0 and 1?

I use pandas frequently and often execute code comparable to the following: df['var_rank'] = df['var'].rank(pct=True) print( df.var_rank.max() ) And will often get values greater than 1. It still happens whether I keep or drop 'na' values. This is…
benten
  • 1,995
  • 2
  • 23
  • 38
3
votes
0 answers

What is the difference between percentile function in Numpy and Excel?

I am using Python to do some calculation on a data series. The goal of the calculation is to remove the top 5 percentile data from the series. As an acceptance criteria, the manual calculation in excel is done in parallel. I need to meet the manual…
ChangeMyName
  • 7,018
  • 14
  • 56
  • 93
3
votes
1 answer

Improve code / remove for-loop when using accumarray MATLAB

I have the following piece of code that is quite slow to compute the percentiles from a data set ("DATA"), because the input matrices are large ("Data" is approx. 500.000 long with 10080 unique values assigned from "Indices"). Is there a…
Jonas
  • 308
  • 1
  • 11
3
votes
1 answer

Nonlinear Colormap/Heatmap

I'm trying to make a 1D heatmap for a gene (see ref 1 in pastebin for example). I've gotten close to what I'm looking for with contourf, but I haven't been able to figure out how to get exactly what I'm looking for. Basically, I'm want to utilize a…
Emmett
  • 37
  • 6
3
votes
0 answers

95th and 99th percentile latency

I am measuring database performance and am looking at p95 and p99 latency. My results are as follows. Database A shows: 95thPercentileLatency(ms) 20 99thPercentileLatency(ms) 28 Database B shows: 95thPercentileLatency(ms) …
JamesF
  • 197
  • 1
  • 2
  • 10
3
votes
1 answer

Getting percentile values from gamlss centile curves

This question is related to: Selecting Percentile curves using gamlss::lms in R I can get centile curve from following data and code: age = sample(5:15, 500, replace=T) yvar = rnorm(500, age, 20) mydata = data.frame(age, yvar) head(mydata) age …
rnso
  • 23,686
  • 25
  • 112
  • 234
3
votes
3 answers

Calculating data point which have Precision of 99%

We have a table which have millions of entry. The table have two columns, now there is correlation between X and Y when X is beyond a value, Y tends to be B (However it is not always true, its a trend not a certainty). Here i want to find the…
user1479802
  • 201
  • 3
  • 12
3
votes
2 answers

How to get a percentile for an empirical data distribution and get it's x-coordinate?

I have some discrete data values, that taken together form some sort of distribution. This is one of them, but they are different with the peak being in all possible locations, from 0 to end. So, I want to use it's quantiles (percentiles) in…
Phlya
  • 5,726
  • 4
  • 35
  • 54
3
votes
3 answers

TSL / SQL PERCENTILE_CONT should return 1 record

I am working on a T-SQL query (I'm running a SQL Server database) that should compute the median from a list of values. The query looks like this: SELECT PERCENTILE_CONT(0.5) OVER (ORDER BY age) as Age FROM peopleDB WHERE …
phoxley
  • 464
  • 8
  • 19
3
votes
1 answer

How to apply a function for each level of a factor variable?

I have a function like this: remove_outliers<-function(x){ qnt<- quantile(x,probs=0.99) y<- x y[x>qnt]<- NA y} The purpose is to remove outliers that are at the top 1% of the data (replace their value with NA). How can I apply this function across…
kuki
  • 303
  • 2
  • 6
  • 15
2
votes
3 answers

within group sorts in mysql

I have a panel data set: that is, times, ids, and values. I would like to do a ranking based on value for each date. I can achieve the sort very simply by running: select * from tbl order by date, value The issue I have is once the table is…
Alex
  • 19,533
  • 37
  • 126
  • 195