Questions tagged [quantile]

Quantiles are points taken at regular intervals from the cumulative distribution function (CDF) of a random variable.

In scientific software for statistical computing and graphics, the quantile of a numeric vector can be found by function quantile.

755 questions
10
votes
3 answers

R, filter matrix based on variance cut-offs

See edit below Using R, I would like to filter a matrix (of gene expression data) and keep only the rows (genes/probes) that have values with high variance. For example, I'd like to only keep the rows that have values in the bottom and top…
Todd
  • 568
  • 2
  • 6
  • 15
9
votes
1 answer

How to ignore empty dataseries in prometheus

Calculating the maximum quantile over all dataseries is a problem for me: query http_response_time{job=~"^(x|y)$", quantile="0.95",...} result http_response_time{job="x",...} 0.26 http_response_time{job="y",...} NaN This is how I would try to…
eventhorizon
  • 2,977
  • 8
  • 33
  • 57
9
votes
1 answer

When using scipy.stats.multivariate_normal.pdf having the error:operands could not be broadcast together with shapes (1,8) (21,)

I want to calculate the multivariate gaussian density function for a data set I have on python. My dataset has 21 variables and there 75 data points. I have calculated the covariance matrix (cov) for this which is a 21*21 array, and the mean array,…
jan93
  • 179
  • 1
  • 2
  • 10
9
votes
1 answer

Replace outliers with column quantile in Pandas dataframe

I have a dataframe: df = pd.DataFrame(np.random.randint(0,100,size=(5, 2)), columns=list('AB')) A B 0 92 65 1 61 97 2 17 39 3 70 47 4 56 6 Here are 5% quantiles: down_quantiles = df.quantile(0.05) A 24.8 B 12.6 And here is…
shda
  • 729
  • 7
  • 19
9
votes
1 answer

Calculate percentiles/quantiles for a timeseries with resample or groupby - pandas

I have a time series of hourly values and I am trying to derive some basic statistics on a weekly/monthly basis. If we use the following abstract dataframe, were each column is time-series: rng = pd.date_range('1/1/2016', periods=2400, freq='H') df…
Andreuccio
  • 1,053
  • 2
  • 18
  • 32
9
votes
2 answers

Quantile functions in Python

I'm having trouble finding quantile functions for well-known probability distributions in Python, do they exist? In particular, is there an inverse normal distribution function? I couldn't find anything in either Numpy or Scipy.
dsaxton
  • 995
  • 2
  • 10
  • 23
9
votes
1 answer

Error in summary quantreg backsolve

When I run a quantile regression in R, using the quantreg package, and then I run summary(quantregObject), I get this error message: Error in base::backsolve(r, x, k = k, upper.tri = upper.tri, transpose = transpose, : singular matrix in…
Alberto
  • 91
  • 1
  • 2
9
votes
2 answers

How can I compute statistics by decile groups in data.table

I have a data.table and would like to compute stats by groups. R) set.seed(1) R) DT=data.table(a=rnorm(100),b=rnorm(100)) Those groups should be defined by R) quantile(DT$a,probs=seq(.1,.9,.1)) 10% 20% 30% …
statquant
  • 13,672
  • 21
  • 91
  • 162
9
votes
4 answers

How to replace outliers with the 5th and 95th percentile values in R

I'd like to replace all values in my relatively large R dataset which take values above the 95th and below the 5th percentile, with those percentile values respectively. My aim is to avoid simply cropping these outliers from the data entirely. Any…
Bobbo
  • 95
  • 1
  • 1
  • 5
8
votes
2 answers

A efficient quantiles algorithm/data structure that allows samples to be updated as they increment over time?

I'm looking for an efficient quantiles algorithm that allows sample values to be "upserted" or replaced as the value changes over time. Let's say I have values for items 1-n. I'd like to put these into a quantiles algorithm that would efficiently…
marathon
  • 7,881
  • 17
  • 74
  • 137
8
votes
3 answers

pandas using qcut on series with fewer values than quantiles

I have thousands of series (rows of a DataFrame) that I need to apply qcut on. Periodically there will be a series (row) that has fewer values than the desired quantile (say, 1 value vs 2 quantiles): >>> s = pd.Series([5, np.nan, np.nan]) When I…
Zhang18
  • 4,800
  • 10
  • 50
  • 67
8
votes
3 answers

How to calculate a percentile ranking of a column of data relative to another column using python

I have two columns of data representing the same quantity; one column is from my training data, the other is from my validation data. I know how to calculate the percentile rankings of the training data efficiently…
Doodles
  • 195
  • 1
  • 2
  • 7
8
votes
2 answers

Is there a better way to create quantile "dummies" / factors in R?

i´d like to assign factors representing quantiles. Thus I need them to be numeric. That´s why I wrote the following function, which is basically the answer to my problem: qdum <- function(v,q){ qd = quantile(v,1:(q)/q) v = as.data.frame(v) v$b =…
Matt Bannert
  • 27,631
  • 38
  • 141
  • 207
8
votes
2 answers

Using Hive ntile results in where clause

I want to get summary data of the first quartile for a table in Hive. Below is a query to get the maximum number of views in each quartile: SELECT NTILE(4) OVER (ORDER BY total_views) AS quartile, MAX(total_views) FROM view_data GROUP BY…
Nadine
  • 1,620
  • 2
  • 15
  • 27
8
votes
4 answers

Plot quantiles of distribution in ggplot2 with facets

I'm currently plotting a number of different distributions of first differences from a number of regression models in ggplot. To facilitate interpretation of the differences, I want to mark the 2.5% and the 97.5% percentile of each distribution.…
chrstnsn
  • 237
  • 1
  • 3
  • 6
1
2
3
50 51