Questions tagged [quantile]

Quantiles are points taken at regular intervals from the cumulative distribution function (CDF) of a random variable.

In scientific software for statistical computing and graphics, the quantile of a numeric vector can be found by function quantile.

755 questions
8
votes
1 answer

Definitions of quantiles in R

Main question: Suppose you have a discrete, finite data set $d$. Then the command summary(d) returns the Min, 1st quartile, Median, mean, 3rd quartile, and max. My question is: what formula does R use to compute the 1st quartile? Background: My data…
voldemort
8
votes
1 answer

Pandas quantile failing with NaN's present

I've encountered an interesting situation while calculating the inter-quartile range. Assuming we have a dataframe such as: import pandas as pd index=pd.date_range('2014 01 01',periods=10,freq='D') data=pd.np.random.randint(0,100,(10,5)) data =…
tnknepp
  • 5,888
  • 6
  • 43
  • 57
8
votes
2 answers

Extract R^2 from quantile regression / summary()

I am using the quantreg package to run the following quantile regression in R: bank <-rq(gekX~laggekVIXclose+laggekliquidityspread+lagdiffthreeMTBILL+ lagdiffslopeyieldcurve+lagdiffcreditspread+laggekSPret, tau=0.99) and extract the coefficients…
schloni
  • 123
  • 2
  • 5
7
votes
2 answers

Finding quantiles in Julia

I need a function like xtile in Stata, that given a vector, it returns which quantile each obs belongs to. So if the function is defined as function xtile(vector; q= 4) #q = 4 by default returns quartiles *** returns a vector with the same size…
Aleiem
  • 226
  • 1
  • 9
7
votes
1 answer

Calculate percentile on pyspark dataframe columns

I have a PySpark dataframe which contains an ID and then a couple of variables for which I want to calculate the 95% point. Part of the printSchema(): root |-- ID: string (nullable = true) |-- MOU_G_EDUCATION_ADULT: double (nullable = false) |--…
Wendy De Wit
  • 293
  • 2
  • 3
  • 6
7
votes
1 answer

Google BigQuery APPROX_QUANTILES and getting true quartiles

According to the docs: Returns the approximate boundaries for a group of expression values, where number represents the number of quantiles to create. This function returns an array of number + 1 elements, where the first element is the approximate…
Tyler_1
  • 176
  • 1
  • 2
  • 11
7
votes
3 answers

Python equivalent of Excel's PERCENTILE.EXC

I am using Pandas to compute some financial risk analytics, including Value at Risk. In short, to compute Value at Risk (VaR), you take a time series of simulated portfolio changes in value, and then compute a specific tail percentile loss. For…
ryanr377
  • 81
  • 1
  • 5
7
votes
2 answers

How to calculate the mean of the top 10% in R

My dataset contains multiple observations for different species. Each species has a different number of observations. Looking for a fast way in R to calculate the mean of the top 10% of values for a given variable for each species. I figured out how…
PGLS
  • 71
  • 1
  • 5
7
votes
1 answer

Python Statsmodels QuantReg Intercept

Problem Setup In statsmodels Quantile Regression problem, their Least Absolute Deviation summary output shows the Intercept. In that example, they are using a formula from __future__ import print_function import patsy import numpy as np import…
Jarad
  • 17,409
  • 19
  • 95
  • 154
7
votes
2 answers

quantile regression+ dummy variable

I used the quantreg package in R to compute the quantile regression model. In the model, dependent Variable(Y) is NAS_DELAY, and the independent variable(Xs) are SEANSON1TO4, SEANSON2TO4, SEANSON3TO4. The model is: …
shitong
  • 71
  • 1
  • 3
6
votes
2 answers

How to use spark quantilediscretizer on multiple columns

All, I have a ml pipeline setup as below import org.apache.spark.ml.feature.QuantileDiscretizer import org.apache.spark.sql.types.{StructType,StructField,DoubleType} import org.apache.spark.ml.Pipeline import org.apache.spark.rdd.RDD import…
sramalingam24
  • 1,297
  • 1
  • 14
  • 19
6
votes
1 answer

quantile vs ecdf results

I am trying to use ecdf, but I am not sure if I am doing it right. My ultimate purpose is to find what quantile corresponds to a specific value. As an example: sample_set <- c(20, 40, 60, 80, 100) # Now I want to get the 0.75 quantile: quantile(x =…
Max_IT
  • 602
  • 5
  • 15
6
votes
1 answer

How can I get a percentile value for each dataframe row considering a subset of the data?

I have a dataframe obs with 145 rowns and more than 1000 columns. For each row I would like to extract the value of the 95th percentile but calculated only on the data greater or equal to 1. I managed calculating a value for each row, considering…
Corrado
  • 157
  • 4
  • 9
6
votes
2 answers

Calculate quantiles in R without interpolation - round up or down to actual value

It's my understanding that when calculating quantiles in R, the entire dataset is scanned and the value for each quantile is determined. If you ask for .8, for example it will give you a value that would occur at that quantile. Even if no such…
jsuprr
  • 97
  • 1
  • 6
6
votes
2 answers

Calculation of return levels based on a GPD in different R packages

I am performing an extreme value analysis for meteorological data, to be precise for precipitation data available in mm/d. I am using a threshold excess approach for estimating the parameters of a generalized Pareto distribution with a maximum…
Homunculus
  • 91
  • 1
  • 8
1 2
3
50 51