Questions tagged [quantile]

Quantiles are points taken at regular intervals from the cumulative distribution function (CDF) of a random variable.

In scientific software for statistical computing and graphics, the quantile of a numeric vector can be found by function quantile.

755 questions
4
votes
1 answer

How to select observations that are within a certain quantile

I have data (~1000 rows) that look like this: head(data) alt alb alp alt_zscore alb_zscore alp_zscore 1 11 2.60 9 -1.54 -7.82 -0.949 2 12 5.37 86.3 -1.45 …
burphound
  • 161
  • 7
4
votes
2 answers

Show percentiles of Variable A, while the classification of percentiles is based on Variable B

I have a dataset that looks like the following: INCOME WEALTH 10.000 100000 15.000 111000 14.200 123456 12.654 654321 I have many more rows. I now want to now find how much INCOME a household in a specific WEALTH percentile has.…
Jakob
  • 43
  • 3
4
votes
1 answer

Why is the quantile function not working for this dplyr function?

I'm working through Faraway's 2016 book Extending the Linear Model with R and have encountered an issue with the code that I don't know how to fix. Here is the relevant syntax leading up to the error: #### Load Data & Libraries…
Shawn Hemelstrand
  • 2,676
  • 4
  • 17
  • 30
4
votes
0 answers

bucketing with QuantileDiscretizer using groupBy function in pyspark

I have a large dataset like so: |…
thentangler
  • 1,048
  • 2
  • 12
  • 38
4
votes
2 answers

Pandas: groupby and then retrieving IQR

I am quite new to Pandas and I am trying to do the following thing: I have two dataframes comms and arts that look like this (except for the fact they are longer ad with other columns) comms: ID commScore 10 5 10 …
Sala
  • 480
  • 4
  • 19
4
votes
0 answers

Prometheus CPU Usage Histogram Metrics

my goal is to observe metrics (like CPU, Memory usage etc.) with Prometheus on a server and on its running docker containers. Before sending an alarm, I would like to compare the certain values of those metrics with e.g. an 0.95 quantile. However,…
ilya21
  • 41
  • 4
4
votes
2 answers

How to apply NTILE(4) using range of column values?

Would like to use NTILE to see the distribution of countries by forested land percent of total land area. The range of values in the column I'd like to use is from 0.00053 to very close to 98.25, and countries are not evenly distributed across the…
Conner M.
  • 1,954
  • 3
  • 19
  • 29
4
votes
2 answers

Remove decimal points from pandas qcut intervals (transform intervals to integers)

I have many scores in the column of an object named example. I want to split these scores into deciles and assign the corresponding decile interval to each row. I tried the following: import random import pandas as pd random.seed(420)…
Arturo Sbr
  • 5,567
  • 4
  • 38
  • 76
4
votes
2 answers

Assigning quantiles in R where quantiles are not unique

Let x be a vector of numeric, non-negative data (mostly < 10) and qx <- quantile(x, probs = pq), and where length(pq) is typically > length(x) * (3/4). I am in need of a vector of indices of qx, call it q_i, where x[i] falls in the quantile…
Kyle
  • 83
  • 9
4
votes
1 answer

How to add a column to a PySpark dataframe which contains the nth quantile of another column in the dataframe

I have a very large CSV file which has been imported as a PySpark dataframe: df. The dataframe contains many columns including column ireturn. I want to compute the 0.99 and 0.01 percentile of this column and then add another column to the dataframe…
Monirrad
  • 465
  • 1
  • 7
  • 17
4
votes
1 answer

algorithm to dynamically monitor quantile(s)

I want to estimate the quantile of some data. The data is so huge that it won't fit in memory. And new data keeps coming in. Does anyone know an algorithm to monitor the quantile(s) of the data observed so far with very limited memory and…
sinoTrinity
  • 1,125
  • 2
  • 15
  • 27
4
votes
1 answer

quantile method on groupby of xarray dataset

I have a classic xarray Dataset. These are monthly data (38 years of monthly data). I am interested in calculating the quantile values for each month separately. Dimensions: (lat: 26, lon: 71, time: 456) Coordinates: * lat …
claude
  • 549
  • 8
  • 25
4
votes
1 answer

R Make Nice Kable From Quantile Output

Is there a way to nicely format a the output of a quantile function in R when using Knitr to put everything together into an HTML or PDF? Typically, I have used Kable to make Knitr Tables nice with correct formatting. e.g. quantile(data$x,…
guy
  • 1,021
  • 2
  • 16
  • 40
4
votes
1 answer

pandas: qcut error: ValueError: Bin edges must be unique:

I am trying to compute percentile of two columns using the pandas qcut method like below: my_df['float_col_quantile'] = pd.qcut(my_df['float_col'], 100, labels=False) my_df['int_col_quantile'] = pd.qcut(my_df['int_col'].astype(float), 100,…
Edamame
  • 23,718
  • 73
  • 186
  • 320
4
votes
1 answer

select/filter bins after qcut decile

I am trying to access the labels (i.e. positional indicator) after binning my data by decile: q = pd.qcut(df["revenue"], 10) q.head(): 7 (317.942, 500.424] 81 (317.942, 500.424] 83 (150.65, 317.942] 84 [0.19, 150.65] 85 …
codingknob
  • 11,108
  • 25
  • 89
  • 126