Questions tagged [iqr]

IQR stands for "Interquartile range" in statistics.

Interquartile range (statistics) equals to the difference between the third and first quartiles. A really nice alternative to show dispersion instead of standard deviation.

This descriptive statistic could be familiar from boxplots.

75 questions
0
votes
2 answers

Given a large dataset, how do you remove outliers using IQR method using R

We have been given a large dataset, and we are being asked to remove outliers using the IQR method using R. The data has 53 columns, 17 of which are continuous, the remaining are categorical. How would you use the IQR method to remove outliers to…
0
votes
0 answers

Outlier removal using Box-plot's IQR - Repeatedly

I have a dataset with a column on which I can apply outlier removal logic of box-plot (all rows with value lower than (Q1 - 1.5 x IQR) and higher than lower than (Q3 + 1.5 x IQR). However, it is observed that after removing outliers, if the box-plot…
Jay
  • 1,210
  • 9
  • 28
  • 48
0
votes
1 answer

Make a threshold model based on the interquartile range in R

I want to create the standard deviation of the interquartile range when the rows exceeds a certain threshold. For example, I got 7 columns named AI_1, ..., AI_7. In total this dataset has 60480 observations split out over 42 IDs (1440 each). I can…
0
votes
2 answers

Count number of outliers by group in r and store count in new dataframe

I have a dataset that has 2 columns; column A is State_Name and has 5 different options of state, and column B is Total_Spend which has the average total spend of that state per day. There are 365 observations for each state. What I want to do is…
K-J
  • 99
  • 9
0
votes
1 answer

Calculate IQR for selected observations

I have the following data and I would like to calculate IQR for only for those whose sex is equal to 1. I have tried if(Agogo$sex_2015==2) { IQR(Agogo$bmi) } Is there any way to do this using ifelse or any other condition?
0
votes
1 answer

How do I get rid of abnormalities from Pandas?

If I want to remove values that do not exist between -2σ and 2σ, how do I remove outliers using iqr? I implemented this equation as follows. iqr = df['abc'].percentile(0.75) - df['abc'].percentile(0.25) cond1 = (df['abc'] >…
SecY
  • 307
  • 4
  • 12
0
votes
1 answer

Filter outliers with IQR and groupby in for loop, python

I would like to filter outliers by categories. For each column (fat_100g...) and each category from ['main_category_fr'] i would like to filter with the IQR method My dataframe df : I have done this : nutriments = ["fat_100g", "carbohydrates_100g",…
Giordano
  • 37
  • 6
0
votes
1 answer

Interquartile range for categorical data

I have been asked to report the descriptive statistics of my results in terms of IQR and median for my categorical variables but I do not know how I can do that! I know the logic but in continuous data. Can anyone explain how to calculate that on…
Aura
  • 49
  • 7
0
votes
2 answers

Outlier Elimination in Spark With InterQuartileRange Results in Error

I have the following recursive function that determines the Outlier using the InterQuartileRange method: def interQuartileRangeFiltering(df: DataFrame): DataFrame = { @scala.annotation.tailrec def inner(cols: List[String], acc: DataFrame):…
joesan
  • 13,963
  • 27
  • 95
  • 232
0
votes
1 answer

Change inter quartile range(IQR) for boxchart

I had to use only boxchart for my application where I need to change Inter Quartile Range(IQR) from default range i.e. 25% for the lower and 75% is the upper quartiles respectly. I see changing of IQR in boxplot using the Outliers Tag box object…
Bhaskar
  • 478
  • 4
  • 9
0
votes
1 answer

exclude zeros in Numpy quantile calculation of rows of an array

I have a 2D-array with zero values in each row. [[5, 3, 2, 0, 0, 1, 6, 9, 11, 1, 4, 1], [0, 0, 12, 0, 1, 0, 0, 2, 0, 30, 2, 2], [120, 2, 10, 3, 0, 0, 2, 7, 9, 5, 0, 0]] Is there a way to calculate the 0.75 quantile of each row by excluding the…
Chong Onn Keat
  • 520
  • 2
  • 8
  • 19
0
votes
1 answer

Define Function to Remove Outliers

I created a function to remove outliers data like this: def remove_outliers(data): numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64'] data = data.select_dtypes(include=numerics) for i in data.columns: Q1 =…
0
votes
1 answer

How to identify outliers with several grouping

I am trying identify outliers from my data set from specific relabs column, but I need to calculate them in Control column with values 1 and 2 separately where conc column equals "NK" also grouping by Treatment. Data set with reprex (should have 40…
Simona
  • 87
  • 2
  • 8
0
votes
2 answers

How do i remove outliers in a datset that has both categorical and numerical data?

I'm trying to remove outliers from the 'Price' column in a dataset. I have been able to create a data frame of the outliers with their corresponding values in other columns but I'm struggling to exclude these entries from the parent dataset. How do…
0
votes
1 answer

Removing outliers with PCA in multidmension (100+) cluster problem

I have two dataframes that I need to clusterize where I am trying to do the following: Apply PCA to remove outliers and use PCA with 3 components to visualize it.I am using a total of explained variance of 97,5% for the outlier removal…
Ricardo
  • 1
  • 2