Questions tagged [outliers]

An outlier is an observation that appears to be unusual or not well described relative to a simple characterization of a dataset.

Overview

Outliers are not necessarily bad or wrong, nor do they need to be removed from data for further analysis. However, outliers (of which there can be more than one in any set of data) indicate that some data at least appear to differ from the bulk of the dataset, suggesting they should be individually examined and understood. Also, some statistical procedures are sensitive to outliers: this means that removal of one or more outliers could substantially change the conclusions of those procedures.

Tag usage

Consider whether the question would be more suitable on Stack Overflow SE (programming-related) or Cross Validated SE (statistics-related).

In scientific software for statistical computing and graphics, function boxplot.stats provides a basic method for detecting outliers.

1199 questions
-1
votes
1 answer

Inserting outliers to a dataframe

I try to create a function to inject outliers to an existing data frame. I started creating a new dataframe outsusing the maxand minvalues of the original dataframe. This outs dataframe will containing a certain amountof outliered data. Later I…
mina
  • 195
  • 1
  • 2
  • 14
-1
votes
2 answers

Remove outliers with large standardized residuals in Stata

I run a simple regression in Stata for two subsamples and afterwards I want to exclude all observations with standardized residuals larger than 3.0. I tried: regress y x if subsample_criteria==1 gen st_res1=e(rsta) regress y x if…
jeffrey
  • 2,026
  • 5
  • 28
  • 42
-1
votes
2 answers

How can I specify different points in the plot in matlab

I have generated a data set in matlab then some outliers embedding in the data. I would like to plot it and since I'm new in matlab I don't know how to specify the outliers from inliers by different sign or different color. The points which are…
-1
votes
2 answers

Removing outliers in one step

I have a dataset in which there are some outliers due to input errors. I have written a function to remove these outliers from my data frame (source): remove_outliers <- function(x, na.rm = TRUE, ...) { qnt <- quantile(x, probs=c(.25, .75),…
Mash
  • 13
  • 2
  • 6
-1
votes
1 answer

Removing MULTIPLE outliers in regression model in R

this is in R ok so i've used cook distances to identify the points i would like to remove from a dataset of 506 variables that i have. i am able to remove ONE point (number 369) as follows: modelmc1 = lm(housing[-369,14] ~ housing[-369,1] +…
poleworld
-1
votes
1 answer

How to remove outliers above a specified value in R?

I am new to R programming. I have a set of two data series. I need to remove outliers that are above a certain value, for example absolute value of 25. Once those values are identified, they need to be removed from both sets. How would I proceed in…
-1
votes
1 answer

Removing extreme values in moving average (MATLAB)

I have a matrix of measurements: A=[x1,y1;x2,y2;x3,y3] and my device had some interferences, so i want to delete measurements (rows) that are above 10 times the average between the neighboring points (the average y values). example: if…
ValientProcess
  • 1,699
  • 5
  • 27
  • 43
-2
votes
0 answers

What should we do if our target variable contains a significant number of outliers?

If there are outliers in the independent variables, we can either delete them or deal with them using a variety of strategies, such as feature scaling, imputation, binning and trimming. If there are outliers in an independent variable, it does not…
-2
votes
0 answers

In Python detect and replace outliers with their "sector median"

I am trying to create a model and when it comes to the outliers i am stuck. I have a dataframe which includes columns: "Bank Names" (Around 700 banks) "Sectors" (Commercial Bank, Private Bank, Investment Bank, Bank Holdings etc. total 10) Asset…
-2
votes
1 answer

Outlier removal and position approximation from a real time noisy measurement

I am trying to write a robot localization program and i am getting a very noisy measurements and there are several outliers. I am quite new to these subjects so i don't know where to start. Can you suggest me a way to move on? Here is a sample…
Bepo-san
  • 1
  • 1
-2
votes
2 answers

Please guys what might the cause of this error am recieving

TypeError: Cannot perform 'ror_' with a dtyped [bool] array and scalar of type [NoneType] ###I receive this error whenever i run this code. print(df < (Q1 - 1.5 * IQR)) |(df > (Q3 + 1.5 * IQR)) Please what am I not doing right?
-2
votes
1 answer

Outliers in data

I have a dataset like so - 15643, 14087, 12020, 8402, 7875, 3250, 2688, 2654, 2501, 2482, 1246, 1214, 1171, 1165, 1048, 897, 849, 579, 382, 285, 222, 168, 115, 92, 71, 57, 56, 51, 47, 43, 40, 31, 29, 29, 29, 29, 28, 22, 20, 19, 18, 18, 17, 15, 14,…
Aaron
  • 1,345
  • 2
  • 13
  • 32
-2
votes
1 answer

how do I put the statement result in a list?

I am trying to save the 'yes' or 'no' results into a list that is named as outlier here. This is my code d = {'col1': [1, 2, 3, 4, 5], 'Spread': [10, 10.8, 5.0, 4.9,12.3]} df = pd.DataFrame(d) upper_limit = 9 rows = df.index.tolist() outlier =…
Susie
  • 93
  • 6
-2
votes
1 answer

Outlier treatment of large dataset

I am doing a project and have a dataset of 8545 X 52. Every variable has outlier in it and unfortunately I can't remove the outliers. I know the method of capping by checking for IQR of each column but as number of columns is 52 it will take a lot…
-2
votes
1 answer

How can I find means for each row in my dataframe with the outliers removed?

I am relatively new to R, so I'm sorry if this is a stupid question. I have imported a dataframe using readxl of countries and their respective scores over a period of time, here is a section of this dataframe: structure(list(X1 = 2:5, Argent. =…
Harry W
  • 11
  • 4
1 2 3
79
80