Questions tagged [outliers]

An outlier is an observation that appears to be unusual or not well described relative to a simple characterization of a dataset.

Overview

Outliers are not necessarily bad or wrong, nor do they need to be removed from data for further analysis. However, outliers (of which there can be more than one in any set of data) indicate that some data at least appear to differ from the bulk of the dataset, suggesting they should be individually examined and understood. Also, some statistical procedures are sensitive to outliers: this means that removal of one or more outliers could substantially change the conclusions of those procedures.

Tag usage

Consider whether the question would be more suitable on Stack Overflow SE (programming-related) or Cross Validated SE (statistics-related).

In scientific software for statistical computing and graphics, function boxplot.stats provides a basic method for detecting outliers.

1199 questions
12
votes
7 answers

What are the efficient and accurate algorithms to exclude outliers from a set of data?

I have set of 200 data rows(implies a small set of data). I want to carry out some statistical analysis, but before that I want to exclude outliers. What are the potential algos for the purpose? Accuracy is a matter of concern. I am very new to…
Ashish Agarwal
  • 6,215
  • 12
  • 58
  • 91
12
votes
1 answer

Ignore outliers in ggplot2 boxplot + faceting + "free" options

How can I adjust my Y axis in order to ignore outliers, like in this post, but in a more challenging case where I have 4 boxplots and a "free faceting" layout? p <- ggplot(molten.DF,aes(x=class,y=SOC,fill=class)) + geom_boxplot() + …
fstevens
  • 1,287
  • 1
  • 17
  • 28
11
votes
3 answers

Filtering out outliers in Pandas dataframe with rolling median

I am trying to filter out some outliers from a scatter plot of GPS elevation displacements with dates I'm trying to use df.rolling to compute a median and standard deviation for each window and then remove the point if it is greater than 3 standard…
p0ps1c1e
  • 176
  • 2
  • 2
  • 14
10
votes
2 answers

How to repeat the Grubbs test and flag the outliers

I am wanting to apply the Grubbs test to a set of data repeatedly until it ceases to find outliers. I want the outliers flagged rather than removed so that I can plot the data as a histogram with the outliers a different colour. I have used…
Lee_Kennedy
  • 207
  • 1
  • 2
  • 12
9
votes
4 answers

How to replace outliers with the 5th and 95th percentile values in R

I'd like to replace all values in my relatively large R dataset which take values above the 95th and below the 5th percentile, with those percentile values respectively. My aim is to avoid simply cropping these outliers from the data entirely. Any…
Bobbo
  • 95
  • 1
  • 1
  • 5
8
votes
2 answers

Pandas: replace outliers in all columns with nan

I have a data frame with 3 columns, for ex c1,c2,c3 10000,1,2 1,3,4 2,5,6 3,1,122 4,3,4 5,5,6 6,155,6 I want to replace the outliers in all the columns which are outside 2 sigma. Using the below code, I can create a dataframe without…
Sridhar
  • 121
  • 1
  • 2
  • 5
8
votes
1 answer

Include indication of extreme outliers in ggplot

I have some very, very few outliers in my dataset making the boxplots difficult to read: library(ggplot2) mtcars$mpg[1] <- 60 p <- ggplot(mtcars, aes(factor(cyl), mpg)) p + geom_boxplot() Hence, I would like to indicate the extreme outliers like…
chamaoskurumi
  • 2,271
  • 2
  • 23
  • 30
8
votes
1 answer

Search for and remove outliers from a dataframe grouped by a variable

I have a data frame that has 5 variables and 800 rows: head(df) V1 variable value element OtolithNum 1 24.9835 V7 130230.0 Mg 25 2 24.9835 V8 145844.0 Mg 25 3 24.9835 V9 126126.0 Mg …
Kole Stewart
  • 83
  • 1
  • 1
  • 5
8
votes
1 answer

Outlier detection with k-means algorithm

I am hoping you can help me with my problem. I am trying to detect outliers with use of the kmeans algorithm. First I perform the algorithm and choose those objects as possible outliers which have a big distance to their cluster center. Instead of…
user3611933
  • 99
  • 1
  • 1
  • 2
7
votes
3 answers

Identifying statistical outliers with pandas: groupby and reduce rows into different dataframe

I'm trying to understand how to identify statistical outliers in groups of dataframe. I will need to group the rows by the conditions and then reduce those groups into a single row and later find the outliers in all reduced rows. df =…
Aaditya Ura
  • 12,007
  • 7
  • 50
  • 88
7
votes
2 answers

Isolation Forest vs Robust Random Cut Forest in outlier detection

I am examining different methods in outlier detection. I came across sklearn's implementation of Isolation Forest and Amazon sagemaker's implementation of RRCF (Robust Random Cut Forest). Both are ensemble methods based on decision trees, aiming to…
7
votes
1 answer

Outliers using RPCA

I read about using RPCA to find outliers on time series data. I have an idea about the fundamentals of what RPCA is about and the theory. I got a Python library that does RPCA and pretty much got two matrices as the output (L and S), a low rank…
Aragorn
  • 477
  • 1
  • 8
  • 12
7
votes
1 answer

Outlier detection algorithm spark mllib

Is there any pre-built Outlier Detection Algorithm/Interquartile Range identification methods available in Spark 2.0.0 ? I found some code here but i dont think this is available yet in spark2.0.0 Thanks
7
votes
1 answer

Replicator Neural Network for outlier detection, Step-wise function causing same prediction

In my project, one of my objectives is to find outliers in aeronautical engine data and chose to use the Replicator Neural Network to do so and read the following report on it…
Daniel Takyi
  • 159
  • 1
  • 9
7
votes
2 answers

Removing outliers from a k-mean cluster

I have number of smaller data sets, containing 10 XY coordinates each. I am using Matlab (R2012a)and k-means to obtain a centroid. In some of the clusters (see figure below) I can see some extreme points, beacuse my dataset are as small as they are,…
carro
  • 109
  • 1
  • 1
  • 6
1
2
3
79 80