Questions tagged [outliers]

An outlier is an observation that appears to be unusual or not well described relative to a simple characterization of a dataset.

Overview

Outliers are not necessarily bad or wrong, nor do they need to be removed from data for further analysis. However, outliers (of which there can be more than one in any set of data) indicate that some data at least appear to differ from the bulk of the dataset, suggesting they should be individually examined and understood. Also, some statistical procedures are sensitive to outliers: this means that removal of one or more outliers could substantially change the conclusions of those procedures.

Tag usage

Consider whether the question would be more suitable on Stack Overflow SE (programming-related) or Cross Validated SE (statistics-related).

In scientific software for statistical computing and graphics, function boxplot.stats provides a basic method for detecting outliers.

1199 questions
5
votes
4 answers

Is there function that can remove the outliers?

I find a function to detect outliers from columns but I do not know how to remove the outliers is there a function for excluding or removing outliers from the columns Here is the function to detect the outlier but I need help in a function to…
swe2010
  • 91
  • 1
  • 6
5
votes
1 answer

Isolation Forest : Categorical data

I am trying to detect anomalies in a breast cancer dataset using Isolation Forest in sklearn. I am trying to apply Iolation Forest to a mixed data set and it gives me value errors when I fit the model. This is my dataset :…
5
votes
1 answer

number of rows of result is not a multiple of vector length (arg 2) in R

I have new question related with this my topic deleting outlier in r with account of nominal var. In new case variables x and x1 has different lenght x <- c(-10, 1:6, 50) x1<- c(-20, 1:5, 60) z<- c(1,2,3,4,5,6,7,8) bx <- boxplot(x) bx$out bx1 <-…
San.O
  • 87
  • 1
  • 2
  • 6
5
votes
5 answers

R Language - Sorting data into ranges; averaging; ignore outliers

I am analyzing data from a wind turbine, normally this is the sort of thing I would do in excel but the quantity of data requires something heavy-duty. I have never used R before and so I am just looking for some pointers. The data consists of 2…
klonq
  • 3,535
  • 4
  • 36
  • 58
5
votes
1 answer

Outlier-Detection in scikit-learn using Transformers in a pipeline

I'm wondering if it is possible to include scikit-learn outlier detections like isolation forests in scikit-learn's pipelines? So the problem here is that we want to fit such an object only on the training data and do nothing on the test data.…
Quickbeam2k1
  • 5,287
  • 2
  • 26
  • 42
5
votes
5 answers

How to remove records from dataframe that fall outside variable-specific ranges? [R]

I have a dataframe and a predictive model that I want to apply to the data. However, I want to filter out records for which the model might not apply very well. To do this, I have another dataframe that contains for every variable the minimum and…
A. Stam
  • 2,148
  • 14
  • 29
5
votes
3 answers

How exactly are outliers removed in R boxplot and how can the same outliers be removed for further calculation (e.g. mean)?

In a boxplot I've set the option outline=FALSE to remove the outliers. Now I'd like to include points that show the mean in the boxplot. Obviously, the means calculated using mean include the outliers. How can the very same outliers be removed from…
Gnark
  • 4,080
  • 7
  • 33
  • 44
5
votes
2 answers

specific outliers on a heat map- matplotlib

I am generating a heat map with data that has a fixed outlier number and I need to show these outliers as a colour out of the colour palette of the cmap I use which is "hot". With the use of cmap.set_bad('green') and np.ma.masked_values(data,…
user2998764
  • 445
  • 1
  • 6
  • 22
5
votes
2 answers

Equivalent of 'range' in boxplot for ggplot2

I am trying to get the whiskers of a ggplot2's geom_boxplot to cover the outliers. The outliers would de facto not be displayed as dots as they are encompassed by the boxplot. If I was using the standard 'boxplot', I would be using: boxplot(x,…
Ant
  • 790
  • 7
  • 18
5
votes
1 answer

Speeding up outliers check on a pandas Series

I am running an outlier check on a pandas Series object with two passes using different standard deviation criteria. However, I use two loops for that and it run extremely slow. I wonder if there is any pandas "tricks" to speed-up this step. Here…
ocefpaf
  • 569
  • 4
  • 15
5
votes
2 answers

Different results from LOF implementation in ELKI and RapidMiner

I have written my own implementation of LOF and I'm trying to compare results with the implementations in ELKI and RapidMiner, but all 3 give different results! I'm trying to work out why. My reference dataset is one-dimensional, 102 real values…
Michael D.
  • 195
  • 1
  • 9
5
votes
1 answer

Access outlier ids in lme plot

I'm plotting an lme fit object in r and get outlier ids (studyID) displayed on the graph, but I'd like to access these IDs automatically by looking them up in the plot object. I cannot figure out how to do this. I'm doing many analyses, thus it…
user1895891
  • 125
  • 1
  • 5
5
votes
2 answers

Extract rows with highest and lowest values from a data frame

I'm quite new to R, I use it mainly for visualising statistics using ggplot2 library. Now I have faced a problem with data preparation. I need to write a function, that will remove some number (2, 5 or 10) rows from a data frame that have highest…
Paweł Rumian
  • 3,676
  • 3
  • 21
  • 27
5
votes
5 answers

How do I tell R to remove the outlier from a correlation calculation?

How do I tell R to remove an outlier when calculating correlation? I identified a potential outlier from a scatter plot, and am trying to compare correlation with and without this value. This is for an intro stats course; I am just playing with this…
Beth
  • 63
  • 1
  • 5
4
votes
2 answers

OpenCV Surf and Outliers detection

I know there are already several questions with the same subject asked here, but I couldn't find any help. So I want to compare 2 images in order to see how similar they are and I'm using the well known find_obj.cpp demo to extract surf descriptors…
user1148222
  • 73
  • 2
  • 6