Questions tagged [outliers]

An outlier is an observation that appears to be unusual or not well described relative to a simple characterization of a dataset.

Overview

Outliers are not necessarily bad or wrong, nor do they need to be removed from data for further analysis. However, outliers (of which there can be more than one in any set of data) indicate that some data at least appear to differ from the bulk of the dataset, suggesting they should be individually examined and understood. Also, some statistical procedures are sensitive to outliers: this means that removal of one or more outliers could substantially change the conclusions of those procedures.

Tag usage

Consider whether the question would be more suitable on Stack Overflow SE (programming-related) or Cross Validated SE (statistics-related).

In scientific software for statistical computing and graphics, function boxplot.stats provides a basic method for detecting outliers.

1199 questions
7
votes
3 answers

Univariate outlier detection

This time I won't be asking a direct question on how to detect outliers as I did before in one of my questions. I did read some posts related to this topic but didn't get what I needed. I have a set of values which are given below: y<-c(0.59, 0.61,…
Shahzad
  • 1,999
  • 6
  • 35
  • 44
6
votes
1 answer

Isolation Forest in Python

I am currently working on detecting outliers in my dataset using Isolation Forest in Python and I did not completely understand the example and explanation given in scikit-learn documentation Is it possible to use Isolation Forest to detect outliers…
Nnn
  • 191
  • 3
  • 9
6
votes
0 answers

Remove outliers in multiple columns from a spark dataframe

I have a dataset of around 10 integer features and I wish to remove outliers from my dataset, from each feature. What I have done in the past, is compute average and standard deviation for each feature and do a pass on the dataset, with discarding…
disha
  • 61
  • 4
6
votes
4 answers

Identifying the outliers in a data set in R

So, I have a data set and know how to get the five number summary using the summary command. Now I need to get the instances above the Q3 + 1.5IQR or below the Q1 - 1.5IQR, since these are just numbers - how would I return the instances from a data…
Diante
  • 139
  • 3
  • 3
  • 12
6
votes
1 answer

Boxplot : Outliers Labels Python

I'm making a time series boxplot using seaborn package but I can't put a label on my outliers. My data is a dataFrame of 3 columns : [Month , Id , Value] that we can fake like that : ### Sample Data ### Month = numpy.repeat(numpy.arange(1,11),10) Id…
KB23
  • 63
  • 1
  • 6
6
votes
2 answers

R - Approach to find outliers/artefacts in blood pressure curve

Do you guys have an idea how to approach the problem of finding artefacts/outliers in a blood pressure curve? My goal is to write a program, that finds out the start and end of each artefact. Here are some examples of different artefacts, the green…
Borsi
  • 301
  • 1
  • 2
  • 11
6
votes
3 answers

Label outliers using mvOutlier from MVN in R

I'm trying to label outliers on a Chi-square Q-Q plot using mvOutlier() function of the MVN package in R. I have managed to identify the outliers by their labels and get their x-coordinates. I tried placing the former on the plot using text(), but…
Fato39
  • 746
  • 1
  • 11
  • 26
6
votes
2 answers

R: How to remove outliers from a smoother in ggplot2?

I have the following data set that I am trying to plot with ggplot2, it is a time series of three experiments A1, B1 and C1 and each experiment had three replicates. I am trying to add a stat which detects and removes outliers before returning a…
John
  • 5,139
  • 19
  • 57
  • 62
6
votes
9 answers

How to detect outliers in an ArrayList

I'm trying to think of some code that will allow me to search through my ArrayList and detect any values outside the common range of "good values." Example: 100 105 102 13 104 22 101 How would I be able to write the code to detect that (in this…
Ashton
  • 119
  • 3
  • 4
  • 14
6
votes
6 answers

Labeling outliers on boxplot in R

I would like to plot each column of a matrix as a boxplot and then label the outliers in each boxplot as the row name they belong to in the matrix. To use an…
user1836894
  • 293
  • 2
  • 5
  • 18
6
votes
3 answers

How can I identify the labels of outliers in a R boxplot?

The R boxplot function is a very useful way to look at data: it quickly provides you with a visual summary of the approximate location and variance of your data, and the number of outliers. In addition, I'd like to identify the outliers, in order to…
static_rtti
  • 53,760
  • 47
  • 136
  • 192
5
votes
1 answer

detecting outliers in a sparse distribution?

i would like to find what the best way to detect outliers is. here is the problem and some things which probably will not work. let's say we want to fish out some quasi-uniform data from a dirty varchar(50) column in mysql. let's start by doing an…
leeoniya
  • 1,071
  • 1
  • 9
  • 25
5
votes
3 answers

Real time detection of peaks of frequency of events

In a web application, I get a trigger every time an event occurs. I want to detect 'violent' frequency peaks, which probably translate into abnormal behaviour. I can think of two naive ways of achieving that: Fixed threshold - "If more than 500…
sawidis
  • 201
  • 3
  • 5
5
votes
0 answers

Evaluate multiple Isolation Forest estimators during GridSearchCV with custom scorer function

I have a sample of values that don't have a y target value. Actually, the X features (predictors) are all used to fit the Isolation Forest estimator. The goal is to identify which of those X-features and the ones to come in the future are actually…
NikSp
  • 1,262
  • 2
  • 19
  • 42
5
votes
2 answers

How to detect univariate outliers and mark as TRUE or FALSE in new column

I have a dataframe with 30 columns and >10,000 rows. How can I run an outlier analysis for a set of variables that will return a TRUE if ANY of the variables exceed the particular threshold (for that given variable), or FALSE if the respective…
stat.chat
  • 53
  • 3
1 2
3
79 80