Questions tagged [outliers]

An outlier is an observation that appears to be unusual or not well described relative to a simple characterization of a dataset.

Overview

Outliers are not necessarily bad or wrong, nor do they need to be removed from data for further analysis. However, outliers (of which there can be more than one in any set of data) indicate that some data at least appear to differ from the bulk of the dataset, suggesting they should be individually examined and understood. Also, some statistical procedures are sensitive to outliers: this means that removal of one or more outliers could substantially change the conclusions of those procedures.

Tag usage

Consider whether the question would be more suitable on Stack Overflow SE (programming-related) or Cross Validated SE (statistics-related).

In scientific software for statistical computing and graphics, function boxplot.stats provides a basic method for detecting outliers.

1199 questions
-2
votes
1 answer

How to remove outlier

I'm working on n a regression problem. I have 10 independent variables.I'm using SVR. Despite doing feature selection and tuning SVR parameters Using Grid search, I got huge MAPE which is 15%. So I'm trying to remove outliers but after removing them…
imtiaz ul Hassan
  • 358
  • 3
  • 14
-2
votes
1 answer

How to remove outliers from data set using Cook's distance?

We are required to remove outliers/influential points from the data set in a model. I have 400 observations and 5 explanatory variables. I have tried this: Outlier <- as.numeric(names (cooksdistance)[(cooksdistance > 4 / sample_size))) Where Cook's…
Bonang
  • 71
  • 3
  • 9
-2
votes
2 answers

How to calculate local outlier detection (LOF)

I want to have the correct calculation formula for the local outlier factor (LOF) according to the publication of Breunig & Sander. I have found this formula: LOF = (Average of the lrd of the objects located in the MinPts area) divided through lrd…
user200179
  • 13
  • 3
-2
votes
1 answer

R in counting data

Right now I'm trying to do a bell curve on a file called output9.csv on my. Here is my code, I want to uses z score to detect outliers, and uses the difference between the value and mean of the data set.The difference is compared with standard…
-2
votes
2 answers

How to maintain the order of elements of a row when using by and rbind function in r?

I have written a function which takes a subset of data based on the value of name column.It Computes the outlier for column "mark" and replaces all the outliers. However when I try to combine these different subsets, the order of my elements…
-2
votes
1 answer

Label or score outliers in R

I'm looking for some easy to use algorithms in R to label (outlier or not) or score (say, 7.5) outliers row-wise. Meaning, I have a matrix m that contains several rows and I want to identify rows who represent outliers compared to the other rows. m…
JimBoy
  • 597
  • 8
  • 18
-2
votes
1 answer

Outlier detection in small sets

Is there a good algorithm for detecting outliers in small sets of decimal numbers? The best idea I have come up with so far is a kind of recursive standard deviation based approach, but it seems a bit computationally expensive. I'm using c++, so…
technorabble
  • 391
  • 7
  • 16
-3
votes
1 answer

Elliptic Envelope outlier detection

I want to catch outliers in the 16 x 224 array by using the Elliptic Envelope from sklearn. The problem is that when I predict the array, it gives me different dimension: ell = EllipticEnvelope() ell.fit(c) b = ell.predict(c) C is 16 x 224 as I…
-3
votes
1 answer

How to remove outliers from a data array in R

I would like to locate and remove the outlier in the measurement and replace with a smoothened value to capture the trend better. Please find the figure below Data with outliers
-3
votes
2 answers

Extreme values for combination of variables in R

I have a data set like below. Now my problem is many fold. For each combination of client, task and subtask I want to exclude the top 10% extreme values. I want 2 data sets in out put, one with the extreme values for all the combination and other…
Rakesh
  • 1
  • 1
-3
votes
2 answers

How to process a large file with 30M entries?

First part of my question is, is there a faster way of calculation Standard deviation than mySD = apply(myData,1,sd) Second part of the question is how to remove outliers (3 SD away from the mean of each line) and recalculate the SD for each…
user1007742
  • 571
  • 3
  • 11
  • 20
-4
votes
1 answer

What is the meaning of 50 in command ">mean(c(1:10, 50))"

I have tried it by using different numbers in place of 50 and got different answers. Please can anyone tell me the calculation of this number.
Yasin Luni
  • 15
  • 1
  • 3
-5
votes
1 answer

How I do remove the Outlier using R?

weight<-c(117, 118, 125, 86, 131, 93, 103, 107, 112, 97, 105, 105, 111, 105, 124, 111, 103, 113, 112, 127, 111, 115, 108, 105, 108, 127, 148, 131, 126, 119, 131, …
1 2 3
79
80