-1

I try to create a function to inject outliers to an existing data frame.

I started creating a new dataframe outsusing the maxand minvalues of the original dataframe. This outs dataframe will containing a certain amountof outliered data. Later I want to inject the outliered values of the outs dataframe to the original dataframe.

What I want to get is a function to inject a certain amount of outliers to an original dataframe. I have different problems for example: I do know if I am using correctly runif to create a dataframe of outliers and second I do not know how to inject the outliers to temp

The code I've tried until now is:

addOutlier <- function (data, amount){
maxi <- apply(data, 2, function(x) (mean(x)+(3*(sd(x)))))
mini <- apply(data, 2, function(x) (mean(x)-(3*(sd(x)))))
temp <- data
amount2 <- ifelse(amount<1, (prod(dim(data))*amount), amount)
outs <- runif(amount2, 2, min = mini, max = maxi) # outliers
if (amount2 >= prod(dim(data))) stop("exceeded data size")
 for (i in 1:length(outs))
   temp[sample.int(nrow(temp), 1), sample.int(ncol(temp), 1)] <- outs
 return (temp)
} 

Please any help to make this work, will be deeply appreciated

mina
  • 195
  • 1
  • 2
  • 14
  • I'm not sure that I understand your goal, but I think you just have an error in your code. `runif(n, mini, maxi)` will give you a `n` values between the two extremes you define. It's unlikely to be an outlier and certainly not guaranteed to be one. – alexwhitworth May 30 '16 at 17:13
  • @Alex I see the problem, Any suggestion of how i can make to guarantee to have outliers – mina May 31 '16 at 09:11
  • You haven't clearly defined what you mean by "outlier".... obviously `runif(n, -Inf, mini)`, `runif(n,maxi, Inf)` would work, but that's probably not realistic. – alexwhitworth May 31 '16 at 16:05
  • NO! You absolutely should **not** grab someone else's answer and claim it as your own without attribution.... your edit also didn't address the questions that I raised. – alexwhitworth May 31 '16 at 17:34

1 Answers1

1

My understanding is that what you're trying to achieve is adding a set amount of outliers to each column in your vector. Alternatively, you seem to also be looking into adding a % of outliers to each column. I wrote down a solution only for the former case, but the latter should pretty easy to implement if you really need it. Note how I broke things down into two functions, to (hopefully) help clarify what is going on. Hope this helps!

add.outlier.to.vector <- function(vector, amount) {
  cells.to.modify <- sample(1:length(vector), amount, replace=F)
  mean.val <- mean(vector)
  sd.val <- sd(vector)
  min.val <- mean.val - 3 * sd.val
  max.val <- mean.val + 3 * sd.val 
  vector[cells.to.modify] <- runif(amount, min=min.val, max=max.val)
  return(vector)
}
add.outlier.to.data.frame <- function (temp, amount){
  for (i in 1:ncol(temp)) {
    temp[,i] <- add.outlier.to.vector(temp[,i], amount)
  }
  return (temp)
} 

data <- data.frame(
  a=c(1,2,3,4),
  b=c(7,8,9,10)
)
add.outlier.to.data.frame(data, 2)
bogdata
  • 88
  • 4
  • Useful function, however is there a way I can keep the original mean value of the dataset, so I can see the injected outliers – mina May 31 '16 at 12:59