0

Assuming that I have a dataframe such as

x <- round(runif(1000,-5,5), 2)
y <- round(runif(1000,0,5), 2)
z <- sprintf("%s%05d", "A", seq.int(1000))
df <- data.frame(x, y, z)

How can I find which data point (names of the point from column z) is an outlier of a non-linear threshold that looks like this

y = a/(|x|-c)

where a and c are values that I can arbitrary chose?

|x| is the modulus of x

Jaap
  • 81,064
  • 34
  • 182
  • 193
pisistrato
  • 375
  • 2
  • 14
  • 1
    Something like `a <- 1; c <- 1; df[df$y > a / (abs(df$x) - c), "z"];`? You could wrap the `a / abs()...` in a function if you wanted for more readability – Mike H. Jan 03 '18 at 15:00

1 Answers1

1

As mentioned in the comment, you can create a short function for this:

find_outliers = function(df, a, c){
  y_threshold = a/(abs(df$x)-c)
  return(df$z[df$y>y_threshold])
}

a=1
c=0.1
find_outliers(df,a,c)
Kelli-Jean
  • 1,417
  • 11
  • 17