1

I am trying to create a function in R. The function should find outliers from a matrix using z score. The function should have two arguments as input (x which is a matrix and zs which is an integer). For each raw of the matrix, the function should calculate the zscore for each element and if zscore is bigger than zs or smaller than -zs, then the function should print that element. I know that I can use:

z<- (x-mean(x))/sd(x)   or  z<- scale(x) 

for the calculations of z score but as I am a beginner in programming, I find it difficult to solve the problem because of the matrix.

Kevin Panko
  • 8,356
  • 19
  • 50
  • 61
Alan
  • 13
  • 1
  • 3

2 Answers2

2

How about this code:

set.seed(1)
mat <- matrix(rnorm(100), ncol=10)
temp <- abs(apply(mat, 1, scale))
mat[temp > 2]
### [1]  1.9803999  0.2670988 -1.2765922

I took 2 standard deviations for your Z limit. First i create a random matrix. Then i then scale it row by row (the '1' argument of the apply function) I apply 'abs' to avoid having to test on both sides (< and >), since the test is symetric Eventually it gives you the outlier values. But you also might want to see where they are, just do:

image(temp > 2)

enter image description here

EDIT: If you need it as a function inputting x and zs, i wrapped it:

outliers = function(x, zs) {
  temp <- abs(apply(x, 1, scale))
  return(x[temp > zs])
}

### > outliers(matrix(rnorm(100), ncol=10), 2)
### [1]  1.9803999  0.2670988 -1.2765922
agenis
  • 8,069
  • 5
  • 53
  • 102
  • HI, Thanks for your instant help. What i need is that user should input x (matrix) and zs and the function should return the outliers (zscore>zs or zscore< -zs). Thanks – Alan Mar 05 '15 at 21:52
  • @Alan What do you mean "should return the outliers"? You want a list of their values or you want their position row/column, or maybe just the initial matrix with TRUE/FALSE? Please specify – agenis Mar 05 '15 at 22:06
  • A list of their values. Thanks – Alan Mar 05 '15 at 22:23
  • Thanks for your replies. Let me check that I understood your code right. We create the function outliers with input x and zs. We create a variable temp which calculates the absolute value of z score for each row of x. Then our function returns the values of the zscores that are bigger of zs. Right? In your example, x is a matrix with 100 random values and 10 columns. zs=2. Right? Then since it must return values above zs, why the return is 1.9803999 0.2670988 -1.2765922? Thanks – Alan Mar 06 '15 at 18:21
  • you got it right. For the "why the return values are not above 2" question, it's just because of the scaling: don't forget that the outliers are the values in the initial matrix, not the scaled matrix! – agenis Mar 06 '15 at 23:21
  • no problem. If one one the answers solves your problem, please accept it or upvote it, so other people can benefit from it. Rgds. – agenis Mar 09 '15 at 07:00
  • I accepted the answer but unfortunately I cannot upvote it because I don't have 15 reputation. Thanks for the help! – Alan Mar 11 '15 at 00:22
0
myfun <- function(x, zs) { 
    x1 <- apply(x, 1, scale)
    x2 <- (abs(x1) - abs(zs)) > 0
    return(x * x2)
}
J. Win.
  • 6,662
  • 7
  • 34
  • 52
  • hi. thanks for your help. so i guess user can input x (matrix) and zs. then x1 takes each row and scales the elements ??? then if x1 (zscore) - zs>0 then why it returns x*x2? Also, so for asking, i know that is a silly question, but how i insert the data? myfun(what i should include)? Thanks – Alan Mar 05 '15 at 21:48
  • Cellwise multiplication by a True/False matrix (x2) results in a matrix that has same dimension as original matrix `x`, with all values of the original matrix that meet the "outlier" criteria. You could use this function by running `outliers <- myfun(x=yourmatrix, zs=yourz)` where `your` represents whatever input values you want to use. – J. Win. Mar 06 '15 at 14:14