-1

I'm trying to apply filters to a large matrix. The matrix "logcounts1" consists of 8978 rows and 4 columns.

The filter should be applied per row, so I could then select those rows in which at least one value is out of the interval. The filter makes use of an interval consisting on the mean of each row as a central value + - the standard deviation. Average and SD of each row is respectively comprised in vectors "Average1" and "SDr1".

I also defined matrices "alpha" for rows in which at least one column value is out of the range of the interval and "beta" to store rows whose values keep in the interval at all times.

for (i in 1:8978) {
if (logcounts1[i,1] > average1 [i]+SDr1[i] | logcounts1[i,2] > average1 [i]+SDr1[i] | logcounts1[i,3] > average1 [i]+SDr1[i] | logcounts1[i,4] > average1 [i]+SDr1[i] | logcounts1[i,1] < average1 [i]+SDr1[i] | logcounts1[i,2] < average1 [i]+SDr1[i] | logcounts1[i,3] > average1 [i]+SDr1[i] | logcounts1[i,4] > average1 [i]+SDr1[i]) {
alpha <- rbind(alpha,logcounts1[i,])
} else {
beta <- rbind(beta, logcounts1[i,])
}
}

I really hope you can help me out guys, I'm quite new on this. Bests

I made an example on excel <- click here

Basically, red cells are values going of the interval (mean+-StandardDeviation). Then rows 1,2 and 5, which have out-of-range values should be stored in a new matrix "alpha", so the output should then be:

Alpha selected matrix

Besides, rows not containing any out-of-range values should also be stored in another matrix ("beta"), of output:

Beta selected matrix

  • 1
    Can you show a small sample input and output? The loop seems unnecessary, but it's hard to be sure without a test case. Just share a sample matrix with 5 rows and the expected output. – Gregor Thomas Apr 13 '18 at 15:56
  • I actually don't know how to share a table here in the comments section... – Fernando Delgado Chaves Apr 13 '18 at 17:09
  • Don't share a table in the comments section. Click the "edit" button and share it in your question. If you have a table `x` in R, you can get a copy/pasteable object definition with `dput(x)`. If you have a big matrix, you can share `dput(head(x))` for the first 6 rows. Or you can just share a definition like `x = matrix(c(...))`. – Gregor Thomas Apr 13 '18 at 17:28
  • Well, I guess I did it my way... – Fernando Delgado Chaves Apr 13 '18 at 17:41
  • Reproducible data is very helpful, e.g. by providing a bit of script to make a 10x4 matrix with some dummy values – rg255 Apr 13 '18 at 17:44
  • 1
    Images of data are useless. I want to be able to *test code*. I don't want to type the numbers from your picture into R. Please post an example using valid R syntax. – Gregor Thomas Apr 13 '18 at 18:08
  • @Fernandodelgadochaves I've updated my answer, if it solves your problem please acknowledge by accepting (and up voting) the answer. If not, comment so I can rectify it. – rg255 Apr 14 '18 at 04:59

1 Answers1

0

I've gone for a non-looping method, instead using subset. The top section is just producing reproducible data. Columns 7 and 8 are where I calculate the lower and upper bounds (mean - SD and mean + SD). I then use range to pull the lowest and highest values of the row in to columns 9 and 10 (it is not necessary to add these as columns but I did to help show you what is happening).

I then use the subset function. The rules for alpha are that either the lowest observed value is less than mean - SD or (|) the highest observed value is greater than mean + SD. The rules for beta are that the lowest observed value is greater than or equal to mean - SD and (&) the highest observed value is less than or equal to mean + SD.

# Dummy Data
df1 <- data.frame(matrix(c(rnorm(40, 0, 1)), ncol = 4))
df1[,5] <- apply(df1[,1:4], 1, mean)
df1[,6] <- sqrt(apply(df1[,1:4], 1, var))

# Add Mean and SD
df1[,7] <- df1[,5] - df1[,6]
df1[,8] <- df1[,5] + df1[,6]

# Get Range of Values of in 1:4
df1[,9]  <- apply(df1[,1:4], 1, range)[1,]
df1[,10] <- apply(df1[,1:4], 1, range)[2,]

# Split
alpha <- subset(df1, df1[,9] <  df1[,7] | df1[,10] >  df1[,8])
beta  <- subset(df1, df1[,9] >= df1[,7] & df1[,10] <= df1[,8])

# Clean up
df1[,c(7:10)] <- NULL
rg255
  • 4,119
  • 3
  • 22
  • 40