0

I am trying to get the position of the first occurrence of a value 0 in a number of binary matrices read in through a number of csv files.

I have got the the number of 0s using...

sapply(files_to_use, function(x) sum(x == 0))

After reading in all csv files using...

reading_in_csv <- list.files(pattern="*.csv")
files_to_use <- lapply(reading_in_csv, read.delim)

I have tried the following code but get the error 'dim(X) must have a positive length'...

find_first_0 <- function(x){which(x = 0)}
apply(files,1,find_first_0)

Would anyone have any insight on the above. I was thinking of the function which() to get the position but I have no understanding with how to implement it with a number of matrices at once.

Given example matrix...

dimMat <- matrix(0, 1000, 10)

for(i in 1:1000){
  dimMat[i, ] <- sample(c(0,1), 10, replace = TRUE, prob = c(.3, .7))
}

print(dimMat)
Lynda
  • 141
  • 7
  • it's just that you used apply instead of sapply – carlo Oct 13 '19 at 13:12
  • Provided your ```dimMat```, what is your expected output? Also, please ```set.seed(123)``` before the ```for``` loop so that it is reproducible. – Cole Oct 13 '19 at 13:15
  • The real problem here is that the list also has objects with 0 dims. The following code solves this issue and produces a warning free rowwise search to finding the 0 values, outputting the result as a vector: delete_empty_matrices <- function(matrix_list){ matrix_list[unlist(lapply(matrix_list, length) != 0)] } files_to_use <- files_to_use[!(is.na(delete_empty_matrices(files_to_use)))] sapply(files_to_use, function(x){apply(x, 1, function(y){ifelse(length(y) > 0, suppressWarnings(min(which(y == 0))), NA)})}) – hello_friend Oct 14 '19 at 00:41

2 Answers2

0

It is ugly but i think this is what you are after:

delete_empty_matrices  <-  function(matrix_list){   
  matrix_list[unlist(lapply(matrix_list, length) != 0)]
}

files_to_use <- files_to_use[!(is.na(delete_empty_matrices(files_to_use)))]

sapply(files_to_use, function(x){apply(x, 1, function(y){ifelse(length(y) > 0,
                                                                suppressWarnings(min(which(y == 0))), NA)})})
hello_friend
  • 5,682
  • 1
  • 11
  • 15
  • Given an example matrix in my answer, I have tried this and I am unsure with the result. I get all positions of 0s but would like just the position of first 0. – Lynda Oct 13 '19 at 09:34
  • Also I tried this by just reading in one binary matrix csv file and I get the following output... $X0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0 integer(0) – Lynda Oct 13 '19 at 09:46
  • You are correct apologies please try: find_first_0 <- function(x){min(which(x == 0))} apply(dimMat, 2, find_first_0) – hello_friend Oct 13 '19 at 09:51
  • I get the following error... dim(X) must have a positive length – Lynda Oct 13 '19 at 09:54
  • Does this work ? apply(dimMat, 2, function(x){ifelse(length(x) > 0, min(which(x == 0)), NA)}) – hello_friend Oct 13 '19 at 09:57
  • No sorry @hello_friend. The above is still not working for me – Lynda Oct 13 '19 at 10:01
  • set.seed(200) dimMat <- matrix(0, 1000, 10) for(i in 1:1000){ dimMat[i, ] <- sample(c(0,1), 10, replace = TRUE, prob = c(.3, .7)) } find_first_0 <- function(x){min(which(x == 0))} apply(dimMat, 2, function(x){ifelse(length(x) > 0, min(which(x == 0)), NA)}) – hello_friend Oct 13 '19 at 10:05
  • With the above code I get the following output [1] 9 5 6 3 4 1 1 4 2 5 for 'dimMat' but the first 0 is set in row 1, column 6? When I try this code on all files using the files variable reading in all csvs I still get the error 'dim(X) must have a positive length' – Lynda Oct 13 '19 at 10:12
  • This function should be interpreted as in column 1, the first 0 is in row 9, then in column 2 the first zero is in row 5 etc. Do you need it work rowwise ? If so change the 2 in the apply function to a 1 – hello_friend Oct 13 '19 at 10:19
  • The reason for you error is that some of the files you have read in have no data in them i working a function to first remove these from the list and then apply the above function – hello_friend Oct 13 '19 at 10:20
  • I want this done by rows so I changed the 2 to 1 in the apply function but once ran it throws this ... 'There were 33 warnings". So I used warnings() to see them and it's stating this ... "In min(which(x == 0)) : no non-missing arguments to min; returning Inf". How could I fix this? – Lynda Oct 13 '19 at 10:27
  • Please try the above code in the edited answer and see if it works – hello_friend Oct 13 '19 at 10:45
  • FYI the reason for this warning is because that row does not contain 0. – hello_friend Oct 13 '19 at 11:27
0

Here are a couple of ways to get the row and column indices of the first record per row which is 0.

aggregate(col ~ row,
          data = which(dimMat == 0, arr.ind = T),
          FUN = function(x) x[1])

complete_rows <- rowSums(dimMat) < ncol(dimMat)

cbind(row = seq_len(nrow(dimMat))[complete_rows],
      col = apply(dimMat == 0, 1, which.max)[complete_rows])

To find the first record per column which is 0 it would be very similar:

aggregate(row ~ col,
          data = which(dimMat == 0, arr.ind = T),
          FUN = function(x) x[1])

complete_cols <- colSums(dimMat) < nrow(dimMat)

cbind(col = seq_len(ncol(dimMat))[complete_cols],
      row = apply(dimMat == 0, 2, which.max)[complete_cols])
Cole
  • 11,130
  • 1
  • 9
  • 24