5

What is most efficient way to replace all zeros in matrix with NAs?

What I do:

my_matrix[my_matrix==0] <- NA

I need it for recommender system (recommenderlab). Filling NAs take same time as building recommender system.

EDIT 1:

dim(my_matrix) ~ 500000x500

Where zeros are ~90%.

Aleksandro M Granda
  • 665
  • 1
  • 8
  • 13
  • 2
    I would think `my_matrix[!my_matrix] <- NA` to be faster. (not tested). Also, you can check `?replace` – akrun May 19 '15 at 15:11
  • 1
    How big is your `my_matrix`? I tried on a `5000*5000` and the system.tiime using your method and the `!my_matrix` was 0.470 vs. 0.150 – akrun May 19 '15 at 15:19
  • 1
    I wonder how your `replace` solution would work @akrun. `replace(my_matrix, my_matrix %in% 0, NA)` – Pierre L May 19 '15 at 15:20
  • 1
    I would wrap it with `which` `replace(my_matrix, which( my_matrix==0), NA)` to get some efficiency. Also, I am not sure if looping is more efficient. For data.frames, `lapply(dataset, function(x) replace(x, which(x==0), NA))` could be faster – akrun May 19 '15 at 15:22
  • 1
    There is not enough information in this question. How big is your matrix? I would expect the method proposed in the question to be very efficient. – Roland May 19 '15 at 15:33
  • @akrun I tried to measure it, the only discrepancy might be that some of the functions print the result and others just assign it. How does that change the benchmarking? It looks like replace2 wins. – Pierre L May 19 '15 at 15:36
  • @plafort That might be an issue here, but I would wait for the OP to comment what works best for him. – akrun May 19 '15 at 15:46
  • Thanks for your answers! Add some additional info about matrix (EDIT 1). Will try recommendations. – Aleksandro M Granda May 19 '15 at 16:32
  • Maybe don't solve this problem. Instead, create your matrix as `NA` and fill in the 10% after-the-fact. Or store in a sparse matrix if that's suitable. – Frank May 19 '15 at 17:10

1 Answers1

12

Answers and a benchmark

my_matrix <- matrix(1:5e5, ncol=50)
my_matrix[4000:5000, 3:10] <- 0

library(microbenchmark)
microbenchmark(
  insubset     = my_matrix[my_matrix %in% 0],
  replace1     = replace(my_matrix, my_matrix %in% 0, NA),
  replace2     = replace(my_matrix, which( my_matrix==0), NA),
  Aleksandro   = my_matrix[my_matrix==0] <- NA,
  excloperator = my_matrix[!my_matrix] <- NA,
  is.na        = is.na(my_matrix) <- which(my_matrix == 0)
)

Unit: milliseconds
         expr       min        lq      mean    median        uq        max neval
     insubset 22.579762 22.890431 26.197510 23.453346 25.210976 151.957848   100
     replace1 21.630386 23.621707 27.573375 25.643425 26.225683 104.389554   100
     replace2  3.979487  4.069095  4.872796  4.159493  6.449839   8.887427   100
   Aleksandro 12.787962 13.100210 14.837055 13.689376 14.098338  96.258866   100
 excloperator 11.894246 12.275969 13.541593 13.011391 15.144429  17.307862   100
        is.na  7.642823  8.901978   15.7352  9.342954  10.13166   68.31235   100
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
Pierre L
  • 28,203
  • 6
  • 47
  • 69
  • 1
    @akrun, I tried it on a matrix with ~90% 0s and the same results still hold: `replace2` is still ~3x as fast as `Aleksandro` and `excloperator`. Any idea why `which` is still faster? (This is the matrix I used: `ln <- 5e5; my_matrix <- matrix(runif(50*ln), nrow = ln); my_matrix<- replace(my_matrix, which(my_matrix < 0.9), NA);`) – adilapapaya May 20 '15 at 05:01
  • 2
    @adilapapaya Thanks for testing. I think by using `which` we are getting the positions instead of having a huge logical matrix – akrun May 20 '15 at 05:25
  • `replace2` is the best one. Thanks! – Aleksandro M Granda May 20 '15 at 09:34