0

I'm populating distribution testa with observations test_idxs

test_idxs <- matrix(sample(c(1,2,3), 300000, repl=T), ncol=3)
testa_for_looped <- array(0, c(3,3,3))
testa_vectorized <- array(0, c(3,3,3))
system.time( for (i in 1:nrow(test_idxs)) { testa_for_looped[rbind(test_idxs[i,])] <- testa_for_looped[rbind(test_idxs[i,])] + 1 } )  ## slower
system.time( testa_vectorized[test_idxs] <- testa_vectorized[test_idxs] + 1  ) ### faster
sum(testa_for_looped) ### right
sum(testa_vectorized) ### wrong

Vectorized solutions are faster, but this one is way broken, and all the solutions I come up with are slower than a for loop. What would you do?

enfascination
  • 1,006
  • 9
  • 20
  • 2
    Do not create new question if you are not satisfied with answer to the same question you asked previously – CHP Nov 06 '13 at 12:33
  • `test_idxs` has duplicates, which is why the matrix indexing does not give the same result. If you aggregated your indices first, you would only have to assign their respective frequencies. It will hopefully be a lot faster than a for loop. – flodel Nov 06 '13 at 12:50
  • geektrader, this is a different question. That one was about doing lots of repetitive matrix indexing in the context of a for-loop, this one is about breaking out of for-loops entirely. It was motivated by one of the answers to the other question --- it didn't occur to me before that that I might not need a for loop at all. – enfascination Nov 06 '13 at 13:09

0 Answers0