0

How can I select rows from one matrix which don't match the rows from another matrix. The case is that I want to train a model over a sample of my data and validate over the other part of the data. Thanks in advance.

  • One option is convert the matrices to data.frame and use `?anti_join` from `library(dplyr)` – akrun Apr 03 '15 at 20:53
  • If you're creating the initial sample of rows, then you can simply use that to isolate two mutually exclusive matrices in the first place and avoid this problem entirely. – Thomas Apr 03 '15 at 20:58
  • Or add a column to your data that is either "test" or "train" (or 1 or 0) and just feed subsets to your model. – Gregor Thomas Apr 03 '15 at 21:25

1 Answers1

1

You can use indexing for that (as hinted by Thomas). Say you have a 2000 rows matrix and want to randomly select half of it:

# Create the matrix
my.matrix <- matrix(rnorm(4000),nrow = 2000)

# Create a vector of 1000 row numbers
selection <- sample(1:2000, size = 1000)

# Create the 2 mutually exclusive matrices
matrix.1 <- my.matrix[selection,]
matrix.2 <- my.matrix[-selection,]
Dominic Comtois
  • 10,230
  • 1
  • 39
  • 61