0

I want to create a matrix in R based on data in a text file. The data in the text file looks like this:

ID1, X, 2
ID1, Y, 1
ID1, Z, 2
ID2, X, 1
ID2, Z, 1
ID3, A, 2

What I would like to do is convert this into a matrix that looks like this:

ID,  A,  X,  Y,  Z
ID1, NA, 2,  1,  1
ID2, NA, 1,  NA, 1
ID3, 2, NA, NA, NA

I know that if I had this format:

ID1, X
ID1, X
ID1, Y
ID1, Z
ID2, X
ID2, Z
ID3, A
ID3, A

I could use the table command and get what I am looking for but with the solutions I have seen they don't have the data formatting that I have. I can't use these solutions because the data that I'm working with has over 500 million rows in the expanded format if it is not condensed into the format I have above.

How can I create this matrix and what is the most efficient way to do it? Condensed my data is approximately 3 million rows.

Alexander
  • 105,104
  • 32
  • 201
  • 196
RDizzl3
  • 318
  • 3
  • 13

2 Answers2

2

Same with tidyr

df <- read.table(text = "ID1, X, 2
ID1, Y, 1
ID1, Z, 2
ID2, X, 1
ID2, Z, 1
ID3, A, 2", sep = ",", col.names=c("ID","Code","Value"))
library(tidyr)
spread(df,Code, Value )
## ID  A  X  Y  Z
## 1 ID1 NA  2  1  2
## 2 ID2 NA  1 NA  1
## 3 ID3  2 NA NA NA
HubertL
  • 19,246
  • 3
  • 32
  • 51
0

Right after I posted this question I found this:

library(reshape)
x<-rbind(c('ID1','X',2),c('ID1','Y',1),c('ID1','Z',1),c('ID2','Y',2),c('ID2','Z',3),c('ID3','A',8))
colnames(x)<-c('ID','Code','Value')
cast(x,ID~Code)

ID    A    X    Y    Z
1 ID1 <NA>    2    1    1
2 ID2 <NA> <NA>    2    3
3 ID3    8 <NA> <NA> <NA>

I have not yet tried it on my large dataset so if anybody knows of something more efficient than this it would be great to see other suggestions!

RDizzl3
  • 318
  • 3
  • 13