I have a dataset with 100,000 rows in a transaction format as below
B038-82C81778E81C Toy Story
B038-82C81778E81C Planet of the apes
B038-82C81778E81C Iron Man
9C05-EE9B44E8C18F Bruce Almighty
9C05-EE9B44E8C18F Iron Man
9C05-EE9B44E8C18F Toy Story
8F59-9956070D8005 Toy Story
8F59-9956070D8005 Gravity
8F59-9956070D8005 Iron Man
8F59-9956070D8005 Gone
B52F-9936734525AF Planet of the Apes
B52F-9936734525AF Bruce Almighty
I want to convert it in a matrix format as below (or TRUE/ FALSE Flag)
Matrix Toy Story Planet of the Apes Iron Man Bruce Almighty Gone Gravity
B038-82C81778E81C 1 1 1 0 0 0
9C05-EE9B44E8C18F 1 0 1 1 0 0
8F59-9956070D8005 1 0 1 0 1 1
B52F-9936734525AF 0 1 0 1 0 0
I have tried the following steps
TrnsDataset1<-read.transactions("~/Desktop/movieswid_1Copy.txt", format= c("single"), sep="\t", cols = c(1,2), rm.duplicates=TRUE);
L <- as(TrnsDataset1,"list");
M <- as(L,"matrix")
CM<- as (M,"ngCMatrix");
But, in my List conversion I am getting the output as
B038-82C81778E81C c("Toy Story\nB038-82C81778E81C\tPlanet of the apes\nB038-82C81778E81C\tIron Man")
9C05-EE9B44E8C18F c("Bruce Almighty","Iron Man","Toy Story")
So some rows are perfect but in some the Unique id is being added in the movie list with \t and \n
I want the list in the below format 9C05-EE9B44E8C18F c("Bruce Almighty","Iron Man","Toy Story")
this way I believe I will be easily achieve the required result. Would really appreciate your help.