I have a code in R which needs to be scaled to use big data. I am using Spark for this and the package that seemed most convenient was sparklyr. However, I am unable to create a TermDocument matrix from a Spark dataframe. Any help would be great.
input_key is the dataframe having the following schema.
ID Keywords
1 A,B,C
2 D,L,K
3 P,O,L
My code in R was the following.
mycorpus <- input_key
corpus <- Corpus(VectorSource(mycorpus$Keywords))
path_matrix <- TermDocumentMatrix(corpus)