I have 2 documents doc1.txt
and doc2.txt
. The contents of these 2 documents are:
#doc1.txt
very good, very bad, you are great
#doc2.txt
very bad, good restaurent, nice place to visit
I want to make my corpus separated with ,
so that my final DocumentTermMatrix
becomes:
terms
docs very good very bad you are great good restaurent nice place to visit
doc1 tf-idf tf-idf tf-idf 0 0
doc2 0 tf-idf 0 tf-idf tf-idf
I know, how to calculate DocumentTermMatrix
of individual words (using http://scikit-learn.org/stable/modules/feature_extraction.html) but don't know how to calculate DocumentTermMatrix
of strings
in Python.