I have a dataframe df with this structure :
Rank Review
5 good film
8 very goood film
..
Then I tried to create a DocumentTermMatris using quanteda package :
temp.tf <- df$Review %>% tokens(ngrams = 1:1) %>% # generate tokens
+ dfm %>% # generate dfm
+ convert(to = "tm")
I get this matrix :
> inspect(temp.tf)
<<DocumentTermMatrix (documents: 63023, terms: 23892)>>
Non-/sparse entries: 520634/1505224882
Sparsity : 100%
Maximal term length: 77
Weighting : term frequency (tf)
Sample :
Whith this structure :
Terms
Docs good very film my excellent heart David plus always so
text14670 1 0 0 0 1 0 0 0 2 0
text19951 3 0 0 0 0 0 0 1 1 1
text24305 7 0 2 1 0 0 0 2 0 0
text26985 6 0 0 0 0 0 0 4 0 1
text29518 4 0 1 0 1 0 0 3 0 1
text34547 5 2 0 0 0 0 2 3 1 3
text3781 3 0 1 4 0 0 0 3 0 0
text5272 4 0 0 4 0 5 0 3 1 2
text5367 3 0 1 3 0 0 1 4 0 1
text6001 3 0 9 1 0 6 0 1 0 1
So I think It is good , but I think that : text6001 , text5367, text5272 ... refer to document's name... My question is that rows in this matrix are ordered? or randoms putted in the matrix?
Thank you
EDIT :
I created a document term frequency :
mydfm <- dfm(df$Review, remove = stopwords("french"), stem = TRUE)
Then, I created a tf-idf matrix :
tfidf <- tfidf(mydfm)[, 5:10]
Then I would like to merge the tfidf matrix to the Rank column to have something like this
features
Docs good very film my excellent heart David plus always so Rank
text14670 1 0 0 0 1 0 0 0 2 0 3
text19951 3 0 0 0 0 0 0 1 1 1 2
text24305 7 0 2 1 0 0 0 2 0 0 4
text26985 6 0 0 0 0 0 0 4 0 1 5
Can you help to make this merge?
Thank you