I am computing cosine similarity over two dfm
objects. One is my reference object which has dimensions 5 x 4,728 while the second dfm
is my target object and has dimensions 2,325,329 x 40,595.
What I don't understand is why textstat_simil()
returns NAs. I tried reproducing the "issue" but no luck so far. You can find the data at the following Dropbox links. Be advised that the target dfm
contains only the first document.
This is the code I am using. dfm_match()
augments my reference dfm
to match the number of features of the target object.
library(quanteda)
# make sure you load the two required dfms
reference_dfm = dfm_match(reference_dfm, featnames(target_dfm))
textstat_simil( target_dfm, reference_dfm, method = "cosine")
#> textstat_simil object; method = "cosine"
#> negative slightly_negative neutral slightly_positive positive
#> text1.1 NA NA NA NA NA
Any idea?