0

I would like to compute some distributional similarity on running text.

There is a nice function in package Quanteda called fcm, which creates a co-occurence matrix from text. For example:

txt <- c("The quick brown fox jumped over the lazy dog.",
"The dog jumped and ate the fox.")
toks <- tokens(char_tolower(txt), remove_punct = TRUE)
fcm(toks, context = "document")
fcm(toks, context = "window", window = 3)

(this example comes from Quanteda documentation (last updated 15 April 2018): https://cran.r-project.org/web/packages/quanteda/quanteda.pdf)

I would like to apply the functions of the package Wordspace (see documentation , last updated August 2016: https://cran.r-project.org/web/packages/wordspace/wordspace.pdf) to a co-occurrence matrix built with Quanteda's fcm.

In particular, I am interested in the Wordspace function dsm_score (page 24). See this example:

model <- DSM_TermTerm
model$M # raw co-occurrence matrix
model <- dsm.score(model, score="MI")
round(model$S, 3) # PPMI scores

My problem is that I cannot apply these instructions to a co-occurrence matrix build with Quanteda "fcm".

Does anybody know how to "bridge" the two packages? The conversion via as.dsm does not encompass an object fcm at present.

Thanks in advance for your suggestions.

Cheers, marina

Marina Santini
  • 99
  • 1
  • 3
  • 12
  • 1
    You can use `as(x, "dgCMatrix")` and `as.dfm()` to move between R and **quanteda** objects. See https://rawgit.com/koheiw/workshop-LSECSS/master/slides.html – Kohei Watanabe Apr 21 '18 at 18:30
  • Hi Kohei, thanx a lot. I tried to open your slides, but I can display only the title slide (ie slide # 1). Is the presentation somehow protected? – Marina Santini Apr 22 '18 at 07:25
  • 1
    No. Right arrow key on your keyboard will take to the next page. The original Rmd is in github.com/koheiw/workshop-LSECSS – Kohei Watanabe Apr 22 '18 at 13:20

0 Answers0