quanteda: error with textmodel_wordscores: Error in t(as(x, "dgCMatrix"))

Question

after the new quanteda update I have some problems with my code. I'm using the MigParl data to generate a dfm from parliamentary speeches.

(1) When switching to quanteda for a wordscores analysis I lose my rownames when transforming the dfm from the MigParl matrix to a quanteda:dfm. However, according to the code, this should be taken care of. I solved this by manually adding the rownames back. It's a little annoying, but manageable.

(2) Now I get an error when I want to calculate my wordscores model. It seems to be a problem with the data format of my dfm.

I would be very happy if you have any idea what's going on. Thank you.

Dana

library(polmineR)
use("MigParl")

pb <- partition("MIGPARL", interjection = FALSE, regional_state = "BW", year = 2013:2018) %>%
  partition_bundle(s_attribute = "party")
pb <- pb[[names(pb)[!names(pb) %in% c("", "fraktionslos")] ]]
pb <- enrich(pb, p_attribute = "lemma")
dtm <- polmineR::as.sparseMatrix(pb, col = "count")
dtm <- Matrix::t(dtm)

pg_dfm <- new(
  "dfm",
  i = dtm@i,
  p = dtm@p,
  x = dtm@x,
  Dim = dtm@Dim,
  Dimnames = list(
    docs = dtm@Dimnames$Docs,
    features = dtm@Dimnames$Terms
  )
)

detach("package:polmineR", unload = TRUE)
library(quanteda)
library(quanteda.textmodels)

pg_dfm_red <- quanteda::dfm(pg_dfm)
pg_dfm_trim <- dfm_trim(pg_dfm_red, min_termfreq = 20)
row.names(pg_dfm_trim) <- c("AfD","CDU","FDP","GRUENE","NA","SPD")

Now this is what I used to do:

tmod <- textmodel_wordscores(pg_dfm_trim, c(-1,NA,NA,1,NA, NA))
predict(tmod)

This is what I tried to change after the update:

textmodel_wordscores(x=pg_dfm_trim, y= c(-1,NA,NA,1,NA, NA), scale = "linear", smooth = 0)
predict(tmod)

Both produce this error message:

Error in t(as(x, "dgCMatrix")) : attempt to set index 1/1 in SET_VECTOR_ELT

I am sure that the problem arises from the generation of the dfm. If you want me to provide more information here, I will gladly add this.

A wordfish models works fine for some reason.

wordfish <- textmodel_wordfish(pg_dfm_trim, c(4,1))
textplot_scale1d(wordfish, doclabels = pg_dfm_trim@Dimnames$docs)

Edit: I installed older quanteda versions. The error still occurs

I can't replicate any of that, but you should not create a dfm that way. Use `as.dfm(dtm)` instead. If you want to reassign the document names, use `rownames(pg_dfm) <- ` instead. If you still have the issue, please create a reprex and file an issue on the GitHub page, if you think it's a bug. — Ken Benoit, Mar 11 '20 at 01:06

score 1 · Answer 1 · answered Mar 10 '20 at 13:23

The problem seems to be the encoding of the MigParl data.

period <-   partition(
  "MIGPARL",
  date = days,
  regional_state = "BW",
  interjection = FALSE,
  encoding = "UTF-8"
)

When creating the partition, adding the following specification solved my problem.

encoding = "UTF-8"

quanteda: error with textmodel_wordscores: Error in t(as(x, "dgCMatrix"))

1 Answers1