0

After running an STM model based on a Quanteda dfm, I want to estimate my covariates' effects on certain topics.

Running the STM model went fine, producing the topics as expected, but when using estimateEffect (in the final step in the script below) the R session is aborted, notifying there is a 'fatal error'.

How can I estimate my covariates' effects, when starting from a dfm? The STM manual advices on running an STM model from a dfm, but I couldn't find how to work with the covariates after this stage.

Here's the code:

# Read texts with Quanteda
texts <- (readtext("C:/Users/renswilderom/Documents/Stuff Im working on at the moment/Newspaper articles DJ/test data/*.txt",
         docvarsfrom = "filenames", dvsep = "_", 
         docvarnames = c("Date of Publication", "Length LexisNexis", "source"), 
         encoding = "UTF-8-BOM"))  

mycorpus <- corpus(texts)  

tokens <- tokens(mycorpus, remove_punct = TRUE, remove_numbers = TRUE, ngrams = 1)

mydfm <- dfm(tokens, remove = stopwords("english"), stem = TRUE)


# Run the STM model - Metadata is called with 'data = docvars(mycorpus)'
stm_from_dfm <- stm(mydfm, K = 10, prevalence =~ Date.of.Publication + source, gamma.prior='L1', data = docvars(mycorpus)) 

# Estimate effects
prep <- estimateEffect(1:10 ~ Date.of.Publication + source, stm_from_dfm, 
                       meta = docvars(mycorpus), uncertainty = "Global")

Alternatively, I made an STM corpus from my dfm corpus, using STMcorpus <- asSTMCorpus(mydfm). But then I couldn't run the STM model as it didn't recognized my meta data. Would it be better to follow this alternative strategy? (so I need to associate the meta data with the STMcorpus in some way after running STMcorpus <- asSTMCorpus(mydfm)).

glts
  • 21,808
  • 12
  • 73
  • 94
Rens
  • 492
  • 1
  • 5
  • 14
  • It is hard to diagnose without a reproducible example- could you provide one? Also I think quanteda includes the data when you do `asSTMCorpus(mydfm)`. The metadata is just the object in the list named `data` – bstewart Oct 16 '17 at 15:52
  • Sure, here is a sample of the actual newspaper articles I'm using: https://wetransfer.com/downloads/a50d8b8fd524359751e8aa68bac3256c20171016160720/3f3bb29e40362434594f44aeee1e67f720171016160720/b425d8 I'dd prefer to work from the `stm_from_dfm`, but if necessary, I can of course also work from the `asSTMCorpus(mydfm)`. I will just give it a try to access the metadata through the list `data`. – Rens Oct 16 '17 at 16:15
  • I was not able to replicate your error with the data sample you gave me. If you want to share a copy of the workspace just before you call `estimateEffect()` I can try to replicate from there- but otherwise there isn't much I can do without being able to recreate the problem. – bstewart Oct 17 '17 at 12:34
  • OK, thanks, yes I will share a copy of the workspace (will work on that now). And just to be sure, you used to code I posted above? (so, in other words, the code should be fine). – Rens Oct 17 '17 at 13:34

1 Answers1

2

We worked through this by email- but I'll add the answer here for others who might encounter some form of the problem.

There is a bug in the matrixStats package which causes R to crash with large matrices on Windows only. The bug and solution are detailed here: https://github.com/HenrikBengtsson/matrixStats/issues/104. This issue contains both a simple test of the problem and instructions for how to install the development version of matrixStats which fixes it. This is an issue in version matrixStats 0.52.2 and will presumably be resolved by the next CRAN release.

bstewart
  • 508
  • 3
  • 8