add docvars to dfm from separate data.frame r

Question

After spending much time developing the proper corpus (e.g. stopwords, tf-idf) I created a dtm in the tmpackage and ran my topic model.

I then proceeded to compare the topics to some document level covariates of interest, only to learn that stm allows you to estimate models using document-level covariates of interest.

I have successfully converted my dtm to a dfm in quanteda and would like to add my covariates to the dfm before converting to a stm.

I have a data.frame with my covariates of interest which share the same row.id as the dfm.

I am looking for a solution to merge the covariates from the data.frame with the dfm.

I have tried:

docvars and metadoc in quanteda to no avail.

For example,

docvars(dfm, docnames(dfm)) <- df$covariate

Any help would be greatly appreciated!

I think you need `cbind.dfm`: Combine a dfm with another dfm, or numeric, or matrix object, returning a dfm with the combined documents or features, respectively. — phiver, Jul 13 '18 at 15:22
@phiver, thanks for the response. It looks like that function is specific to merging two `dfm` rather than a `dfm` and a `data.frame` by their respective `id`. When I run — SeekingData, Jul 13 '18 at 15:38
When I run `cbind(dfm, df)` I get back `Error: not-yet-implemented method for cbind2(, ). ->> Ask the package authors to implement the missing feature.` — SeekingData, Jul 13 '18 at 15:46
try turning your df, or relevant columns of your df into a matrix. cbind(dfm, matrix) works fine. — phiver, Jul 13 '18 at 16:03
I am getting the following error, I think it is due to the fact that my covariates are characters: `Error in asMethod(object) : invalid class 'NA' to dup_mMatrix_as_geMatrix`. When I convert it to as.factor I get `Error in cbind(deparse.level, ...) : all arguments must be dfm objects`. I have had success converting the `dfm` to a `data.frame` and merging the two that way: `dfm.data.frame <- convert(dfmm, to = "data.frame") with.meta <- merge(dfm.data.frame, df, by="document.id")` but now I need to convert the data.frame back to a `dfm` — SeekingData, Jul 13 '18 at 16:17

score 0 · Answer 1 · answered Jul 13 '18 at 17:57

0

Okay, I was able to figure it out. In the end it was very simple. I just needed to specify the data.frame as a whole, not just my columns of interest. Here is the code:

dfm.w.metadata <- convert(dfm, to = "stm", docvars = df)

answered Jul 13 '18 at 17:57

SeekingData

115
6

add docvars to dfm from separate data.frame r

1 Answers1