Including a covariate in a word embedding model in R using text2vec and quanteda packages

Question

I am trying to build a word embedding model in r with the following code:


library(quanteda)
library(text2vec)
fcm_ <- fcm(tokens, context = "window", count = "weighted", weights = 1 / (1:5), tri = TRUE)

glove <- GlobalVectors$new(rank = 50, x_max = 10)



we_main <- glove$fit_transform(fcm_, n_iter = 10,
                               convergence_tol = 0.01, n_threads = 8)

we_context <- glove$components

we_vectors <- we_main + t(we_context)




west <-  word_vectors["USA", , drop = FALSE]  +
  word_vectors["EU", , drop = FALSE]  +

cos_sim = sim2(x = we_vectors,
               y = west,
               method = "cosine",
               norm = "l2")


head(sort(cos_sim[,1], decreasing = TRUE),30)

I would like to rewrite the code in a way that I can include a binary covariate X and get the cosine similarity of with the value of X =1 and X=0. How can I do that? And, would it make a difference if I simply subset the data to X=1 and X=0, and compare the models?

Since the post did not receive any response, I am editing it to specify the kind of code I am looking for

> we_main <- glove$fit_transform(fcm_, n_iter = 10,
>                                convergence_tol = 0.01, n_threads = 8, covariate = covariate)
> 
> we_context <- glove$components(covariate = covariate)
> 
> ## output
> cos_sim = sim2(x = we_vectors,
>                y = west,
>                method = "cosine",
>                norm = "l2", perspectives = covariate)

Including a covariate in a word embedding model in R using text2vec and quanteda packages

0 Answers0