I am trying to build a word embedding model in r with the following code:
library(quanteda)
library(text2vec)
fcm_ <- fcm(tokens, context = "window", count = "weighted", weights = 1 / (1:5), tri = TRUE)
glove <- GlobalVectors$new(rank = 50, x_max = 10)
we_main <- glove$fit_transform(fcm_, n_iter = 10,
convergence_tol = 0.01, n_threads = 8)
we_context <- glove$components
we_vectors <- we_main + t(we_context)
west <- word_vectors["USA", , drop = FALSE] +
word_vectors["EU", , drop = FALSE] +
cos_sim = sim2(x = we_vectors,
y = west,
method = "cosine",
norm = "l2")
head(sort(cos_sim[,1], decreasing = TRUE),30)
I would like to rewrite the code in a way that I can include a binary covariate X and get the cosine similarity of with the value of X =1 and X=0. How can I do that? And, would it make a difference if I simply subset the data to X=1 and X=0, and compare the models?
Since the post did not receive any response, I am editing it to specify the kind of code I am looking for
> we_main <- glove$fit_transform(fcm_, n_iter = 10, > convergence_tol = 0.01, n_threads = 8, covariate = covariate) > > we_context <- glove$components(covariate = covariate) > > ## output > cos_sim = sim2(x = we_vectors, > y = west, > method = "cosine", > norm = "l2", perspectives = covariate)