0

While experimenting with word embedding using text2vec package in R, the following error is thrown

embd_dim <- 5

glove_event <- GlobalVectors$new(rank = embd_dim, x_max = 10,learning_rate = 0.01, alpha = 0.95, lambda = 0.005)

wrd_embd_event <- glove_event$fit_transform(tcm_event, n_iter = 200, convergence_tol = 0.001)

Error in glove_event$fit_transform(tcm_event, n_iter = 200, convergence_tol = 0.001) : 
  Cost is too big, probably something goes wrong... try smaller learning rate

Smaller learning rate has not helped. Similar outcome from experiment with different skip_grams_window values in ctreate_tcm() and different rank values in glove_event().

I am clueless about the source of this error.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • This means that underlying optimization algorithm (SGD) has difficulties with numerical stability and some gradients become NaN. Usually the issue either too high learning rate or some problems in input data. Small reproducible example would help – Dmitriy Selivanov Jun 12 '20 at 17:34
  • @Dmitriy Selivanov Unfortunately I can not share the data directly. I am including some information related to the problem and the data. Please let me know if it helps. I am trying to create embedding not for words, as we understand "words" in english language. Here the word-equivalent entities are phrase; collection of few english words together to express certain technical concept in communication domain. So effectively I am trying to embed a set of such phrases. For example, one such phrase is "Loss of Tracking". – user1263917 Jun 12 '20 at 18:47
  • @Dmitriy Selivanov The data set has a set of 67 unique phrases leading to a sequence of phrases of length 11,000. This is my whole corpus. So the vocabulary has 67 rows. TCM matrix has approx 1,000 non-zero entries (window length = 5L). I know this is more like meta information; was wondering if these info on data size etc. can throw some light on the instability of the algorithm. Thanks a lot for your help ! – user1263917 Jun 12 '20 at 18:56
  • unlikely you will get anything useful with tcm size 67*67 – Dmitriy Selivanov Jun 13 '20 at 06:34

0 Answers0