Questions tagged [glove]

GloVe is an unsupervised learning algorithm for obtaining vector representations for words (word embeddings).

GloVe is an unsupervised learning algorithm for obtaining vector representations for words (word embeddings). See https://nlp.stanford.edu/projects/glove/ for more information.

92 questions
1
vote
1 answer

Glove6b50d parsing: could not convert string to float: '-'

I am trying to parse the Glove6b50d data from Kaggle in via Google Colab, then run it through the word2vec process (apologies for the huge URL - it's the fastest link I've found). However, I'm hitting a bug where '-' tokens are not parsed correctly,…
1
vote
1 answer

How to compare cosine similarities across three pretrained models?

I have two corpora - one with all women leader speeches and the other with men leader speeches. I would like to test the hypothesis that cosine similarity between two words in the one corpus is significantly different than cosine similarity between…
SanMelkote
  • 228
  • 2
  • 12
1
vote
0 answers

Using BERT embeddings for Seq2Seq model building

Earlier I've used Glove embedding to build the seq2seq model for text summarization, Now I want to change the Glove with BERT to see the performance of the model. For this, I used the bert-as-service feature from…
1
vote
0 answers

How do I load a giant (~13 GB) GLOVE vector dictionary, currently in JSON form, in Python?

I converted the Glove 840B text file into JSON vectors in the format of {"word": [300 Dimensional feature vector]}, and I need to query this JSON file thousands of times in my program to get vectors for thousands of words. I was thinking of storing…
kevdoge
  • 11
  • 1
  • 1
1
vote
1 answer

Glove: training with single text file. Does GLoVE try to read it into memory? Or is it streamed?

I need to train some glove models to compare them with word2vec and fasttext output. It's implemented in C, and I can't read C code. The github is here. The training corpus needs to be formatted into a single text file. For me, this would be…
generic_user
  • 3,430
  • 3
  • 32
  • 56
1
vote
2 answers

GloVe for Python 3.7 version

I am trying to install GloVe package from Pypi on Python version 3.7 but it keeps returning the same error written below. Is there any way to use GloVe? I have also tried to install it from https://github.com/stanfordnlp/GloVe but it also ends with…
Black
  • 13
  • 1
  • 4
1
vote
1 answer

Keras word embedding matrix has first row of zeros

I am looking at the Keras Glove word embedding example and it is not clear why the first row of the embedding matrix is populated with zeros. First, the embedding index is created where words are associated with arrays. embeddings_index = {} with…
Peter B
  • 35
  • 5
1
vote
1 answer

Is there any pretrained word2vec model capable of detecting phrase

Is there any pretrained word2vec model with data containing both single word or multiple words coalesced together such as 'drama', 'drama_film' or '‘africanamericancommunity’. Is there any such model trained with huge dataset such as dataset trained…
Shadekur Rahman
  • 73
  • 1
  • 2
  • 9
1
vote
1 answer

I have downloaded an have Unzipped the glove file in my google colab , but still I'm unable to access it

I'm getting this error when I'm trying to access to run this code: word_embedding_matrix = np.load(open("word_embedding_matrix.npy", 'rb')) FileNotFoundError Traceback (most recent call last) in () ----> 1 word_embedding_matrix =…
Bvs Revanth
  • 49
  • 1
  • 7
1
vote
0 answers

Pre-initialize weights in glove use Initial parameter in glove text2vec fit_transform

I would like to pre-initialise glove, word vectors and biases using the initial parameter of the fit_transform. The documentation of the function states to pass as a named list "w_i, w_j, b_i, b_j" values - initial word vectors and biases. As a…
Melt
  • 11
  • 1
1
vote
2 answers

Do we need a GPU system to train an deep learning model?

I have created an encoder-decoder model with pre-trained 100D glove embedding, to create an abstractive text summarizer. The data set has 4300 article, its summary data. Vocabulary size is 48549 for articles and 19130 for summary. Total memory size…
hR 312
  • 824
  • 1
  • 9
  • 22
1
vote
1 answer

how to save BERT word embedding as .vec similar to word2vec

I want to use the generated BERT word embedding as a vector for building the vocab in Torchtext I can load vectors such as GloVe or word2vec but I didn't know how to save the word embedding from BERT to a format acceptable by Torchtext vocab when I…
1
vote
1 answer

when calculating the cooccurance of two words, do we sepate the sentences or linking all sentences?

For example, I get I document that contains 2 sentences: I am a person. He also likes apples. Do we need to count the cooccurrence of "person" and "He" ?
Jing Gu
  • 439
  • 1
  • 3
  • 9
1
vote
1 answer

Read GloVe pre-trained embeddings into R, as a matrix

Working in R. I know the pre-trained GloVe embeddings (e.g., "glove.6B.50d.txt") can be found here: https://nlp.stanford.edu/projects/glove/. However, I've had zero luck reading this text file into R so that the product is the word embedding matrix…
Drew
  • 135
  • 4
  • 11
1
vote
3 answers

Can not read glove.6B.300d.txt in a pandas dataframe

I am trying to read glove.6B.300d.txt file into a Pandas dataframe. (The file can be downloaded from here: https://github.com/stanfordnlp/GloVe) Here are the exceptions I am getting: glove = pd.read_csv(filename, sep = ' ') ParserError: Error…
user8270077
  • 4,621
  • 17
  • 75
  • 140