0

enter image description hereI'm trying to encode word vectors using Glove and I get the error stated above. The data consists of two text columns for the purpose of sentence similarity determination. Can you please help me solve this error?

[code]

embeddings_index = {}
f = open(r'C:\Users\15084\Downloads\glove.840B.300d\glove.840B.300d.txt',errors = 
'ignore',encoding='utf-8')
for line in f:
    values = line.split()
    word = values[0]
    coefs = np.asarray(values[1:], dtype='float32')
    embeddings_index[word] = coefs
f.close()

print('Found %s word vectors.' % len(embeddings_index))
snakecharmerb
  • 47,570
  • 11
  • 100
  • 153
VishwaV
  • 167
  • 12

3 Answers3

0

Use this code to load your embedding index

  import pickle
  with open('glove_vectors', 'rb') as f:
     model = pickle.load(f)
     glove_words =  set(model.keys())

here you embedding index the model itself

Umar Faruk
  • 11
  • 1
  • 5
0

Try following code ,it will resolve above issue:

def process_glove_line(line, dim):
    word = None
    embedding = None

    try:
        splitLine = line.split()
        word = " ".join(splitLine[:len(splitLine)-dim])
        embedding = np.array([float(val) for val in splitLine[-dim:]])
    except:
        print(line)

    return word, embedding

def load_glove_model(glove_filepath, dim):
    with open(glove_filepath, encoding="utf8" ) as f:
        content = f.readlines()
        model = {}
        for line in content:
            word, embedding = process_glove_line(line, dim)
            if embedding is not None:
                model[word] = embedding
        return model

embeddings_index= load_glove_model("glove.840B.300d.txt", 300)
Sweety
  • 21
  • 2
-1

I think this will help you

f = open(r'C:\Users\15084\Downloads\glove.840B.300d\glove.840B.300d.txt',errors ='ignore',encoding='utf-8','r')