21

I am trying to initialize a tensorflow Variable with pre-trained word2vec embeddings.

I have the following code:

import tensorflow as tf
from gensim import models

model = models.Word2Vec.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True)
X = model.syn0

embeddings = tf.Variable(tf.random_uniform(X.shape, minval=-0.1, maxval=0.1), trainable=False)

sess.run(tf.initialize_all_variables())

sess.run(embeddings.assign(X))

And I am receiving the following error:

ValueError: Cannot create an Operation with a NodeDef larger than 2GB.

The array (X) I am trying to assign is of shape (3000000, 300) and its size is 3.6GB.

I am getting the same error if I try tf.convert_to_tensor(X) as well.

I know that it fails due to the fact that the array is larger than 2GB. However, I do not know how to assign an array larger than 2GB to a tensorflow Variable

Filip
  • 19,269
  • 7
  • 51
  • 60

3 Answers3

19

It seems like the only option is to use a placeholder. The cleanest way I can find is to initialize to a placeholder directly:

X_init = tf.placeholder(tf.float32, shape=(3000000, 300))
X = tf.Variable(X_init)
# The rest of the setup...
sess.run(tf.initialize_all_variables(), feed_dict={X_init: model.syn0})
Joshua Little
  • 308
  • 2
  • 6
  • 3
    Note that you can also set the optional shape argument when you call `tf.placeholder()` and then you don't need `validate_shape=False` (and you get better shape inference in the rest of your program!). – mrry Feb 19 '16 at 01:24
  • @mrry, Oh, that's right. Thanks. I've added that to the answer. – Joshua Little Feb 19 '16 at 01:27
11

The easiest solution is to feed_dict'ing it into a placeholder node that you use to tf.assign to the variable.

X = tf.Variable([0.0])
place = tf.placeholder(tf.float32, shape=(3000000, 300))
set_x = X.assign(place)
# set up your session here....
sess.run(set_x, feed_dict={place: model.syn0})

As Joshua Little noted in a separate answer, you can also use it in the initializer:

X = tf.Variable(place)    # place as defined above
...
init = tf.initialize_all_variables()
... create sess ...
sess.run(init, feed_dict={place: model.syn0})
dga
  • 21,757
  • 3
  • 44
  • 51
  • 3
    `X.assign(place)` needs to be `tf.assign(X, place, validate_shape=False)`, or TensorFlow will complain that you're changing the tensor's shape. Other than that, this works. – Joshua Little Feb 19 '16 at 01:19
  • Thank you - updated the answer to include that + mrry's comment below, by setting the shape of the placeholder. – dga Feb 19 '16 at 04:50
  • 3
    for more info there is a good description of how to do this in the documentation under [preloaded data](https://www.tensorflow.org/versions/r0.7/how_tos/reading_data/index.html#preloaded-data) and a full working example on how to use a placeholder and a variable to preload MNIST training inputs [here](https://github.com/tensorflow/tensorflow/blob/r0.7/tensorflow/examples/how_tos/reading_data/fully_connected_preloaded_var.py) – stefano Feb 21 '16 at 15:30
  • Also `tf.initialize_all_variables()` needs to be `tf.global_variables_initializer()` because of it gives warning that `initialize_all_variables()` function is deprecated. – aysebilgegunduz Jun 30 '17 at 22:27
-1

try this:

import tensorflow as tf
from gensim import models

model = models.KeyedVectors.load_word2vec_format('./GoogleNews-vectors-negative300.bin', binary=True)
X = model.syn0

embeddings = tf.Variable(tf.random_uniform(X.shape, minval=-0.1, maxval=0.1), trainable=False)

sess = tf.Session()
sess.run(tf.global_variables_initializer())
embeddings.load(model.syn0, sess)