12

I am running a model with a very big word embedding (>2M words). When I use tf.embedding_lookup, it expects the matrix, which is big. When I run, I subsequently get out of GPU memory error. If I reduce the size of the embedding, everything works fine.

Is there a way to deal with larger embedding?

thang
  • 3,466
  • 1
  • 19
  • 31
  • 4
    Could you put the embedding part on the CPU and have other parts on the GPU? See the usage of tf.device() for device placement here, https://www.tensorflow.org/tutorials/using_gpu – Yao Zhang Apr 08 '17 at 02:35
  • I don't know the context of your problem, but word embeddings often mean sparsity, are sparse matrix operations an option to you? If not, Yao Zhang has the right idea, if it doesn't fit in your GPU, get a GPU with more memory, or just use the CPU where you have plenty of memory. Note that the tensorflow debugger is really nice for looking at the size of various tensors in your model. – David Parks Apr 08 '17 at 18:22
  • @YaoZhang, i tried that. it doesn't seem to alleviate the GPU memory utilization. there are some things happening under the hood that I don't know about. – thang Apr 09 '17 at 03:03

1 Answers1

12

The recommended way is to use a partitioner to shard this large tensor across several parts:

embedding = tf.get_variable("embedding", [1000000000, 20],
                            partitioner=tf.fixed_size_partitioner(3))

This will split the tensor into 3 shards along 0 axis, but the rest of the program will see it as an ordinary tensor. The biggest benefit is to use a partitioner along with parameter server replication, like this:

with tf.device(tf.train.replica_device_setter(ps_tasks=3)):
  embedding = tf.get_variable("embedding", [1000000000, 20],
                              partitioner=tf.fixed_size_partitioner(3))

The key function here is tf.train.replica_device_setter. It allows you to run 3 different processes, called parameter servers, that store all of model variables. The large embedding tensor will be split across these servers like on this picture.

sharding

Maxim
  • 52,561
  • 27
  • 155
  • 209