2

I am trying to train skip-gram word embeddings using the example posted at https://github.com/nzw0301/keras-examples/blob/master/Skip-gram-with-NS.ipynb

on a GPU GeForce GTX 1080 using the english Wikipedia (~100M sentences).

The training time is extremely slow ~estimated 27 days / epoch with a vocab of size 50k which is a little strange for that very simple model. I am using CUDA 8 and CUDNN 5.1. The backend is tensorflow 1.2.0 & I am using keras 2.0.2. I was wondering if anyone trained a skip-gram model with a keras implementation before? Any thoughts why the implementation above is very slow? I made sure the preprocessing is not the major issue. Thanks,

aelgohary
  • 21
  • 3
  • any answers found ? – bicepjai Aug 11 '17 at 08:06
  • I'm not sure if it makes a difference, but that implementation is not the same as the established equations for word2vec. Particularly on how the negative samples are collected and factored into the loss. – SantoshGupta7 Apr 06 '19 at 06:25

0 Answers0