I am trying to train skip-gram word embeddings using the example posted at https://github.com/nzw0301/keras-examples/blob/master/Skip-gram-with-NS.ipynb
on a GPU GeForce GTX 1080 using the english Wikipedia (~100M sentences).
The training time is extremely slow ~estimated 27 days / epoch with a vocab of size 50k which is a little strange for that very simple model. I am using CUDA 8 and CUDNN 5.1. The backend is tensorflow 1.2.0 & I am using keras 2.0.2. I was wondering if anyone trained a skip-gram model with a keras implementation before? Any thoughts why the implementation above is very slow? I made sure the preprocessing is not the major issue. Thanks,