How to train a custom Glove vector representations using many PDF files?

Question

I wanted to train my own custom Glove representations from using many PDF files. How can i do that ? and is there any way to use the concept of POS tagging and dependency parsing etc? Can you suggest any link for implementing that?

score 1 · Answer 1 · answered Dec 30 '18 at 01:09

Your question is overbroad to give any tight answers, but of course you can do what you describe.

You'd 1st look into libraries for extracting plain text from PDFs.

Some word2vec projects have trained word-vectors based on word-tokens that have been extended with POS-labels, or dependency-defined contexts, with potential benefits depending on your goals. See for example Levy & Goldberg's paper on dependency-based embeddings:

https://levyomer.wordpress.com/2014/04/25/dependency-based-word-embeddings/

How to train a custom Glove vector representations using many PDF files?

1 Answers1