4

I want to classify text documents using doc2vec representation and scikit-learn models.

My problem is that I'm lost on how to get started. can someone explain the general steps usually taken to use doc2vec with scikit-learn?

MikeAlbert
  • 164
  • 3
  • 11

1 Answers1

9

There is a great tutorial here for a binary classification with scikit-learn + doc2vec. In short:

  • Using gensim to train/load your doc2vec model.
  • Input text will be converted to a fixed dimension vector of floats (the same dimension as your embedding). These are the actual input features.
  • Now feel free to use any classifier in scikit-learn.
SheepPerplexed
  • 1,132
  • 8
  • 20
greeness
  • 15,956
  • 5
  • 50
  • 80