I have a folder that contains many document in .txt of tourism reviews. I want to use the bag of words approach to convert them to some kind of numeric representation for machine learning (Latent Dirichlet Allocation - LDA) in c++ to train the system in recognizing the topic for each document.
But somehow I do not know what to do with Bag of Word algorithm's, and i heard some tools like Scikit-learn. but Scikit-learn work in python environment. I'm wondering, is there some recommendation tools / library that can help me to solve my bag of words modul's? Or is there a C++ wrapper over scikit-learn for C++?
i have come at a level where I don't know what to do, some guidance would be appreciated. thank you :)