For my current project I have to build a topic modeling or classification utility which will process thousands of articles to classify them into various topics (topics may be 40-50 to start off with). For e.g. it'll go over database technologies articles and classify them whether an article is NOSQL article/ Relational DB Article/ Graph Database article (just an example).
I have very basic NLP background and our team mostly has python backend scripting experience. I began looking into various options available to implement it and came across NLTK and Scikit-Learn which are Python based and also Weka and Mallet which are JVM based.
My understanding is that NLTK is more suited to learn and understand various NLP techniques like Topic classification.
Can someone suggest what may be the best open source solution that we can use for our implementation? Please let me know if I missed on any information that will help with the answers.