I have a training dataset of 1,00,000+ documents categorised into around 100 categories. I am trying to predict category for a text using DeepLearning4java library, code based on ParagraphVectorsClassifierExample example. Each document is a single short line of text.
I am splitting available data into training(80%) and test data(20%). With much tuning of parameters, I am getting maximum 20% correct predictions on the test data. I understand lot of things depend on input data itself. However, just wanted to check if the accuracy can be further improved. I see a comment in the example code that says "This example could be improved by using learning cascade for higher accuracy". Any hint/help/advice to improve prediction accuracy would be highly appreciated.