I am working on a project to classify customer feedback into buckets based on the topic of the feedback comment. So , I need to classify the sentence into one of the topics among a list of pre-defined topics.
For example :
"I keep getting an error message every time I log in" has to be tagged with "login" as the topic.
"make the screen more colorful" has to be tagged with "improvements" as the topic.
So the topics are very specific to the product and the context.
LDA doesn't seem to work for me(correct me if i'm wrong). It detects the topics in a general sense like "Sports" , "Politics" , "Technology" etc. But I need to detect specific topics as mentioned above.
Also , I don't have labelled data for training. All I have is the comments. So supervised learning approach doesn't look like an option.
What I have tried so far:
I trained a gensim model with google news corpus (its about 3.5 gb). I am cleaning the sentence by removing stop words , punctuation marks etc. I am finding , to what topic among the set of topics each word is closest to and tag the word to that topic. With an idea that the sentence might contain more words closer to the topic it is referring to than not , I am picking up the topic(s) to which maximum number of words in the sentence is mapped to.
For example:
If 3 words in a sentence is mapped to "login" topic and 2 words in the sentence is mapped to "improvement" topic , I am tagging the sentence to "login" topic.
If there is a clash between the count of multiple topics , I return all the topics with the maximum count as the topic list.
This approach is giving me fair results. But its not good enough.
What will be the best approach to tackle this problem?