I'm trying to classify sentences on a given topic using machine learning. However, I can't seem to find the adequate algorithm / solution for this particular problem.
Some details:
I have tokenized, lemmatized and vectorized the sentences. So, given a sentence:
How will the weather be today?
It gets tokenized:
['How', 'will', 'the', 'weather', 'be', 'today?']
It then gets lemmatized:
['How', 'weather', 'today']
And then based of a small dictionary (~100 words) I built, the sentenced gets transformed in a sequence of 0 or 1, signaling if the words appear in the dictionary or not:
[0, 0, 0, 1, .... 0, 1]
I've built myself a small dataset (~50 sentences split in 3 topics) and now I need an algorithm that will train on the dataset and that'll predict one of those 3 classes, given a new sentence.
Deep learning is not efficient given the reduced size of the dataset. I've tried liniar regression but outputs random very big numbers. Any ideas on what should I try or if any mistakes were made?