I am scraping reviews off Amazon with the intent to perform sentiment analysis to classify them into positive, negative and neutral. Now the data I would get would be text and unlabeled.
My approach to this problem would be as following:-
1.) Label the data using clustering algorithms like DBScan, HDBScan or KMeans. The number of clusters would obviously be 3.
2.) Train a Classification algorithm on the labelled data.
Now I have never performed clustering on text data but I am familiar with the basics of clustering. So my question is:
Is my approach correct?
Any articles/blogs/tutorials I can follow for text based clustering since I am kinda new to this?