I've got about thousands of txt documents stored in 8 different file folders which are tagged with topic categories (actually,they are class 1,2,3...). And I have another 80 txt documents that don't yet have categories. I'm trying to find the best way to categorize them.
I have already finished the text segmentation and deleted the English letters(cause they are Chinese texts).What should I do next?
I can get the words with highest TF-IDF values but don't know how to do next.It seems like I should turn these text into vectors and train a classifier,but I don't know how.