I have a set of informal documents (couple of thousands) which I want to apply topic modeling (MALLET) on. The problem is, there are a considerable number of misspelled words in the documents. Most are intentional, such as short-forms and local lingo like `'juz' -> 'just', 'alr' -> 'already'. A couple of these variations exists, due to the different authors' peculiar styles of writing.
After feeding them to MALLET, I kinda bothered that one of the topics generated is actually a set of misspelled stopwords. I believe these words are mostly used in the small subset of documents from the same author, hence MALLET picked it up.
My question is, do I spell-check and correct these sets of misspelled words, and perhaps save the corrected text somewhere, before conducting further tasks on them? I suppose this would meant that I do need to manually verify the corrections before committing right? What would be the most "efficient" way to do this?
Or do I actually ignore these misspelled words?