I am mining Twitter data and one of the problems I come across while cleaning text is, being unable to remove/separate conjoint words that are usually hashtag data. Upon removing special characters and symbols like '#', I am left with phrases that make no sense. For instance:
1) Meaningless words: I have words like: 'spillwayjfleck' , 'bowhunterva' etc, which make no sense and need to be removed from my Corpus. Is there any function in R which can do it?.
2) Conjoint words: I need a method to separate joint words like: 'flashfloodwarn' to: 'flash', 'flood', 'warn', from my Corpus.
Any help would be appreciated.