I'm looking for a code, which allows me to delete own stopwords from my textcorpus, but only with defining them by their beginning
example: In my corpus that contains newspaper articles, there are also additional htpps.... internet links included, which I do not need for my topic modeling.
I now want to delete all "words" which begin with "https..."
Is there any way I can do this?
I am using the tm package for text transformations and till this point also used some own stopwords.
CODEnzz <- SimpleCorpus(DirSource("private"), control = list(language="de"))
nzz <- tm_map(nzz, removePunctuation)
nzz <- tm_map(nzz, removeNumbers)
nzz <- tm_map(nzz, stripWhitespace)
**myStopwords <- c("beispiel","bemerkbar","docs","par",**
**"ipar","neue","zuercher","zeitung","http")**
**nzz <- tm_map(nzz, removeWords, c(stopwords("german"), myStopwords))****