0

I'm looking for a code, which allows me to delete own stopwords from my textcorpus, but only with defining them by their beginning

example: In my corpus that contains newspaper articles, there are also additional htpps.... internet links included, which I do not need for my topic modeling.

I now want to delete all "words" which begin with "https..."

Is there any way I can do this?

I am using the tm package for text transformations and till this point also used some own stopwords.

CODE
nzz <- SimpleCorpus(DirSource("private"), control = list(language="de"))

nzz <- tm_map(nzz, removePunctuation)
nzz <- tm_map(nzz, removeNumbers)
nzz <- tm_map(nzz, stripWhitespace)
**myStopwords <- c("beispiel","bemerkbar","docs","par",**
                 **"ipar","neue","zuercher","zeitung","http")**

**nzz <- tm_map(nzz, removeWords, c(stopwords("german"), myStopwords))****
Community
  • 1
  • 1

0 Answers0