4

I am trying to stem Dutch words in a corpus in R. I have found the SnowballC package, but this doesn't seem to work well for Dutch. For example:

wordStem(c("huis", "huizen", "huisje", "huisjes"), language = "porter")
[1] "huis"    "huiz"    "huisj"   "huisjes"

wordStem(c("huis", "huizen", "huisje", "huisjes"), language = "dutch")
[1] "hui"    "huizen" "huisj"  "huisj" 

After some searching I found that the Kraaij-Pohlmann algorithm might be more suitable for Dutch. Is there a way to implement this in R? So far I haven't been able to find a package/script that does this. Other tips and ideas are also welcome!

h3rm4n
  • 4,126
  • 15
  • 21
Charlotte
  • 41
  • 4

0 Answers0