I am trying to stem Dutch words in a corpus in R. I have found the SnowballC package, but this doesn't seem to work well for Dutch. For example:
wordStem(c("huis", "huizen", "huisje", "huisjes"), language = "porter")
[1] "huis" "huiz" "huisj" "huisjes"
wordStem(c("huis", "huizen", "huisje", "huisjes"), language = "dutch")
[1] "hui" "huizen" "huisj" "huisj"
After some searching I found that the Kraaij-Pohlmann algorithm might be more suitable for Dutch. Is there a way to implement this in R? So far I haven't been able to find a package/script that does this. Other tips and ideas are also welcome!