Stemming Dutch words with the Kraaij-Pohlmann algorithm

Asked Jun 25 '17 at 11:58

Active Jun 25 '17 at 13:26

Viewed 344 times

I am trying to stem Dutch words in a corpus in R. I have found the SnowballC package, but this doesn't seem to work well for Dutch. For example:

wordStem(c("huis", "huizen", "huisje", "huisjes"), language = "porter")
[1] "huis"    "huiz"    "huisj"   "huisjes"

wordStem(c("huis", "huizen", "huisje", "huisjes"), language = "dutch")
[1] "hui"    "huizen" "huisj"  "huisj"

After some searching I found that the Kraaij-Pohlmann algorithm might be more suitable for Dutch. Is there a way to implement this in R? So far I haven't been able to find a package/script that does this. Other tips and ideas are also welcome!

edited Jun 25 '17 at 13:26

h3rm4n

4,126
15
21

asked Jun 25 '17 at 11:58

Charlotte

Stemming Dutch words with the Kraaij-Pohlmann algorithm

0 Answers0