I would like to lemmatize a list of German words, including nouns and verbs. The struggle here is that this implies words beginning with capital letters and others with lower case letters. Until now I worked with a lookup list. Here, the sample
lookup_list <-
data.frame(
cbind(
c("mache","tust","Tuns","Reisen","genaue","genauer","pflanze","Pflanzen","reise"),
c("machen","tuen","Tun","Reise","genau","genau","pflanzen","Pflanze","reisen")
)
)
names(lookup_list) <- c("word","lemma")
Text2Lemmatize <- "mache tust Tuns Reisen genaue genauer pflanze Pflanzen reise"
The problem is that '''lemmatize()''' ignores the word in the list that begin with capital letters.
lemmatize_strings(Text2Lemmatize, lookup_list)
> lemmatize_strings(Text2Lemmatize, lookup_list)
[1] "machen tuen Tuns Reisen genau genau pflanzen Pflanzen reisen"
Can anybody help me out with this little problem?
Thanks in advance!