I'm trying to lemmatize some Korean sentences using some pretrained models. I'm very much a beginner with this sort of thing so I'm sure I could be missing something obvious but following examples I found for other languages and the Korean model's docs (https://spacy.io/models/ko#ko_core_news_sm) I tried:
# loading model
nlp = spacy.load("ko_core_news_sm")
# test on first sentence
doc = nlp(sentences[0])
print(doc)
for token in doc:
print(token.lemma_)
I would expect that it would provide the base form of the word, like if it were English for example something like apples->apple.
For the Korean however, the output of this code is providing WORD+affix. I cannot post with Korean due to anti-spam measures but basically it appears to be rather than providing the lemma simply telling me how the word is composed. Am I doing something wrong is this simply how the model works? Is there any way to get the actual base word? Sorry if it's obvious and thanks everyone for the help.