Lemmatization and Pos tagging from scratch

Asked Aug 25 '23 at 23:31

Active Aug 25 '23 at 23:31

Viewed 27 times

I am NLP enthusiast and i plan to write some basic nlp models for my language(Azerbaijani), which do not have any good opportunities on Spacy\NLTK. Can you tell me please "roadmap" to realize this goal?

Firstly, i am going to write pos tagger with helped of Hidden Markov Model(HMM) and approximately 1000 tagged texts and after use this model like a tagger for further model training. And after that, write typical functions like token.pos_, token.tag_.

After Pos-tagger, i want to write lemmatize() function. But i don't have idea how to write it. Can you please tell some steps for achieving this goal?

asked Aug 25 '23 at 23:31

Murad Mammadzada

Welcome to Stackoverflow, I'll encourage you to ask the question here in the discussion instead https://stackoverflow.com/collectives/nlp/beta/discussions. Most probably the questions would be flagged as "asking for tool/fix recommendation" as it is now. – alvas Aug 28 '23 at 11:48
@alvas thanks for advice. I want to write a lemmatizer for my language, because it doesn't have this opportunity in nltk/spacy/keras. So, which tools should i use or steps i have to do for doing this? Is it very hard, especially for beginner? – Murad Mammadzada Aug 29 '23 at 21:23
[Opinion]: Try https://spacy.io/api/lemmatizer – alvas Aug 30 '23 at 00:04
@alvas it is very good idea, thanks. Can i also talk with you about corpus database? I did such question in Stack, but nobody answered to it: https://stackoverflow.com/questions/76541665/sql-database-for-linguistic-corpus – Murad Mammadzada Aug 30 '23 at 07:41

Lemmatization and Pos tagging from scratch

0 Answers0