Questions tagged [stemming]

The process for reducing inflected words to their stem.

In linguistic morphology and information retrieval, stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form—generally a written word form

531 questions
2
votes
1 answer

Lucene synonym expansion,stemming,spell check and more

I am using Lucene to index my database and then perform a phrase search on a specific field(field name: keyword). I am using following code currently: String userQuery = request.getParameter("query"); //create standard analyzer…
Prim
  • 1,312
  • 5
  • 25
  • 51
2
votes
1 answer

How to get a nested list by stemming the words inside the nested lists?

I've a Python list with several sub lists having tokens as tokens. I want to stem the tokens in it so that the output will be as stemmed_expected. tokens = [['cooked', 'lovely','baked'],['hotel',…
Dakshila Kamalsooriya
  • 1,391
  • 4
  • 17
  • 36
2
votes
1 answer

Stemming and lemmatizing - What approach?

I am preparing to do topic modeling via Mallet and have finished pulling the raw datasets. Before I import and start modeling, I need to take some steps to clean and streamline the texts, of course. I have my lists of stopwwords ready and I know…
Glorifier
  • 31
  • 1
2
votes
1 answer

Solr search/faceting results have strange behaviour: i only get "stemmed" strings (hope it's correct definition)

Sorry for a title that bad, but i didn't know how to describe my problem. I'm using sunburnt (python interface) to query solr within my django app. When i'm searching, everything is ok, i get the full string. On the other hand, if i'm faceting…
Samuele Mattiuzzo
  • 10,760
  • 5
  • 39
  • 63
2
votes
2 answers

How do I Get All Attributes Of Synsets?

Please Give Me am example That have all of attribute of synset of a word i know only this attribute: name , lemma_names , definition synsetsWord = ObjWn.synsets( 'Book' ) i = 0 for senseWord in synsetsWord: …
Masoud Abasian
  • 10,549
  • 6
  • 23
  • 22
2
votes
0 answers

Elasticsearch German stemmer doesn't do plural

I'm working on a basic German analyzer in Elasticsearch which is defined as follows { "settings": { "analysis": { "filter": { "german_stemmer": { "type": "snowball", "language": "German" }, …
Lior Magen
  • 1,533
  • 2
  • 15
  • 33
2
votes
2 answers

Exact word search in Solr

I have a question which closely relates to this question. In my schema I have a field This gives an exact match, ie. stemming disabled eat = eat Is it possible,…
Ruth
  • 5,646
  • 12
  • 38
  • 45
2
votes
4 answers

How to find basic, uninflected word for searching?

I am having trouble trying to write a search engine that treats all inflections of a word as the same basic word. So for verbs these are all the same root word, be: number/person (e.g. am; is; are) tense/mood like past or future tense (e.g.…
Jon
  • 757
  • 5
  • 20
2
votes
1 answer

Porter and Lancaster stemming clarification

I am doing stemming using Porter and Lancaster and I find these observations: Input: replied Porter: repli Lancaster: reply Input: twice porter: twice lancaster: twic Input: came porter: came lancaster: cam Input: In porter: …
floss
  • 2,603
  • 2
  • 20
  • 37
2
votes
1 answer

What is the real purpose of Stemming in NLP?

I know about stemming and lemmatizing as follows: stemming - converts words into non-changing portions;amusing, amusement - amus lemmatizing - converts words to dictionary form ; amusing, amusement - amuse I can understand why to use lemmatization.…
Karanam Krishna
  • 365
  • 2
  • 16
2
votes
1 answer

How to exclude certain names and terms from stemming (Python NLTK SnowballStemmer (Porter2))

I am newly getting into NLP, Python, and posting on Stackoverflow at the same time, so please be patient with me if I might seem ignorant :). I am using SnowballStemmer in Python's NLTK in order to stem words for textual analysis. While…
ylimenibor
  • 23
  • 2
2
votes
1 answer

English stemming or lemmatization in Lucene.NET without SnowBall Analyzer or a custom analyzer

Is there a non-obsolete Lucene.NET Analyzer that can do english language stemming or lemmatization or do I need to write a custom Analyzer? I can't seem to find an Analyzer that includes PorterStemFilter or EnglishMinimalStemFilter in the source…
Justin Dearing
  • 14,270
  • 22
  • 88
  • 161
2
votes
1 answer

Lemmatisation of web scraped data

Let's suppose that I have a text document such as the following: document = '

I am a sentence. I am another sentence

I am a third sentence.' ( or a more complex text example: document = '

Forde Education are looking to recruit a Teacher of…

Outcast
  • 4,967
  • 5
  • 44
  • 99
2
votes
1 answer

How is the correct use of stemDocument?

I have already read this and this questions, but I still didn't understand the use of stemDocument in tm_map. Let's follow this example: q17 <- VCorpus(VectorSource(x = c("poder", "pode")), readerControl = list(language = "pt", …
2
votes
2 answers

Why is stemming important for sentimental analysis

I am using seven lexicons to calculate sentimental scores on a data set containing forum posts. Apart from removing all noise such as whitespace, special char, digits and stopwords, why is it also important to stem the words? I am using Harvard.IV,…
Ola
  • 81
  • 1
  • 8