Questions tagged [stemming]

The process for reducing inflected words to their stem.

In linguistic morphology and information retrieval, stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form—generally a written word form

531 questions
5
votes
1 answer

Compute word n-grams on original text or after lemma/stemming process?

I'm thinking about use word n-grams techniques on a raw text. But I have a doubt: does it have sense use word n-grams after applying lemma/stemming on text? If not, why should I use word n-grams only on raw files? What are pros and cons?
5
votes
1 answer

nltk : How to prevent stemming of proper nouns

I am trying to wrote a keyword extraction program using Stanford POS taggers and NER. For keyword extraction, i am only interested in proper nouns. Here is the basic approach Clean up the data by removing anything but alphabets Remove…
AbtPst
  • 7,778
  • 17
  • 91
  • 172
5
votes
1 answer

Add new language to postgresql full text search

Is there any way to add new languages to postgresq full text search? Where can I read or start from ?
littleali
  • 408
  • 1
  • 6
  • 22
5
votes
2 answers

Arabic lemmatization and Stanford NLP

I try to make lemmatization, ie identifying the lemma and possibly the Arabic root of a verb, for example: يتصل ==> lemma (infinitive of the verb) ==> اتصل ==> root (triliteral root / Jidr thoulathi) ==> و ص ل Do you think Stanford NLP can do…
5
votes
6 answers

Python ISRIStemmer for Arabic text

I am running the following code on IDLE(Python) and I want to enter Arabic string and get the stemming for it but actually it doesn't work >>> from nltk.stem.isri import ISRIStemmer >>> st = ISRIStemmer() >>> w= 'حركات' >>> join =…
user2822966
  • 97
  • 1
  • 8
5
votes
1 answer

Snowball Stemmer Usage

I'd like to use the stemmer here for merging word counts. http://snowball.tartarus.org/download.html The page has a download link, but I'm not sure how to integrate the files into my eclipse project Its not just a jar to drop into my lib folder, its…
LemonMan
  • 2,963
  • 8
  • 24
  • 34
5
votes
3 answers

TreeTagger installation successful but cannot open .par file

Do anyone know how to resolve this file reading error in TreeTagger that is a common Natural Language Processing tool used to POS tag, lemmatize and chunk sentences? alvas@ikoma:~/treetagger$ echo 'Hello world!' | cmd/tree-tagger-english …
alvas
  • 115,346
  • 109
  • 446
  • 738
5
votes
3 answers

Can I perform stemming using regular expressions?

How can I get my regular expression to match against just one condition exactly? For example I have the following regular expression: (\w+)(?=ly|es|s|y) Matching the expression against the word "glasses" returns: glasse The correct match should…
Isomorph
  • 341
  • 1
  • 3
  • 9
5
votes
6 answers

Can you programmatically detect pluralizations of English words, and derive the singular form?

Given some (English) word that we shall assume is a plural, is it possible to derive the singular form? I'd like to avoid lookup/dictionary tables if possible. Some examples: Examples -> Example a simple 's' suffix Glitch -> Glitches 'es'…
Matthew Scharley
  • 127,823
  • 52
  • 194
  • 222
5
votes
0 answers

Adding language to pystemmer

I would like to use pystemmer with whoosh, but there is no support for my language. I found two snowball files for my language (Snowball), and i made *.c files from them as advised here. Now i would like to include *.c files in pystemmer. I added…
5
votes
5 answers

Use multiple stemming languages with ElasticSearch

I'm building a search engine for a website where users can be of many different countries and post text content. I'll consider that: - A french generates content in french and english - A german generates content in german and english etc... What…
Sebastien Lorber
  • 89,644
  • 67
  • 288
  • 419
4
votes
2 answers

Solr Snowball stemmer is inconsistent with Spanish

I have this stemmed field:
Chewie
  • 7,095
  • 5
  • 29
  • 36
4
votes
3 answers

Can WordNetLemmatizer in Nltk stem words?

I want to find word stems with Wordnet. Does wordnet have a function for stemming? I use this import for my stemming, but it doesn't work as expected. from nltk.stem.wordnet import WordNetLemmatizer WordNetLemmatizer().lemmatize('Having','v')
Masoud Abasian
  • 10,549
  • 6
  • 23
  • 22
4
votes
2 answers

Is there an implementation of a croatian word stemming algorithm?

i'm searching for an implementation of a croatian word stemming algorithm. Ideally in Java but i would also accept any other language. Is there somewhere a community of english speaking developers, who are developing search applications for the…
Chris
  • 15,429
  • 19
  • 72
  • 74
4
votes
2 answers

English lemmatizer databases?

Do you know any big enough lemmatizer database that returns correct result for following sample words: geese: goose plantes: //not found Wordnet's morphological analyzer is not sufficient, since it gives the following incorrect results: geese:…
Ali Shakiba
  • 20,549
  • 18
  • 61
  • 88