Questions tagged [stemming]

The process for reducing inflected words to their stem.

In linguistic morphology and information retrieval, stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form—generally a written word form

531 questions
0
votes
1 answer

R Textmining: How to perform typical textoperations with tm Package on vectors

How, to operate following standard operations, on a character vector? (Need a dictionary for a DTM (classification). So in order to match the text entries, where this operations were already been made, i have to change the my dictionary terms…
alex
  • 1,103
  • 1
  • 14
  • 25
0
votes
1 answer

PorterStemmer in Lucene

I am looking for help on how I can use the class PorterStemFilter in Lucene 4.0. Below is my indexer taken from http://www.lucenetutorial.com/lucene-in-5-minutes.html: ... StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_40); …
user2161903
  • 577
  • 1
  • 6
  • 22
0
votes
1 answer

Find only exact matches of a particular (exception) word

I looking for a way to configure Solr so that it only finds exact matches for a particular word and works the normal way for other words. One possible way that comes to mind is to configure the stemmer's synonyms list so that this word is mapped to…
axk
  • 5,316
  • 12
  • 58
  • 96
0
votes
0 answers

Get all word forms used in mysql full text search

I am using full text search feature of mysql for searching through comments. To use stemming, I am using "form of" in the query. This gives me the correct result, returning all comments having the any word form of the search text. However, I need to…
Gyanendra Singh
  • 895
  • 2
  • 13
  • 30
0
votes
0 answers

r - DocumentTermMatrix control parameters

I am trying to build a SVM model on a text corpus. For this I built DocumentTermMatrix with following control parameters: control <- list(stopwords = TRUE, removePunctuation = TRUE, removeNumbers = TRUE, …
0
votes
1 answer

Stemming + stop word filtering in Lucene 4.0+

I used to use SnowBallAnalyzer to combine custom stop word filtering with basic stemming, but it has been deprecated. For e.g. in index config, I could easily specify: IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_32, …
abhinavkulkarni
  • 2,284
  • 4
  • 36
  • 54
0
votes
1 answer

lucene stemmer strategy (does it keep both stemmed & non-stemmed words or just stemmed ones)

I have a question regarding lucene Stemmer. I was wondering if lucene keeps both stemmed words and non-stemmed words OR just replaces the stemmed word with the non-stemmed words? for example if a record has following: "everyone loves cats" does it…
Mr.Boy
  • 615
  • 1
  • 7
  • 13
0
votes
1 answer

SOLR stemming and stopwords

In SOLR 3.5 text field type the StopFilterFactory is listed before the PorterStemFilterFactory. does this mean that if I wanted to stop for example "game" and "games" I would have to add both to stopwords? if so would moving the StopFilterFactory…
dice
  • 2,820
  • 1
  • 23
  • 34
0
votes
0 answers

Code works in VS2010 but not in VC++ 6.0

I'm working on a project, in which I'm using stemming library which works quite perfectly on Visual Studio 2010 (Express) but when I tried to compile the same project in VC++ 6.0, it generated errors. I fixed a few of them but I'm stuck at some.…
0
votes
0 answers

Customizing the output of Stemming

I've been using Snowball Porter2 for stemming. I don't want the root form as output.For eg., the Porter2 produces "Emergenc" after stemming "Emergencies". I want "Emergency" instead. Will someone please point direction how to achieve the result? The…
nexuscreator
  • 835
  • 1
  • 9
  • 17
0
votes
2 answers

lexical-level similarity word clustering tool

Is there any open software toolkit that compares the lexcial-level similarities among words and group similar words together? For example, Blue jean, Blue jeans, and blue jea (miss-spelled) should be grouped together? I don't need to look for…
walkman
  • 109
  • 2
  • 3
  • 8
0
votes
1 answer

Is there a port for KStem for .NET?

I'm about to launch into a Lucene.NET implementation and I am concerned about using the PorterStemFilter. Reading here, and reading source code, it appears to be far, far too aggressive for my needs. I need something simpler that doesn't look for…
Kevin
  • 1,829
  • 1
  • 21
  • 22
0
votes
0 answers

Getting the root of an Arabic word

I have a Python code that take an Arabic word and get the root and also remove diacritics, but I have a problem with the output. For example: when the input is "العربيه" the output is:"عرب" but when the input is "كاتب" the output is:"ب", and when…
0
votes
2 answers

Using stemming in a SOLR query

I've set up SOLR, and added a document to the example 'collection1'. 3007WFP Fishing Ladies I can query it ok in the interface using name:*fishing* but I…
finoutlook
  • 2,523
  • 5
  • 29
  • 43
0
votes
1 answer

Searching in Lucene .Net

I have used Lucene .Net for Indexing and using StandardAnalyzer to at time of Indexing. Now I want to search say 'attach'. In document 'attached' is there. How i get the successful hit for word 'attach'. Please help me as soon as possible.
Ashish