Questions tagged [snowball]

Snowball is a small language for writing stemming algorithms, used primarily in information retrieval and natural language processing.

Created by Dr. Martin Porter, Snowball is a small string processing language designed for creating stemming algorithms for use in Information Retrieval. It was created partially to provide a canonical implementation of Porter's stemming algorithm, and partially to facilitate the creation of stemmers for languages other than English.

A further aim of Porter's was to provide a way of creating and defining stemmers that could readily or automatically be translated into C, Java, or other programming languages. The Snowball compiler translates a Snowball script (a .sbl file) into either a thread-safe ANSI C program or a Java program. For ANSI C, each Snowball script produces a program file and corresponding header file (with .c and .h extensions).

The name "Snowball" is a tribute to the SNOBOL programming language.

73 questions
0
votes
1 answer

Does stemDocument in R remove "ed" ending?

Below is how I stem my Corpus and my documents. However, for example "work" and "worked" show up a large amount of the time and these are obviosuly the same word for all intents and purposes in my analysis. Is there a package or some code snippet to…
agunner
  • 55
  • 7
0
votes
0 answers

How do I instantiate a class from this JAR file using rJava?

I was working on a text-analysis project in R and needed a stemmer. I found a JAR-file of a java compilation of a snowball stemmer. Documentation: http://lucene.apache.org/core/3_0_3/api/contrib-snowball/ Download link (bottom…
dimpol
  • 545
  • 9
  • 21
0
votes
2 answers

Elasticsearch how to configure language analyzer (German) or build a custom normalizer

I am using the german language analyzer to tokenize some content. I know that it is basically a macro filter for "lowercase","german_stop", "german_keywords", "german_normalization", "german_stemmer". My problem has to do with the nomalization…
Tom
  • 3,807
  • 4
  • 33
  • 58
0
votes
1 answer

Snowball stemmer is not working

I created an index for an attachment using elasticsearch2.3.3 and Nest 2.3.2.My indexing is given below. I am searching for singular words with plurals in the document.Read that snowball stemmer will do this type of conversion. But no records were…
Ajoe
  • 1,397
  • 4
  • 19
  • 48
0
votes
1 answer

How to make snowball greedy between two matches?

I have 2 routines that should be completely parallel. I want Snowball to execute them and choose the one with the longest match. Currently, I run them using or. That means execute the first, if fails execute the second. I thought of perform a…
Assem
  • 11,574
  • 5
  • 59
  • 97
0
votes
1 answer

Weka Snowball not working

I'm trying to create an Italian text classifier with Weka using Weka's StringToWordVector to create the features. The classifier works fine, but I set a stemmer as an option of the filter and it doesn't work. This is my code: SnowballStemmer sb=new…
AlbertoD
  • 146
  • 1
  • 9
0
votes
1 answer

tm and Snowball package commands slow in Linux

I am using tm and Snowball packages in R for text mining. I initially ran it on my laptop that has Windows 7 with 8 GB memory. Later I tried the same on a Linux (Ubuntu) machine with 64 GB of memory. Both of these machines are 64 bit and am using 64…
Ravi
  • 3,223
  • 7
  • 37
  • 49
0
votes
2 answers

Lucene using Snowball and SpellChecker brings back strange values

I am trying to get SpellChecker setup using Lucene.NET, it all works fine other than situations similar to the following: I have text containing satellite in the index, I analyze it using Snowball. I then create a SpellChecker index and get…
John_
  • 2,931
  • 3
  • 32
  • 49
0
votes
1 answer

SOLR Snowball Porter for Arabic

is there a Snowball Porter Filter or any similar filter for Arabic? I need it to normalize plural words into singular words for the Arabic language
Jad Joubran
  • 2,511
  • 3
  • 31
  • 57
0
votes
0 answers

R unable to load package Snowball, rJava

I am trying to get the R package "lsa" running, which in turn requires Snowball, which in turn fails. I'm running OpenSUSE 12.2 with the latest R-patched build (currently 3.01). Here's the thing: the libraries load no problem if I do "sudo R" but if…
WorldsEndless
  • 1,493
  • 1
  • 15
  • 27
0
votes
1 answer

ElasticSearch stemming with protected words

I'm using ElasticSearch (via Ruby, Tire) for a search feature on an ecommerce clothing website. I need a stemming filter, BUT I also need to be able to specify a list of protected words which do not get stemmed. Currently I'm using the snowball…
awhitworth
  • 93
  • 7
0
votes
1 answer

Solr SnowballPorterFilterFactory for index and query analyzers

I use SnowballPorterFilterFactory for index and query analyzers. When i search for "profession" word. Solr successfully finds only articles that contains "profession", but i want "professional" "professionalism" ... This is the current…
ZendMind
  • 101
  • 1
  • 3
  • 7
-1
votes
2 answers

libstemmer sphinx does not work

I have sphinx installed on my vagrant machine with CentOs 6 and i'm trying to install the dutch libstemmer from Snowball. The installation was executed successfully but the tests goes wrong. I have create 2 indexes with exactly the same data. My…
1 2 3 4
5