Questions tagged [snowball]

Snowball is a small language for writing stemming algorithms, used primarily in information retrieval and natural language processing.

Created by Dr. Martin Porter, Snowball is a small string processing language designed for creating stemming algorithms for use in Information Retrieval. It was created partially to provide a canonical implementation of Porter's stemming algorithm, and partially to facilitate the creation of stemmers for languages other than English.

A further aim of Porter's was to provide a way of creating and defining stemmers that could readily or automatically be translated into C, Java, or other programming languages. The Snowball compiler translates a Snowball script (a .sbl file) into either a thread-safe ANSI C program or a Java program. For ANSI C, each Snowball script produces a program file and corresponding header file (with .c and .h extensions).

The name "Snowball" is a tribute to the SNOBOL programming language.

73 questions
1
vote
1 answer

Passing value in column as parameter in apply with nltk snowball stemmer

Passing df[language] works for stopwords but not for snowball stemmer. Is there a way I can get around that? I haven't really found any clues so far... import nltk from nltk.corpus import stopwords import pandas as pd import re df =…
1
vote
2 answers

problems in stemming in text analysis (Swedish data)

In the following codes, my aim is to reduce the number of words with the same stem. For example, kompis in Swedish refer a friend in English, and the words with similar roots are kompisar,…
1
vote
0 answers

Reverse of Stemming

Is there any way in R to reverse process of stemming? I have some russian keywords. I want to find out all the possible roots of the words. library(SnowballC) wordStem('выявлениа', language = "ru") wordStem('выявления', language = "ru") It returns…
john
  • 1,026
  • 8
  • 19
1
vote
1 answer

Snowball Stemmer : poor french stemming

I'm dealing with some nlp tasks. My inputs are french text and so, only Snowball Stemmer is usable in my context. But, unfortunately, it keeps giving me poor stems as it wouldn't remove even plural "s" or silent e. Below is some example: from…
Neroksi
  • 1,301
  • 1
  • 12
  • 20
1
vote
1 answer

Why am I missing the last letter in term document matrix?

I am new to R and I'm trying to create term document matrix with a csv file. But the results show that some of the words are missing the letter "e" in the end. How can I make the term document matrix showing the full words? It will be great if you…
Amelia
  • 11
  • 2
1
vote
1 answer

Defining a list of strings using snowball

How can i define a list string using snowball ? I have tried to do it like this : define patterns ( '{m}{f}{i}{l}' or '{f}{a}{i}{l}' or ....... ) How to get the list length ? how to deal with every pattern ?
1
vote
1 answer

Can i do this code python with snowball?

The word length is 5. I want to delete the letter in position 0 and the letter in position 3 with python seems like this : word = word[1:3] + word[4] #this is with python The question is, How i can do it with snowball ?
1
vote
2 answers

KeyError: "Stemming algorithm not found" using Snowballstemmer for Arabic

I installed this stemmer for arabic language Here. I was running it with this code : from snowballstemmer import stemmer ar_stemmer = stemmer("arabic") ar_stemmer.stemWord(u"فسميتموها") And when I run it, I get this : Traceback (most recent call…
YayaYaya
  • 125
  • 2
  • 3
  • 10
1
vote
1 answer

how to use snowball's catalan stemmer?

I want to use the catalan stemmer provided in here: http://snowball.tartarus.org/algorithms/catalan/stemmer.html However, when I do: from nltk.stem.snowball import SnowballStemmer stemmer = SnowballStemmer("catalan") it says: the language…
woohooo
  • 11
  • 3
1
vote
1 answer

Porter Stemmer and Weka

I am using Weka with the porter Stemmer provided in the SnowBall package. Everything works fine if I run my application within Eclipse, but as soon as I export it as runnable jar (With all the libraries included) weka says: Stemmer 'porter'…
mariosangiorgio
  • 1,007
  • 2
  • 13
  • 26
1
vote
2 answers

Stemming words in r does not work as expected

I am trying to do a very simple word steming in R and getting something very unexpected. In the code below 'complete' variable is 'NA'. Why can't I complete stem on the word easy? library(tm) library(SnowballC) dict <- c("easy") stem <-…
user2630162
  • 137
  • 1
  • 12
1
vote
1 answer

Custom analyzer elasticsearch soundex plus snowball

The following works for me (searching for 'testing' also returns fields with 'test'): index : analysis : analyzer : default : type : snowball language : english when set up in my elasticsearch.yml file . I want to…
Ben Dubuisson
  • 727
  • 13
  • 38
1
vote
1 answer

add elision filter to snowball

At first, I was using the analizer "language analyzer" and everything seemed to work very well. Until I realize that "a" is not part of the list of stopwords in french So I decided to test with snowball. It also seemed working well, but in this…
user2016483
  • 583
  • 1
  • 5
  • 13
1
vote
2 answers

Failed with error: ‘package ‘sentiment’ was built before R 3.0.0: please re-install it’

I am trying to run the snaMIC.R script which is doing sentiment analysis on twitter data. But it is failing with an error saying package sentiment was built before R 3.0.0: please re-install. I am using R-3.1.0 i386 (32 bits win). Another thing that…
somnathchakrabarti
  • 3,026
  • 10
  • 69
  • 92
1
vote
1 answer

Word does not get analysed properly using StemmerOverrideFilterFactory and SnowballPorterFilterFactory for Dutch language

Solr: 3.5 Hi, I created a dutch field type according to the following fieldType definition:
kdvr
  • 55
  • 4