Questions tagged [snowball]

Snowball is a small language for writing stemming algorithms, used primarily in information retrieval and natural language processing.

Created by Dr. Martin Porter, Snowball is a small string processing language designed for creating stemming algorithms for use in Information Retrieval. It was created partially to provide a canonical implementation of Porter's stemming algorithm, and partially to facilitate the creation of stemmers for languages other than English.

A further aim of Porter's was to provide a way of creating and defining stemmers that could readily or automatically be translated into C, Java, or other programming languages. The Snowball compiler translates a Snowball script (a .sbl file) into either a thread-safe ANSI C program or a Java program. For ANSI C, each Snowball script produces a program file and corresponding header file (with .c and .h extensions).

The name "Snowball" is a tribute to the SNOBOL programming language.

73 questions
1
vote
1 answer

ElasticSearch: snowball not working?

I build the following: curl -XDELETE "http://localhost:9200/testindex" curl -XPOST "http://localhost:9200/testindex" -d' { "mappings" : { "article" : { "dynamic" : false, "properties" : { "text" : { "type"…
Dave Carpeneto
  • 1,042
  • 2
  • 12
  • 23
1
vote
1 answer

IncompatibleClassChangeError using Snowball Stemmer

I'm stuck in this problem from 3 days and i don't find any solution. I'm developing a DM application with NetBeans 7.3 using the Weka developer edition (3.7.10). I'm trying to use the Snowball stemmer and I keep getting the same exception while I…
fran
  • 145
  • 1
  • 11
1
vote
1 answer

How can I add english to SnowballStemmer inside NLTK?

I have installed all the possible packages from nltk.download() interface but still SnowballStemmer lacking about english language if I print all available. Ho can I add english to this stemmer in NLTK?
erogol
  • 13,156
  • 33
  • 101
  • 155
1
vote
4 answers

Weka and Snowball don't work when exported in JAR

This problem is really driving me crazy, TO ANSWER MOST OF WHAT PEOPLE THINK: YES I ADDED snowball.jar TO THE CLASSPATH I have a simple main class that supposed to stem the word "going" to "go": import weka.core.stemmers.SnowballStemmer; public…
TeFa
  • 974
  • 4
  • 15
  • 37
1
vote
1 answer

Can "Java-only" analyzer be used under Lucene.Net?

I thought the answer is "No". But I saw some interesting words of Microsoft MVP Simone Chiaretta: Directoy The index structure is compatible with all ports of Lucene, so you could also have the indexing done with .NET and searched with Java, or …
Dmitry Isaev
  • 3,888
  • 2
  • 37
  • 49
0
votes
0 answers

stringdef doesn't expand for č and unicode input

I've started to play with snowball, here's the basic code I'm using, and I'm stuck with non-ascii letter č. From what I see in the produced files, it has no special handling. The .sbl in a nutshell is: externals ( stem ) stringdef cv '{U+010D}' //…
aikipooh
  • 137
  • 1
  • 19
0
votes
0 answers

How to use snowball (.sbl) file in Python

I am a first-year Master's student currently learning NLP with Python. My professor has assigned me a mini project that involves implementing Porter Stemmer in Snowball Language. However, I am a bit confused and would appreciate some assistance from…
Fatihi Youssef
  • 411
  • 1
  • 6
  • 15
0
votes
3 answers

Remove common english words strategy

I want to extract relevant keywords from a html page. I already stipped all html stuff, split the text into words, used a stemmer and removed all words appearing in a stop word list from lucene. But now I still have alot of basic verbs and pronouns…
Franz Kafka
  • 10,623
  • 20
  • 93
  • 149
0
votes
1 answer

How to fix errors during the building snowball tool from sources?

I've downloaded sources of https://github.com/snowballstem/snowball and try to build it by official guide on my machine. Unfortunately, I have gotten an error make: *** No rule to make target 'install'. Stop. Steps to reproduce: $ git clone…
0
votes
0 answers

How does full-text search snowball algorithm interpret words of an unspecified language

I build a full-ext search index with sqlite and don't understand what is going on internally when i'm scanning documents contain few languages. For example, i describe a programming topic i'm learning in Russian and add into the description code…
kvdm.dev
  • 141
  • 1
  • 1
  • 12
0
votes
0 answers

Applying Snowballstemmer to a Pandas dataframe for each word

SO I want to apply stemming using Snowballstemmer on a column (unstemmed) of a dataframe in order to use a classification algorithm. So my code looks like the following: df = pd.read_excel(...) df["content"] = df['column2'].str.lower() stopword_list…
Mathy
  • 21
  • 3
0
votes
1 answer

Not getting the right text after stemming in text analysis (Swedish)

I am having problem with getting the right text after stemming in R. Eg. 'papper' should show as 'papper' but instead shows up as 'papp', 'projekt' becomes 'projek'. The frequency cloud generated thus shows these shortened versions which loses the…
Dejie
  • 95
  • 7
0
votes
1 answer

Snowball Edge - aws-sdk-go package in Golang - Can't Connect to S3

I'm using the aws-sdk-go package in Golang to connect to Amazon S3 to provide a cloud-based storage pool. I have this working well. I would like to be able to support bulk high-speed transfers using Snowball, so I got a Snowball Edge to test this in…
Keith Hogan
  • 117
  • 1
  • 9
0
votes
0 answers

stemming of set of tokens using r

I have tried using snowballc stemmer for stemming but it produces different output for same queries wordStem("waiting",language = "porter") ## [1] wait The above word is correctly stemmed but whenever i give a set of tokens as an input c("htc",…
p.k
  • 37
  • 3
0
votes
1 answer

Stemmer function in R Slow

I am trying to run stemmer function on a dataset(uploaded through data.table package) in R of around 40000 rows,but its taking forever to run. My code looks like this: data[, Description := map(Description, function(k) stemmer(k))] If manually stop…