Questions tagged [stemming]

The process for reducing inflected words to their stem.

In linguistic morphology and information retrieval, stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form—generally a written word form

531 questions
-1
votes
2 answers

Stemming and lemming words

I have a text document i need to use stemming and Lemmatization on. I have already cleaned the data and tokenised it as well as removing stop words what i need to do is take the list as an input and return a dict and the dict should have the keys…
Retsukki
  • 3
  • 2
-1
votes
1 answer

How to remove unnecessary words from string for better search

I have different strings for searching the related data but due to unnecessary words, retrieved results are not good. For example, "Working of genetic algorithm", so the words "working of" are not important in here. I can remove "of" by considering…
badar
  • 55
  • 5
-1
votes
1 answer

How to modify word in a for loop in python

Im trying to stem some text in python with SnowballStemmer, but it wont work. Here is the code: import nltk from nltk import SnowballStemmer stem = SnowballStemmer("spanish") def limpiar (texto): texto = texto.split() stemm =…
jimbeam
  • 3
  • 1
-1
votes
1 answer

error: object Stemmer is not a member of package org.apache.spark.mllib.feature

Importing the package org.apache.spark.mllib.feature.Stemmer in Spark-shell using Scala returns the following error: :47: error: object Stemmer is not a member of package org.apache.spark.mllib.feature import…
k_bm
  • 81
  • 1
  • 10
-1
votes
1 answer

How do I extract values from a string and use those values in a database query?

I'm trying to extract custom entities from a sentence/question and query them against a database, the problem is that I'm having trouble with the extraction of entities. My table has 10,000s of rows and looks like this: Car type |…
crossemup
  • 351
  • 1
  • 3
  • 9
-1
votes
1 answer

TypeError: translate() takes exactly 1 argument (2 given) Python

I found this python code to perform stemming on text files. import nltk import string from collections import Counter def get_tokens(): with open('/Users/MYUSERNAME/Desktop/Test_sp500/A_09.txt', 'r') as shakes: text = shakes.read() …
paschy96
  • 1
  • 2
-1
votes
1 answer

Python NLTK: search for occurrence of a word

I use the brown corpus "brown.words()" which gives me a list of 1161192 words. Now I want to find any occurrence of the word "have", so whenever in the corpus there is an "has", "had", "haven't" ect. I want to do something (could be pushing them…
-1
votes
1 answer

Can Azure Search Service be used to generate all the query tokens?

Is it possible to get all the tokens for a particular query through the azure search API without linking the actual DB source? I want to have operations like stemming, removing stop words etc. performed on the query entered by the userand then pass…
Amit
  • 194
  • 4
  • 20
-1
votes
1 answer

Ascii codec can't decode byte 0xc2 python nltk

I have a code that I'm using for Spam Classification and it works great but everytime I try to stem/lemmatize the word I get this error: File "/Users/Ramit/Desktop/Bayes1/src/filter.py", line 16, in trim_word word = ps.stem(word) File…
Ramit Sawhney
  • 31
  • 1
  • 9
-1
votes
2 answers

Value NULL after program compile in browser

I'm making an application to remove additive or commonly called Stemming confix stripping. I wanted to make a loop to process stemming in each text file. process stemming I've put them in the loop. making the process the content of each document…
-1
votes
2 answers

Python 2 : AttributeError: 'list' object has no attribute 'split'

this is my program of LSA, in this fonction i want to tokenize all my text and then transform it to stem. i'm trying to integrate them program of stemming and then i get this: for word in titles.split(" "): AttributeError: 'list' object has no…
YayaYaya
  • 125
  • 2
  • 3
  • 10
-1
votes
2 answers

libstemmer sphinx does not work

I have sphinx installed on my vagrant machine with CentOs 6 and i'm trying to install the dutch libstemmer from Snowball. The installation was executed successfully but the tests goes wrong. I have create 2 indexes with exactly the same data. My…
-1
votes
1 answer

Indonesian Stemmer Using Lucene

Here is class from Lucene library that I want to take advantage (make use) of.. But I don't know how to use/implement that library in Java.. Example: I have string array >> menjadikan, menjawab, penerbangan Can you help me in Java with creating such…
Lita
  • 175
  • 1
  • 2
  • 11
-1
votes
1 answer

How to store and get back stemmed text in solr?

Usually we store the original text and index the stemmed text and or the original text in Solr. Is there a possibility to store the stemmed content in solr? Because my goal is to obtain the stemmed version of the text when I do a query to…
-1
votes
1 answer

Is there any third party tool available to perform stemming in python

I am using Python NLTK library to perform stemming on a big corpus. I am doing following text = [porter.stem(token) for token in text.split()] text = ' '.join(text) "text" is representing one row of my file. I have millions of rows in my file, and…
Sangeeta
  • 589
  • 1
  • 7
  • 26
1 2 3
35
36