Highest Voted 'stemming' Questions

2

votes

1 answer

Removing hyphens in http but preserving hyphenated words in corpus

I am trying to modify a stemming function that is able to 1) remove hyphens in http (that appeared in the corpus) but, meanwhile, 2) preserve hyphens that appeared in meaningful hyphenated expressions (e.g., time-consuming, cost-prohibitive,…

asked Oct 05 '18 at 11:15

Chris T.

1,699
7
23
45

2

votes

0 answers

Stemming french text with NLTK

I'm trying to stemming a text in French with NLTK europe amérique nord fruits espèce fraisier bois petite taille connus depuis antiquité romains consommaient utilisaient produits cosmétiques raison odeur agréable cultivée jardins européens fraisier…

python python-3.x nltk stemming

asked Jul 26 '18 at 16:37

marin

923
2
18
26

2

votes

2 answers

Remove punctuation but keep hyphenated phrases in R text cleaning

Is there any effective way to remove punctuation in text but keeping hyphenated expressions, such as "accident-prone"? I used the following function to clean my text clean.text = function(x) { # remove rt x = gsub("rt ", "", x) # remove at x…

r regex stemming punctuation hyphenation

asked Mar 05 '18 at 16:33

Chris T.

1,699
7
23
45

2

votes

1 answer

How to apply a custom stemmer before passing the training corpus to TfidfVectorizer in sklearn?

Here is my code, I have a sentence and I want to tokenize and stem it before passing it to TfidfVectorizer to finally to get a tf-idf representation of the sentence: from sklearn.feature_extraction.text import TfidfVectorizer import nltk from…

python scikit-learn stemming document-classification tfidfvectorizer

asked Feb 22 '18 at 11:00

Eugenio

3,195
5
33
49

2

votes

1 answer

How to use new .sbl Snowball algorithm in Python?

I want to use Lithuanian language stemmer in Python, however, there is no Lithuanian language in common tools like NLTK. However, I could find snowball .sbl files of Lithuanian stemmers here and here. But how to use them in Python? What I was able…

python stemming snowball

asked Feb 10 '18 at 08:39

Lukas

160
2
8

2

votes

2 answers

Get the word from stem (stemming)

I am using porter stemmer as follows to get the stem of my words. from nltk.stem.porter import PorterStemmer stemmer = PorterStemmer() def stem_tokens(tokens, stemmer): stemmed = [] for item in tokens: …

nlp nltk text-mining stemming

asked Dec 09 '17 at 08:39

user8871463

2

votes

2 answers

Russian Porter stemming in JavaScript

Does someone have an example of Russian Porter stemming in JavaScript?

javascript stemming

asked Jan 08 '11 at 10:39

Semen

41
1

2

votes

0 answers

R language - stem completion in italian

I have a large corpus of text, in italian, to analyze using the R-language. Almost all the preprocessing method is easily writable to adapt to my native language, with a couple of default libraries. Problem is I can't find a way to implement a…

r stemming

asked Aug 29 '17 at 22:53

Cristiano

21
2

2

votes

3 answers

Ruby: is there a stemmer that "knows" English irregular verbs?

There is a ruby stemmer https://github.com/aurelian/ruby-stemmer, but it 1) does not stem English irregular verbs 2) fails to build native extensions on Windows. Is there an alternative that fixes at least one of the problems?

ruby nlp stemming

asked Dec 21 '10 at 16:22

Alexey

9,197
5
64
76

2

votes

2 answers

MarkLogic generic language support

As per the documentation: The generic language support only stems words to themselves, queries in these languages will not include variations of words based on their meanings in the results. xdmp:document-insert("/lang/test.xml",

indexing marklogic stemming

asked Jul 05 '17 at 14:44

Yash

510
2
6
14

2

votes

1 answer

Word stemming in R

I am working on a text mining project and trying to clean the text - words in singular/plural forms, verbs in different tenses and misspelling words. My sample looks like this: test <-…

r text-mining stemming

asked Mar 20 '17 at 16:17

Ran Tao

311
1
4
13

2

votes

0 answers

Avoiding specific words in word stemming with tm package

A previous post addressed this issue here: Text-mining with the tm-package - word stemming However I am still running into challenges with the tm package. My goal is to stem a large corpus of words, however I wish to avoid stemming specific words.…

r meta tm stemming

asked Mar 13 '17 at 15:17

kdudeIA

57
1
6

2

votes

1 answer

Python Snowball Stemmer + RAKE: generates 'u's

I am trying to get the keywords from a text file containing a text, and I'm stemming the text first. The code below works, but for some reason it generates the letter 'u' in front of the keyword list. E.g. this is what I get: [(u'keyword1', 5),…

python rake stemming

asked Feb 14 '17 at 11:57

user7443687

2

votes

1 answer

How to provide (or generate) tags for nltk lemmatizers

I have a set of documents, and I would like to transform those into such form, that it would allow me to count tfidf for words in those documents (so that each document is being represented by vector of tfidf-numbers). I thought that it is enough to…

python nltk stemming lemmatization

asked Nov 12 '16 at 23:28

Zbyszek M.

85
1
8

2

votes

1 answer

Snowball Stemming: defining Null Region

I'm trying to understand the snowball stemming algorithmus. HW90 has had a similar question with examples, but not mine. The algorithmus is using two regions R1 and R2 that are definied as follows: R1 is the region after the first non-vowel…

nlp stemming linguistics porter-stemmer snowball

asked Sep 06 '16 at 18:54

NewbieXXL

155
1
1
11

Prev 1 2 3

…

35 36 Next

Questions tagged [stemming]