2

I am preprocessing text data. After stemming when I am doing lemmatizing, it is giving exactly the same results as stemming (no change in text). I can't understand what is the issue.

def stem_list(row):
    my_list = row['no_stopwords']
    stemmed_list = [stemming.stem(word) for word in my_list]
    return stemmed_list


Japan['stemmed_words'] = Japan.apply(stem_list, axis=1)


def lemma_list(row):
    my_list = row['stemmed_words']
    lemmas_list = [lemma.lemmatize(word) for word in my_list]
    return lemmas_list


Japan['lemma_words'] = Japan.apply(lemma_list, axis=1)

Below is the sample output:

secur huawei involv uk critic network suffici mitig longterm hcsec form mitig perceiv risk aris involv huawei critic nation infrastructur governmentl board includ offici britain gchq cybersecur agenc well senior huawei execut repres uk telecommun

My text is news articles. I am using PorterStemmer for Stemming, and WordNetLemmatizer for Lemmatizing.

Thank you in Advance.

Piyush Ghasiya
  • 515
  • 7
  • 25
  • It looks to me like you're trying to lemmatize words that are already reduced to their stem. In general this won't work. Try lemmatizing the original word. You also should be passing in the part-of-speech to the Wordnet Lemmatizer, otherwise it will treat all words as nouns. If you want more help, you'll probably have to post a fully runnable sample of code and data that exhibits the issue. – bivouac0 Oct 29 '19 at 23:14
  • I have posted another question with full code. https://stackoverflow.com/questions/58618352/how-to-pass-part-of-speech-in-wordnetlemmatizer – Piyush Ghasiya Oct 30 '19 at 03:31

1 Answers1

0

The reason your text is likely not changing during lemmatization is that stemmed words are often not real words that have lemmas at all.

Both processes try to shorten a word to its root, but stemming is strictly and algorithm and lemmatization uses a vocabulary to find the smallest root of a word. Generally you would use one or the other depending on the speed you need.

However, if you just want to see the results of both in series, reverse your process- you should get stems that differ from the lemmas you feed into the stemmer.