I want to lemmatize dask dataframe but I am stuck

Asked Mar 04 '19 at 08:18

Active Mar 04 '19 at 08:18

Viewed 121 times

I am new to dask and was wondering if anyone could give me a hand. I have a large text dataset >20GB and need/want to lemmatize a column. My current function - which works with pandas directly is

wnl = WordNetLemmatizer()

def lemmatizing(sentence):    
    stemSentence = ""

    for word in sentence.split():
        stem = wnl.lemmatize(word)
        stemSentence += stem
        stemSentence += " "

    stemSentence = stemSentence.strip()

    return stemSentence

And usually would do the following

df['news_content'] = df['news_content'].apply(lemmatizing)

I was looking at delayed but I am puzzled on how to implement it.

Any help is highly appreciated.

asked Mar 04 '19 at 08:18

osterburg

2

dask also has an apply function. – cs95 Mar 04 '19 at 08:22
Thanks. I guess what confused me was that ```.compute()``` resulted in an error. – osterburg Mar 04 '19 at 08:47
1

Could you produce a [mcve](/help/mcve)? – rpanai Mar 04 '19 at 11:16

I want to lemmatize dask dataframe but I am stuck

0 Answers0