Stemming Pandas Dataframe 'float' object has no attribute 'split'

Question

import pandas as pd
from nltk.stem import PorterStemmer, WordNetLemmatizer
porter_stemmer = PorterStemmer()

df = pd.read_csv("last1.csv",sep=',',header=0,encoding='utf-8')

df['rev'] = df['reviewContent'].apply(lambda x : filter(None,x.split(" ")))

Dataset

I am trying to stem my dataframe. While tokenizing I am getting this error for

df['rev'] = df['reviewContent'].apply(lambda x : filter(None,x.split(" ")))

AttributeError: 'float' object has no attribute 'split'

While using Stemming I also get the float problem

df['reviewContent'] = df["reviewContent"].apply(lambda x: [stemmer.stem(y) for y in x])

TypeError: 'float' object is not iterable

What can I do?

Where is your data? What is your expected output? Your code isn't enough to help. — cs95, Nov 07 '17 at 16:20
This is a dataset for yelp fake review. I am trying to stem my whole dataset. Should I upload the dataset too?? — Ashfaq Ali Shafin, Nov 07 '17 at 16:31
I edited the post and added a photo of the dataset. Is it enough? — Ashfaq Ali Shafin, Nov 07 '17 at 16:44

score 4 · Accepted Answer · answered Nov 07 '17 at 16:50

4

When tokenising your data, you don't need the apply call. str.split should do just fine. Also, you can split on multiple whitespace, so you don't have to look for empty strings.

df['rev'] = df['reviewContent'].astype(str).str.split()

You can now run your stemmer as before:

df['rev'] = df['rev'].apply(lambda x: [stemmer.stem(y) for y in x])

answered Nov 07 '17 at 16:50

cs95

379,657
97
704
746

Sorry getting another error: UnicodeEncodeError: 'ascii' codec can't encode characters in position 79-80: ordinal not in range(128) I havo to do the following things to overcome the error: import sys reload(sys) sys.setdefaultencoding('utf8') is it okay? – Ashfaq Ali Shafin Nov 07 '17 at 16:56
@AshfaqAliShafin Yeah, that's okay. All the best! – cs95 Nov 07 '17 at 17:00

score 0 · Answer 2 · edited Oct 09 '20 at 21:57

0

We can also write it this way

df['rev'] = df['rev'].astype(str).apply(lambda x: stemmer.stem(x))

edited Oct 09 '20 at 21:57

Dharman

30,962
25
85
135

answered Oct 09 '20 at 16:07

Raj

173
5

Stemming Pandas Dataframe 'float' object has no attribute 'split'

2 Answers2