1

I am carrying out EDA on a dataset and want to count the total number of words in a column, before and after deleting duplicates.

Here is my code:

print(train_dataset['text'].apply(lambda x: len(x.split(' '))).sum())

It is throwing this error:

AttributeError: 'float' object has no attribute 'split'
Simon Crane
  • 2,122
  • 2
  • 10
  • 21
  • 1
    Please provide a sample dataset as text – mozway Mar 05 '22 at 18:12
  • The problem is that `split` is part of the `str`-functions, but there's probably a better (i.e., more "panda-esque") way of doing this. Can you share an example of what the `text`-column looks like? – fsimonjetz Mar 05 '22 at 18:13

1 Answers1

0

You could try to convert column values to string type before split:

train_dataset['text'] = train_dataset['text'].astype(str)
train_dataset['text'].apply(lambda x: len(x.split())).sum()
# or
train_dataset['text'].apply(lambda x: len(str(x).split())).sum()
gremur
  • 1,645
  • 2
  • 7
  • 20