2

I am trying to tokenize a sentence in a pandas dataframe but I am having some trouble

I know this code works to covert just one row

TextBlob(df['H'][0]).words

But when I tried to apply it in a for loop I got an error

for i, row in df.H():
ifor_val = TextBlob(df['H'][i]).words
df.at[i,'ifor'] = H

Error message: TypeError: 'Series' object is not callable

Edit:

data = {'H':['the quick brown fox jumps over the road', 'the weather is nice 
today'], 'marks':[99, 98]} 
df = pd.DataFrame(data) 

desired

H                                  marks
['the','quick','brown', 'fox'....]   99
['the','weather','is', 'nice'....]   98

SOLUTION:

df['H']=df['H'].apply(word_tokenize) df['H'].head()

OptimusPrime
  • 619
  • 8
  • 17

3 Answers3

0

You may be wanting to apply a function to every row in a dataframe. In this case you can use lambda to apply a function once per row over the entire dataframe.

Assuming H is the column you are targeting, and each row is the exact text you want to send to TextBlob, the following would add a column called 'output' which would be the result of the TextBlob fuction

df['output'] = df['H'].apply(lambda x: TextBlob(x)) 
Chris
  • 15,819
  • 3
  • 24
  • 37
0

this gave me what you wanted :

data = {'H':['the quick brown fox jumps over the road', 'the weather is nice today'], 'marks':[99, 98]} 
df = pd.DataFrame(data) 

print(df)


df2 = df.drop("H",axis=1).copy()


df2.insert(loc=0, column='H', value=[[] for x in range(df.shape[0])])

for index, row in df2.iterrows():
    vals = df.loc[index,"H"].split()

    for word in vals : 
        df2.loc[index,"H"].append(word) 

print(df2)

nassim
  • 1,547
  • 1
  • 14
  • 26
0

If you want to iterate over index - value(strings in this case) pairs of a column, you will need the iteritems() method of the column:

for i,  s in df.H.iteritems():
    pass #  Do stuff with your values

Better add a new column, instead of overwriting the old.

phylogram
  • 123
  • 7