Tokenize each row in a dataframe - for loop not working

Question

I am trying to tokenize a sentence in a pandas dataframe but I am having some trouble

I know this code works to covert just one row

TextBlob(df['H'][0]).words

But when I tried to apply it in a for loop I got an error

for i, row in df.H():
ifor_val = TextBlob(df['H'][i]).words
df.at[i,'ifor'] = H

Error message: TypeError: 'Series' object is not callable

Edit:

data = {'H':['the quick brown fox jumps over the road', 'the weather is nice 
today'], 'marks':[99, 98]} 
df = pd.DataFrame(data)

desired

H                                  marks
['the','quick','brown', 'fox'....]   99
['the','weather','is', 'nice'....]   98

SOLUTION:

df['H']=df['H'].apply(word_tokenize) df['H'].head()

Please fix your indentation and post your error message. It would help us solve your problem. — d_kennetz, Apr 08 '19 at 17:43
Please add a [mcve] with 5 or so rows of data with expected output. Thanks. — cs95, Apr 08 '19 at 17:44

score 0 · Answer 1 · answered Apr 08 '19 at 17:54

0

You may be wanting to apply a function to every row in a dataframe. In this case you can use lambda to apply a function once per row over the entire dataframe.

Assuming H is the column you are targeting, and each row is the exact text you want to send to TextBlob, the following would add a column called 'output' which would be the result of the TextBlob fuction

df['output'] = df['H'].apply(lambda x: TextBlob(x))

answered Apr 08 '19 at 17:54

Chris

15,819
3
24
37

I tried that and it tokenizes the word to be just each letter vs a word – OptimusPrime Apr 08 '19 at 17:57
try TextBlob([x])) in the lambda function – Chris Apr 08 '19 at 18:02

nassim · Answer 2 · 2019-04-08T18:26:16.770

0

this gave me what you wanted :

data = {'H':['the quick brown fox jumps over the road', 'the weather is nice today'], 'marks':[99, 98]} 
df = pd.DataFrame(data) 

print(df)


df2 = df.drop("H",axis=1).copy()


df2.insert(loc=0, column='H', value=[[] for x in range(df.shape[0])])

for index, row in df2.iterrows():
    vals = df.loc[index,"H"].split()

    for word in vals : 
        df2.loc[index,"H"].append(word) 

print(df2)

edited Apr 08 '19 at 18:26

answered Apr 08 '19 at 17:57

nassim

1,547
1
14
26

i updated my answer please try it and see if this is what you wanted – nassim Apr 08 '19 at 18:27

score 0 · Answer 3 · answered Apr 08 '19 at 18:11

If you want to iterate over index - value(strings in this case) pairs of a column, you will need the iteritems() method of the column:

for i,  s in df.H.iteritems():
    pass #  Do stuff with your values

Better add a new column, instead of overwriting the old.

Tokenize each row in a dataframe - for loop not working

3 Answers3