0

I have a dataframe which consists of the columns 'App', 'Query' and 'Label'

train_data = pd.DataFrame({
     'Query':['shoerack','shoerack','shoerack','shoerack', 'nike shoes'], 
     'App':  ['amazon', 'amazon', 'amazon', 'amazon', 'zalando'],
     'Label':[1, 1, 1, 1, 1]})

now if I do a simple apply:

train_data.apply(lambda row: print(row['App']))

I get:

KeyError                                  Traceback (most recent call last)
Cell In[20], line 4
  1 train_data = pd.DataFrame({'Query': ['shoerack','shoerack','shoerack','shoerack', 'nike shoes'], 
  2                            'App': ['amazon', 'amazon', 'amazon', 'amazon', 'zalando'],
  3                            'Label': [1, 1, 1, 1, 1]})
 ----> 4 train_data.apply(lambda row: print(row['App']))
 
KeyError: 'App'

According to this: How to apply a function on every row on a dataframe? the apply should work fine as it is per row. Why do I get a Key Error if the key exists?

s.blnc
  • 76
  • 6
  • Out of curiosity, what are you trying to do? I almost never have to use `apply` on `axis=1` – mozway Apr 19 '23 at 20:52
  • I am generating a negative sample based on the input line. Therefore I am calling a function per row and passing the query and the app as arguments. I just simplified it with print() to make it easy reproduceable. – s.blnc Apr 19 '23 at 21:04
  • Thanks. Make sure to check whether your function can be vectorized. `apply` should always be avoided whenever possible. – mozway Apr 19 '23 at 21:08

1 Answers1

0

Using apply with the default axis will run on columns. The is no App indice in your index, thus the KeyError.

You need to use axis=1:

train_data.apply(lambda row: print(row['App']), axis=1)

Output:

# printed
amazon
amazon
amazon
amazon
zalando

# returned value
0    None
1    None
2    None
3    None
4    None
dtype: object
mozway
  • 194,879
  • 13
  • 39
  • 75