0

I try to use any() to check if the column contains any string from the list and make a new column with the corresponding results

df_data = pd.DataFrame({'A':[2,1,3], 'animals': ['cat, frog', 'kitten, fish', 'frog2, fish']})
cats = ['kitten', 'cat']
df_data['cats'] = df_data.apply(lambda row: True if any(item in cats for item in row['animals']) else False, axis = 1)

I got these results, and I don't understand why it is False for the first two rows :

   A       animals   cats
0  2     cat, frog  False
1  1  kitten, fish  False
2  3   frog2, fish  False

I expect to get False for the last row only

cat_on_the_mat
  • 100
  • 1
  • 9

2 Answers2

1

With pandas you should try your best not using for loop or apply , I am using DataFrame constructor with isin and any

df_data['cats']=pd.DataFrame(df_data.animals.str.split(', ').tolist()).isin(cats).any(1)
df_data
   A       animals   cats
0  2     cat, frog   True
1  1  kitten, fish   True
2  3   frog2, fish  False
BENY
  • 317,841
  • 20
  • 164
  • 234
0

Flip your iterables

df_data['cats'] = df_data.apply(lambda row: True if any([item in row['animals'] for item in cats]) else False, axis = 1)

print(df_data)
#    A       animals   cats
# 0  2     cat, frog   True
# 1  1  kitten, fish   True
# 2  3   frog2, fish  False

If you look closely

item in row['animals'] for item in cats

will iterate over cats and see if the item is in row['animals']

item in cats for item in row['animals']

will iterate over row['animals'] and see if the value of row['animals'] is in the cats list

Cohan
  • 4,384
  • 2
  • 22
  • 40
  • Just curious about whether you consider the speed of the code or not – BENY May 01 '19 at 23:54
  • In this case, I was worried about showing them the issue with their code. Sometimes I try to fix the mistakes, other times I try to inform of a better way. Though I do appreciate when I get to learn new things from your comments and answers. Also, I only really care about the speed of the code if the problem set is long enough to make speed an issue. Otherwise, readability is also good. – Cohan May 01 '19 at 23:57
  • Ok that is fair enough :-) – BENY May 01 '19 at 23:58