2

How do I print/return the value based on values from another column?

df = my_df[['Index', 'FRUITS']]
print(df)

 Index            FRUITS
     7       Green Apple
     7             Mango
     7            Orange
     7        Strawberry
     9         Pineapple
     9            Banana
     9            Grapes
    10   Orange (Unripe)
    10              Plum

L = ['apple', 'orange']

Here, I want to check if the string Apple or Orange are present in the substring for every Index irrespective of case, return those serial numbers where either one of these 2 fruits aren't found!

I tried using approaches from different answers and tried groupby and iterating over fruits:

out = df.groupby('Index')['FRUITS'].apply(lambda x: L in x)

TypeError: 'in <string>' requires string as left operand, not list

So, the expected output is:

[9, 10]
Yash Ghorpade
  • 607
  • 1
  • 7
  • 16
  • I understand why 10 is ther in expected output because it has Orange. But why 9? – Sushanth Nov 13 '19 at 10:56
  • I guess you got it in opposite way. 10 is there because it doesn't have apple and 9 because it doesn't have both orange and apple. I want Index when even 1 of the value is missing. – Yash Ghorpade Nov 13 '19 at 11:02

1 Answers1

1

using str.findall

result = df.groupby('Index')['FRUITS'].apply(' '.join).str.lower().str.findall('\\bapple\\b|\\borange\\b').str.len() < 2

list(result[result].index)

[9, 10]
iamklaus
  • 3,720
  • 2
  • 12
  • 21
  • I have a huge data of around 3000 records, now when I search apple the result is count = 2791, when I search orange the result is count = 800. But when I search for both values it gives 1895 as result. But if apple is missing in 2791 records, shouldn't the answer be 2791 instead of 1895. – Yash Ghorpade Nov 13 '19 at 11:37
  • does apple and orange present in a single string..? also your numbers dont make sense, if 3000 are total and you want those strings where both fruits aren't present so the result should be 2108 (1895 are where both are present).. . – iamklaus Nov 13 '19 at 11:56
  • also how are you searching..normal search for apple will consider string with "pineapple" in it as legit – iamklaus Nov 13 '19 at 11:57
  • I want data where either one isn't present so if apple isn't present in 2791 records, even if orange has an 800 number, the total records with missing value should be 2791 or greater (considering oranges were not found in different records and not the same as apples.) – Yash Ghorpade Nov 14 '19 at 09:31
  • One more thing, if you test only orange in the mentioned example, the expected output is [9] as only 9 doesn't have any substring orange but the result from above code is [7,8,9] – Yash Ghorpade Nov 14 '19 at 10:05
  • 1
    Oh! I see the number at the end in your code plays a vital role. More values equal is the number. Thanks. – Yash Ghorpade Nov 14 '19 at 10:08