1

I have a dataframe,

DF,
Name    Stage   Description
Sri     1       Sri is one of the good singer in this two
        2       Thanks for reading
Ram     1       Ram is one of the good cricket player
ganesh  1       good driver

and a list,

my_list=["one"]

 I tried mask=df["Description"].str.contains('|'.join(my_list),na=False)

but it gives,

 output_DF.
Name    Stage   Description
Sri     1       Sri is one of the good singer in this two
Ram     1       Ram is one of the good cricket player

My desired output is,
desired_DF,
Name    Stage   Description
Sri     1       Sri is one of the good singer in this two
        2       Thanks for reading
Ram     1       Ram is one of the good cricket player

It has to consider the stage column, I want all the rows associated with the description.

Pyd
  • 6,017
  • 18
  • 52
  • 109

2 Answers2

1

I think you need:

print (df)
     Name  Stage                                Description
0     Sri      1  Sri is one of the good singer in this two
1              2                         Thanks for reading
2     Ram      1      Ram is one of the good cricket player
3  ganesh      1                                good driver

#replace empty or whitespaces by previous value
df['Name'] = df['Name'].mask(df['Name'].str.strip() == '').ffill()
print (df)
     Name  Stage                                Description
0     Sri      1  Sri is one of the good singer in this two
1     Sri      2                         Thanks for reading
2     Ram      1      Ram is one of the good cricket player
3  ganesh      1                                good driver

#get all names by condition
my_list = ["one"]
names=df.loc[df["Description"].str.contains("|".join(my_list),na=False), 'Name']
print (names)
0    Sri
2    Ram
Name: Name, dtype: object

#select all rows contains names
df = df[df['Name'].isin(names)]
print (df)
  Name  Stage                                Description
0  Sri      1  Sri is one of the good singer in this two
1  Sri      2                         Thanks for reading
2  Ram      1      Ram is one of the good cricket player
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • IF we have, my_list ["Thanks"], it gives me "Thanks for reading" row. But i dont want to map when the stage is other than 1. Is there a way ? – Pyd Oct 04 '17 at 09:13
  • I want to map my_list and df["Description"] only when the stage is 1. if we find a match we will get all the stages for the particular description. – Pyd Oct 04 '17 at 09:19
  • Yes, I think instead `df["Description"].str.contains("|".join(my_list),na=False)` need `df["Description"].str.contains("|".join(my_list),na=False) & (df['Stage'] == 1)` – jezrael Oct 04 '17 at 09:20
  • do you want to replace this **names=df.loc[df["Description"].str.contains("|".join(my_list),na=False), 'Name']** if i did I am getting only the columns, no values are there – Pyd Oct 04 '17 at 09:26
  • Yes, exactly. Need chain new condition. – jezrael Oct 04 '17 at 09:27
  • ok I got it, If we changed we are getting the rows which are having stage=1. – Pyd Oct 04 '17 at 09:28
  • For condition we need to map with df["Description"] when the stage=1. if we find a map we need to consider stage =2 also for the same description – Pyd Oct 04 '17 at 09:32
  • Yes, `names=df.loc[df["Description"].str.contains("|".join(my_list),na=False) & (df['Stage'] == 1), 'Name']` return nothing for `my_list = ["Thanks"]`, because `Thanks` is in stage 2 only. – jezrael Oct 04 '17 at 09:34
  • 1
    yes Jezrael, it works fine. I dint check properly. Thanks for your answer – Pyd Oct 04 '17 at 09:36
  • Hi Jezrael, I want to add the keyword "one " in a new column in the final dataframe, – Pyd Oct 11 '17 at 11:18
0

It looks to be finding "one" in the Description fields of the dataframe and returning the matching descriptions.

If you want the third row, you will have to add an array element for the second match

eg. 'Thanks' so something like my_list=["one", "Thanks"]

Calvin Taylor
  • 664
  • 4
  • 15