2

I have a dataframe,

DF,
Name    Stage   Description
Sri     1       Sri is one of the good singer in this two
        2       Thanks for reading
Ram     1       Ram is one of the good cricket player
ganesh  1       good driver

and a list,

my_list=["one","driver"]

I tried, names=df.loc[df["Description"].str.contains("|".join(my_list),na=False), 'Name']

achieved everything except the keyvalue column.

 output_DF.
Name    Stage   Description
Sri     1       Sri is one of the good singer in this two
Ram     1       Ram is one of the good cricket player

My desired output is,
desired_DF,
Name    Stage   Description                                 keyvalue
Sri     1       Sri is one of the good singer in this two    one
        2       Thanks for reading                           
Ram     1       Ram is one of the good cricket player        one
ganesh  1       good driver                                  driver

some one help me with generating keyvalue column

Pyd
  • 6,017
  • 18
  • 52
  • 109

1 Answers1

2

I think you can use previous solution from here and then extract:

pat = "|".join(my_list)

df['keyvalue'] = df['Description'].str.extract("(" + pat + ')', expand=False).fillna('')
print (df)
     Name  Stage                                Description keyvalue
0     Sri      1  Sri is one of the good singer in this two      one
1     Sri      2                         Thanks for reading         
2     Ram      1      Ram is one of the good cricket player      one
3  ganesh      1                                good driver   driver

All together:

print (df)
#     Name  Stage                                Description
#0     Sri      1  Sri is one of the good singer in this two
#1              2                         Thanks for reading
#2     Ram      1      Ram is one of the good cricket player
#3  ganesh      1                            good Driver one

my_list=["ONE","driver"]
df['Name'] = df['Name'].mask(df['Name'].str.strip() == '').ffill()

pat = "|".join(my_list).lower()

names=df.loc[df["Description"].str.lower().str.contains(pat,na=False), 'Name']

df = df[df['Name'].isin(names)]

df['keyvalue'] = (df['Description'].str.lower()
                                   .str.extract("(" + pat + ')', expand=False)
                                   .fillna(''))
print (df)
#     Name  Stage                                Description keyvalue
#0     Sri      1  Sri is one of the good singer in this two      one
#1     Sri      2                         Thanks for reading         
#2     Ram      1      Ram is one of the good cricket player      one
#3  ganesh      1                            good Driver one   driver
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • no Jezrael, i dont want to add "one" for all the rows. It has to add the exact matching keyword – Pyd Oct 11 '17 at 11:25
  • ops, give me some time. – jezrael Oct 11 '17 at 11:26
  • Edited the final expected output, please check – Pyd Oct 11 '17 at 11:27
  • What is 2 or more values match? – jezrael Oct 11 '17 at 11:28
  • we can give the first value – Pyd Oct 11 '17 at 11:29
  • It is not working if the Description column has the value with higher case, If the Description has uppercase value "ONE" and my_list has lower case "one" then value is not printing in keyvalue column – Pyd Oct 11 '17 at 12:29
  • are you there ? – Pyd Oct 11 '17 at 12:48
  • Sorry, I have idea. Use parameter `re.IGNORECASE` in `extract` - but I dont test it yet. – jezrael Oct 11 '17 at 12:49
  • str.extract("(" + pat + ')',re.IGNORECASE,expand=False) is this correct ? – Pyd Oct 11 '17 at 12:52
  • Yes, exactly. Now I am going test it. – jezrael Oct 11 '17 at 12:52
  • I test it now, need `df['keyvalue'] = df['Description'].str.extract("(" + pat + ')', expand=False, flags=re.IGNORECASE).fillna('')` – jezrael Oct 11 '17 at 12:56
  • And then if need all values lowercase `df['keyvalue'] = df['Description'].str.extract("(" + pat + ')', expand=False, flags=re.IGNORECASE).fillna('').str.lower()` – jezrael Oct 11 '17 at 12:57
  • It is working for the my sample data, but for my actual data its not working. something is wrong, ill check – Pyd Oct 11 '17 at 13:04
  • If it does not work, solution is all convert to lowercase and then test like `pat = "|".join(my_list).lower()` and then `df['keyvalue'] = df['Description'].str.lower().str.extract("(" + pat + ')', expand=False).fillna('')` – jezrael Oct 11 '17 at 13:06
  • I want to edit this code names=df.loc[df["Description"].str.contains("|".join(my_list),na=False), 'Name'] to I want to map the lower cases of df["Description"] with my list so I tried * names=df.loc[df["Description"].str.lower().str.contains("|".join(my_list),na=False), 'Name'] * but I am getting empty df, can you check – Pyd Oct 11 '17 at 13:40
  • Hmmm, it should work. I test it now and for me it working perfectly. – jezrael Oct 11 '17 at 13:46
  • with sample data, sure. – jezrael Oct 11 '17 at 13:46