1

I have two dataframes,

df1,

 Names
 one two three
 Sri is a good player
 Ravi is a mentor
 Kumar is a cricketer

df2,

 values
 sri
 NaN
 sri, is
 kumar,cricketer

I am trying to get the row in df1 which contains the all the items in df2

My expected output is,

 values             Names
 sri                Sri is a good player
 NaN
 sri, is            Sri is a good player
 kumar,cricketer    Kumar is a cricketer

i tried, df1["Names"].str.contains("|".join(df2["values"].values.tolist()))

but I cannot achieve my expected output as it has (","). Please help

Pyd
  • 6,017
  • 18
  • 52
  • 109

2 Answers2

3

Using sets

s1 = df1.Names.dropna()
s1.loc[:] = [set(x.lower().split()) for x in s1.values.tolist()]
a1 = s1.values

s2 = df2['values'].dropna()
s2.loc[:] = [set(x.replace(' ', '').lower().split(',')) for x in s2.values.tolist()]
a2 = s2.values

i = np.column_stack([a1 >= a2[:, None], [True] * len(a2)]).argmax(1)

df2.assign(Names=pd.Series(
    np.append(df1.Names.values, np.nan)[i], s2.index
))

            values                 Names
0              sri  Sri is a good player
1              NaN                   NaN
2          sri, is  Sri is a good player
3  kumar,cricketer  Kumar is a cricketer
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • I don't want the output df separately. I want to add it to my df2 – Pyd Nov 20 '17 at 06:10
  • Then assign the result back to `df2`. Or just assign to a new column directly instead of using `assing`. Like `df2.loc[:, 'Names'] = pd.Series(np.append(df1.Names.values, np.nan)[i], s2.index)` – piRSquared Nov 20 '17 at 07:06
  • It worked, can you suggest me the best resource to learn pandas easily. – Pyd Nov 20 '17 at 09:23
  • Nothing worthwhile is easy! – piRSquared Nov 20 '17 at 09:24
  • 1. Start [here](http://shop.oreilly.com/product/0636920023784.do) to get an idea of what pandas can do. 2. Give yourself a data analysis task. and figure it out using pandas. Ask questions if needed. 3. Answer other people's questions. Even if you don't post answers, read the questions and figure them out. Read other people's answers to the question you just tried to answer. 4. Practice! – piRSquared Nov 20 '17 at 09:48
  • @piRSquared , this is not working properly when we have a space in df2 eg if i have a key `A room`, it is combining to `aroom` in `a2` I fixed that by removing the replace in `s2.loc[:] = [set(x.replace(' ', '').lower().split(',')) for x in s2.values.tolist()]` but still it is not matching as it is splitting by a single word in s1. What can we do for this – Pyd Nov 27 '17 at 10:42
  • hi, can you check this pls https://stackoverflow.com/questions/49022851/how-to-map-two-rows-of-different-dataframe-based-on-a-condition-in-pandas – Pyd Feb 28 '18 at 05:57
1
import pandas as pd
names =  [
    'one two three',
    'Sri is a good player',
    'Ravi is a mentor',
    'Kumar is a cricketer'
]
values = [
    'sri',
    'NaN',
    'sri, is',
    'kumar,cricketer',
]

names = pd.Series(names)
values = pd.DataFrame(values, columns=['values'])

def foo(words):
    names_copy = names.copy()

    for word in words.split(','):
        names_copy = names_copy[names_copy.str.contains(word, case=False)]

    return names_copy.values

 values['names'] = values['values'].map(foo)
 values


    values          names
0   sri             [Sri is a good player]
1   NaN             []
2   sri, is         [Sri is a good player]
3   kumar,cricketer [Kumar is a cricketer]