1
df1=pd.DataFrame({'Product_ID':["55165","80125,30215","55557","92361","32619,28965,20147","88722","82793","70809, 20201","11367"],
          'Product': ["ABC",'FDA','FSD','JGH','TYE','BVC','LKJ','HJD','POI'],
         'Country':['CN','US','GB','AG','MX','CA','DE','CA','SG']})

df2=pd.DataFrame({'Deal_ID':[70809,88722,82793,20201,55165,30215,11367]})

The df and the list (deal_id) are provided above. enter image description here

I'd like to return the country information and the product_id adding to df2. I tried to use the join function but Product_ID of df1 is not numeric. Are there any solutions?

Thanks in advance for the help.

Boomshakalaka
  • 521
  • 1
  • 6
  • 19

1 Answers1

4

You can do this in a couple of steps:-

1. Flatten your aggregated dataframe

Chain and repeat series as appropriate, remembering to convert from str to int:

from itertools import chain
import numpy as np

split = df1['Product_ID'].str.split(',')
lens = split.map(len)

df1 = pd.DataFrame({'Country': np.repeat(df1['Country'], lens),
                    'Product': np.repeat(df1['Product'], lens),
                    'Deal_ID': list(map(int, chain.from_iterable(split)))})

2. Merge with dataframe containing selected Deal_IDs

df2 = df2.merge(df1)

print(df2)

   Deal_ID Country Product
0    70809      CA     HJD
1    88722      CA     BVC
2    82793      DE     LKJ
3    20201      CA     HJD
4    55165      CN     ABC
5    30215      US     FDA
6    11367      SG     POI
jpp
  • 159,742
  • 34
  • 281
  • 339