0

I have a pandas dataframe df1 which has an index, and a column of lists and looks like:

index   IDList
0   [1,3,5,7]
1   [2,4,5,8]
2   [6,8,9]
3   [1,2]

I have another pandas dataframe df2 which has NewID as the index, and a column of lists which looks like this:

NewID   IDList
1       [3]
2       [4,5]
3       [1,7]
4       [2]
5       [9,3]
6       [8]
7       [6]

What I need to do is if any of the items in df1.IDList exist in df2.IDList, then return a list of the relevant df2.NewID.

So the returned d1 dataframe would look like:

index   IDList      NewID
0       [1,3,5,7]   [3,1,2,3,5]
1       [2,4,5,8]   [4,2,2,6]
2       [6,8,9]     [7,6,5]
3       [1,2]       [3,4]

EDIT: Note that in df2 there can be ID in IDList that occurs in multiple rows (see ID 3 from df1.IDList and where ID 3 shows up in df2 rows 1 AND 5)

I was thinking some kind of np.where statement which incorporates 'any' and a list comprehension? but uncertain how to apply for each IDList in df1 where it looks at the whole of df2.IDList. Maybe some kind of .stack()? or .melt()? This would be easy in a spreadsheet with a vlookup of df2...

Help appreciated...

clg4
  • 2,863
  • 6
  • 27
  • 32

1 Answers1

1
# expand and map ids from IDList to NewID
flat_ids = pd.DataFrame({
    "NewID": pd.np.repeat(df2.NewID, df2.IDList.str.len().tolist()),
    "IDList": [x for l in df2.IDList for x in l]
}).set_index("IDList").NewID

# extract ids from flat ids using loc
df1['NewID'] = df1['IDList'].map(lambda x: flat_ids.loc[x].tolist())

enter image description here

Psidom
  • 209,562
  • 33
  • 339
  • 356