I have a pandas dataframe df1
which has an index, and a column of lists and looks like:
index IDList
0 [1,3,5,7]
1 [2,4,5,8]
2 [6,8,9]
3 [1,2]
I have another pandas dataframe df2
which has NewID as the index, and a column of lists which looks like this:
NewID IDList
1 [3]
2 [4,5]
3 [1,7]
4 [2]
5 [9,3]
6 [8]
7 [6]
What I need to do is if any of the items in df1.IDList
exist in df2.IDList
, then return a list of the relevant df2.NewID
.
So the returned d1
dataframe would look like:
index IDList NewID
0 [1,3,5,7] [3,1,2,3,5]
1 [2,4,5,8] [4,2,2,6]
2 [6,8,9] [7,6,5]
3 [1,2] [3,4]
EDIT: Note that in df2
there can be ID in IDList that occurs in multiple rows (see ID 3 from df1.IDList
and where ID 3 shows up in df2
rows 1 AND 5)
I was thinking some kind of np.where
statement which incorporates 'any
' and a list comprehension? but uncertain how to apply for each IDList
in df1
where it looks at the whole of df2.IDList
. Maybe some kind of .stack()
? or .melt()
? This would be easy in a spreadsheet with a vlookup of df2...
Help appreciated...