I have a bit of an odd problem that I am trying to solve in pandas. Let's say I have a bunch of objects that have different ways to group them. Here is what our dataframe look like:
df=pd.DataFrame([
{'obj': 'Ball', 'group1_id': None, 'group2_id': '7' },
{'obj': 'Balloon', 'group1_id': '92', 'group2_id': '7' },
{'obj': 'Person', 'group1_id': '14', 'group2_id': '11'},
{'obj': 'Bottle', 'group1_id': '3', 'group2_id': '7' },
{'obj': 'Thought', 'group1_id': '3', 'group2_id': None},
])
obj group1_id group2_id
Ball None 7
Balloon 92 7
Person 14 11
Bottle 3 7
Thought 3 None
I want to group things together based on any of the groups. Here it is annotated:
obj group1_id group2_id # annotated
Ball None 7 # group2_id = 7
Balloon 92 7 # group1_id = 92 OR group2_id = 7
Person 14 11 # group1_id = 14 OR group2_id = 11
Bottle 3 7 # group1_id = 3 OR group2_id = 7
Thought 3 None # group1_id = 3
When combined, our output should look like this:
count objs composite_id
4 [Ball, Balloon, Bottle, Thought] g1=3,92|g2=7
1 [Person] g1=11|g2=14
Notice that the first three objects we can get based on group2_id=7
and then the fourth one, Thought
, is because it can match with another item via group1_id=3
that assigns it the group_id=7
id. Note: for this question assume an item will only ever be in one combined group (and there will never be conditions where it could possibly be in two groups).
How could I do this in pandas
?