Let's say I have a pandas dataframe
rid category
0 0 c2
1 1 c3
2 2 c2
3 3 c3
4 4 c2
5 5 c2
6 6 c1
7 7 c3
8 8 c1
9 9 c3
I want to add 2 columns pid and nid, such that for each row pid contains a random id (other than rid) that belongs to the same category as rid and nid contains a random id that belongs to a different category as rid,
an example dataframe would be:
rid category pid nid
0 0 c2 2 1
1 1 c3 7 4
2 2 c2 0 1
3 3 c3 1 5
4 4 c2 5 7
5 5 c2 4 6
6 6 c1 8 5
7 7 c3 9 8
8 8 c1 6 2
9 9 c3 1 2
Note that pid should not be the same as rid. Right now, I am just brute forcing it by iterating through the rows and sampling each time, which seems very inefficient.
Is there a better way to do this?
EDIT 1: For simplicity let us assume that each category is represented at least twice, so that at least one id can be found that is not rid but has the same category.
EDIT 2: For further simplicity let us assume that in a large dataframe the probability of ending up with the same id as rid is zero. If that is the case I believe the solution should be easier. I would prefer not to make this assumption though