I have a df:
df = pd.DataFrame({'src':['LV','LA','NC','NY','ABC','XYZ'], 'dest':['NC','NY','LV','LA','XYZ','ABC'], 'dummy':[1,3,6,7,8,10]})
src dest dummy
LV NC 1
LA NY 3
NC LV 6
NY LA 7
ABC XYZ 8
XYZ ABC 10
I run it through:
df['pair'] = df[['src', 'dest']].apply(lambda x : tuple(set(x)), 1).factorize()[0] + 1
to try and key off unique pairs such as (a->b, b->a)
I correctly end up with this:
src dest dummy pair
LV NC 1 1
LA NY 3 2
NC LV 6 1
NY LA 7 2
ABC XYZ 8 3
XYZ ABC 10 3
However, sometimes when I run it I end up incorrectly with this:
src dest dummy pair
LV NC 1 1
LA NY 3 2
NC LV 6 1
NY LA 7 2
ABC XYZ 8 3
XYZ ABC 10 4
As you can see, the last element is not being properly keyed off to pair '3' for some reason. This happens randomly. I am able to reproduce this by commenting out the 'pairing off' code, running the script to make and print the df, then uncommenting and trying again. You may be able to reproduce this in other ways by running with other modifications.
How can I fix this non deterministic behavior?