Trying to assign IDs to pairs in a pandas DataFrame, getting inconsistent results

Question

I have a df:

df = pd.DataFrame({'src':['LV','LA','NC','NY','ABC','XYZ'], 'dest':['NC','NY','LV','LA','XYZ','ABC'], 'dummy':[1,3,6,7,8,10]})
src   dest   dummy
LV      NC       1
LA      NY       3
NC      LV       6
NY      LA       7
ABC     XYZ      8
XYZ     ABC     10

I run it through:

df['pair'] = df[['src', 'dest']].apply(lambda x : tuple(set(x)), 1).factorize()[0] + 1

to try and key off unique pairs such as (a->b, b->a)

I correctly end up with this:

src   dest   dummy  pair
LV      NC       1     1
LA      NY       3     2
NC      LV       6     1
NY      LA       7     2
ABC     XYZ      8     3
XYZ     ABC     10     3

However, sometimes when I run it I end up incorrectly with this:

 src   dest   dummy  pair
LV      NC       1     1
LA      NY       3     2
NC      LV       6     1
NY      LA       7     2
ABC     XYZ      8     3
XYZ     ABC     10     4

As you can see, the last element is not being properly keyed off to pair '3' for some reason. This happens randomly. I am able to reproduce this by commenting out the 'pairing off' code, running the script to make and print the df, then uncommenting and trying again. You may be able to reproduce this in other ways by running with other modifications.

How can I fix this non deterministic behavior?

score 1 · Answer 1 · answered Oct 27 '20 at 16:27

1

Try with that is the propblem with set , you can change it to frozenset

df['pair'] = pd.DataFrame(np.sort(df[['src','dest']].values,1)).agg(tuple,1).factorize()[0]+1
Out[108]: array([1, 2, 1, 2, 3, 3], dtype=int64)

answered Oct 27 '20 at 16:27

BENY

317,841
20
164
234

thanks! where exactly is frozenset being used in the code above? – reeeeeeeeeeee Oct 27 '20 at 16:31
1

@reeeeeeeeeeee I mean fixed your code `df['pair'] = df[['src', 'dest']].apply(lambda x : tuple(frozenset(x)), 1).factorize()[0] + 1` – BENY Oct 27 '20 at 16:34

Trying to assign IDs to pairs in a pandas DataFrame, getting inconsistent results

1 Answers1