How can I add an id column based on unique combinations of other columns?

Question

I'm using NBA play by play data that has player ID numbers for each defensive player and each offensive player. I'd like to add a column for each lineup combination, so a deflinid and offlinid.

Here's the code for an example of the dataset:

df = pd.DataFrame(np.array([[1,2,3,4,5,11,12,13,14,15,5,5],[1,2,3,4,6,11,12,13,14,15,4,4],[2,3,4,5,6,11,12,13,14,15,3,5],[11,12,13,14,15,1,2,3,4,5,5,5],[11,12,13,14,15,1,2,3,4,6,10,10],[11,12,13,14,16,2,3,4,5,6,5,5]]),columns=['offplayer1','offplayer2','offplayer3','offplayer4','offplayer5','defplayer1','defplayer2','defplayer3','defplayer4','defplayer5','possessions','points'])

Then from there, I'd like to create columns with lineup IDs for each unique 5-player ID combination.

Here's an example of the 2 columns I'd like to be generated and added to df based on the example df above:

offlinid  deflinid
       1         4
       2         4
       3         4
       4         1
       4         2
       5         3

Thanks in advance!

Make a column with hashes, see here https://stackoverflow.com/questions/25757042/create-hash-value-for-each-row-of-data-with-selected-columns-in-dataframe-in-pyt — Oleg O, Dec 10 '19 at 00:48
where do we get the number for playerID combination? for example why is the the defence combination 11,12,13,14,15 is 4 & not 1? — moys, Dec 10 '19 at 01:59
@moys Because the lineup ID "1" is already assigned to the lineup combination of "1,2,3,4,5" Ultimately the order of the lineup IDs don't matter, just that a unique number is assigned to each unique combination... Offense and defense lineup IDs should be the same for the same player combinations. — RobbieBeats, Dec 10 '19 at 02:09
Does this answer your question? [In Pandas, how to create a unique ID based on the combination of many columns?](https://stackoverflow.com/questions/36646923/in-pandas-how-to-create-a-unique-id-based-on-the-combination-of-many-columns) — Evan, Dec 10 '19 at 22:45

score 1 · Accepted Answer · answered Dec 10 '19 at 02:09

Using pd.concat to stack offplayerX columns on top of defplayerX columns. Next, agg every row to tuples and call rank and unstack

offcols = ['offplayer1', 'offplayer2', 'offplayer3', 'offplayer4', 'offplayer5']
defcols = ['defplayer1', 'defplayer2', 'defplayer3', 'defplayer4', 'defplayer5']

df1 = pd.concat([df[offcols], df[defcols].rename(columns=dict(zip(defcols, offcols)))], 
                 keys=['offlinid',  'deflinid'])

df_final = df1.agg(tuple, axis=1).rank(method='dense').unstack(0)

Out[92]:
   offlinid  deflinid
0       1.0       4.0
1       2.0       4.0
2       3.0       4.0
3       4.0       1.0
4       4.0       2.0
5       5.0       3.0

How can I add an id column based on unique combinations of other columns?

1 Answers1