0

I'm using NBA play by play data that has player ID numbers for each defensive player and each offensive player. I'd like to add a column for each lineup combination, so a deflinid and offlinid.

Here's the code for an example of the dataset:

df = pd.DataFrame(np.array([[1,2,3,4,5,11,12,13,14,15,5,5],[1,2,3,4,6,11,12,13,14,15,4,4],[2,3,4,5,6,11,12,13,14,15,3,5],[11,12,13,14,15,1,2,3,4,5,5,5],[11,12,13,14,15,1,2,3,4,6,10,10],[11,12,13,14,16,2,3,4,5,6,5,5]]),columns=['offplayer1','offplayer2','offplayer3','offplayer4','offplayer5','defplayer1','defplayer2','defplayer3','defplayer4','defplayer5','possessions','points'])

Then from there, I'd like to create columns with lineup IDs for each unique 5-player ID combination.

Here's an example of the 2 columns I'd like to be generated and added to df based on the example df above:

offlinid  deflinid
       1         4
       2         4
       3         4
       4         1
       4         2
       5         3

Thanks in advance!

  • Make a column with hashes, see here https://stackoverflow.com/questions/25757042/create-hash-value-for-each-row-of-data-with-selected-columns-in-dataframe-in-pyt – Oleg O Dec 10 '19 at 00:48
  • where do we get the number for playerID combination? for example why is the the defence combination 11,12,13,14,15 is 4 & not 1? – moys Dec 10 '19 at 01:59
  • @moys Because the lineup ID "1" is already assigned to the lineup combination of "1,2,3,4,5" Ultimately the order of the lineup IDs don't matter, just that a unique number is assigned to each unique combination... Offense and defense lineup IDs should be the same for the same player combinations. – RobbieBeats Dec 10 '19 at 02:09
  • Does this answer your question? [In Pandas, how to create a unique ID based on the combination of many columns?](https://stackoverflow.com/questions/36646923/in-pandas-how-to-create-a-unique-id-based-on-the-combination-of-many-columns) – Evan Dec 10 '19 at 22:45

1 Answers1

1

Using pd.concat to stack offplayerX columns on top of defplayerX columns. Next, agg every row to tuples and call rank and unstack

offcols = ['offplayer1', 'offplayer2', 'offplayer3', 'offplayer4', 'offplayer5']
defcols = ['defplayer1', 'defplayer2', 'defplayer3', 'defplayer4', 'defplayer5']

df1 = pd.concat([df[offcols], df[defcols].rename(columns=dict(zip(defcols, offcols)))], 
                 keys=['offlinid',  'deflinid'])

df_final = df1.agg(tuple, axis=1).rank(method='dense').unstack(0)

Out[92]:
   offlinid  deflinid
0       1.0       4.0
1       2.0       4.0
2       3.0       4.0
3       4.0       1.0
4       4.0       2.0
5       5.0       3.0
Andy L.
  • 24,909
  • 4
  • 17
  • 29