Pandas hierarchical pivot get column with max

Question

df.head().info()

RangeIndex: 5 entries, 0 to 4
Data columns (total 4 columns):
id    5 non-null object
date-hr               5 non-null object
channel           5 non-null object
hr                5 non-null int64
dtypes: int64(1), object(3)

Actual date-hr looks something like

'2017-02-14--15'

id is a string

I have a df like:

User-ID | Date-hr | Channel | Hr

U1       D1-10      C1        10
U1       D1-11      C2        11
U1       D1-10      C1        10
U1       D1-10      C3        10
U1       D1-10      C1        10
U1       D1-11      C3        11
U1       D1-11      C2        11

..

when I apply pivot operation with user-id as index and columns as

['date-hr', 'channel']

using count as the aggregation function.

I get 1 row for every user with the primary index as date-hr and all channels under that one date-hr value like:

    D1-10     D1-11 .....

    C1  C3    C2 C3 .....

U1  3    1    2   1 .....

Now what I require is max channel under every 'date-hr' with the count

    D1-10   D1-11 .....

    C1      C2    .....

U1  (C1,3)  (C2,2) .....

I can't figure it out how to get this transformation from my data.

Is possible omit first level of MultiIndex in columns in output? — jezrael, May 22 '18 at 08:32
After df.pivot_table(index=['id'],columns=['date-hr','channel'], margins=True, aggfunc='count'), omit first level? — Nikhil Verma, May 22 '18 at 10:00

jezrael · Accepted Answer · 2018-05-22T11:00:18.510

1

You can create custom function:

print (df)
  User-ID Date-hr Channel  Hr
0      U1   D1-10      C1  10
1      U1   D1-11      C2  11
2      U1   D1-10      C1  10
3      U1   D1-10      C3  10
4      U2   D1-10      C1  10
5      U2   D1-11      C3  11
6      U2   D1-11      C2  11
6      U4   D7-11      C2  11

df = df.groupby(['User-ID','Date-hr', 'Channel'])['Hr'].count().unstack([1,2], fill_value=0)
print (df)
Date-hr D1-10    D1-11    D7-11
Channel    C1 C3    C2 C3    C2
User-ID                        
U1          2  1     1  0     0
U2          1  0     1  1     0
U4          0  0     0  0     1

def f(x):
    c = x.idxmax(axis=1).str[1]
    m = x.max(axis=1)
    s = pd.Series((list(zip(c, m))), index=x.index)
    return (s)

df = df.groupby(axis=1, level=0).apply(f)
print (df)
Date-hr    D1-10    D1-11    D7-11
User-ID                           
U1       (C1, 2)  (C2, 1)  (C2, 0)
U2       (C1, 1)  (C2, 1)  (C2, 0)
U4       (C1, 0)  (C2, 0)  (C2, 1)

edited May 22 '18 at 11:00

answered May 22 '18 at 08:07

jezrael

822,522
95
1,334
1,252

@NikhilVerma - I test it in last pandas version `0.23.0`, is possible upgrade? – jezrael May 22 '18 at 10:02
Okay, I tried the same thing after updating pandas, same issue – Nikhil Verma May 22 '18 at 10:13
I have updated the question with info about my initial df. Is there any issue in it? – Nikhil Verma May 22 '18 at 10:17
@NikhilVerma - are data confidental? – jezrael May 22 '18 at 10:39
I find possible problem, all `NaN`s per some group. – jezrael May 22 '18 at 10:48
Yes, I just thought of that, I will try to fillna with 0 – Nikhil Verma May 22 '18 at 10:49
Where should I write **df=df.fillna(0)**?, tried that after the first **group by** and got the same error – Nikhil Verma May 22 '18 at 10:54
There is another problem, soem values are omited, I find problem with `s = list(zip(c, m))` – jezrael May 22 '18 at 10:56
oops, my solution was wrong, please check edited answer. – jezrael May 22 '18 at 11:00
@NikhilVerma - Super :) – jezrael May 22 '18 at 11:14

Pandas hierarchical pivot get column with max

1 Answers1