Pandas time re-sampling categorical data from a column with calculations from another numerical column

Question

I have a data-frame with a categorical column and a numerical , the index set to time data

df = pd.DataFrame({
        'date': [
            '2013-03-01 ', '2013-03-02 ',
            '2013-03-01 ', '2013-03-02',
            '2013-03-01 ', '2013-03-02 '
        ],
        'Kind': [
            'A', 'B', 'A', 'B', 'B', 'B'
        ],
        'Values': [1, 1.5, 2, 3, 5, 3]
    })

df['date'] =  pd.to_datetime(df['date'])
df = df.set_index('date')

the above code gives:

        Kind    Values
date        
2013-03-01  A   1.0
2013-03-02  B   1.5
2013-03-01  A   2.0
2013-03-02  B   3.0
2013-03-01  B   5.0
2013-03-02  A   3.0

My aim is to achieve the below data-frame:


         A_count   B_count  A_Val max   B_Val max
date                
2013-03-01   2         1        2             5
2013-03-02   0         3        0             3

Which also has the time as index . Here, I note that If we use

data = pd.DataFrame(data.resample('D')['Pack'].value_counts())

we get :

    Kind
date    Kind    
2013-03-01  A   2
            B   1
2013-03-02  B   3

Welcome to stackoverflow. Stackoverflow is not a code writting service. Please always add what you have tried by yourself. — cronoik, May 29 '19 at 12:53

jezrael · Accepted Answer · 2019-05-29T13:01:09.930

1

Use DataFrame.pivot_table with flattening MultiIndex in columns in list comprehension:

df = pd.DataFrame({
        'date': [
            '2013-03-01 ', '2013-03-02 ',
            '2013-03-01 ', '2013-03-02',
            '2013-03-01 ', '2013-03-02 '
        ],
        'Kind': [
            'A', 'B', 'A', 'B', 'B', 'B'
        ],
        'Values': [1, 1.5, 2, 3, 5, 3]
    })

df['date'] =  pd.to_datetime(df['date'])

#is possible omit
#df = df.set_index('date')

df = df.pivot_table(index='date', columns='Kind', values='Values', aggfunc=['count','max'])
df.columns = [f'{b}_{a}' for a, b in df.columns]
print (df)
            A_count  B_count  A_max  B_max
date                                      
2013-03-01      2.0      1.0    2.0    5.0
2013-03-02      NaN      3.0    NaN    3.0

Another solution with Grouper for resample by days:

df = df.set_index('date')

df = df.groupby([pd.Grouper(freq='d'), 'Kind'])['Values'].agg(['count','max']).unstack()
df.columns = [f'{b}_{a}' for a, b in df.columns]

edited May 29 '19 at 13:01

answered May 29 '19 at 12:54

jezrael

822,522
95
1,334
1,252

Thank you very much for your instant reply . Just made my day. – Fizmath May 29 '19 at 13:04
@Fizmath - You are welcome! Don't forget to accept the answer, if it suits you! :) – jezrael May 29 '19 at 13:06
I get an Attribute error : '''str' object has no attribute '\_\_name\_\_'' at the line with the pivot when I try your solution, do you have any idea why ? – vlemaistre May 29 '19 at 13:06
@vlemaistre - What is your pandas version? – jezrael May 29 '19 at 13:07
My pandas version is 0.21.1 – vlemaistre May 29 '19 at 13:08
@vlemaistre - Try upgrade ;) – jezrael May 29 '19 at 13:09

Pandas time re-sampling categorical data from a column with calculations from another numerical column

1 Answers1