reshaping in pandas (python)

Question

I am trying to reshape this file with pivot and pivoed fonction but without any succes.

Every user have two tiestamps, one at 19h and the second at 22h, every user have 3 variable and each variable has 8 index

file sample :

DATA_STAMP;ID;VARIABLES;INDEX;VALUE 2018-02-19 19:15:25;6550692;count1;3;230993 2018-02-19 19:15:25;6550692;count1;2;0 2018-02-19 19:15:25;6550692;count1;1;34513980 2018-02-19 19:15:25;6550692;count1;0;1500517 2018-02-19 19:15:25;6550692;count1;7;0 2018-02-19 19:15:25;6550692;count1;6;14958246 2018-02-19 19:15:25;6550692;count1;5;0 2018-02-19 19:15:25;6550692;count1;4;156 2018-02-19 19:15:25;6549986;count1;3;5047 2018-02-19 19:15:25;6549986;count1;2;0 2018-02-19 19:15:25;6549986;count1;1;1016836 2018-02-19 19:15:25;6549986;count1;0;265705 2018-02-19 19:15:25;6549986;count1;7;0 2018-02-19 19:15:25;6549986;count1;6;18661246 2018-02-19 19:15:25;6549986;count1;5;0 2018-02-19 19:15:25;6549986;count1;4;0 2018-02-19 19:15:25;6549456;count1;7;0 2018-02-19 19:15:25;6549456;count1;5;164663 2018-02-19 19:15:25;6549456;count1;6;4640344 2018-02-19 19:15:25;6550692;count2;3;230993 2018-02-19 19:15:25;6550692;count2;2;0 2018-02-19 19:15:25;6550692;count2;1;34513980 2018-02-19 19:15:25;6550692;count2;0;1500517 2018-02-19 19:15:25;6550692;count2;7;0 2018-02-19 19:15:25;6550692;count2;6;14958246 2018-02-19 19:15:25;6550692;count2;5;0 2018-02-19 19:15:25;6550692;count2;4;156 2018-02-19 19:15:25;6549986;count2;3;5047 2018-02-19 19:15:25;6549986;count2;2;0 2018-02-19 19:15:25;6549986;count2;1;1016836 2018-02-19 19:15:25;6549986;count2;0;265705 2018-02-19 19:15:25;6549986;count2;7;0 2018-02-19 19:15:25;6549986;count2;6;18661246 2018-02-19 19:15:25;6549986;count2;5;0 2018-02-19 19:15:25;6549986;count2;4;0 2018-02-19 19:15:25;6549456;count2;7;0 2018-02-19 19:15:25;6549456;count2;5;164663 2018-02-19 19:15:25;6549456;count2;6;4640344 2018-02-19 19:15:25;6550692;count2;3;230993 2018-02-19 19:15:25;6550692;count3;2;0 2018-02-19 19:15:25;6550692;count3;1;34513980 2018-02-19 19:15:25;6550692;count3;0;1500517 2018-02-19 19:15:25;6550692;count3;7;0 2018-02-19 19:15:25;6550692;count3;6;14958246 2018-02-19 19:15:25;6550692;count3;5;0 2018-02-19 19:15:25;6550692;count3;4;156 2018-02-19 19:15:25;6549986;count3;3;5047 2018-02-19 19:15:25;6549986;count3;2;0 2018-02-19 19:15:25;6549986;count3;1;1016836 2018-02-19 19:15:25;6549986;count3;0;265705 2018-02-19 19:15:25;6549986;count3;7;0 2018-02-19 19:15:25;6549986;count3;6;18661246 2018-02-19 19:15:25;6549986;count3;5;0 2018-02-19 19:15:25;6549986;count3;4;0 2018-02-19 19:15:25;6549456;count3;7;0 2018-02-19 19:15:25;6549456;count3;5;164663 2018-02-19 19:15:25;6549456;count3;6;4640344 2018-02-19 22:15:25;6550692;count1;3;230993 2018-02-19 22:15:25;6550692;count1;2;0 2018-02-19 22:15:25;6550692;count1;1;34513980 2018-02-19 22:15:25;6550692;count1;0;1500517 2018-02-19 22:15:25;6550692;count1;7;0 2018-02-19 22:15:25;6550692;count1;6;14958246 2018-02-19 22:15:25;6550692;count1;5;0 2018-02-19 22:15:25;6550692;count1;4;156 2018-02-19 22:15:25;6549986;count1;3;5047 2018-02-19 22:15:25;6549986;count1;2;0 2018-02-19 22:15:25;6549986;count1;1;1016836 2018-02-19 22:15:25;6549986;count1;0;265705 2018-02-19 22:15:25;6549986;count1;7;0 2018-02-19 22:15:25;6549986;count1;6;18661246 2018-02-19 22:15:25;6549986;count1;5;0 2018-02-19 22:15:25;6549986;count1;4;0 2018-02-19 22:15:25;6549456;count1;7;0 2018-02-19 22:15:25;6549456;count1;5;164663 2018-02-19 22:15:25;6549456;count1;6;4640344 2018-02-19 22:15:25;6550692;count2;3;230993 2018-02-19 22:15:25;6550692;count2;2;0 2018-02-19 22:15:25;6550692;count2;1;34513980 2018-02-19 22:15:25;6550692;count2;0;1500517 2018-02-19 22:15:25;6550692;count2;7;0 2018-02-19 22:15:25;6550692;count2;6;14958246 2018-02-19 22:15:25;6550692;count2;5;0 2018-02-19 22:15:25;6550692;count2;4;156 2018-02-19 22:15:25;6549986;count2;3;5047 2018-02-19 22:15:25;6549986;count2;2;0 2018-02-19 22:15:25;6549986;count2;1;1016836 2018-02-19 22:15:25;6549986;count2;0;265705 2018-02-19 22:15:25;6549986;count2;7;0 2018-02-19 22:15:25;6549986;count2;6;18661246 2018-02-19 22:15:25;6549986;count2;5;0 2018-02-19 22:15:25;6549986;count2;4;0 2018-02-19 22:15:25;6549456;count2;7;0 2018-02-19 22:15:25;6549456;count2;5;164663 2018-02-19 22:15:25;6549456;count2;6;4640344 2018-02-19 22:15:25;6550692;count2;3;230993 2018-02-19 22:15:25;6550692;count3;2;0 2018-02-19 22:15:25;6550692;count3;1;34513980 2018-02-19 22:15:25;6550692;count3;0;1500517 2018-02-19 22:15:25;6550692;count3;7;0 2018-02-19 22:15:25;6550692;count3;6;14958246 2018-02-19 22:15:25;6550692;count3;5;0 2018-02-19 22:15:25;6550692;count3;4;156 2018-02-19 22:15:25;6549986;count3;3;5047 2018-02-19 22:15:25;6549986;count3;2;0 2018-02-19 22:15:25;6549986;count3;1;1016836 2018-02-19 22:15:25;6549986;count3;0;265705 2018-02-19 22:15:25;6549986;count3;7;0 2018-02-19 22:15:25;6549986;count3;6;18661246 2018-02-19 22:15:25;6549986;count3;5;0 2018-02-19 22:15:25;6549986;count3;4;0 2018-02-19 22:15:25;6549456;count3;7;0 2018-02-19 22:15:25;6549456;count3;5;164663 2018-02-19 22:15:25;6549456;count3;6;4640344

What I want :

ID;INDEX;Count1_19,Cou2_19_count3_19, count1_22,count2_22,count3_22

For each ID.INDEX

possible with reshape ? any other solution

Thank you

Please post a minimal snippet of the file data, avoid external links to files or code as may disappear and are not pratical for others to copy/paste/test. — progmatico, Feb 24 '18 at 17:31
What you want is very possible with pandas, search for the `.groupby()` method. — joaoavf, Feb 24 '18 at 19:20

Scott Boston · Accepted Answer · 2018-02-25T14:53:54.630

First, off I think you have errors in your test data. If we do counts of your data as you want it laid out, you will see that most values have a count of one, but then you have two values with a count of two and the next two values have none.

VARIABLES     count1      count2      count3     
hour              19   22     19   22     19   22
ID      INDEX                                    
6549456 5        1.0  1.0    1.0  1.0    1.0  1.0
        6        1.0  1.0    1.0  1.0    1.0  1.0
        7        1.0  1.0    1.0  1.0    1.0  1.0
6549986 0        1.0  1.0    1.0  1.0    1.0  1.0
        1        1.0  1.0    1.0  1.0    1.0  1.0
        2        1.0  1.0    1.0  1.0    1.0  1.0
        3        1.0  1.0    1.0  1.0    1.0  1.0
        4        1.0  1.0    1.0  1.0    1.0  1.0
        5        1.0  1.0    1.0  1.0    1.0  1.0
        6        1.0  1.0    1.0  1.0    1.0  1.0
        7        1.0  1.0    1.0  1.0    1.0  1.0
6550692 0        1.0  1.0    1.0  1.0    1.0  1.0
        1        1.0  1.0    1.0  1.0    1.0  1.0
        2        1.0  1.0    1.0  1.0    1.0  1.0
        3        1.0  1.0    2.0  2.0    NaN  NaN
        4        1.0  1.0    1.0  1.0    1.0  1.0
        5        1.0  1.0    1.0  1.0    1.0  1.0
        6        1.0  1.0    1.0  1.0    1.0  1.0
        7        1.0  1.0    1.0  1.0    1.0  1.0

However, we can still reshape your data using some sort of aggregations for the places with two values.

df_out = df.groupby(['ID','INDEX','VARIABLES','hour'])['VALUE'].mean().unstack([-2,-1])

df_out.columns = df_out.columns.map('{0[0]}_{0[1]}'.format)

print(df_out.reset_index())

Output:

         ID  INDEX   count1_19   count1_22   count2_19   count2_22   count3_19   count3_22
0   6549456      5    164663.0    164663.0    164663.0    164663.0    164663.0    164663.0
1   6549456      6   4640344.0   4640344.0   4640344.0   4640344.0   4640344.0   4640344.0
2   6549456      7         0.0         0.0         0.0         0.0         0.0         0.0
3   6549986      0    265705.0    265705.0    265705.0    265705.0    265705.0    265705.0
4   6549986      1   1016836.0   1016836.0   1016836.0   1016836.0   1016836.0   1016836.0
5   6549986      2         0.0         0.0         0.0         0.0         0.0         0.0
6   6549986      3      5047.0      5047.0      5047.0      5047.0      5047.0      5047.0
7   6549986      4         0.0         0.0         0.0         0.0         0.0         0.0
8   6549986      5         0.0         0.0         0.0         0.0         0.0         0.0
9   6549986      6  18661246.0  18661246.0  18661246.0  18661246.0  18661246.0  18661246.0
10  6549986      7         0.0         0.0         0.0         0.0         0.0         0.0
11  6550692      0   1500517.0   1500517.0   1500517.0   1500517.0   1500517.0   1500517.0
12  6550692      1  34513980.0  34513980.0  34513980.0  34513980.0  34513980.0  34513980.0
13  6550692      2         0.0         0.0         0.0         0.0         0.0         0.0
14  6550692      3    230993.0    230993.0    230993.0    230993.0         NaN         NaN
15  6550692      4       156.0       156.0       156.0       156.0       156.0       156.0
16  6550692      5         0.0         0.0         0.0         0.0         0.0         0.0
17  6550692      6  14958246.0  14958246.0  14958246.0  14958246.0  14958246.0  14958246.0
18  6550692      7         0.0         0.0         0.0         0.0         0.0         0.0

thank you, it seem work, and of course I have errr in my data because it's just a sample, i will try to remove mean() aggrecation — mbooma, Feb 27 '18 at 23:05

score 0 · Answer 2 · answered Feb 24 '18 at 19:25

0

Could you check if this is what you want?

import pandas as pd

df = pd.read_csv('file.csv', sep=';', parse_dates=[0], index_col=0)
df.groupby(['ID', 'INDEX'])['VARIABLES'].value_counts().unstack()

Rereading your post, I think you want this:

df.groupby(['ID', 'INDEX', df.index.hour])['VARIABLES'].value_counts().unstack().unstack()

answered Feb 24 '18 at 19:25

joaoavf

1,343
1
12
25

Hello, thankk you so much.... it look like what I want, but I need real VALUE from my data instead value_count() – mbooma Feb 24 '18 at 21:23

reshaping in pandas (python)

2 Answers2