aggregating, grouping and UN stack to many columns

Question

i am in python i have data-frame of 177 columns that contain patient values for 24 hours as in this

subject_id hour_measure         urinecolor   Respiraory                 
3          1.00                 red          40
3          1.15                 red          90
4          2.00              yellow          60

i want for every hour to calculate some statistics like mean, max, std, skew, etc

as it contain text and numeric columns it can't loop in all data-frame to make aggregation , therefore ,i try to make it for every column like in the following code

 grouped= df.groupby(['Hour_measure','subject_id']).agg({"Heart Rate":['sum','min','max','std', 'count','var','skew']}) 
grouped2= df.groupby(['Hour_measure','subject_id']).agg({"Respiraory":['sum','min','max','std', 'count']})
  #write aggregated values to csv file 
 grouped.coloumns=["_".join(x) for x in grouped.columns.ravel()]
           grouped.to_csv('temp3.csv')

     with open('temp3.csv', 'a') as f:
        grouped2.to_csv(f, header=True)
    # make unstack to convert all to rows               
        df.set_index(['subject_id','Hour_measure']).unstack()

this code works correctly, but the idea i want to use loop to aggregate every numeric column .For every text column choose most frequent value in the hour instead of statistical functions, and also add it to the file which will stacked finally based on subject_id and hour_measure to have finally like this

              heart rate 
                  1                             2              3.... to 24      then the next feature 
subject_id   min    max   std   skwe      min   max   std    
 1            40     110    50   60       60   290     40

Btw, there is no missing column for hours? – jezrael Dec 07 '19 at 10:44 — jezrael, Dec 07 '19 at 10:44

jezrael · Answer 1 · 2019-12-07T10:51:54.080

0

Use:

print (df)
   hour  subject_id  hour_measure urinecolor  Respiraory
0     1           3          1.00        red          40
1     1           3          1.15        red          90
2     1           4          2.00     yellow          60

df1 = (df.groupby(['hour_measure','subject_id', 'hour'])
        .agg(['sum','min','max','std', 'count','var','skew']))
print (df1)
                             Respiraory                           
                                    sum min max std count var skew
hour_measure subject_id hour                                      
1.00         3          1            40  40  40 NaN     1 NaN  NaN
1.15         3          1            90  90  90 NaN     1 NaN  NaN
2.00         4          1            60  60  60 NaN     1 NaN  NaN

f = lambda x: next(iter(x.mode()), None)
cols = df.select_dtypes(object).columns
df2 = df.groupby(['hour_measure','subject_id', 'hour'])[cols].agg(f)
df2.columns = pd.MultiIndex.from_product([df2.columns, ['mode']])
print (df2)
                             urinecolor
                                   mode
hour_measure subject_id hour           
1.00         3          1           red
1.15         3          1           red
2.00         4          1        yellow

df3 = pd.concat([df1, df2], axis=1).unstack().reorder_levels([0,2,1], axis=1)
print (df3)
                        Respiraory                            urinecolor
hour                             1                                     1
                               sum min max std count var skew       mode
hour_measure subject_id                                                 
1.00         3                  40  40  40 NaN     1 NaN  NaN        red
1.15         3                  90  90  90 NaN     1 NaN  NaN        red
2.00         4                  60  60  60 NaN     1 NaN  NaN     yellow

edited Dec 07 '19 at 10:51

answered Dec 07 '19 at 10:44

jezrael

822,522
95
1,334
1,252

why we add another column for hour in addition to hour measure ? – mayaaa Dec 07 '19 at 10:50
@Nora - yop, this information is here necessary, not in input dataframe? – jezrael Dec 07 '19 at 10:52
can i send to you some questions in inbox – mayaaa Dec 07 '19 at 10:53
@Nora - Not understand, by email? – jezrael Dec 07 '19 at 10:54
@Nora - sure, cech my email in my profile, only run code – jezrael Dec 07 '19 at 11:01
i couldn't reach to it – mayaaa Dec 07 '19 at 11:03
i want to ask the hour column , i will put input hour column as the integer value of hour_measure – mayaaa Dec 09 '19 at 07:05
@Nora - yes? what is logic for create it? – jezrael Dec 09 '19 at 07:06
@Nora - reason for hours is expected output - then is used hours `1 2 3.... to 24` and no this data in input. – jezrael Dec 09 '19 at 07:07
you mean to add in the csv file column to hour ? right? – mayaaa Dec 09 '19 at 07:09
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/203876/discussion-between-jezrael-and-nora). – jezrael Dec 09 '19 at 07:10
send to you in chat – mayaaa Dec 09 '19 at 07:16

aggregating, grouping and UN stack to many columns

1 Answers1