0

i am in python i have data-frame of 177 columns that contain patient values for 24 hours as in this

subject_id hour_measure         urinecolor   Respiraory                 
3          1.00                 red          40
3          1.15                 red          90
4          2.00              yellow          60

i want for every hour to calculate some statistics like mean, max, std, skew, etc

as it contain text and numeric columns it can't loop in all data-frame to make aggregation , therefore ,i try to make it for every column like in the following code

 grouped= df.groupby(['Hour_measure','subject_id']).agg({"Heart Rate":['sum','min','max','std', 'count','var','skew']}) 
grouped2= df.groupby(['Hour_measure','subject_id']).agg({"Respiraory":['sum','min','max','std', 'count']})
  #write aggregated values to csv file 
 grouped.coloumns=["_".join(x) for x in grouped.columns.ravel()]
           grouped.to_csv('temp3.csv')

     with open('temp3.csv', 'a') as f:
        grouped2.to_csv(f, header=True)
    # make unstack to convert all to rows               
        df.set_index(['subject_id','Hour_measure']).unstack()

this code works correctly, but the idea i want to use loop to aggregate every numeric column .For every text column choose most frequent value in the hour instead of statistical functions, and also add it to the file which will stacked finally based on subject_id and hour_measure to have finally like this

              heart rate 
                  1                             2              3.... to 24      then the next feature 
subject_id   min    max   std   skwe      min   max   std    
 1            40     110    50   60       60   290     40  
mayaaa
  • 289
  • 1
  • 5
  • 14

1 Answers1

0

Use:

print (df)
   hour  subject_id  hour_measure urinecolor  Respiraory
0     1           3          1.00        red          40
1     1           3          1.15        red          90
2     1           4          2.00     yellow          60

df1 = (df.groupby(['hour_measure','subject_id', 'hour'])
        .agg(['sum','min','max','std', 'count','var','skew']))
print (df1)
                             Respiraory                           
                                    sum min max std count var skew
hour_measure subject_id hour                                      
1.00         3          1            40  40  40 NaN     1 NaN  NaN
1.15         3          1            90  90  90 NaN     1 NaN  NaN
2.00         4          1            60  60  60 NaN     1 NaN  NaN

f = lambda x: next(iter(x.mode()), None)
cols = df.select_dtypes(object).columns
df2 = df.groupby(['hour_measure','subject_id', 'hour'])[cols].agg(f)
df2.columns = pd.MultiIndex.from_product([df2.columns, ['mode']])
print (df2)
                             urinecolor
                                   mode
hour_measure subject_id hour           
1.00         3          1           red
1.15         3          1           red
2.00         4          1        yellow

df3 = pd.concat([df1, df2], axis=1).unstack().reorder_levels([0,2,1], axis=1)
print (df3)
                        Respiraory                            urinecolor
hour                             1                                     1
                               sum min max std count var skew       mode
hour_measure subject_id                                                 
1.00         3                  40  40  40 NaN     1 NaN  NaN        red
1.15         3                  90  90  90 NaN     1 NaN  NaN        red
2.00         4                  60  60  60 NaN     1 NaN  NaN     yellow
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252