14

It will be the easiest to explain starting with a sample of the dataframe:

    TimeStamp   382.098     382.461     383.185     383.548
    10:28:00    0.012448    0.012362    0.0124485   0.012362
    10:30:00    0.0124135   0.0123965   0.0124135   0.012431
    10:32:00    0.0551035   0.0551725   0.055931    0.0563105
    10:34:00    0.055586    0.0557245   0.056655    0.0569485
    10:36:00    0.055586    0.055776    0.0568105   0.057362

I want my output to be:

    TimeStamp   382         383
    10:28:00    0.012405    0.01240525
    10:30:00    0.012405    0.01242225
    10:32:00    0.05513     0.05612075
    10:34:00    0.05565525  0.05680175
    10:36:00    0.055681    0.05708625

So, I want to look at the column name values and if they are the same up to whole numbers, I want the output col to have the mean of the values for each time index value.

My idea was to use df.round to round the column headers to the nearest whole number and then to use .mean() to somehow apply the mean on axis = 0 for same col headers. But, I get an error using the round function on dataframe index type.

EDIT: based on the answers, I used

df.rename(columns=dict(zip(df.columns[0:], df.columns[0:]\
          .values.astype(float).round().astype(str))),inplace=True)
df = df.groupby(df.columns[0:], axis=1).mean()

And it messes up the column names as well as the values instead of giving me the mean based on col names...no idea why!

Brain_overflowed
  • 426
  • 1
  • 5
  • 21

6 Answers6

12

Use groupby along the 1st axis with a lambda.

df.set_index('TimeStamp', inplace=True)
df.groupby(by=lambda x: int(x.split('.')[0]), axis=1).mean()

                382       383
TimeStamp
10:28:00   0.012405  0.012405
10:30:00   0.012405  0.012422
10:32:00   0.055138  0.056121
10:34:00   0.055655  0.056802
10:36:00   0.055681  0.057086
cs95
  • 379,657
  • 97
  • 704
  • 746
  • This just splits it and doesn't round it! – Brain_overflowed Sep 05 '18 at 15:21
  • the output here matches the desired output from the question. What are you looking for, if not that? – Andrew Sep 05 '18 at 16:06
  • @Brain_overflowed It is completely identical to what you posted as your expected output. If something is wrong, you have to explain why. I'd recommend trying the answers first before writing them off... – cs95 Sep 05 '18 at 19:53
6

Rename columns with type conversion, move TimeStamp to index, and then use groupby to get column means:

df.rename(columns=lambda x: int(float(x)) if x!="TimeStamp" else x, inplace=True)
df.set_index("TimeStamp", inplace=True)

df
                382       382       383       383
TimeStamp                                        
10:28:00   0.012448  0.012362  0.012448  0.012362
10:30:00   0.012414  0.012396  0.012414  0.012431
10:32:00   0.055103  0.055172  0.055931  0.056310
10:34:00   0.055586  0.055725  0.056655  0.056948
10:36:00   0.055586  0.055776  0.056810  0.057362


df.groupby(df.columns, axis=1).mean()

                382       383
TimeStamp                    
10:28:00   0.012405  0.012405
10:30:00   0.012405  0.012422
10:32:00   0.055138  0.056121
10:34:00   0.055655  0.056802
10:36:00   0.055681  0.057086
andrew_reece
  • 20,390
  • 3
  • 33
  • 58
5

with np.floor rename and groupby

df.rename(columns=dict(zip(df.columns[1:], np.floor(df.columns[1:].values.astype(float)).astype(str))),inplace=True)
df.set_index('TimeStamp').groupby(level=0,axis=1).mean().reset_index()
Out[171]: 
  TimeStamp     382.0     383.0
0  10:28:00  0.012405  0.012405
1  10:30:00  0.012405  0.012422
2  10:32:00  0.055138  0.056121
3  10:34:00  0.055655  0.056802
4  10:36:00  0.055681  0.057086
BENY
  • 317,841
  • 20
  • 164
  • 234
3

Another method is via pd.to_numeric, just a slight variant of @coldspeed's answer i.e

df = df.set_index('TimeStamp')

df.groupby(pd.to_numeric(df.columns).astype(int),1).mean()

            382       383
TimeStamp                    
10:28:00   0.012405  0.012405
10:30:00   0.012405  0.012422
10:32:00   0.055138  0.056121
10:34:00   0.055655  0.056802
10:36:00   0.055681  0.057086
Bharath M Shetty
  • 30,075
  • 6
  • 57
  • 108
3

Generalised solution

df = pd.DataFrame({383.045:[1,2], 383.96:[3,4], 383.78:[5,5], 343:[9,11]})
df.columns = [int(i) for i in df.columns]
for i in set(df.columns):
    if len(df[i].shape) == 2:
        mean = df[i].T.sum()/float(df[i].shape[1])
        df = df.drop([i],1)
        df[i] = mean
HimanshuGahlot
  • 561
  • 4
  • 11
2

To round the column values to the nearest integer, you can group on a list comprehension that rounds each column (barring the first which is TimeStamp) to the nearest whole number and then takes the integer:

>>> (df
     .set_index('TimeStamp')
     .groupby([int(round(col, 0)) for col in df.columns[1:].astype(float)], axis=1)
     .mean())
                382       383       384
TimeStamp                              
10:28:00   0.012405  0.012448  0.012362
10:30:00   0.012405  0.012414  0.012431
10:32:00   0.055138  0.055931  0.056310
10:34:00   0.055655  0.056655  0.056948
10:36:00   0.055681  0.056810  0.057362
Alexander
  • 105,104
  • 32
  • 201
  • 196