Pandas dataframe conditional mean based on column names

Question

It will be the easiest to explain starting with a sample of the dataframe:

    TimeStamp   382.098     382.461     383.185     383.548
    10:28:00    0.012448    0.012362    0.0124485   0.012362
    10:30:00    0.0124135   0.0123965   0.0124135   0.012431
    10:32:00    0.0551035   0.0551725   0.055931    0.0563105
    10:34:00    0.055586    0.0557245   0.056655    0.0569485
    10:36:00    0.055586    0.055776    0.0568105   0.057362

I want my output to be:

    TimeStamp   382         383
    10:28:00    0.012405    0.01240525
    10:30:00    0.012405    0.01242225
    10:32:00    0.05513     0.05612075
    10:34:00    0.05565525  0.05680175
    10:36:00    0.055681    0.05708625

So, I want to look at the column name values and if they are the same up to whole numbers, I want the output col to have the mean of the values for each time index value.

My idea was to use df.round to round the column headers to the nearest whole number and then to use .mean() to somehow apply the mean on axis = 0 for same col headers. But, I get an error using the round function on dataframe index type.

EDIT: based on the answers, I used

df.rename(columns=dict(zip(df.columns[0:], df.columns[0:]\
          .values.astype(float).round().astype(str))),inplace=True)
df = df.groupby(df.columns[0:], axis=1).mean()

And it messes up the column names as well as the values instead of giving me the mean based on col names...no idea why!

Yes? If you need clarification from an answer, please ask for it. Thanks. — cs95, Oct 26 '17 at 18:08

score 12 · Answer 1 · answered Oct 15 '17 at 21:52

12

Use groupby along the 1st axis with a lambda.

df.set_index('TimeStamp', inplace=True)
df.groupby(by=lambda x: int(x.split('.')[0]), axis=1).mean()

                382       383
TimeStamp
10:28:00   0.012405  0.012405
10:30:00   0.012405  0.012422
10:32:00   0.055138  0.056121
10:34:00   0.055655  0.056802
10:36:00   0.055681  0.057086

answered Oct 15 '17 at 21:52

cs95

379,657
97
704
746

This just splits it and doesn't round it! – Brain_overflowed Sep 05 '18 at 15:21
the output here matches the desired output from the question. What are you looking for, if not that? – Andrew Sep 05 '18 at 16:06
@Brain_overflowed It is completely identical to what you posted as your expected output. If something is wrong, you have to explain why. I'd recommend trying the answers first before writing them off... – cs95 Sep 05 '18 at 19:53

score 6 · Answer 2 · answered Oct 15 '17 at 21:42

Rename columns with type conversion, move TimeStamp to index, and then use groupby to get column means:

df.rename(columns=lambda x: int(float(x)) if x!="TimeStamp" else x, inplace=True)
df.set_index("TimeStamp", inplace=True)

df
                382       382       383       383
TimeStamp                                        
10:28:00   0.012448  0.012362  0.012448  0.012362
10:30:00   0.012414  0.012396  0.012414  0.012431
10:32:00   0.055103  0.055172  0.055931  0.056310
10:34:00   0.055586  0.055725  0.056655  0.056948
10:36:00   0.055586  0.055776  0.056810  0.057362


df.groupby(df.columns, axis=1).mean()

                382       383
TimeStamp                    
10:28:00   0.012405  0.012405
10:30:00   0.012405  0.012422
10:32:00   0.055138  0.056121
10:34:00   0.055655  0.056802
10:36:00   0.055681  0.057086

BENY · Answer 3 · 2018-09-05T16:02:51.933

5

with np.floor rename and groupby

df.rename(columns=dict(zip(df.columns[1:], np.floor(df.columns[1:].values.astype(float)).astype(str))),inplace=True)
df.set_index('TimeStamp').groupby(level=0,axis=1).mean().reset_index()
Out[171]: 
  TimeStamp     382.0     383.0
0  10:28:00  0.012405  0.012405
1  10:30:00  0.012405  0.012422
2  10:32:00  0.055138  0.056121
3  10:34:00  0.055655  0.056802
4  10:36:00  0.055681  0.057086

edited Sep 05 '18 at 16:02

answered Oct 15 '17 at 21:44

BENY

317,841
20
164
234

Sigh, it's the same thing with this user. Accepts my answer first, and then un-accepts. – cs95 Oct 26 '17 at 18:11
2

@cᴏʟᴅsᴘᴇᴇᴅ i still prefer your solution ~ :-) – BENY Oct 26 '17 at 18:15
1

OPs are fickle beasts. They don't know what they want. – cs95 Oct 26 '17 at 18:16
Sorry to get back to this again, but why would you reset the index at the end? The first line rounds it but the second line messes up the col names and the value...did not accomplish the goal :( – Brain_overflowed Sep 05 '18 at 15:27
@Brain_overflowed I am reset the index cause I set_index before . – BENY Sep 05 '18 at 15:33
1

@Brain_overflowed and this is one year after question .. LOL – BENY Sep 05 '18 at 15:40
@Wen I know...sorry but I was newer to python and didn't know what I was doing. Now I am working on a different project and needed the same thing – Brain_overflowed Sep 05 '18 at 15:44
1

The question is too old to have a bounty on it IMO. And the solution described works. – shiv_90 Sep 11 '18 at 08:30

score 3 · Answer 4 · answered Sep 10 '18 at 06:02

3

Another method is via pd.to_numeric, just a slight variant of @coldspeed's answer i.e

df = df.set_index('TimeStamp')

df.groupby(pd.to_numeric(df.columns).astype(int),1).mean()

            382       383
TimeStamp                    
10:28:00   0.012405  0.012405
10:30:00   0.012405  0.012422
10:32:00   0.055138  0.056121
10:34:00   0.055655  0.056802
10:36:00   0.055681  0.057086

answered Sep 10 '18 at 06:02

Bharath M Shetty

30,075
6
57
108

1

Uv'd sorry for the delay – cs95 Sep 11 '18 at 19:42

HimanshuGahlot · Answer 5 · 2018-09-14T01:20:20.907

3

Generalised solution

df = pd.DataFrame({383.045:[1,2], 383.96:[3,4], 383.78:[5,5], 343:[9,11]})
df.columns = [int(i) for i in df.columns]
for i in set(df.columns):
    if len(df[i].shape) == 2:
        mean = df[i].T.sum()/float(df[i].shape[1])
        df = df.drop([i],1)
        df[i] = mean

edited Sep 14 '18 at 01:20

answered Sep 11 '18 at 14:27

HimanshuGahlot

561
4
11

why is this better? – Yuca Sep 11 '18 at 18:31
1

I highly doubt it is. Please don't astroturf, you don't decide if your solution is better or not. Leave that to OP and the voters. Don't beg for upvotes either, that's bad form here. – cs95 Sep 11 '18 at 19:43
Thanks @coldspeed, i will keep this thing in mind :) – HimanshuGahlot Sep 12 '18 at 11:17

Alexander · Answer 6 · 2018-09-12T15:31:50.747

To round the column values to the nearest integer, you can group on a list comprehension that rounds each column (barring the first which is TimeStamp) to the nearest whole number and then takes the integer:

>>> (df
     .set_index('TimeStamp')
     .groupby([int(round(col, 0)) for col in df.columns[1:].astype(float)], axis=1)
     .mean())
                382       383       384
TimeStamp                              
10:28:00   0.012405  0.012448  0.012362
10:30:00   0.012405  0.012414  0.012431
10:32:00   0.055138  0.055931  0.056310
10:34:00   0.055655  0.056655  0.056948
10:36:00   0.055681  0.056810  0.057362

Pandas dataframe conditional mean based on column names

6 Answers6