Reorder columns based on column suffixes

Question

This is my code:

all_data = pd.merge(all_data, meanData, suffixes=["", "_mean"], how='left', on=['id', 'id2'])

Now, I want to merge all_data and meanData, but I want the columns of meanData to appear first.

Like this:

a_mean,b_mean,c_mean,a,b,c

Not like this

a,b,c,a_mean,b_mean,c_mean

Note: I have a lot of columns, So i do not want to manually write code to change index.

Sample Code (you can reproduce):

import pandas
df = pd.DataFrame([[0,1, 2], [0,1, 3], [0,4, 6],[1,3,4],[1,4,2]], columns=['id','A', 'B'])
features = ['A','B']
meanData = df.groupby(['id'])[features].agg('mean')
df = pd.merge(df, meanData, suffixes=["", "_mean"], how='left', on=['id'])
print(df.columns)

Output

Index(['id', 'A', 'B', 'A_mean', 'B_mean'], dtype='object')

Expected output:

Index(['A_mean', 'B_mean','id', 'A', 'B'], dtype='object')

Sorry, I mean `pd.merge(meanData, all_data, suffixes=["_mean", ""], how='right', on=['id', 'id2'])` — cs95, Dec 25 '18 at 14:57
Can you please provide a [mcve] with some data as text and expected output edited into your question? I will reopen your post once it is clear what you have tried and what has not worked. — cs95, Dec 25 '18 at 15:03
@JohnDoe do you use the dataframe `meanData` for something else or it is just to add the columns '_mean' in your original dataframe `df`? — Ben.T, Dec 25 '18 at 15:23
I use it for also calculating `meanRank`, after that I delete `meanData`, to save memory. — John Doe, Dec 25 '18 at 15:28
@JohnDoe my answer does not include the `meanData` as I'm not sure what you mean by `meanRank` — Ben.T, Dec 25 '18 at 15:39
Just `.reindex` to ensure the same column order: `df.reindex(['A_mean', 'B_mean', 'id', 'A', 'B'], axis=1)` — ALollz, Dec 25 '18 at 16:24

Ben.T · Accepted Answer · 2018-12-25T15:40:07.933

2

I think you can use transform after the groupby to get the mean related to each row, and then pd.concat the dataframes such as:

new_df = pd.concat([(df.groupby('id')[features]
                       .transform(np.mean).add_suffix('_mean')), df],
                   axis=1)
print (new_df)
   A_mean    B_mean  id  A  B
0     2.0  3.666667   0  1  2
1     2.0  3.666667   0  1  3
2     2.0  3.666667   0  4  6
3     3.5  3.000000   1  3  4
4     3.5  3.000000   1  4  2

edited Dec 25 '18 at 15:40

answered Dec 25 '18 at 15:37

Ben.T

29,160
6
32
54

I am sorry for not mentioning this earlier, but I use `mean` for `meanRank` also, ` meanDataRank = mean.groupby('id')[features].rank(pct=True).reset_index()` – John Doe Dec 25 '18 at 15:40
@JohnDoe I assume `mean` is what you called `meanData` from your question. If so, I don't see the point of using `groupby` as meanData has only one row per id, no? or do you want to rank the values within each group of id? – Ben.T Dec 25 '18 at 15:51
Ids are not unique. Yeah, i want to calculate meanRank using mean dataframe. – John Doe Dec 26 '18 at 04:10

score 2 · Answer 2 · answered Dec 25 '18 at 16:41

You can merge and reorder columns with sorted():

v = pd.merge(df, meanData, suffixes=["", "_mean"], how='left', on=['id'])
v[sorted(v.columns, key=lambda x: 'mean' not in x)]

   A_mean    B_mean  id  A  B
0     2.0  3.666667   0  1  2
1     2.0  3.666667   0  1  3
2     2.0  3.666667   0  4  6
3     3.5  3.000000   1  3  4
4     3.5  3.000000   1  4  2

Reorder columns based on column suffixes

2 Answers2