0

This is my code:

all_data = pd.merge(all_data, meanData, suffixes=["", "_mean"], how='left', on=['id', 'id2'])

Now, I want to merge all_data and meanData, but I want the columns of meanData to appear first.

Like this:

a_mean,b_mean,c_mean,a,b,c

Not like this

a,b,c,a_mean,b_mean,c_mean

Note: I have a lot of columns, So i do not want to manually write code to change index.

Sample Code (you can reproduce):

import pandas
df = pd.DataFrame([[0,1, 2], [0,1, 3], [0,4, 6],[1,3,4],[1,4,2]], columns=['id','A', 'B'])
features = ['A','B']
meanData = df.groupby(['id'])[features].agg('mean')
df = pd.merge(df, meanData, suffixes=["", "_mean"], how='left', on=['id'])
print(df.columns)

Output

Index(['id', 'A', 'B', 'A_mean', 'B_mean'], dtype='object')

Expected output:

Index(['A_mean', 'B_mean','id', 'A', 'B'], dtype='object')

cs95
  • 379,657
  • 97
  • 704
  • 746
John Doe
  • 437
  • 1
  • 5
  • 14
  • Sorry, I mean `pd.merge(meanData, all_data, suffixes=["_mean", ""], how='right', on=['id', 'id2'])` – cs95 Dec 25 '18 at 14:57
  • Same result @coldspeed :/ – John Doe Dec 25 '18 at 15:02
  • Can you please provide a [mcve] with some data as text and expected output edited into your question? I will reopen your post once it is clear what you have tried and what has not worked. – cs95 Dec 25 '18 at 15:03
  • 1
    Done @coldspeed – John Doe Dec 25 '18 at 15:09
  • @JohnDoe do you use the dataframe `meanData` for something else or it is just to add the columns '_mean' in your original dataframe `df`? – Ben.T Dec 25 '18 at 15:23
  • I use it for also calculating `meanRank`, after that I delete `meanData`, to save memory. – John Doe Dec 25 '18 at 15:28
  • @JohnDoe my answer does not include the `meanData` as I'm not sure what you mean by `meanRank` – Ben.T Dec 25 '18 at 15:39
  • Just `.reindex` to ensure the same column order: `df.reindex(['A_mean', 'B_mean', 'id', 'A', 'B'], axis=1)` – ALollz Dec 25 '18 at 16:24

2 Answers2

2

I think you can use transform after the groupby to get the mean related to each row, and then pd.concat the dataframes such as:

new_df = pd.concat([(df.groupby('id')[features]
                       .transform(np.mean).add_suffix('_mean')), df],
                   axis=1)
print (new_df)
   A_mean    B_mean  id  A  B
0     2.0  3.666667   0  1  2
1     2.0  3.666667   0  1  3
2     2.0  3.666667   0  4  6
3     3.5  3.000000   1  3  4
4     3.5  3.000000   1  4  2
Ben.T
  • 29,160
  • 6
  • 32
  • 54
  • I am sorry for not mentioning this earlier, but I use `mean` for `meanRank` also, ` meanDataRank = mean.groupby('id')[features].rank(pct=True).reset_index()` – John Doe Dec 25 '18 at 15:40
  • @JohnDoe I assume `mean` is what you called `meanData` from your question. If so, I don't see the point of using `groupby` as meanData has only one row per id, no? or do you want to rank the values within each group of id? – Ben.T Dec 25 '18 at 15:51
  • Ids are not unique. Yeah, i want to calculate meanRank using mean dataframe. – John Doe Dec 26 '18 at 04:10
2

You can merge and reorder columns with sorted():

v = pd.merge(df, meanData, suffixes=["", "_mean"], how='left', on=['id'])
v[sorted(v.columns, key=lambda x: 'mean' not in x)]

   A_mean    B_mean  id  A  B
0     2.0  3.666667   0  1  2
1     2.0  3.666667   0  1  3
2     2.0  3.666667   0  4  6
3     3.5  3.000000   1  3  4
4     3.5  3.000000   1  4  2
cs95
  • 379,657
  • 97
  • 704
  • 746