0

I have pandas dataframe as below:

import pandas as pd
import numpy as np
df = pd.DataFrame({'CATEGORY': [1, 1, 2, 2],
                    'GROUP': ['A', 'A', 'B', 'B'],
                     'XYZ': [3000, 2500, 3000, 3000],
                  'VAL': [3000, 2500, 3000, 3000],
                  'A_CLASS': [3000, 2500, 3000, 3000],
                  'B_CAL': [3000, 4500, 3000, 1000],
                  'C_CLASS': [3000, 2500, 3000, 3000],
                  'A_CAL': [3000, 2500, 3000, 3000],
                  'B_CLASS': [3000, 4500, 3000, 500],
                  'C_CAL': [3000, 2500, 3000, 3000],
                  'ABC': [3000, 2500, 3000, 3000]})
df

CATEGORY   GROUP   XYZ   VAL    A_CLASS  B_CAL  C_CLASS   A_CAL   B_CLASS   C_CAL  ABC  
1          A       3000   1     3000     3000     3000     3000    3000     3000   3000
1          A       2500   2     2500     4500     2500     2500    4500     2500   2500
2          B       3000   4     3000     3000     3000     3000    3000     3000   3000
2          B       3000   1     3000     1000     3000     3000    500      3000   3000

I want columns in below order in my final dataframe

GROUP, CATEGORY, all columns with suffix "_CAL", all columns with suffix "_CLASS", all other fields

My expected output:

GROUP    CATEGORY   B_CAL    A_CAL   C_CAL   A_CLASS   C_CLASS    B_CLASS   XYZ   VAL   ABC 
A        1          3000     3000    3000    3000      3000       3000      3000   1    3000
A        1          4500     2500    2500    2500      2500       4500      2500   2    2500
A        1          8000     7000    8000    8000      8000       8000      8000   5    8000
B        2          3000     3000    3000    3000      3000       3000      3000   4    3000
B        2          1000     3000    3000    3000      3000       500       3000   1    3000
yatu
  • 86,083
  • 12
  • 84
  • 139
Shanoo
  • 1,185
  • 1
  • 11
  • 38
  • you can create a new dataframe with the order of columns you like. in your case `cols = ['GORUP', 'CATEGORY', 'OTHER_COLUMNS']` then `newdf = df[cols]` – XXavier Apr 03 '20 at 15:38
  • `df.reindex(cols_sorted, axis=1)` is also an option, see the answers for what `cols_sorted` should be. (I think it might be preferred over `df[cols_sorted]`, see [doc](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reindex.html)). – Snow bunting Apr 03 '20 at 15:46

2 Answers2

3

Fun with sorted:

first = ['GROUP','CATEGORY']
cols = sorted(df.columns.difference(first),
              key=lambda x: (not x.endswith('_CAL'), not x.endswith('_CLASS')))

df[first+cols]

   GROUP  CATEGORY  A_CAL  B_CAL  C_CAL  A_CLASS  B_CLASS  C_CLASS   ABC   VAL  \
0     A         1   3000   3000   3000     3000     3000     3000  3000  3000   
1     A         1   2500   4500   2500     2500     4500     2500  2500  2500   
2     B         2   3000   3000   3000     3000     3000     3000  3000  3000   
3     B         2   3000   1000   3000     3000      500     3000  3000  3000   

    XYZ  
0  3000  
1  2500  
2  3000  
3  3000  

For more details here's a similar one with a detailed explanation

yatu
  • 86,083
  • 12
  • 84
  • 139
  • Well there's not a clear definition of how the rest of the columns should be ordered, si I'm just going with alphabetical order :) @rpanai – yatu Apr 03 '20 at 15:50
  • 1
    Ohh missed `group`! @rpanai Updating then – yatu Apr 03 '20 at 15:51
2

You just need to play with strings

cols = df.columns
cols_sorted = ["GROUP", "CATEGORY"] +\
              [col for col in cols if col.endswith('_CAL')] +\
              [col for col in cols if col.endswith('_CLASS')]
cols_sorted += sorted([col for col in cols if col not in cols_sorted])

df = df[cols_sorted]
rpanai
  • 12,515
  • 2
  • 42
  • 64