0

I have a DataFrame with lots of columns that I want to reorganize the order of some columns(not all of them), depending on the value of some other.

I tried to change the order with .loc, when I run the code, it shows what I want, but when I try to set the values with it, it doesn't.

Example, I want to change only the order of the Math_answers that have type_Math A, so the last 2 Ans_Math would be: 'ABC' and 'ACB'

import pandas as pd
import numpy as np

data = pd.DataFrame({'Type_Bio':['A','A','B','A','B'],
                     'Type_Math':['B','B','B','A','A'],
                     'Ans_Bio':['AEC','ABC','AAC','CBA','BCA'],
                     'Ans_Math':['AEC','ABC','AAC','CBA','BCA']})

data
        Type_Bio  Type_Math     Ans_Bio     Ans_Math
0               A       B           AEC             AEC
1               A       B           ABC             ABC
2               B       B           AAC             AAC
3               A       A           CBA             CBA
4               B       A           BCA             BCA

I used this code to split the Math answers:

data = pd.concat([data, data.loc[:,'Ans_Math'].str.split('', expand = True).iloc[:,1:4]], axis = 1)
        Type_Bio    Type_Math   Ans_Bio Ans_Math    1   2   3
0           A           B           AEC AEC         A   E   C
1           A           B           ABC ABC         A   B   C
2           B           B           AAC AAC         A   A   C
3           A           A           CBA CBA         C   B   A
4           B           A           BCA BCA         B   C   A

Then, I noticed that:

data.loc[data.Type_Math == 'A',np.arange(1, 4)][np.arange(3,0,-1)]

Gives what I want(almost)(Only the lines I that want, reversed!)

    3   2   1
3   A   B   C
4   A   C   B

So, the next step is to set the values:

data.loc[data.Type_Math == 'A',np.arange(1, 4)] = data.loc[data.Type_Math == 'A',np.arange(1, 4)][np.arange(3,0,-1)]
data

But the data itself has not changed:

        Type_Bio    Type_Math   Ans_Bio Ans_Math    1   2   3
0           A           B           AEC AEC         A   E   C
1           A           B           ABC ABC         A   B   C
2           B           B           AAC AAC         A   A   C
3           A           A           CBA CBA         C   B   A
4           B           A           BCA BCA         B   C   A

I am aware that are some bugs with setting with copy, but i thought using .loc would not have this issue, somebody could explain and help please?

paimfp
  • 3
  • 4
  • do you want to change the position of the column or the order of the elements in the column based on a condition? – anky Jun 24 '19 at 17:18
  • I've seen this question, and I dont think it's a duplicate https://stackoverflow.com/questions/13148429/how-to-change-the-order-of-dataframe-columns. Here I have to filter for some rows, not change order of columns for all rows – paimfp Jun 24 '19 at 17:32
  • @Yuca If I change the rows like you posted, It would change for all rows, even the Type_Math B, and cannot do that. – paimfp Jun 24 '19 at 17:41
  • agree it's not exactly the same question, this is what you want to do: `data.loc[data.Type_Math == 'A', [1,2,3]] = data.loc[data.Type_Math == 'A', [3,2,1]]` – Yuca Jun 24 '19 at 17:50
  • @Yuca Tried this and didn't change the original Dataframe 'data'. – paimfp Jun 24 '19 at 18:15
  • @cs95 Could you remove the tag of duplicate question? – paimfp Jun 24 '19 at 18:32
  • forgot .values at the end, so `data.loc[data.Type_Math == 'A', [1,2,3]] = data.loc[data.Type_Math == 'A', [3,2,1]].values` – Yuca Jun 24 '19 at 18:32
  • @Yuca Now it's working! Thanks a lot! Could you give more information on why .values? I am searching this all day and could not find this answer. – paimfp Jun 24 '19 at 18:36
  • pandas tries to match things together, so withouth the `.values` the data has both row index and column index, so it is writting contents from the '1' column to the '1' column. When you write `values` there's no index info, so as long as the shapes match, it will write with no regard of the destination order – Yuca Jun 24 '19 at 18:39
  • For anyone who wants to read more and understand about this odd behavior, go to the page: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html and search '.to_numpy()' – paimfp Jun 24 '19 at 18:42
  • 1
    @Yuca That is some precious information... Very useful to know this. Thanks again! – paimfp Jun 24 '19 at 18:45

1 Answers1

0

This is what you're looking for:

data.loc[data.Type_Math == 'A', [1,2,3]] = data.loc[data.Type_Math == 'A', [3,2,1]].values

Output:

  Type_Bio Type_Math Ans_Bio Ans_Math  1  2  3
0        A         B     AEC      AEC  A  E  C
1        A         B     ABC      ABC  A  B  C
2        B         B     AAC      AAC  A  A  C
3        A         A     CBA      CBA  A  B  C
4        B         A     BCA      BCA  A  C  B
Yuca
  • 6,010
  • 3
  • 22
  • 42