1

I find this a rather complex challenge, as I need to merge multiple columns in a dataframe together and then pivot the table multiple times (I think).

So the provided input is this dataframe which I have filtered out:

                       name      X      Y
0      Mathematics House AB  0.123  0.111
2111   Physical Science Hut  0.124  0.112
9412   Primary Education LO  0.125  0.113
1234   Tertiary Universitas  0.126  0.114
12411  Principle of Physics  0.127  0.115
12373  Chemical Industry A1  0.128  0.116

and the output should be something that looks like this:

                  label  Z      
   Mathematics House AB  X  0.123
                         Y  0.111
   Physical Science Hut  X  0.124
                         Y  0.112
   Primary Education LO  X  0.125
                         Y  0.113
   Tertiary Universitas  X  0.126
                         Y  0.114
   Principle of Physics  X  0.127
                         Y  0.115
   Chemical Industry A1  X  0.128
                         Y  0.116

Where Z stands for a new column that has yet to be created. I'm currently using a very hacky technique that takes some columns as numpy arrays and try to reconstruct it. And the results aren't pretty and what is expected. Is there a way to manipulate the dataframe directly without using numpy? It seems like a tool for pivoting multiple times. The current method I have is df.pivot(index='name', columns='Z').T.unstack().T, where I previously let df['Z'] = '' -- very ugly and hacky, indeed, and it doesn't get what I want to present in.

oldselflearner1959
  • 633
  • 1
  • 5
  • 22

1 Answers1

1

This is stack not pivot

df.set_index('name').stack()
Out[186]: 
name                  
MathematicsHouseAB   X    0.123
                     Y    0.111
PhysicalScienceHut   X    0.124
                     Y    0.112
PrimaryEducationLO   X    0.125
                     Y    0.113
TertiaryUniversitas  X    0.126
                     Y    0.114
PrincipleofPhysics   X    0.127
                     Y    0.115
ChemicalIndustryA1   X    0.128
                     Y    0.116
dtype: float64

EDIT:

df=df.set_index('name').stack()
df.index.names=['name', 'Z']
df
Out[263]: 
                           0
name                Z       
MathematicsHouseAB  X  0.123
                    Y  0.111
PhysicalScienceHut  X  0.124
                    Y  0.112
PrimaryEducationLO  X  0.125
                    Y  0.113
TertiaryUniversitas X  0.126
                    Y  0.114
PrincipleofPhysics  X  0.127
                    Y  0.115
ChemicalIndustryA1  X  0.128
                    Y  0.116
BENY
  • 317,841
  • 20
  • 164
  • 234
  • Thanks for your reply. Is there a way to perform the 2nd part of the task by relabelling the column of X,Y and the float column (2nd and 3rd columns in all) with the label Z? – oldselflearner1959 Dec 14 '17 at 20:02
  • I would need a dataframe. I have tried converting it into dataframe using `to_frame()` and then `df.columns = ['Z']` but it appears that the Z is on top of the float instead of X and Ys. However, I would like to know how it is done in Series also just for learning. It would be excellent to be flexible in two approaches. – oldselflearner1959 Dec 14 '17 at 20:23
  • For the third column, is there a specific column name for it? is there a way to combine these two columns under the one column Z? Would it be easier to do this in series (and if so, how?)? – oldselflearner1959 Dec 14 '17 at 20:37
  • @oldselflearner1959 it is time to clarify , series only have one column ...others are index, not columns – BENY Dec 14 '17 at 20:39
  • so sorry i think I have mistaken dataframe for series, since I wanted the columns to "merge" into only one column, which means its really a series rather than a dataframe. – oldselflearner1959 Dec 14 '17 at 20:45
  • 1
    I managed to find that rename_axis=['name', 'Z'] works. Thank you. – oldselflearner1959 Dec 14 '17 at 20:54