1

I have below data in an Excel file with column marks & sub column $

        |   Marks   |
______________________
Student | $ |%  |#  |
______________________
Joy     | 7 |2  |0  |
RED     | 9 |2  |5  |
CTL     | 5 |0  |4  |

I want to convert column into row in below format:

Student | Marks | Symbol |
--------|-------|--------|
Joy     | 7     |    $   |
Joy     | 2     |    %   |
Joy     | 0     |    #   |   
RED     | 9     |    $   | 
RED     | 2     |    %   |
RED     | 5     |    #   | 
CTL     | 5     |    $   |
CTL     | 0     |    %   | 
CTL     | 4     |    #   |

For converting into row I am using this code for conversion. I want the above output after converting from column to row but getting some random value. What needs to be done to get the above output?

import numpy as np
import pandas as pd

df = pd.read_excel('test.xlsx')
tdf = df.reset_index( col_level = 1 )
tdf

dfs = pd.melt( tdf, id_vars = ['Student'], value_vars = [ '$', '%', '#' ], col_level = 1, var_name = 'Symbol' )
dfs = dfs.rename( columns = { 'value': 'Marks' } )
Sonia Samipillai
  • 590
  • 5
  • 15
  • 1
    Welcome to SO! What do you mean, "melt is not correct"? Can you post the full traceback you get? – gimix Sep 03 '21 at 14:02
  • My guess is the issue is that you have a `MultiIndex` and aren't specifying the full value of the index in the `melt` method. – bicarlsen Sep 04 '21 at 21:24
  • Another issue could be if the `Student` column is the DataFrame's index, you can't use it as an element in `id_vars` in `melt`. – bicarlsen Sep 04 '21 at 21:28
  • @gimix I want mentioned output after converting from coloumn to row but getting some random value. can you help me with this or instant of melt what else we can use – Teena Agrawal Sep 05 '21 at 13:36
  • @bicarlsen i am not sure in melt function as its not working for me, can you help me with this or instant of melt what else we can use – Teena Agrawal Sep 05 '21 at 13:37
  • @TeenaAgrawal also add the output you are getting and share df.columns output. – Chandan Sep 09 '21 at 08:47

2 Answers2

0

The first thing to do is move Student into the DataFrame if it is the index.

tdf = df.reset_index( col_level = 1 )

Then you can perform the pd.melt.

dfs = pd.melt( tdf, id_vars = [ 'Student' ], value_vars = [ '$', '%', '#' ], col_level = 1, var_name = 'Symbol' )
dfs = dfs.rename( columns = { 'value': 'Marks' } )
bicarlsen
  • 1,241
  • 10
  • 27
  • Thanks but after applying this code i am getting again error: "The following 'id_vars' are not present in the DataFrame: ['Student']" – Teena Agrawal Sep 06 '21 at 08:07
  • The error seems quite descriptive. Ensure that the `Student` data is a column of the DataFrame and not the index, and that everything is spelled correctly. If you still can't get it working correclty your can pickle the DataFrame and post a link to it in your question so we can look at it directly. – bicarlsen Sep 06 '21 at 10:25
  • This is my full code which i am using : ' import numpy as np import pandas as pd df = pd.read_excel('test.xlsx') tdf = df.reset_index( col_level = 1 ) tdf dfs = pd.melt( tdf, id_vars = ['Student'], value_vars = [ '$', '%', '#' ], col_level = 1, var_name = 'Symbol' ) dfs = dfs.rename( columns = { 'value': 'Marks' } )' addedd code in question also – Teena Agrawal Sep 06 '21 at 10:38
  • i have added my complete code in question too – Teena Agrawal Sep 06 '21 at 10:41
  • Without having access to your DataFrame I wonlt be able to debug any further. Again, I would suggest checking the format fo the DataFrame. – bicarlsen Sep 06 '21 at 14:10
  • I have added my complete code so now you can create same at your end too – Teena Agrawal Sep 06 '21 at 15:02
  • Yes, that is not the issue. The issue is that you need to provide access to the `test.xlsx` file, or the DataFrame that you are trying to manipulate (e.g. using `pickle`). – bicarlsen Sep 06 '21 at 15:21
  • test file is same as I provided in question. I named it as test.xlsx file. – Teena Agrawal Sep 06 '21 at 16:20
  • If it's the same then the code should work. I've tested it duplicating your exact DataFrame structure using `Student` as an index for the original DataFrame and having a column MultiIndex of tyoe ( `Mark`, `symbol` ). This is why I was asking for the actual data file you are testing it with. – bicarlsen Sep 06 '21 at 17:10
0
# Sample df
symbols = np.array(['$', '%', '#'])
data = [
    ['Joy', 7,2,0],
    ['Red', 9,2,5],
    ['CTL', 5,0,4]
]

id_vars = [('', 'Student')]
value_vars = [('Marks', i) for i in symbols]
columns = pd.MultiIndex.from_tuples(id_vars + value_vars)
df = pd.DataFrame(data, columns = columns)

# df:
#|           Marks      
#|   Student     $  %  #
#| 0     Joy     7  2  0
#| 1     Red     9  2  5
#| 2     CTL     5  0  4
tdf = df.reset_index()
dfs = pd.melt( tdf, id_vars = [ 'Student' ], value_vars = [ '$', '%', '#' ], col_level = 1, var_name = 'Symbol' )
dfs = dfs.rename( columns = { 'value': 'Marks' } )

# dfs:
#|   Student Symbol  Marks
#| 0     Joy      $      7
#| 1     Red      $      9
#| 2     CTL      $      5
#| 3     Joy      %      2
#| 4     Red      %      2
#| 5     CTL      %      0
#| 6     Joy      #      0
#| 7     Red      #      5
#| 8     CTL      #      4
ntdata = np.hstack((df.T.values[1:], np.array_split(symbols, len(symbols))))
ndf = pd.DataFrame(ntdata)

ncolumns = df[('', 'Student')].values.tolist()
ndf.columns = ncolumns + ['Symbol']
ndf = ndf.melt(id_vars = [ 'Symbol' ], value_vars = ncolumns, var_name = 'Student' )
ndf = ndf.rename( columns = { 'value': 'Marks' } )

# ndf:
#|   Symbol Student  Marks
#| 0      $     Joy      7
#| 1      %     Joy      2
#| 2      #     Joy      0
#| 3      $     Red      9
#| 4      %     Red      2
#| 5      #     Red      5
#| 6      $     CTL      5
#| 7      %     CTL      0
#| 8      #     CTL      4

Here working example

Chandan
  • 11,465
  • 1
  • 6
  • 25