1

I have a data frame that I want to transform from wide into a long format. But I do not want to use all columns.
In detail, I want to melt the following data frame

import pandas as pd
data = {'year': [2014, 2018,2020,2017], 
        'model':[12, 14,21,8],
        'amount': [100, 120,80,210],
        'quality': ["low", "high","medium","high"]
       }

# pass column names in the columns parameter 
df = pd.DataFrame.from_dict(data)
print(df)

into this data frame:

data2 = {'year': [2014, 2014, 2018, 2018, 2020, 2020, 2017, 2017], 
        'variable': ["model", "amount", "model", "amount", "model", "amount", "model", "amount"],
        'value':[12, 100, 14, 120, 21, 80, 8, 210],
        'quality': ["low", "low", "high", "high", "medium", "medium", "high", "high"]
       }

# pass column names in the columns parameter 
df2 = pd.DataFrame.from_dict(data2)
print(df2)

I tried pd.melt() with different combinations of the input parameters, and it works somehow if I do not take the quality colum into consideration. But according to the result, I can not skip the quality column. Furthermore, I tried df.pivot(), df.pivot_table(), and pd.wide_to_long(). All in several combinations. But somehow, I do not get the desired result. Maybe pushing the columns year and quality into the data frame index would help, before performing any pd.melt() operations?

Thank you very much for your help in advance!

chris_tri
  • 67
  • 4

1 Answers1

1
import pandas as pd

data = {'year': [2014, 2018,2020,2017],
        'model':[12, 14,21,8],
        'amount': [100, 120,80,210],
        'quality': ["low", "high","medium","high"]
       }

# pass column names in the columns parameter
df = pd.DataFrame.from_dict(data)
print(df)

data2 = {'year': [2014, 2014, 2018, 2018, 2020, 2020, 2017, 2017],
        'variable': ["model", "amount", "model", "amount", "model", "amount", "model", "amount"],
        'value':[12, 100, 14, 120, 21, 80, 8, 210],
        'quality': ["low", "low", "high", "high", "medium", "medium", "high", "high"]
       }

# pass column names in the columns parameter
df2 = pd.DataFrame.from_dict(data2)
print(df2)

df3 = pd.melt(df, id_vars=['year', 'quality'], var_name='variable', value_name='value')
df3 = df3[['year', 'variable', 'value', 'quality']]
df3.sort_values('year', inplace=True)

print(df3)

Output (for df3):

   year variable  value quality
0  2014    model     12     low
4  2014   amount    100     low
3  2017    model      8    high
7  2017   amount    210    high
1  2018    model     14    high
5  2018   amount    120    high
2  2020    model     21  medium
6  2020   amount     80  medium
David M.
  • 4,518
  • 2
  • 20
  • 25