0

I'm fairly new to Pandas and often find myself stuck in situations that I have no solution to them rather than asking for help. I have read so many tutorials and checked many Stackoverflow topics but none of them asnwered mine since it is very difficult to search for. Would some please help me?

I have an excel file that I read as df. The df is something like this (I simplified it):

pd.DataFrame({'date': ['type', 'model', '2020', '2021', '2022', '2023'],
              'factor1': ['type_1', 'model_1', '1', '2', '3', '4'],
              'factor2': ['type_2', 'model_1', '5', '6', '7', '8'],
              'factor3': ['type_3', 'model_2', '9', '10', '11', '12']
             })

The result is:

    date  factor1  factor2  factor3
0   type   type_1   type_2   type_3
1  model  model_1  model_1  model_2
2   2020        1        5        9
3   2021        2        6       10
4   2022        3        7       11
5   2023        4        8       12

and the original data in the excel file that I based my dataframe definition off of is: Original excel


What I want to produce is something like this:

                     2020  2021  2022  2023
0  model_1   type_1     1     2     3     4
1            type_2     5     6     7     8
2  model_2   type_3     9    10    11    12

I tried any combo of melt, pivot and stack but the problem is for instance 'type' is horizontal but vertically listed under date! When melting, it looks at 'type' as a 'date' which is clearly wrong. Any help would be appreciated! Thanks!

mahb
  • 39
  • 6

1 Answers1

3

I tried any combo of melt, pivot and stack..

Your forgot about T :

tmp = df.T # or df.transpose()

out = (
    tmp.iloc[1:]
        .set_axis(tmp.iloc[0], axis=1)
        .set_index(["model", "type"])
        .rename_axis(index=[None]*2, columns=None) # optional
)

Output :

print(out)

               2020 2021 2022 2023
model_1 type_1    1    2    3    4
        type_2    5    6    7    8
model_2 type_3    9   10   11   12
Timeless
  • 22,580
  • 4
  • 12
  • 30
  • Awesome! Thank you so much. Do you know how can I do this for arbitrarily large dataframe? – mahb Jun 23 '23 at 18:18
  • You're welcome. What do you mean by *large DataFrame* ? The solution is generic (*I suppose*) and must work for any DataFrame with a similar configuration as the given example. You should give it a try first ;) – Timeless Jun 23 '23 at 18:19
  • This worked perfectly. Thanks for the answer. I tried it out now and it was what I was looking for. Cheers! – mahb Jun 23 '23 at 20:43