I would like to convert an excel file from wide format into long format.
I'm reading an excel file which has not only two rows of headers, it also includes merged cells in the header. Input Example:
| Task | Name | May,2022 | Jun,2022 | Jul,2022 |
| | |status|type|status|type|status|type|
| ---- | ---- |------|----|------|----|------|----|
| 1 | Tom |1 |AB |1 |DT |0 |AB |
| 2 | John |0 |DT |1 |CC |0 |GH |
Desired Output:
Task | Name | Date | status | type |
---|---|---|---|---|
1 | Tom | May,2022 | 1 | AB |
1 | Tom | Jun,2022 | 1 | DT |
1 | Tom | Jul,2022 | 0 | AB |
2 | John | May,2022 | 0 | DT |
2 | John | Jun,2022 | 1 | CC |
2 | John | Jul,2022 | 0 | GH |
What i have:
import pandas as pd
df = pd.read_excel(path)
df = df.melt(id_vars=["Task","Name"],
var_name="Date",
value_name="Value")
I don't know how to deal with the multi header. I tried combining the two headers.
import pandas as pd
df = pd.read_excel(path, header[1,2])
df.columns = df.columns.to_series().ffill().values
df.columns.value[0]='Task'
df.columns.value[1]='Name'
df = df.melt(id_vars=["Task","Name"],
var_name="Date",
value_name="Value")
This results in having something like: (I'm fine with the date transformation)
Task | Name | (2022-05-01 00:00:00, status) | (2022-05-01 00:00:00, type) |
---|---|---|---|
1 | Tom | 1 | AB |
2 | John | 0 | DT |
The melt function returns a KeyError 'Task'. Even if this would work i would somehow need to split the date from the status and type into seperate columns. Someone knows how to deal with such data?