I'm close to achieve what I want thanks to Python Pandas Groupby/Append columns but still not quite there.
DF:
City | Plan | Problem 1 | Problem 2 | Problem 3 |
---|---|---|---|---|
Genoa | Service 1 | aaa | bbb | ccc |
Genoa | Service 2 | ddd | zzz | yyy |
Genoa | Service 3 | ggg | ppp | jjj |
Venice | Service 2 | lll | vvv | |
Venice | Service 3 | eee | fff | mmm |
Expected Output:
City | Problem 1 | Problem 2 | Problem 3 | Problem 4 | Problem 5 | Problem 6 | Problem 7 | Problem 8 | Problem 9 |
---|---|---|---|---|---|---|---|---|---|
Genoa | aaa | bbb | ccc | ddd | zzz | yyy | ggg | ppp | jjj |
Venice | lll | vvv | eee | fff | mmm |
Basically I want to:
- Group by City
- Discard Plan Column (if possible)
- Append all the other parameters (They still need to be always in order, so if a service is missing the cells would be empty.
After playing a while with unstack and cumcount from the linked solution, I'm still missing something to respect the order of the Plan column and fill with empty cells if a service is missing.
This is the code I'm using:
import pandas as pd
df = pd.read_csv('input.csv')
df1 = df.set_index('City').stack().reset_index(name='vals')
df1['g'] = 'Param' + df1.groupby('City').cumcount().add(1).astype(str)
df1 = df1.pivot(index='City', columns='g', values='vals')
df1.to_csv('output.csv')
In my tests I've removed the Plan column from the input, but the problem is that after ordering the parameters in the output, if, for example, a city has only Service 3, they are still aligned under Service 1.