I am importing a csv file into a pandas dataframe such as:
df = pd.DataFrame( {0: {0: 'ID', 1: '1', 2: '2', 3: '3', 4: '4', 5: '5'}, 1: {0: 'Net Cost', 1: '30', 2: '40', 3: '50', 4: '35', 5: '45'}, 2: {0: 'Charge Description', 1: 'Surcharge A', 2: 'Discount X', 3: 'Discount X', 4: 'Discount X', 5: 'Surcharge A'}, 3: {0: 'Charge Amount', 1: '9.5', 2: '-12.5', 3: '-11.5', 4: '-5.5', 5: '9.5'}, 4: {0: 'Charge Description', 1: 'Discount X', 2: '', 3: '', 4: 'Surcharge B', 5: 'Discount X'}, 5: {0: 'Charge Amount', 1: '-11.5', 2: '', 3: '', 4: '3.5', 5: '-10.5'}, 6: {0: 'Charge Description', 1: 'Discount Y', 2: '', 3: '', 4: '', 5: 'Surcharge B'}, 7: {0: 'Charge Amount', 1: '-3.25', 2: '', 3: '', 4: '', 5: '4.5'}, 8: {0: 'Charge Description', 1: 'Surcharge B', 2: '', 3: '', 4: '', 5: ''}, 9: {0: 'Charge Amount', 1: '2.5', 2: '', 3: '', 4: '', 5: ''}} )
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|
ID | Net Cost | Charge Description | Charge Amount | Charge Description | Charge Amount | Charge Description | Charge Amount | Charge Description | Charge Amount |
1 | 30 | Surcharge A | 9.5 | Discount X | -11.5 | Discount Y | -3.25 | Surcharge B | 2.5 |
2 | 40 | Discount X | -12.5 | ||||||
3 | 50 | Discount X | -11.5 | ||||||
4 | 35 | Discount X | -5.5 | Surcharge B | 3.5 | ||||
5 | 45 | Surcharge A | 9.5 | Discount X | -10.5 | Surcharge B | 4.5 |
The first row are the headers with column names Charge Description and Charge Amount forming pairs and appearing multiple times.
Desired output is a df with a unique column for each description, with the reorganized columns sorted alphabetically and NaNs showing as 0:
ID | Net Cost | Surcharge A | Surcharge B | Discount X | Discount Y |
---|---|---|---|---|---|
1 | 30 | 9.5 | 2.5 | -11.5 | -3.25 |
2 | 40 | 0 | 0 | -12.5 | 0 |
3 | 50 | 0 | 0 | -11.5 | 0 |
4 | 35 | 0 | 3.5 | -5.5 | 0 |
5 | 45 | 9.5 | 4.5 | -10.5 | 0 |
This post looks like a good starting point but then I need a column for each Charge Description and only a single row per ID.