I need to reshape my df.
This is my input df:
import pandas as pd
import datatable as dt
DF_in = dt.Frame(name=['name1', 'name1', 'name1', 'name1', 'name2', 'name2', 'name2', 'name2'],
date=['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05', '2021-01-06', '2021-01-07', '2021-01-08'],
type=['a', 'b', 'a', 'b', 'b', 'a', 'b', 'a'],
value=[1, 2, 3, 4, 5, 6, 7, 8])
| name date type value
-- + ----- ---------- ---- -----
0 | name1 2021-01-01 a 1
1 | name1 2021-01-02 b 2
2 | name1 2021-01-03 a 3
3 | name1 2021-01-04 b 4
4 | name2 2021-01-05 b 5
5 | name2 2021-01-06 a 6
6 | name2 2021-01-07 b 7
7 | name2 2021-01-08 a 8
This is the desired output df:
DF_out = dt.Frame(name=['name1', 'name1', 'name2', 'name2'],
date_a=['2021-01-01', '2021-01-03', '2021-01-06', '2021-01-08'],
date_b=['2021-01-02', '2021-01-04', '2021-01-07', None],
value_a=[1, 3, 6, 8],
value_b=[2, 4, 7, None])
| name date_a date_b value_a value_b
-- + ----- ---------- ---------- ------- -------
0 | name1 2021-01-01 2021-01-02 1 2
1 | name1 2021-01-03 2021-01-04 3 4
2 | name2 2021-01-06 2021-01-07 6 7
3 | name2 2021-01-08 NA 8 NA
If necessary the datatable Frames can be converted into a pandas DataFrame:
DF_in = DF_in.to_pandas()
Transformation:
- This is a grouped transformation. The grouping column is 'name'.
- The df is already sorted
- The number of rows in each group is different and can be even or uneven
- If the first row in a group has a 'b' in the column 'type' it has to be removed (example: row 4 in DF_in)
- It is also possible that the last row in a group has an 'a' in the column 'type', this row should not get lost (example: row 7 in DF_in)
I hope this explanation is understandable.
Thank you in advance