I have the following pandas Dataframe:
+---+-----+-----+------+------+------+------+
| | A | B | C_10 | C_20 | D_10 | D_20 |
+---+-----+-----+------+------+------+------+
| 1 | 0.1 | 0.2 | 1 | 2 | 3 | 4 |
| 2 | 0.3 | 0.4 | 5 | 6 | 7 | 8 |
+---+-----+-----+------+------+------+------+
Now I'd like to melt the columns C_10
, C_20
, D_10
, D_20
to obtain a Dataframe as follows:
+---+-----+-----+----+---+---+
| | A | B | N | C | D |
+---+-----+-----+----+---+---+
| 1 | 0.1 | 0.2 | 10 | 1 | 3 |
| 1 | 0.1 | 0.2 | 20 | 2 | 4 |
| 2 | 0.3 | 0.4 | 10 | 5 | 7 |
| 2 | 0.3 | 0.4 | 20 | 6 | 8 |
+---+-----+-----+----+---+---+
Is there a easy way to do this? Thanks!
EDIT: I tried wide_to_long
, but this doesn't work if there are duplicated rows in the Dataframe:
df = pd.DataFrame({
'combination': [1, 1, 2, 2],
'A': [0.1, 0.1, 0.2, 0.2],
'B': [0.3, 0.3, 0.4, 0.4],
'C_10': [1, 5, 6, 7],
'C_20': [2, 6, 7, 8],
'D_10': [3, 7, 8, 9],
'D_20': [4, 8, 9, 10],
})
+--------------------------------------------------+
| combination A B C_10 C_20 D_10 D_20 |
+--------------------------------------------------+
| 0 1 0.1 0.3 1 2 3 4 |
| 1 1 0.1 0.3 5 6 7 8 |
| 2 2 0.2 0.4 6 7 8 9 |
| 3 2 0.2 0.4 7 8 9 10 |
+--------------------------------------------------+
If I use wide_to_long
I get the following error:
> pd.wide_to_long(df, stubnames=['C','D'], i=['combination', 'A', 'B'], j='N', sep='_').reset_index()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-31-cc5863fa7ecc> in <module>
----> 1 pd.wide_to_long(df, stubnames=['C','D'], i=['combination', 'A', 'B'], j='N', sep='_').reset_index()
pandas/core/reshape/melt.py in wide_to_long(df, stubnames, i, j, sep, suffix)
456
457 if df[i].duplicated().any():
--> 458 raise ValueError("the id variables need to uniquely identify each row")
459
460 value_vars = [get_var_names(df, stub, sep, suffix) for stub in stubnames]
ValueError: the id variables need to uniquely identify each row
The parameter i
is described as "Column(s) to use as id variable(s).", but I don't understand what this exactly means.