2

I have the following pandas Dataframe:

+---+-----+-----+------+------+------+------+
|   |  A  |  B  | C_10 | C_20 | D_10 | D_20 |
+---+-----+-----+------+------+------+------+
| 1 | 0.1 | 0.2 |    1 |    2 |    3 |    4 |
| 2 | 0.3 | 0.4 |    5 |    6 |    7 |    8 |
+---+-----+-----+------+------+------+------+

Now I'd like to melt the columns C_10, C_20, D_10, D_20 to obtain a Dataframe as follows:

+---+-----+-----+----+---+---+
|   |  A  |  B  | N  | C | D |
+---+-----+-----+----+---+---+
| 1 | 0.1 | 0.2 | 10 | 1 | 3 |
| 1 | 0.1 | 0.2 | 20 | 2 | 4 |
| 2 | 0.3 | 0.4 | 10 | 5 | 7 |
| 2 | 0.3 | 0.4 | 20 | 6 | 8 |
+---+-----+-----+----+---+---+

Is there a easy way to do this? Thanks!

EDIT: I tried wide_to_long, but this doesn't work if there are duplicated rows in the Dataframe:

df = pd.DataFrame({
    'combination': [1, 1, 2, 2],
    'A': [0.1, 0.1, 0.2, 0.2],
    'B': [0.3, 0.3, 0.4, 0.4],
    'C_10': [1, 5, 6, 7],
    'C_20': [2, 6, 7, 8],
    'D_10': [3, 7, 8, 9],
    'D_20': [4, 8, 9, 10],
})
+--------------------------------------------------+
|    combination    A    B  C_10  C_20  D_10  D_20 |
+--------------------------------------------------+
| 0            1  0.1  0.3     1     2     3     4 |
| 1            1  0.1  0.3     5     6     7     8 |
| 2            2  0.2  0.4     6     7     8     9 |
| 3            2  0.2  0.4     7     8     9    10 |
+--------------------------------------------------+

If I use wide_to_long I get the following error:

> pd.wide_to_long(df, stubnames=['C','D'], i=['combination', 'A', 'B'], j='N', sep='_').reset_index()


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-31-cc5863fa7ecc> in <module>
----> 1 pd.wide_to_long(df, stubnames=['C','D'], i=['combination', 'A', 'B'], j='N', sep='_').reset_index()

pandas/core/reshape/melt.py in wide_to_long(df, stubnames, i, j, sep, suffix)
    456 
    457     if df[i].duplicated().any():
--> 458         raise ValueError("the id variables need to uniquely identify each row")
    459 
    460     value_vars = [get_var_names(df, stub, sep, suffix) for stub in stubnames]

ValueError: the id variables need to uniquely identify each row

The parameter i is described as "Column(s) to use as id variable(s).", but I don't understand what this exactly means.

jim
  • 311
  • 3
  • 13

1 Answers1

4

Use wide_to_long:

df = pd.wide_to_long(df, stubnames=['C','D'], i=['A','B'], j='N', sep='_').reset_index()
print (df)
     A    B   N  C  D
0  0.1  0.2  10  1  3
1  0.1  0.2  20  2  4
2  0.3  0.4  10  5  7
3  0.3  0.4  20  6  8

EDIT: If possible combinations of A, B columns are not unique is possible create helper column with convert index to column index, apply solution and last remove level index:

df = (pd.wide_to_long(df.reset_index(), 
                      stubnames=['C','D'],
                      i=['index','A','B'], 
                      j='N', 
                      sep='_')
        .reset_index(level=0, drop=True)
        .reset_index())
print (df)

     A    B   N  combination  C   D
0  0.1  0.3  10            1  1   3
1  0.1  0.3  20            1  2   4
2  0.1  0.3  10            1  5   7
3  0.1  0.3  20            1  6   8
4  0.2  0.4  10            2  6   8
5  0.2  0.4  20            2  7   9
6  0.2  0.4  10            2  7   9
7  0.2  0.4  20            2  8  10
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252