Melt a pandas DataFrame

Question

I have a pandas DataFrame like this:

df = pd.DataFrame({'custid':[1,2,3,4],
...: 'prod1':['jeans','tshirt','jacket','tshirt'],
...: 'prod1_hnode1':[1,2,3,2],
...: 'prod1_hnode2':[6,7,8,7],
...: 'prod2':['tshirt','jeans','jacket','shirt'],
...: 'prod2_hnode1':[2,1,3,4],
...: 'prod2_hnode2':[7,6,8,7]})

In [54]: df
Out[54]: 
    custid   prod1  prod1_hnode1  prod1_hnode2   prod2  prod2_hnode1  \
0       1   jeans             1             6  tshirt             2   
1       2  tshirt             2             7   jeans             1   
2       3  jacket             3             8  jacket             3   
3       4  tshirt             2             7   shirt             4   

   prod2_hnode2  
0             7  
1             6  
2             8  
3             7

How can I convert this to the following format:

dfnew = pd.DataFrame({'custid':[1,1,2,2,3,3,4,4],
...: 'prod':['prod1','prod2','prod1','prod2','prod1','prod2','prod1','prod2'],
...: 'rec':['jeans','tshirt','tshirt','jeans','jacket','jacket','tshirt','shirt'],
...: 'hnode1':[1,2,2,1,3,3,2,4],
...: 'hnode2':[6,7,7,6,8,8,7,7]})


In [56]: dfnew
Out[56]: 
   custid  hnode1  hnode2   prod     rec
0       1       1       6  prod1   jeans
1       1       2       7  prod2  tshirt
2       2       2       7  prod1  tshirt
3       2       1       6  prod2   jeans
4       3       3       8  prod1  jacket
5       3       3       8  prod2  jacket
6       4       2       7  prod1  tshirt
7       4       4       7  prod2   shirt

Your question is not clear enough. Have you tried [pivoting](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.pivot.html) ? — Souradeep Nanda, Jan 12 '18 at 10:07
@SouradeepNanda I think pivoting is the opposite of what I was trying to achieve. — tushaR, Jan 12 '18 at 10:10

jezrael · Accepted Answer · 2018-01-12T10:09:24.293

5

Use:

set_index by column custid
create MultiIndex in columns by split
replace NaNs in columns by rec
stack by first level
reset_index for columns from MultiIndex
rename column

df = df.set_index('custid')
df.columns = df.columns.str.split('_', expand=True)
df = df.rename(columns={np.nan:'rec'})
cols = ['custid','hnode1','hnode2','prod','rec']
df = df.stack(0).reset_index().rename(columns={'level_1':'prod'}).reindex(columns=cols)
print (df)
   custid  hnode1  hnode2   prod     rec
0       1       1       6  prod1   jeans
1       1       2       7  prod2  tshirt
2       2       2       7  prod1  tshirt
3       2       1       6  prod2   jeans
4       3       3       8  prod1  jacket
5       3       3       8  prod2  jacket
6       4       2       7  prod1  tshirt
7       4       4       7  prod2   shirt

edited Jan 12 '18 at 10:09

answered Jan 12 '18 at 10:08

jezrael

822,522
95
1,334
1,252

Very nice solution ! – MaxU - stand with Ukraine Jan 12 '18 at 10:09
@MaxU - Thanks, the hardest was replace NaNs in columns. – jezrael Jan 12 '18 at 10:09
@jezrael Thanks a lot. I was not able to solve this even in `R` in which I majorly code. Long way to go. – tushaR Jan 12 '18 at 10:14

score 1 · Answer 2 · answered Jan 12 '18 at 16:36

Here's another approach that should work, but uses repeated melts.

coln = df.dtypes.index  # save some typing
df_long = pd.melt(
    df, id_vars = "custid", value_vars = ["prod1", "prod2"],
    var_name = "prod", value_name = "rec").assign(
    hnode1 = pd.melt(df, id_vars = "custid", 
                     value_vars = filter(lambda x: "hnode1" in x, coln))["value"],
    hnode2 = pd.melt(df, id_vars = "custid", 
                     value_vars = filter(lambda x: "hnode2" in x, coln))["value"])
print(df_long)
   custid   prod     rec  hnode1  hnode2
0       1  prod1   jeans       1       6
1       2  prod1  tshirt       2       7
2       3  prod1  jacket       3       8
3       4  prod1  tshirt       2       7
4       1  prod2  tshirt       2       7
5       2  prod2   jeans       1       6
6       3  prod2  jacket       3       8
7       4  prod2   shirt       4       7

You mention R in your comments. melt from "data.table" should be able to handle this much more easily since you can melt multiple sets of columns at once, similar to how you might approach the problem with base R's reshape function.

The base R approach might be something along the lines of:

reshape(df, direction = "long", idvar = "custid", 
        varying = list(c(2, 5), c(3, 6), c(4, 7)), 
        sep = "", times = c("prod1", "prod2"))

Melt a pandas DataFrame

2 Answers2

Linked