1

I have data with 4 columns . I want to perform group by with melt.

data:
col1   col2   col3           col4
de1    do1    2020-11-24      vt1
de1    do1    2020-11-24      vt2
de1    do2    2020-11-24      vt1
de1    do2    2020-11-24      vt2

I want to get output like below:

col1 col2 col3        vt1   vt2 
de1   do1  2020-11-24  1     1
de1   do2  2020-11-24  1     1

I have tried like this

df1 = data.melt('col1','col2','col3').groupby(['col1','col2','col3','col4']).size().unstack(fill_value=0)

I am geeting key error: 'col1'

I have tried with melt(['col1','col2','col3')]. I am getting error key error: 'col4'.

Please, help me to solve this.

Navya
  • 307
  • 3
  • 15

3 Answers3

1

Will it not be easier to do a pivot_table?

import pandas as pd
data={'col1':['de1','de1','de1','de1'],
        'col2':['do1','do1','do2','do2'],
        'col3':['2020-11-24','2020-11-24','2020-11-24','2020-11-24'],
        'col4':['vt1','vt2','vt1','vt2']}
df=pd.DataFrame(data)
pivot=pd.pivot_table(df,index=['col1','col2','col3'], columns=['col4'],
                        aggfunc=len, fill_value=0).reset_index()

print(pivot)

result:

      col1 col2        col3  vt1  vt2
0     de1  do1  2020-11-24    1    1
1     de1  do2  2020-11-24    1    1

Explanation of what is happening

First your table will have as index Col1 to 3, then Col4 will be used to transform its values into new columns, and aggfunc will provide the count for each one of the values (extracted from col4). Missing values will receive 0 and the reset_index at the end is to give the table a better shape (not needed).

ombk
  • 2,036
  • 1
  • 4
  • 16
Renaud
  • 2,709
  • 2
  • 9
  • 24
0

Does it solves your problem?

aux = df.melt(id_vars=['col1','col2', 'col3'], value_vars=['val'])
aux['count'] = 1
aux.pivot(index = ['col1','col2','col3'], columns='value',values='count' )
kaihami
  • 815
  • 7
  • 18
0

This solutions returns also the correct names for the columns:

df = pd.DataFrame({
    "col1": ["de1", "de1", "de1", "de1"],
    "col2": ["do1", "do1", "do2", "do2"],
    "col3": ["2020-11-24", "2020-11-24", "2020-11-24", "2020-11-24"],
    "col4": ["vt1", "vt2", "vt1", "vt2"]
})
df['count'] = 1
group_df = df.groupby(['col1','col2','col3','col4']).count().unstack(fill_value=0).reset_index(level=[0,1,2])
group_df.columns = list(group_df.columns.get_level_values(0))[:3] + list(group_df.columns.get_level_values(1))[3:]

Output:

  col1 col2        col3  vt1  vt2
0  de1  do1  2020-11-24    1    1
1  de1  do2  2020-11-24    1    1
Molessia
  • 464
  • 1
  • 4
  • 17