0

I have this data frame:

df = pd.DataFrame({ "cluster" : ["A", "A", "A", "A", "A", "C", "C", "C", "C"], 
                    "col1": ["0", "0", "1", "1", "0", "1", "0", "1", "0"], 
                    "col2": ["1", "1", "1", "1", "0", "1", "1", "1", "0"] })

and I can't figure out the best way to have a data frame result with this format

col cluster avg
col1 A 0.4
col1 C 0.5
col2 A 0.8
col2 C 0.75

I tried using transpose, then melt the cluster, but didn't work. Any help would be highly appreciated!

My desired result is the table - where all columns (except cluster) are rows, and cluster is melt and the average is calculated for the values of the column for each cluster

F_M
  • 301
  • 2
  • 8
  • 1
    Can you show your efforts and also your desired results? – Karn Kumar Apr 16 '22 at 07:25
  • My desired result is the table - where all columns (except cluster) are rows, and cluster is melt and the average is calculated for the values of the column for each cluster – F_M Apr 16 '22 at 07:27

2 Answers2

3

Looks like a combination of melt and groupby + mean would do the trick. Note that the "col" values are strings in your input, so we also have to convert them to integers first before calculating the mean:

out = (df.melt(['cluster'], var_name='col')
       .assign(value=lambda x: x['value'].astype(int))
       .groupby(['col', 'cluster'], as_index=False).mean())

Output:

    col cluster  value
0  col1       A   0.40
1  col1       C   0.50
2  col2       A   0.80
3  col2       C   0.75
2

You can group by cluster column and calculate mean of each group then melt

df[['col1', 'col2']] = df[['col1', 'col2']].astype(int)

out = (df.groupby('cluster')
         .agg('mean').reset_index()
         .melt(id_vars=['cluster'], var_name='col', value_name='avg')
)
print(out)

  cluster   col   avg
0       A  col1  0.40
1       C  col1  0.50
2       A  col2  0.80
3       C  col2  0.75
Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52