Grouping without losing original information using `df.groupby()`

Question

Not sure that is feasible, but I'm trying to achieve the following with df.groupby(): let's say we have the following dataframe

name   target
A         1
A         2
A        0.5
B         3
B         1
B         2
C        0.6
C        1.2

and I want to group it based on name without losing the original information on target. My expected output would be something like:

name   target  count
A         1      3
          2
         0.5
B         3      3
          1
          2
C        0.6     2
         1.2

You want `df['count'] = df.groupby('name').transform('count')`? — Ynjxsjmh, Apr 16 '23 at 12:46
sort of, but possibly without having to repeat then the same `name` for different targets, to enhance readability. — James Arten, Apr 16 '23 at 12:56

TanjiroLL · Answer 1 · 2023-04-16T13:18:33.793

1

You can use multi index

df = pd.DataFrame(data)
df['count'] = df.groupby('name').transform('count')
df = df.set_index(['name', 'count', 'target'])
df.head()

Update: As previous code will result in an empty dataframe, you can do the following:

df['count'] = df.groupby('name').transform('count')
df = df.set_index(['name', 'count'])
df.set_index(df.groupby(level=[0,1]).cumcount(), append=True).head()

Code is taken from Why does my multi-index dataframe have duplicate values for indices?

edited Apr 16 '23 at 13:18

answered Apr 16 '23 at 12:52

TanjiroLL

1,354
1
5
5

mmmh I've tried but then everything gets into the index that way and the dataframe becomes empty.. – James Arten Apr 16 '23 at 13:01
Yes, you are right, but it's only to enhance readability of data, not processing it. – TanjiroLL Apr 16 '23 at 13:06
Is there a way to prevent `name` entries being repeated for all different targets? Somehow as displayed in my expected output. – James Arten Apr 16 '23 at 13:17
still not working in my case... – James Arten Apr 16 '23 at 18:17

inquirer · Answer 2 · 2023-04-21T13:04:20.970

You can try to reset the values of 'name' to empty when grouping and write only on the first row, and for 'count' immediately create a column with empty rows and write the value in the first row.

Just keep in mind these are not multi-indexes. If where there are no values, these will be empty strings.

import pandas as pd

df['count'] = ''


def f(x):
    df.loc[x.index[0], 'count'] = x['name'].count()
    aaa = df.loc[x.index[0], 'name']
    df.loc[x.index, 'name'] = ''
    df.loc[x.index[0], 'name'] = aaa



df.groupby('name').apply(f)

print(df)

Output

  name  target count
0    A     1.0     3
1          2.0      
2          0.5      
3    B     3.0     3
4          1.0      
5          2.0      
6    C     0.6     2
7          1.2

Grouping without losing original information using `df.groupby()`

2 Answers2