0

I have a DataFrame with a structure like this:

df = pd.DataFrame({
        'id': ['123', '123', '123', '456', '456', '789'],
        'type': ['A', 'A', 'B', 'B', 'C', 'A']
     })
id type
123 A
123 A
123 B
456 B
456 C
789 A

How can I get a count of each type grouped by id, and create a new column for each unique type?

The resulting DataFrame I'm looking for would look like this:

df = pd.DataFrame({
        'id': ['123', '456', '789'],
        'A': [2, 0, 1],
        'B': [1, 1, 0],
        'C': [0, 1, 0]
    })
id A B C
123 2 1 0
456 0 1 1
789 1 0 0

Thank you for any help and guidance.

Chris Topher
  • 47
  • 1
  • 7

3 Answers3

3

You can do:

out = df.groupby(['id','type']).size().unstack().fillna(0).astype(int).rename_axis([None])

or as @Quang Hoang suggested, simply as

out = pd.crosstab(df['id'], df['type']).rename_axis([None])

Output:

type  A  B  C
123   2  1  0
456   0  1  1
789   1  0  0
0

You can use function for each column using pd.apply. For example if you want to count values of column "c" you can do this:

New_column = df["C"].apply(count)

New_column is a series of column "c" counts.

ASH
  • 31
  • 5
0

Another way by aggregating:

df.groupby(['id', 'type']).agg(count=('type', len)).reset_index().pivot(index='id', columns='type', values='count').fillna(0)

type    A   B   C
id          
123 2.0 1.0 0.0
456 0.0 1.0 1.0
789 1.0 0.0 0.0
cazman
  • 1,452
  • 1
  • 4
  • 11