1

I have a dataset with 3 columns: Category, Country, and Count (which is always 1 - and is pretty useless, actually).

What I want to achieve is something like the yellow column here:

img 1: how I want and what I want

I could do a simple group by in python, but that's not what I want, because I want to preserve the individual rows of the data, different from the image below (that groups them):

what I did and I don't want (group by)

I just wanted to get the frequency based on both columns, without grouping it, any idea? I thought about iterating with for loops, but I couldn't, I'm kind of a beginner in python, so your help is deeply appreciated.

TontonVelu
  • 471
  • 2
  • 8
  • 21

1 Answers1

0

It seems like you want to use transform here. That will create a new column in your dataframe with the grouped summary statistics you are looking for.

import pandas as pd
df = pd.DataFrame({'category_cluster' : ['Assault', 'Assault', 'Assault', 'Assault', 'Assault', 'Assault', 'Assault'],
                   'Country': ['Egypt', 'India', 'India', 'Mexico', 'Mexico', 'Mexico', 'Morocco'],
                   'Count' : [1, 1, 1, 1, 1, 1, 1]})

df['new_column'] = df.groupby(['category_cluster', 'Country'])['Count'].transform('sum')
Joe
  • 206
  • 2
  • 9