2

DataFrame:

df = pd.DataFrame([['CategoryA', 400], ['CategoryA', 600],
['CategoryB', 1000], ['CategoryB', 2000]],
columns = ['Category', 'ItemTotal'])

I used the following to create a groupby for the total dollar amount by category:

df = df.groupby('Category') 
df = df['ItemTotal'].sum()
df

The output looks like this:

Category
CategoryA    1000
CategoryB    3000

I want to eventually obtain another column besides the sum for share of each category like so:

Category
CategoryA    1000   .25
CategoryB    3000   .75

I'd like to know how to do this by inserting the cumulative dollar amount into a new column and then creating a new column for share by dividing.

The link that @jezrael provided How do I create a new column from the output of pandas groupby().sum()? returns the sum of each category next to every row in the original data frame:

df['Total'] = df['ItemTotal'].groupby(df['Category']).transform('sum')
df

Output:

   Category     ItemTotal   Total
 0  CategoryA   400         1000
 1  CategoryA   600         1000
 2  CategoryB   1000        3000
 3  CategoryB   2000        3000

However, I'm looking for the sum of each category and the share of the total next to each unique category that has been grouped, not every row in the original dataframe.

IamWarmduscher
  • 875
  • 2
  • 10
  • 27

1 Answers1

1

First is necessary convert index to column:

df1 = df.groupby('Category', as_index=False)['ItemTotal'].sum()

Or:

df1 = df.groupby('Category')['ItemTotal'].sum().reset_index()

If need convert Series to one column DataFrame:

df1 = df.groupby('Category')['ItemTotal'].sum().to_frame()

And last divide sum of column:

df1['Perc'] = df1['ItemTotal'].div(df1['ItemTotal'].sum())
print (df1)
    Category  ItemTotal  Perc
0  CategoryA       1000  0.25
1  CategoryB       3000  0.75
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252