1

I have a dataframe which looks like this:

enter image description here

My end goal is to calculate the percent of each purpose and for each origin

enter image description here

so the final table should look this:

enter image description here

Any assistance or direction will be greatly appreciated :)

Thanks in advance :)

Shei

ML85
  • 709
  • 7
  • 19
Py_junior
  • 93
  • 1
  • 10

1 Answers1

1

Use transform to get exactly what you need.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.transform.html .

Even though aggregation returns a reduced version of the input data, the best thing about transform is that the output returned is the same shape as the input:

df['%_values'] = 100 * (df['values'] / (df.groupby('Origin')['values'].transform('sum')))

The above 1-liner divides each value with the grouped sum of values (grouped by origin), and then returns the results in a new column.

which will print you:

  Origin Destination     Purpose  values   %_values
0      a           a    business    1490  37.664307
1      a           b    business    1301  32.886754
2      a           c    pleasure    1165  29.448938
3      b           a    pleasure    1777  57.064868
4      b           b    business     580  18.625562
5      b           c    pleasure     757  24.309570
6      c           a    business    1852  41.599281
7      c           b    pleasure    1949  43.778077
8      c           c  undeclared     651  14.622642

and of course to get your output with 0 (or any) decimal places, you can use round:

df['%_values'] = round(100 * (df['values'] / (df.groupby('Origin')['values'].transform('sum'))))

  Origin Destination     Purpose  values  %_values
0      a           a    business    1490      38.0
1      a           b    business    1301      33.0
2      a           c    pleasure    1165      29.0
3      b           a    pleasure    1777      57.0
4      b           b    business     580      19.0
5      b           c    pleasure     757      24.0
6      c           a    business    1852      42.0
7      c           b    pleasure    1949      44.0
8      c           c  undeclared     651      15.0

sophocles
  • 13,593
  • 3
  • 14
  • 33