This code:
#Missing analysis for actions - which action is missing the most action_types?
grouped_missing_analysis = pd.crosstab(clean_sessions.action_type, clean_sessions.action, margins=True).unstack()
grouped_unknown = grouped_missing_analysis.loc(axis=0)[slice(None), ['Missing', 'Unknown', 'Other']]
print(grouped_unknown)
Leads to the printing of this:
action action_type
10 Missing 0
Unknown 0
11 Missing 0
Unknown 0
12 Missing 0
Unknown 0
15 Missing 0
Unknown 0
about_us Missing 0
Unknown 416
accept_decline Missing 0
Unknown 0
account Missing 0
Unknown 9040
acculynk_bin_check_failed Missing 0
Unknown 1
acculynk_bin_check_success Missing 0
Unknown 51
acculynk_load_pin_pad Missing 0
Unknown 50
How would I now aggregate the total Missing
, Unknown
and Other
for each action as a total value count for each action, and have as a percentage of All
action_types which are Missing
, Unknown
or Other
? So for example, there would be one row for each action, and about_us
row would have 406+0/Total Missing + Unknown + Other
for all actions.
See this question for context.
The problem is that the above contains a row right at the bottom of it called All
which is the sum of everything, so:
All Missing 1126204
Unknown 1031170
Desired output would be:
action percent_total_missing_action_type
10 0
11 0
12 0
15 0
about_us 416/total_missing_action_type (in the All row - 2157374, or the sum of everything in the action_type column)
accept_decline 0
account 9040/total_missing_action_type (in the All row - 2157374, or the sum of everything in the action_type column)
acculynk_bin_check_failed 1/total_missing_action_type (in the All row - 2157374, or the sum of everything in the action_type column)
etc..
Here is some test data:
action action_type
a Missing 2
Unknown 5
b Missing 3
Unknown 4
c Missing 5
Unknown 6
d Missing 1
Unknown 9
All Missing 11
Unknown 24
Which should go into this:
action action_type_percentage
a Missing 2/11
Unknown 5/24
b Missing 3/11
Unknown 4/24
c Missing 5/11
Unknown 6/24
d Missing 1/11
Unknown 9/24
All Missing 11/11
Unknown 24/24