I have created a cross tabulation in pandas using:
grouped_missing_analysis = pd.crosstab(clean_sessions.action_type, clean_sessions.action, margins=True).unstack()
print(grouped_missing_analysis[:20])
Which leads to displaying:
action action_type
10 Missing 0
Unknown 0
booking_request 0
booking_response 0
click 0
data 0
message_post 3215
modify 0
partner_callback 0
submit 0
view 0
All 3215
11 Missing 0
Unknown 0
booking_request 0
booking_response 0
click 0
data 0
message_post 716
modify 0
dtype: int64
I want to only show the action_type
which is either 'Unknown', 'Missing' or 'Other', and ignore other action_type
for each action. I have a feeling the answer is to do with:
.where(clean_sessions.action_type.isin(('Missing', 'Unknown')), 'Other')
From a previous snippet I have, but I can't get it to work. Maybe pivot_table
is easier, this exercise is just for me to learn about how to do data analysis in python with the different functions.
Raw data for clean_sessions
looks like:
user_id action action_type action_detail \
0 d1mm9tcy42 lookup Missing Missing
1 d1mm9tcy42 search_results click view_search_results
2 d1mm9tcy42 lookup Missing Missing
3 d1mm9tcy42 search_results click view_search_results
4 d1mm9tcy42 lookup Missing Missing
5 d1mm9tcy42 search_results click view_search_results
6 d1mm9tcy42 lookup Missing Missing
7 d1mm9tcy42 personalize data wishlist_content_update
8 d1mm9tcy42 index view view_search_results
9 d1mm9tcy42 lookup Missing Missing
device_type secs_elapsed
0 Windows Desktop 319
1 Windows Desktop 67753
2 Windows Desktop 301
3 Windows Desktop 22141
4 Windows Desktop 435
5 Windows Desktop 7703
6 Windows Desktop 115
7 Windows Desktop 831
8 Windows Desktop 20842
9 Windows Desktop 683