3

I have created a cross tabulation in pandas using:

grouped_missing_analysis = pd.crosstab(clean_sessions.action_type, clean_sessions.action, margins=True).unstack()    
print(grouped_missing_analysis[:20])

Which leads to displaying:

action  action_type     
10      Missing                0
        Unknown                0
        booking_request        0
        booking_response       0
        click                  0
        data                   0
        message_post        3215
        modify                 0
        partner_callback       0
        submit                 0
        view                   0
        All                 3215
11      Missing                0
        Unknown                0
        booking_request        0
        booking_response       0
        click                  0
        data                   0
        message_post         716
        modify                 0
dtype: int64

I want to only show the action_type which is either 'Unknown', 'Missing' or 'Other', and ignore other action_type for each action. I have a feeling the answer is to do with:

.where(clean_sessions.action_type.isin(('Missing', 'Unknown')), 'Other')

From a previous snippet I have, but I can't get it to work. Maybe pivot_table is easier, this exercise is just for me to learn about how to do data analysis in python with the different functions.

Raw data for clean_sessions looks like:

   user_id          action action_type            action_detail  \
0  d1mm9tcy42          lookup     Missing                  Missing   
1  d1mm9tcy42  search_results       click      view_search_results   
2  d1mm9tcy42          lookup     Missing                  Missing   
3  d1mm9tcy42  search_results       click      view_search_results   
4  d1mm9tcy42          lookup     Missing                  Missing   
5  d1mm9tcy42  search_results       click      view_search_results   
6  d1mm9tcy42          lookup     Missing                  Missing   
7  d1mm9tcy42     personalize        data  wishlist_content_update   
8  d1mm9tcy42           index        view      view_search_results   
9  d1mm9tcy42          lookup     Missing                  Missing   

       device_type secs_elapsed  
0  Windows Desktop          319  
1  Windows Desktop        67753  
2  Windows Desktop          301  
3  Windows Desktop        22141  
4  Windows Desktop          435  
5  Windows Desktop         7703  
6  Windows Desktop          115  
7  Windows Desktop          831  
8  Windows Desktop        20842  
9  Windows Desktop          683 
Dhruv Ghulati
  • 2,976
  • 3
  • 35
  • 51
  • 1
    does `grouped_missing_analysis.loc(axis=0)[slice(None), ['Missing', 'Unknown', 'Other']]` work? Can you post raw data and code to reproduce your df – EdChum Feb 16 '16 at 10:45
  • Yes it does! Quick follow up, how would I now aggregate the total `Missing`, `Unknown` and `Other` for each action as a total value count for each action, and have as a percentage of `All` action_types which are missing, unknown or other? – Dhruv Ghulati Feb 16 '16 at 11:10
  • Sorry 1 problem per question, please post a new question, answering new requirements via comments is counter-productive on SO. – EdChum Feb 16 '16 at 11:12

1 Answers1

4

Those are your indices and not columns, you need to pass labels to select the rows of interest.

You can pass slice(None) for the first level and then a list for the second level:

In [102]:
grouped_missing_analysis.loc[slice(None), ['Missing', 'Unknown', 'Other']]

Out[102]:
action          action_type
index           Missing        0
lookup          Missing        5
personalize     Missing        0
search_results  Missing        0
All             Missing        5
dtype: int64

The docs give more detail on this style of indexing

EdChum
  • 376,765
  • 198
  • 813
  • 562