0

I was tring to apply a function on my dataframe adding information regarding the average performance of each id i have. the problem is that applying this function using apply takes a lot of time (i stopped the process after 20 minuts) and i need to do it a few times over. so i thought to try taking out the information to a new data frame and then merge the two together. The problem is my new dataframe has multyindex and i'm trying to merge on that index. can't seem to find the way to do it. any suggestions?

the new data frame look like this:

                              compliance_detail
    inspector_name  compliance  
    Addison, Michael    0.0 0.431203
                        1.0 0.034050
    Ahmad, Muna I       0.0 0.731405
                        1.0 0.052342
    Anderson, Trevis    0.0 0.166667

and i want to merge on the inspector's name and compliance since i have this two in my original data frame.

i want the merged data frame to have for every inspector his rating for complaince==0 and complaince==1. so it will look something like this:

    ticket_id   inspector_name  positive_complaince negetive_complaince
    0   22056   Addison, Michael 0.034050            0.431203
    1   22057   Ahmad, Muna I    0.052342            0.731405
Lior T
  • 137
  • 2
  • 9
  • Welcome to StackOverflow. Please take the time to read this post on [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) as well as how to provide a [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve) and revise your question accordingly. These tips on [how to ask a good question](http://stackoverflow.com/help/how-to-ask) may also be useful. – jezrael Mar 05 '18 at 14:17
  • What does a merged inspector_name complance look like? Can you post expected output from this dataset? – Scott Boston Mar 05 '18 at 14:22

2 Answers2

0

You can use unstack to reshape the data frame. Also,based on your given output, I assume you want to to drop inspector_name where one of the compliance (positive or negative) is not available.

## sample data
df = pd.DataFrame({'inspector_name':['Addison, Michael','Addison, Michael','Ahmad, Muna I','Ahmad, Muna I','Anderson, Trevis'],
                   'compliance': [0.0,1.0,0.0,1.0,0.0],
                   'compliance_detail':[0.4312, 0.0340, 0.7314,0.052,0.1666]})

df = df.set_index(['inspector_name','compliance'])

## solution

df = df.unstack().dropna().reset_index()
df.columns = ['inspector_name','negative_complaince','positive_complaince']
df

    inspector_name  negative_complaince     positive_complaince
0   Addison, Michael       0.4312                  0.034
1   Ahmad, Muna I          0.7314                  0.052
YOLO
  • 20,181
  • 5
  • 20
  • 40
  • thank you! i solved it myself just a minute ago but im gonna look for your answer as i never used unstack before – Lior T Mar 05 '18 at 15:01
0

ok i found the answer. not sure if its the best way but it works. first i take the new data frame and split it into two data frames, one with compliance==0 and the other with compliance==1. then i take the index of each data frame convert it into a list and save it in a parameter. now i take my basic data frame and merge it with the two new dataframes i created. the "right_on" parameter in the merge function is now equal to the name of the list saved on the side for the two dataframes. and it works :) hope this could help someone.

Lior T
  • 137
  • 2
  • 9