-1

I have a dataframe as shown below

Inspector_ID   Sector     Waste        Fire       Traffic
1              A          7            2          1
1              B          0            0          0
1              C          18           2          0
2              A          1            6          3
2              B          1            4          0    
2              C          4            14         2
3              A          0            0          0
3              B          2            6          12
3              C          0            1          4

From the above dataframe I would like to calculate the Inspector's expertise score in raising issues in a domain (waste, Fire and Traffic).

For example the the score of inspector-1 for waste is (((7/8)*2) + ((18/22)*3)/2)/2

I1W = Inspector-1 similarity in waste.
Ai = No. of waste issues raised by inspector-1 in sector i
Ti = Total no. of waste issues in sector i
Ni = No of inspectors raised issues in sector i(if all zero then only it is considered as not raised)

TS1 = Total no of sectors the inspector-1 visited.

I1W = Sum((Ai/Ti)*Ni)/TS1

The expected output is below dataframe

Inspector_ID    Waste     Fire    Traffic
1               I1W       I1F     I1T
2               I2W       I2F     I2T
3               I3W       I3F     I3T

TBF = To be filled

Danish
  • 2,719
  • 17
  • 32
  • The output is quite different from the starting dataframe. Maybe the best apprach is to just create a different one. You could try filtering rows by inspector_id with `loc` and then using the filtered data to apply a function that gives you the desired output – Manuel Dec 22 '19 at 14:35

1 Answers1

1

You could look into something along the lines of:

newData = []
inspector_ids = df['Inspector_ID'].unique().tolist()

for id in inspector id:
    current_data = df.loc[df['Inspector_id'] == id]
    #With the data of the current inspector you get the desired values
    waste_val = 'I1W'
    fire_val = 'I1F'
    traffic_val = 'I1T'
    newData.append([id,waste_val, fire_val, traffic_val])

new_df = pd.DataFrame(newData, columns = ['Inspector_ID','Waste','Fire','Traffic'])

Some ideas for getting the values you need

#IS1 = Sectors visited by inspector 1.
#After the first loc that filters the inspector
sectors_visited = len(df['Sector'].unique().tolist())
#Ai = No. of waste issues raised by inspector-1 in sector i
waste_Issues_A = current_data.loc[current_data['Sector' == A].value_counts()
#Ti = Total no. of waste issues in sector i
#You can get total number of issues by sector with
df['Sector'].value_counts()
#Ni = No of inspectors raised issues in sector i(if all zero then only it is considered as not raised)
#I dont know if i understand this one correctly, I guess its the number
#of inspectors that raised issues on a sector
inspectors_sector_A = len(df.loc[df['Sector'] == A]['Inspector_ID'].unique().tolist())

The previous was done by memory so take the code with a grain of salt (Specially the Ni one).

Manuel
  • 730
  • 7
  • 25
  • I am struggling to write the function to get below. waste_val = 'I1W' fire_val = 'I1F' traffic_val = 'I1T' – Danish Dec 22 '19 at 15:03