I am trying to assign
unique
values in pandas
df
to specific individuals.
For the df
below, [Area]
and [Place]
will together make up unique
values that are various jobs. These values will be assigned to individuals with the overall aim to use the least amount of individuals possible.
The trick is these values are constantly starting and finishing and go for different lengths of time. The most unique
values assigned to an individual any one time is 3. [On]
displays how many current unique values for [Place] and [Area] are occurring.
So this provides a concrete guide on how many individuals I need. e.g. 3 unique
values one = 1 person, 6 unique values on = 2 persons
I can't do a groupby
statement where I assign
the first 3 unique values
to individual 1
and the next 3 unique
values to individual 2
etc.
What I envisage is, when unique
values are greater than 3 I want to group values in [Area]
first and then combine the leftovers. So look to assign
same values in [Area]
to an individual (up to 3). Then, if there are _leftover_
values (<3), they should be combined to make a group of 3, where possible.
The way I envisage this working is: see into the future by an hour
. For each new row
of values the script
should see how many values will be [On]
(this provides an indication of how many total individuals are required). Where unique
values are >3, they should be assigned
by grouping
the same value in [Area]
. If there are leftover values they should be combined anyhow to make up to a group of 3.
For the df
below, the number of unique
values occurring for [Place]
and [Area]
varies between 1-6. So we should never have more than 2 individuals assigned
. When unique
values are >3 it should be assigned by [Area]
first. The leftover values should be combined with other individuals that have less than 3 unique
values.
Apologies for the large df. It's the only way i can replicate the problem!
import pandas as pd
import numpy as np
from collections import Counter
d = ({
'Time' : ['8:03:00','8:17:00','8:20:00','8:33:00','8:47:00','8:48:00','9:03:00','9:15:00','9:18:00','9:33:00','9:45:00','9:48:00','10:03:00','10:15:00','10:15:00','10:15:00','10:18:00','10:32:00','10:33:00','10:39:00','10:43:00','10:48:00','10:50:00','11:03:00','11:03:00','11:07:00','11:25:00','11:27:00','11:42:00','11:48:00','11:51:00','11:57:00','12:00:00','12:08:00','12:15:00','12:17:00','12:25:00','12:30:00','12:35:00','12:39:00','12:47:00','12:52:00','12:55:00','13:00:00','13:03:00','13:07:00','13:12:00','13:15:00','13:22:00','13:27:00','13:27:00'],
'Area' : ['A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','B','A','B','A','A','A','A','B','A','A','B','B','A','B','C','A','B','C','C','A','B','C','C','B','A','C','B','C','C','A','C','B','C','C','A','C'],
'Place' : ['House 1','House 2','House 3','House 1','House 3','House 2','House 1','House 3','House 2','House 1','House 3','House 2','House 1','House 3','House 4','House 1','House 2','House 1','House 1','House 4','House 3','House 2','House 1','House 1','House 4','House 1','House 1','House 4','House 1','House 1','House 4','House 1','House 2','House 1','House 4','House 1','House 1','House 2','House 1','House 4','House 1','House 1','House 3','House 2','House 4','House 1','House 2','House 4','House 1','House 4','House 2'],
'On' : ['1','2','3','3','3','3','3','3','3','3','3','3','3','3','4','5','5','5','5','5','5','4','3','3','3','2','2','2','2','3','3','3','4','4','4','4','4','4','4','4','4','4','4','4','4','4','5','6','6','6','6'],
'Person' : ['Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 2','Person 3','Person 1','Person 3','Person 1','Person 2','Person 1','Person 1','Person 3','Person 1','Person 2','Person 3','Person 3','Person 2','Person 3','Person 4','Person 2','Person 3','Person 4','Person 4','Person 2','Person 3','Person 4','Person 4','Person 3','Person 2','Person 4','Person 3','Person 4','Person 4','Person 2','Person 4','Person 3','Person 5','Person 4','Person 2','Person 4'],
})
df = pd.DataFrame(data=d)
def getAssignedPeople(df, areasPerPerson):
areas = df['Area'].values
places = df['Place'].values
times = pd.to_datetime(df['Time']).values
maxPerson = np.ceil(areas.size / float(areasPerPerson)) - 1
assignmentCount = Counter()
assignedPeople = []
assignedPlaces = {}
heldPeople = {}
heldAreas = {}
holdAvailable = True
person = 0
# search for repeated areas. Mark them if the next repeat occurs within an hour
ixrep = np.argmax(np.triu(areas.reshape(-1, 1)==areas, k=1), axis=1)
holds = np.zeros(areas.size, dtype=bool)
holds[ixrep.nonzero()] = (times[ixrep[ixrep.nonzero()]] - times[ixrep.nonzero()]) < np.timedelta64(1, 'h')
for area,place,hold in zip(areas, places, holds):
if (area, place) in assignedPlaces:
# this unique (area, place) has already been assigned to someone
assignedPeople.append(assignedPlaces[(area, place)])
continue
if assignmentCount[person] >= areasPerPerson:
# the current person is already assigned to enough areas, move on to the next
a = heldPeople.pop(person, None)
heldAreas.pop(a, None)
person += 1
if area in heldAreas:
# assign to the person held in this area
p = heldAreas.pop(area)
heldPeople.pop(p)
else:
# get the first non-held person. If we need to hold in this area,
# also make sure the person has at least 2 free assignment slots,
# though if it's the last person assign to them anyway
p = person
while p in heldPeople or (hold and holdAvailable and (areasPerPerson - assignmentCount[p] < 2)) and not p==maxPerson:
p += 1
assignmentCount.update([p])
assignedPlaces[(area, place)] = p
assignedPeople.append(p)
if hold:
if p==maxPerson:
# mark that there are no more people available to perform holds
holdAvailable = False
# this area recurrs in an hour, mark that the person should be held here
heldPeople[p] = area
heldAreas[area] = p
return assignedPeople
def allocatePeople(df, areasPerPerson=3):
assignedPeople = getAssignedPeople(df, areasPerPerson=areasPerPerson)
df = df.copy()
df.loc[:,'Person'] = df['Person'].unique()[assignedPeople]
return df
print(allocatePeople(df))
Output:
Time Area Place On Person
0 8:03:00 A House 1 1 Person 1
1 8:17:00 A House 2 2 Person 1
2 8:20:00 A House 3 3 Person 1
3 8:33:00 A House 1 3 Person 1
4 8:47:00 A House 3 3 Person 1
5 8:48:00 A House 2 3 Person 1
6 9:03:00 A House 1 3 Person 1
7 9:15:00 A House 3 3 Person 1
8 9:18:00 A House 2 3 Person 1
9 9:33:00 A House 1 3 Person 1
10 9:45:00 A House 3 3 Person 1
11 9:48:00 A House 2 3 Person 1
12 10:03:00 A House 1 3 Person 1
13 10:15:00 A House 3 3 Person 1
14 10:15:00 A House 4 4 Person 2
15 10:15:00 B House 1 5 Person 2
16 10:18:00 A House 2 5 Person 1
17 10:32:00 B House 1 5 Person 2
18 10:33:00 A House 1 5 Person 1
19 10:39:00 A House 4 5 Person 2
20 10:43:00 A House 3 5 Person 1
21 10:48:00 A House 2 4 Person 1
22 10:50:00 B House 1 3 Person 2
23 11:03:00 A House 1 3 Person 1
24 11:03:00 A House 4 3 Person 2
25 11:07:00 B House 1 2 Person 2
26 11:25:00 B House 1 2 Person 2
27 11:27:00 A House 4 2 Person 2
28 11:42:00 B House 1 2 Person 2
29 11:48:00 C House 1 3 Person 2
30 11:51:00 A House 4 3 Person 2
31 11:57:00 B House 1 3 Person 2
32 12:00:00 C House 2 4 Person 3
33 12:08:00 C House 1 4 Person 2
34 12:15:00 A House 4 4 Person 2
35 12:17:00 B House 1 4 Person 2
36 12:25:00 C House 1 4 Person 2
37 12:30:00 C House 2 4 Person 3
38 12:35:00 B House 1 4 Person 2
39 12:39:00 A House 4 4 Person 2
40 12:47:00 C House 1 4 Person 2
41 12:52:00 B House 1 4 Person 2
42 12:55:00 C House 3 4 Person 3
43 13:00:00 C House 2 4 Person 3
44 13:03:00 A House 4 4 Person 2
45 13:07:00 C House 1 4 Person 2
46 13:12:00 B House 2 5 Person 3
47 13:15:00 C House 4 6 Person 4
48 13:22:00 C House 1 6 Person 2
49 13:27:00 A House 4 6 Person 2
50 13:27:00 C House 2 6 Person 3
Intended Output and Comments on why I think it should be assigned: