S.O. community - looking for a solution to a machine learning problem, if anybody can help it would be much appreciated:
I would like to be able to apply a machine learning algorithm that assigns a classification of each instance BASED ON OTHER INSTANCES WITHIN ITS 'GROUP'. The model learns the features that lead to classification of '1' with the strongest features in that group and the others as '0' (or even more ideally a softmax probability output that adds up to 1 within the group).
Instances may have features that are not necessary like other instances in other groups but they are stronger indicators WITHIN their group
I.e. with data that looks like the following, how do i get the model to generally learn which features lead to a '1' classification but only determine the label based on the features other the other instances within the group
training set
feat1 feat2 feat3 feat4 group label
0 1 2 yes cat1 1 0
1 3 4 no cat4 1 0
2 2 6 yes cat3 1 0
3 4 8 yes cat2 1 1
4 14 10 no cat4 2 0
5 10 12 yes cat1 2 0
6 12 12 no cat2 2 0
7 18 11 yes cat4 2 1
8 16 15 no cat5 2 0
test set
feat1 feat2 feat3 feat4 group label (softmax output)
0 1 2 yes cat2 3 0 0.15
1 6 4 no cat4 3 0 0.07
2 4 2 yes cat2 3 0 0.34
3 2 3 yes cat2 3 1 0.44
I.e the model will assign a '1' to only one instance within each 'group' and '0' to the rest (or probability to all instances)
The closest i have got to this is multiple instance learning but that leads to classification of the group, rather than the instances within the group.
I think a simple explanation of what i am trying to achieve is: use a bunch of features to determine which item within a group of items is the most likely to be flagged, having assessed each of the group's items individually. I guess it is akin to predicting a race where each participant has a bunch of attributes and the outcome (winner) can only be predicted having assessed each participant and their attributes.
Any help would be much appreciated.
ps. Loving this community, without it i wouldn't have made it far in the world of analytics!
pps. just to clarify - the training would be required to be done assessing each group and not trained in a traditional instance-by-instance way. ie. you can have a medium quality instance among a group of low quality instances and the medium quality instance should yield the '1' (or highest probability) output. This same medium quality instance could be in a group of high quality instances and should therefore be classified as '0' (or lowest probability) output. Traditional instance-by-instance classification models would simply assign the exact same classification to this medium quality instance but I am looking for it's 'status' WITHIN the group it is assigned to!