0

S.O. community - looking for a solution to a machine learning problem, if anybody can help it would be much appreciated:

I would like to be able to apply a machine learning algorithm that assigns a classification of each instance BASED ON OTHER INSTANCES WITHIN ITS 'GROUP'. The model learns the features that lead to classification of '1' with the strongest features in that group and the others as '0' (or even more ideally a softmax probability output that adds up to 1 within the group).

Instances may have features that are not necessary like other instances in other groups but they are stronger indicators WITHIN their group

I.e. with data that looks like the following, how do i get the model to generally learn which features lead to a '1' classification but only determine the label based on the features other the other instances within the group

training set

   feat1  feat2 feat3 feat4 group label
0     1     2    yes  cat1    1     0
1     3     4    no   cat4    1     0
2     2     6    yes  cat3    1     0
3     4     8    yes  cat2    1     1
4    14    10    no   cat4    2     0
5    10    12    yes  cat1    2     0
6    12    12    no   cat2    2     0
7    18    11    yes  cat4    2     1
8    16    15    no   cat5    2     0

test set

   feat1  feat2 feat3 feat4 group label (softmax output)
0     1     2    yes  cat2    3     0     0.15
1     6     4    no   cat4    3     0     0.07
2     4     2    yes  cat2    3     0     0.34
3     2     3    yes  cat2    3     1     0.44

I.e the model will assign a '1' to only one instance within each 'group' and '0' to the rest (or probability to all instances)

The closest i have got to this is multiple instance learning but that leads to classification of the group, rather than the instances within the group.

I think a simple explanation of what i am trying to achieve is: use a bunch of features to determine which item within a group of items is the most likely to be flagged, having assessed each of the group's items individually. I guess it is akin to predicting a race where each participant has a bunch of attributes and the outcome (winner) can only be predicted having assessed each participant and their attributes.

Any help would be much appreciated.

ps. Loving this community, without it i wouldn't have made it far in the world of analytics!

pps. just to clarify - the training would be required to be done assessing each group and not trained in a traditional instance-by-instance way. ie. you can have a medium quality instance among a group of low quality instances and the medium quality instance should yield the '1' (or highest probability) output. This same medium quality instance could be in a group of high quality instances and should therefore be classified as '0' (or lowest probability) output. Traditional instance-by-instance classification models would simply assign the exact same classification to this medium quality instance but I am looking for it's 'status' WITHIN the group it is assigned to!

Bazza
  • 29
  • 2
  • You want to train the model with all your data but classify as '1' only one data for each group, is it? – Filipe Lauar Jan 30 '20 at 02:22
  • That is correct Filipe, but in the training process the model will need to consider all the group situation as well, otherwise an instance that is strong within a group (leading to '1') may not necessarily be strong in another group (where it would be '0') and will yield an inaccurate model. If the softmax variation is possible, ideally the outputs would all sum to 1 for each group. – Bazza Jan 30 '20 at 03:04

1 Answers1

0

I think a good solution to this is to train your model without the group feature.

As I see in your data, the values of the features are conditioned by the group feature, so they are in different scales. To train a model with all your data you will need to split your data by group and then normalize it, to put all the features in the same scale.

After normalize your data, you train your model without the group feature. When you are in the predicting phase, you do the same normalization process for the test data (divide by group and normalize), and then you get the max probability for each group.

Filipe Lauar
  • 434
  • 3
  • 8
  • Hi Filipe, thanks for your response. What you have alluded to is the model that I think i will need to go ahead with but this will become difficult with the inability to normalise categorical features. Any thoughts on that front? – Bazza Jan 30 '20 at 04:14
  • The categorical features 3 and 4 don’t need to be normalized, only the numerical ones. The features 1 and 2 are caregorical or numerical? – Filipe Lauar Jan 30 '20 at 10:50
  • Thanks for your thoughts Filipe, much appreciated! – Bazza Jan 31 '20 at 01:01
  • Hi @Bazza, did you finally manage to get good results? What technique/model did you use? Would you have a toy example to share? – Jones Apr 29 '21 at 11:51
  • Also, Bazza, @Filipe, how do you feed the inputs? I thought you would need to put all the instances of a group on the same row. E.g.: cat1feat1, cat1feat2, cat1feat3, cat2feat1...cat2feat3... (without feat4) – Jones Apr 29 '21 at 13:58
  • Hello @Jones, in this case you need to use a One Hot Encoder, which will transform each category in a new feature. In the case of the example above the feat4 would become 5 features. – Filipe Lauar Apr 30 '21 at 08:40