17

I have created a simple neural network (Python, Theano) to estimate a persons age based on their spending history from a selection of different stores. Unfortunately, it is not particularly accurate.

The accuracy might be hurt by the fact that the network has no knowledge of ordinality. For the network there is no relationship between the age classifications. It is currently selecting the age with the highest probability from the softmax output layer.

I have considered changing the output classification to an average of the weighted probability for each age.

E.g Given age probabilities: (Age 10 : 20%, Age 20 : 20%, Age 30: 60%)

Rather than output: Age 30 (Highest probability)
Weighted Average: Age 24 (10*0.2+20*0.2+30*0.6 weighted average)

This solution feels sub optimal. Is there a better was to implement ordinal classification in neural networks, or is there a better machine learning method that can be implemented? (E.g logistic regression)

A. Dev
  • 350
  • 1
  • 2
  • 7
  • 1
    You might have a look at [this paper](https://web.missouri.edu/~zwyw6/files/rank.pdf). It describes a way to set up ordinal regression w/NNets. – o-90 Jul 14 '16 at 14:18
  • That link doesn't work unfortunately, I am not sure if it has something to do with university access privileges. Do you have the name of it instead? – A. Dev Jul 14 '16 at 14:25
  • 2
    @gobrewers14 - The link is broken. Can you post the title or another link to find it – figs_and_nuts Aug 02 '18 at 08:46
  • 1
    @MiloMinderbinder [updated link](http://orca.st.usm.edu/~zwang/files/rank.pdf) – o-90 Aug 03 '18 at 13:54
  • 7
    I'll add the title and authors in case the link breaks in the future: A Neural Network Approach to Ordinal Regression Jianlin Cheng, Zheng Wang, and Gianluca Pollastri – Salvador Medina Nov 27 '18 at 04:32
  • If you're trying to estimate a number, wouldn't this be a regression problem instead of a classification problem, and you should just have the network learn the number itself? – endolith Nov 24 '22 at 15:20
  • 1
    @endolith, there are many cases where the value is ordered yet discrete. So you can try both regression of classification. Yet classification feels better if you can fit a good loss function to represent the ordinal data. I asked here: https://stats.stackexchange.com/questions/596750/how-to-define-a-classification-loss-function-for-discrete-rating. – Mark Nov 25 '22 at 08:30
  • @Thomas OK, but age isn't discrete… – endolith Nov 25 '22 at 15:48
  • If you talk on year only, the sub groups, etc... it is. – Mark Nov 27 '22 at 11:01

1 Answers1

35

This problem came up in a previous Kaggle competition (this thread references the paper I mentioned in the comments).

The idea is that, say you had 5 age groups, where 0 < 1 < 2 < 3 < 4, instead of one-hot encoding them and using a softmax objective function, you can encode them into K-1 classes and use a sigmoid objective. So, as an example, your encodings would be

[0] -> [0, 0, 0, 0]
[1] -> [1, 0, 0, 0]
[2] -> [1, 1, 0, 0]  
[3] -> [1, 1, 1, 0]
[4] -> [1, 1, 1, 1]

Then the net will learn the orderings.

o-90
  • 17,045
  • 10
  • 39
  • 63
  • 1
    Would it be a bad idea to have a single output neuron and take each class to be 0.2 intervals? – jinawee Jul 17 '19 at 17:16
  • @jinawee Yes, it doesn't work as well. I've used similar kinds of networks for a lot of different problems and the method you describe is always worse than the ordered encoding described by gobrewers14. I don't know if I would say "bad idea" but definitely "less good idea". Anyone reading this, 100% go with this answer – Alex I Jun 08 '20 at 04:28