2

I am working on a project where we are trying to classify gene expression using a neural network. We are using Keras. We have the sequences of 35000 genes. For each of these genes, we know how much they are expressed in 28 different tissues. Therefore, each gene has 28 different labels. Each label can be one of the following: 'none', 'low', 'med', or 'high'. What I would like to see is that for every gene we feed to the network, it returns 28 labels. Each of the label corresponds to the predicted rate of expression in the corresponding tissue for that particular gene.

I have tried googling for the answer and came across multi-label-classifiers. These classifiers return a binary output vector. For example: an output of [0, 1, 0, 1, 1] means that the input is classified as having label 2, 4, and 5. This would not be suitable solution for my problem as we need to assign one of the 4 labels to each tissue. Would anyone have a clue how I could solve this?

Luuk256
  • 31
  • 2
  • First idea that comes to mind, unfortunately a bit cumbersome: Have 28 separate softmax classifiers (i.e. 28 output layers in parallel), each with four classes. That way, each classifier is responsible for one of the tissues and outputs the predicted label for that tissue. – xdurch0 Sep 06 '19 at 12:47
  • Thank you for you idea. This is however exactly what we do not want to do. We want to keep the whole process in one model. – Luuk256 Sep 06 '19 at 12:48
  • You can keep the whole thing in one model, using the functional API. Define however many hidden layers you want and then the separate classifiers on top of the last hidden layer. That way, most of the model is shared between the tissues and you will have one model with 28 outputs. You could even define a custom layer that groups the output into 28 groups of 4 entries and applies softmax to each group, thus keeping the whole thing "in one layer". – xdurch0 Sep 06 '19 at 12:55
  • Would a regression model also not fit on a problem such as this. By outputting 28 predicted values in the output vector and assigning them a label yourself by discretizing or specifying thresholds. – Nabeel Mehmood Sep 06 '19 at 12:58
  • @NabeelMehmood A regression model is our final goal and we have tried this before. This proved to be too difficult with the amount of data we have. We therefore try to make the problem less complex by creating expression ranges instead of continuous values. Good point though. – Luuk256 Sep 06 '19 at 13:07
  • @xdurch0 Thank you for your recommendation. This sounds like the way to go. I will start looking at the functional API and report whether I was succesful. – Luuk256 Sep 06 '19 at 13:08

0 Answers0