I have created a neural network that classifies data into a large number of classes. I used softmax and categorical cross entropy in Keras. The training was successful and I achieved 95% accuracy. The problem is the following, when I remove the last layer with categories and use the neural network as an encoder with cosine similarity the predictions are not correct. The classes don't match each other. Why and how can this problem be solved? In theory, the neural network was supposed to make the data space in such a way that similar classes are nearby.
2 Answers
You can't necessarily expect similar classes to be "nearby", or data points of the same classes to be "nearby" in a latent space, unless such a property was encouraged by your objective. In order for such an expectation to be valid, you'd at least have to have a useful definition of distance. Sure, you can get some measure of distance with euclidean distance, some sort of negative cosine similarity, etc. But there is no reason to expect any of these distance metrics to be highly correlated with the semantics (class) of the data, so while they are distance metrics, they aren't likely to represent anything useful in the latent space.
Put another way, the last layer was doing something important. I'm not sure what kind of layer it was (perhaps a fully connected linear layer), but whatever it was doing, it was responsible for mapping the latent space to logits, which are then mapped to categorical probabilities through a softmax function. The composition of the linear layer and the softmax function is a nonlinear transformation. So while the outputs of the softmax layer have useful distances (say, data points of class 0 are all likely going to be mapped near <1, 0, 0, ...>
), removing the linear layer and the softmax can very easily destroy this property. I think it would be relatively easy to come up with an example of two points very "close" to each other in the latent space but "far" from each other in the softmax output, and vice-versa.
So if you want your latent space to have such a property, you have to encourage that property in some intentional way. Metric learning is an example which, through a loss function, intentionally encourages points of the same class to be near each other and points of different classes to be far from each other in the latent space ("near" and "far" defined by a given distance metric). VAEs are another example which have a loss component which intentionally encourages the latent space to behave in a known way (minimizing a KL divergence so the latent space follows a known, e.g. standard Gaussian, distribution).
If you want to encourage such a property in the latent space of a multiclass classifier, you could do something similar. But you might ask yourself why you want such a property to hold; for instance, do you actually need the classifier to begin with, given that you're simply discarding the final linear and softmax layer anyways? Is a cross entropy loss even a reasonable choice for your task? It is very possible that a method like metric learning might actually solve your problem already.

- 2,063
- 1
- 14
- 20
The problem is the following when I remove the last layer with categories and use the neural network as an encoder with cosine similarity the predictions are not correct.
The reason is very simple. The second last Dense layer should be considered as a feature extraction step for the subsequent layer rather than a representation of the input data. This is because the weights of this layer are trained specifically for a downstream task (multi-class classification) and hold non-linear features. On the other hand, when training an autoencoder, the downstream task is to regenerate the input itself where this layer ends up becoming a compression of the original input.
Here is what you can do.
Multi-output model: Create an autoencoder with an auxiliary output of multi-class classification. This means you would be training the model to minimize the reconstruction loss as well as a categorical cross-entropy together.
Guided training: - Another way is to train a model for the classification task, then remove the final layer to use it as an encoder. Then replace that final layer with a decoder. Now, continue training the encoder-decoder to regenerate the input data by initializing the encoder with the weights which it had learnt in the classification task. (Initialize weights but don't make them non-trainable). This allows the encoder to start with the knowledge it had from the classification task and then further train the weights to be able to regenerate its own inputs. This layer now would be able to get you the right embeddings you need.
Plain and simple embedding layer: - The most reliable way, if you have a good amount of data would be to simply train an embedding layer as part of the classification task. The weights from this embedding layer would give you significantly better results (in terms of similarity in a euclidean space) than Dense layer which is focused on feature extraction for the subsequent layer instead of input representation.
Also remember, the more accurate representation of the input in a properly trained deep neural network lies in the initial layers rather than the latter which are closer to the output distribution.

- 18,741
- 3
- 21
- 51