Autoencoder loss and accuracy on a simple binary data

Question

I'm trying to understand and improve the loss and accuracy of the variational autoencoder. I filled the autoencoder with a simple binary data:

data1 = np.array([0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
   1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   0, 0, 0, 0, 0, 0, 0, 0], dtype='int32')

data2 = np.array([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
   1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   0, 0, 0, 0, 0, 0, 0, 0], dtype='int32')


data3 = np.array([0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
   1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   0, 0, 0, 0, 0, 0, 0, 0], dtype='int32')

100 samples each, so I have 300 samples.

I tried to predict with Variational Autoencoder

sent_encoded = encoder.predict(np.array(test), batch_size = batch_size)
sent_decoded = generator.predict(sent_encoded)

and got correct answers for a few rows

print(np.round_(sent_decoded[1]))
print(np.round_(sent_decoded[100]))
print(np.round_(sent_decoded[299]))

[ 0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.
  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
[ 0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.
  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
[ 1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.
  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]

What I don't understand is the loss, accuracy and the mse loss of the model training.

I got pretty nice loss chart

but why accuracy of the model is not so great on that simple dataset? Just look at it

The mse loss doesn't change it and it is pretty high:

What I can do to get 100% accurate model? Does variational autoencoder is capable to get me 100% accurate model with this type of data? Show me with the code please.

score 2 · Answer 1 · answered Apr 11 '18 at 09:19

2

Variational autoencoder is not a classifier, so accuracy doesn't actually make any sense here.

Measuring VAE's loss by mean-squared reconstruction error could be also problematic. To put it shortly, VAE doesn't only optimize reconstruction loss.

You need to read more about what Variational Autoencoder is, and specifically what it optimizes. If you're just interested in classification, then maybe just pretraining regular autoencoder and then classifier will make more sense.

answered Apr 11 '18 at 09:19

Jakub Bartczuk

2,317
1
20
27

I want to use it for anomaly detection. Suppose if I have some other pattern in a test part of dataset, other then data1,data2 or data3, But you are right, loss is the acceptable measure, but why is so high (0.94) for that simple dataset? – MarioZ Apr 11 '18 at 09:26
Why loss chart is not converging to zero if I have 100% precise decoding of the test part of the dataset? I use custom loss layer. – MarioZ Apr 25 '18 at 10:12

Autoencoder loss and accuracy on a simple binary data

1 Answers1