1

I am experimenting with TensorFlow 2.0 (alpha). I want to implement a simple feed forward Network with two output nodes for binary classification (it's a 2.0 version of this model).

This is a simplified version of the script. After I defined a simple Sequential() model, I set:

# import layers + dropout & activation
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.activations import elu, softmax

# Neural Network Architecture
n_input = X_train.shape[1]
n_hidden1 = 15
n_hidden2 = 10
n_output = y_train.shape[1]


model = tf.keras.models.Sequential([
    Dense(n_input, input_shape = (n_input,), activation = elu),   # Input layer
    Dropout(0.2), 
    Dense(n_hidden1, activation = elu), # hidden layer 1
    Dropout(0.2),     
    Dense(n_hidden2, activation = elu), # hidden layer 2
    Dropout(0.2), 
    Dense(n_output, activation = softmax)  # Output layer
])


# define loss and accuracy
bce_loss = tf.keras.losses.BinaryCrossentropy()
accuracy = tf.keras.metrics.BinaryAccuracy()

# define optimizer
optimizer = tf.optimizers.Adam(learning_rate = 0.001)

# save training progress in lists
loss_history = []
accuracy_history = []


# loop over 1000 epochs
for epoch in range(1000):

    with tf.GradientTape() as tape:

        # take binary cross-entropy (bce_loss)
        current_loss = bce_loss(model(X_train), y_train)

    # Update weights based on the gradient of the loss function
    gradients = tape.gradient(current_loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))

    # save in history vectors
    current_loss = current_loss.numpy()
    loss_history.append(current_loss)

    accuracy.update_state(model(X_train), y_train)
    current_accuracy = accuracy.result().numpy()
    accuracy_history.append(current_accuracy)

    # print loss and accuracy scores each 100 epochs
    if (epoch+1) % 100 == 0:
        print(str(epoch+1) + '.\tTrain Loss: ' + str(current_loss) + ',\tAccuracy: ' + str(current_accuracy))

    accuracy.reset_states()

print('\nTraining complete.')

Training goes without errors, however strange things happen:

  • Sometimes, the Network doesn't learn anything. All loss and accuracy scores are constant throughout all the epochs.
  • Other times, the network is learning, but very very badly. Accuracy never went beyond 0.4 (while in TensorFlow 1.x I got an effortless 0.95+). Such a low performance suggests me that something went wrong in the training.
  • Other times, the accuracy is very slowly improving, while the loss remains constant all the time.

What can cause these problems? Please help me understand my mistakes.


UPDATE: After some corrections, I can make the Network learn. However, its performance is extremely poor. After 1000 epochs, it reaches about %40 accuracy, which clearly means something is still wrong. Any help is appreciated.

Leevo
  • 1,683
  • 2
  • 17
  • 34

1 Answers1

3

The tf.GradientTape is recording every operation that happens inside its scope.

You don't want to record in the tape the gradient calculation, you only want to compute the loss forward.

with tf.GradientTape() as tape:
    # take binary cross-entropy (bce_loss)
    current_loss = bce_loss(model(df), classification)
# End of tape scope

# Update weights based on the gradient of the loss function
gradients = tape.gradient(current_loss, model.trainable_variables)
# The tape is now consumed
optimizer.apply_gradients(zip(gradients, model.trainable_variables))

More importantly, I don't see the loop on the training set, therefore I suppose the complete code looks like:

for epoch in range(n_epochs):
    for df, classification in dataset:
        # your code that computes loss and trains

Moreover, the usage of the metrics is wrong.

You want to accumulate, thus update the internal state of the accuracy operation, at every training step and measure the overall accuracy at the end of every epoch.

Thus you have to:

# Measure the accuracy inside the training loop
accuracy.update_state(model(df), classification)

And call accuracy.result() only at the end of the epoch, when all the accuracy value have been saved into the metric. Remember to call to the .reset_states() method to clears the variable states, resetting it to zero at the end of every epoch.

nessuno
  • 26,493
  • 5
  • 83
  • 74
  • 1) Apologies, I still don't get line `for df, classification in dataset`: what is object `dataset`? (In my script, `df` is my dataset, and `classification` is the one-hot encoded dependent variable). 2) Should I write `current_accuracy = accuracy.update_state(model(df), classification)` and then append that to `accuracy_history`? Btw, thank you for all the info. – Leevo Mar 21 '19 at 13:58
  • 1
    1) So you're not looping aver batches but you're using all the training set all at once. Pure gradient descent and not mini-batch gradient descent - that's ok then! You can remove the loop part I added since you don't need to loop over a dataset. 2) `accuracy.update_state(model(df), classification)` then `accuracy_history.append(accuracy.result())` and end with `accuracy_history.clear_states()` – nessuno Mar 21 '19 at 14:23
  • Did you mean: `accuracy.reset_states()` as the end? Shall I do the same for `loss` ? – Leevo Mar 21 '19 at 15:10
  • Hi, I followed all your suggestions, but the problem is still there. The loss and accuracy output is always constant, except sometimes when it learns but so badly that it can't be right. What else can I try? – Leevo Mar 25 '19 at 16:04
  • Hi, please update the post with the updated version of the code - it can help – nessuno Mar 25 '19 at 16:07
  • I have updated the code. Now it includes the corrections you suggested, and also the MLP specification – Leevo Mar 25 '19 at 17:09
  • It looks OK - the only missing part is the "training" parameter of the sequential `__call__` method. When you trian the model, you should use it like `model(X_train, True)` while you measure the accuracy instead you should set ti to inference mode, thus `model(X_train, False)` - probably the low accuracy is due to dropout being applied when measuring it – nessuno Mar 25 '19 at 17:49
  • Got it. Something is wrong in the accuracy computation: I just visualized a confusion matrix on the test set, and got about 98% accuracy. Can you tell what's the problem with accuracy? – Leevo Mar 25 '19 at 21:13
  • 1
    I guess I found the reason! ` accuracy.update_state(model(X_train), y_train)` should be ` accuracy.update_state(y_train, model(X_train))` since https://www.tensorflow.org/api_docs/python/tf/keras/metrics/Accuracy is accuracy(y_true, y_pred) – nessuno Mar 26 '19 at 07:22
  • Yes, it works now. Why an op such as Binary Accuracy whould be affected by switching from (a, b) to (b, a)? However, thanks. I successfully run my fist MLP in TF 2.0 thanks to your help. – Leevo Mar 26 '19 at 08:11
  • Because considering predictions and labels to be the same is wrong! (a,b) != (b,a). However I'm happy to help - please mark this answer as accepted now :D – nessuno Mar 26 '19 at 10:54