DIfferent optimization with different TF versions

Question

I'm trying to train a convolutional neural network with keras and Tensorflow version 2.6, also I did it with Tensorflow version 1.11. I think that I did the migration okey (two neural networks converged) but when I see the results they are very different, worst in TF2.6, I used an optimizer Adam for both cases with the same hyperparameters (learning_rate = 0.001) but the optimization in the loss function in TF1.11 is better than in TF2.6

I'm trying to find out where the differences could be. What things must be taken into account when we work with differents TF versions? Can have important numerical differences? I know that in TF1.x the default mode is graph and in TF2 the default is eager, I don't know if this could bring different behavior in the training.

It surprises me how much the loss function is reduced in the first epochs reaching a lower value at the end of the training.

score 0 · Answer 1 · edited Mar 17 '23 at 13:36

0

you understand that is correct they are working in different working modes eager and graph but the loss Fn is defined by how much change of value to required optimized pointed calculated by your or configured method.

You cannot directly be compared one model training history to another directly, running it several time you experience TF 1 is faster and smaller in the number of losses in the loss Fn that is needed to review the changelog Changlog
Loss Fn are updated, the graph is the powerful technique we know but TF 2.x supports access of the value at its level, why you have easy delegated methods such as callback, dynamic FNs, and working update value runtime. ( Trends to understand and experiments for student or user compared by both versions on the same tasks )

Symetrics in methods not create different results.

Sample

edited Mar 17 '23 at 13:36

General Grievance

4,555
31
31
45

answered Nov 11 '22 at 03:06

Jirayu Kaewprateep

736
6
9

Thanks for your answer but I don't pretty sure if I understand correctly. I understand you say that we cannot compare directly two models training history... also I've tried with some callbacks and dynamic learning without success, what do you mean with "working update value runtime"? In my case is a customized loss function and in the case of TF1.11 I ran in CPU while in TF2.6 I used a GPU – Belén Costanza Nov 29 '22 at 18:12
You understand it correctly way, two model training loss values cannot compare because loss fn is applied for the current process and estimator, different versions of TensorFlow also use different loss fn and optimizer as in the changelog. ( loss function formula changed but the project from working process of different graph cannot direct compare ) – Jirayu Kaewprateep Nov 30 '22 at 23:19

DIfferent optimization with different TF versions

1 Answers1