From what you have shared, I believe you should pick the first one since it has higher accuracy on the validation data. The reason is that you are optimizing your models with a loss function which aims to decrease the training loss. Therefore, it can overfit to your data to yield good accuracy scores on your training loss whereas in reality it cannot generalize to data out of your training data. In what you have shared I see that you have 2 metrics at your hand; namely, the training loss and the validation loss. When you are working on models, you should also have a test dataset which is separate from these two. You would train on your training data, and fine-tune(pick hyper parameters) for your model by trying to increase the performance on the validation data. After you have picked your model, you should evaluate it regarding its performance on the test dataset which you have not introduced to your model up to this point. It would show you really how your model generalizes to unseen data. The reason you do this is because when training you optimized your model regarding its performance on the training set. When picking the best hyper-parameters you were biased to do well on the validation set. Whereas, you have never seen its performance on the test dataset therefore, it in a sense reflects real world situations that your model would face if it were to be deployed. Hope this helps.