I'm very new to training on bigger datasets and would appreciate some advice.
I have a very large dataset that I am training on Google Colab. Colab has a 12-hour period, and it stops the runtime.
I have built-in checkpoints every so often.
If Colab bombs out, can I continue training again using the weights from the last checkpoint? Can I keep doing this? At what point would I stop? When would it be too much?
(I'm generating images)