I am trying to train a RL agent that needs to carry out quite a heavy process to complete, as it needs to perform certain actions with Selenium webdriver and I do not count with a GPU to speed up this process. Due to this, I have tried several alternatives in order to train my agent:
- Train the agent in my computer (CPU): The problem with this alternative is that, if I run the agent for a while, I end up getting the Webdriver exception "
not connected to DevTools
", which is an error I have tried to avoid by reseting the webdriver when it occurs but without succeeding (more on that in my previous question). - Train the agent using Google Colab: The problem here is that the training execution lasts more than 2 hours, and after that time, Google Colab seems to disconnect. Also, it is worth noticing that once in a while, if you are not active in your Google Colab notebook, a captcha appears to make sure you are not a robot and are still there, but I need to leave my script running for a long time, so being constantly interacting with the notebook may not be possible.
- Train the agent with Amazon SageMaker: For those who may not know (as was my case a few weeks ago), Amazon SageMaker is a cloud machine-learning platform that provides free sessions of 4 hours of GPU usage (with a maximum of 8 hours in total per day). I wanted to try to run my script here, as maybe 4 hours of GPU were enough for my model to be trained, but for some reason, this method does not allow me to communicate with my database (more on that in my previous question).
Due to all this, I'd like to know if there is any way to fix any of the problems stated above or, alternatively, find a way to be able to execute this learning process partially. With this, I mean to be able to store the model trained until the moment of disconnection somewhere to then keep training from there and not having to repeat the same process over and over again from the beginning. With that purpose, I already have this code that implements a "checkpoint" functionality in order to store the results for each learning step:
def train_agent(self, n_steps = 100, total_timesteps = 5000, callback=None, log_interval=1, tb_log_name='AgentV1', reset_num_timesteps=True, progress_bar=False, checkpoint = True):
# Loading the last training info to continue from the same point in case the execution ends abruptly
if checkpoint:
best_mean_reward, best_model_path = -float('inf'), None
model_paths = glob.glob(f"{self.model_name}_*.zip")
# Check if there is a saved model
if model_paths:
best_model_path = max(model_paths, key=lambda path: int(path.split('_')[-1].split('.')[0]))
print("Loading saved model...")
self.load(best_model_path)
last_step = int(best_model_path.split("_")[-1].split('.')[0])
print(f"Continuing training from step #{last_step}")
else:
last_step = 0
best_mean_reward, best_model_path = -float('inf'), None
for i in range(1, int(total_timesteps/n_steps)+1):
print(f" Training step #{i} ", flush=True)
self.learn(total_timesteps=total_timesteps, callback=callback, log_interval=log_interval,
tb_log_name=tb_log_name, reset_num_timesteps=reset_num_timesteps, progress_bar=progress_bar)
# evaluate the agent and log the results
mean_reward = self.evaluate()
if mean_reward > best_mean_reward:
best_mean_reward = mean_reward
best_model_path = f"{self.model_name}_{i}.zip"
self.save(best_model_path)
print(f"New best model saved to {best_model_path}")
print(f"New best model saved to {best_model_path}", flush=True)
def learn(self, total_timesteps, callback=None, log_interval=1, tb_log_name='AgentV1', reset_num_timesteps=True,
progress_bar=False):
self.model.learn(total_timesteps=total_timesteps, callback=callback, log_interval=log_interval,
tb_log_name=tb_log_name, reset_num_timesteps=reset_num_timesteps, progress_bar=progress_bar)
os.makedirs(self.model_folder, exist_ok=True)
self.model.save(self.model_path)
As it can be seen, what I do in this code is call the agent's "learn
" function for as many times as total_timesteps/n_steps
and save the agent per each learning step. It is worth noting that my Agent class inherits from stable_baselines3's BaseAlgorithm
class. The thing is that even a training step is already too long to perform coping with the problems stated above, and therefore I am not able to get a partial model (I am currently running my agent for 50 training steps, with 5000 iterations for each step, but I'd like to increase this number if possible). I'd like to know if there is any way to store not only each model version after a training iteration, as I currently do, but also override the learn method in such way that it allows to store partial results of the training in batches (for example, storing the model every 100 steps) and continue training from there in case of having to terminate the execution abruptly.