For reinforcement learning experiments, I often run independent repetitions for each hyperparameter setting. Ideally, I would visualize the average of these repetitions (per setting), including confidence intervals around the mean learning curve. I suppose many RL researchers have this issue.
I run my hyperparameter experiments with Ray Tune, which automatically visualizes each independent run in Tensorboard (which is very useful). It would be really helpful if I could automatically aggregate the results over the repetitions (with confidence), and then compare the different hyperparameter settings (and plot them for papers). I could not find any method in Tune/Tensorboard to do this, nor an intergration with another framework that can do this.
As an example, I would ideally get a curve like below, but then directly in Tensorboard
I suppose more people will have this issue, and was curious whether anyone knows a package or quick solution to go from Ray Tune output to the above figure (without coding it manually). Thanks a lot!
Best regards, Thomas