I would like to run multiple experiments, then report model accuracy per experiment.
I'm training a toy MNIST example with pytorch (v1.1.0), but the goal is, once I can compare performance for the toy problem, to have it integrated with the actual code base.
As I understand the TRAINS python package, with the "two lines of code" all my hyper-parameters are already logged (Command line argparse in my case).
What do I need to do in order to report a final scalar and then be able to sort through all the different training experiments (w/ hyper-parameters) in order to find the best one.
What I'd like to get, is a graph/s where on the X-axis I have hyper-parameter values and on the Y-axis I have the validation accuracy.