I am using hydra to organize my configurations and output folders for a project. This project involves first specifying a particular objective that can differ in a few parameters. Then that objective can be optimized by different kinds of optimization procedures, where each optimization has its own specific hyperparameters.
So far, it has been straightforward to use hydra's multisweep functionality to explore different objectives, for a single kind of optimization. I have been using string interpolation in my main config.yaml file. Suppose objective
and experiment
are config groups, and optimization
is a config group under experiment
.
filepaths:
# filenames
objective_data: objective_data.csv # contains intermediate output useful for all experiments
experiment_data: experiment_output.csv # contains final output specific only to a single experiment configuration
# important subdirectories to keep distinct
objective_subdir: param1=${objective.feature1.param1}/param2=${objective.feature1.param2}
experiment_subdir: optimization=${experiment.optimization.name}/num_trials=${experiment.num_trials}/learning_rate=${experiment.optimization.learning_rate}/seed=${seed}
leaf_subdir: ${filepaths.objective_subdir}/${filepaths.experiment_subdir}
hydra:
run:
dir: outputs/${filepaths.leaf_subdir}
job:
chdir: True
config:
override_dirname:
exclude_keys:
- filepaths.leaf_subdir
- objective.feature1.param1
- objective.feature1.param2
- experiment.optimization.name
- experiment.num_trials
- experiment.optimization.learning_rate
- seed
sweep:
dir: multirun
subdir: ${filepaths.leaf_subdir}
And my conf directory structure is the following:
conf
├── config.yaml
├── objective
│ ├── basic.yaml
│ └── feature1
│ └── params.yaml
└── experiment
├── basic.yaml
└── optimization
├── reinforcement_learning.yaml
└── replicator_dynamic.yaml
When I use multisweep, I can get the following directory structure:
multirun
├── multirun.yaml
└── param1=10
└── param2=10
├── objective_data.csv
├── optimization=reinforcement_learning
│ └── num_trials=100
│ └── learning_rate=1e-4
│ └── rd_hparam=5
│ └── seed=42
│ └── output.csv
│
└── optimization=replicator_dynamic
└── num_trials=1
└── learning_rate=1e-4
└── rd_hparam=5
└── seed=42
└── output.csv
But really, I don't want to have to branch with rd_hparam
in the reinforcement_learning optimization, or to have to branch with learning_rate
subfolder in the replicator_dynamic optimization, because those values have nothing to do with those configurations. This is because learning_rate
is a hyperparameter specific to reinforcement_learning
and rd_param
is a parameter specific to reinforcement_learning
.
However, it would be useful to keep other keys excluded from the override_dirname
, like seed
, (like is described here) so that they could be the later parents of my job-specific output.csv
files
But I really want to use multisweep to run the different optimization jobs all at once. Ideally, I want something like the following directory structure:
multirun
├── multirun.yaml
└── param1=10
└── param2=10
├── objective_data.csv
├── optimization=reinforcement_learning
│ └── num_trials=100
│ └── learning_rate=1e-4
│ └── seed=42
│ └── output.csv
│
└── optimization=replicator_dynamic
└── num_trials=1
└── rd_hparam=5
└── seed=42
└── output.csv
I've mainly been looking into ways of combining override_dirname
as explained in the hydra tutorial here together with string interpolation. But this only gets me so far -- what I really need is to be able to specify a condition for what the job-specific subdirectory is, based on the values I'm sweeping in multisweep.
The main reason I'm worried that what I want to do is not possible is that it appears that I want to change fields of hydra.run
and hydra.sweep
, which are populated at runtime, according to the docs. I know that I can access them, but can I change their fields? If I could edit these fields somehow, then I would probably just implement all of my folder logic in code, instead of trying to do sophisticated string interpolation.
Less relevant, things that I've looked into:
I initially had some suspicion that what I should use is some kind of custom resolver from OmegaConf, because then perhaps I could pass the config object or fields of it to a custom-built resolver function that has precisely the logic I want. But whether and how to do this is not clear to me.
Finally, I've been avoiding trying to edit paths from hydra.utils.get_original_cwd
by hand, because hydra seems to have a lot of support for automatic subdirectory creation.