2

I want to control version of experiment configuration files with hydra and dvc without uploading original config files to git.
Hydra does control config, and dvc controls version. But Hydra does not specify which 'code version' is needed to reproduce experiment. And I don't want to add 'git hash logging code' in every experiments.

Is there any way to log git hash to hydra log in default? thanks in advance

이준혁
  • 267
  • 4
  • 14

2 Answers2

2

Good timing! A DVC-Hydra integration is in development. You can see the proposal in https://github.com/iterative/dvc/discussions/7044#discussioncomment-3271855 and the development progress in https://github.com/iterative/dvc/pull/8093. This should allow you to take a Hydra config, pass your Hydra overrides via dvc exp run --set-param=<hydra_overrides>, and capture the output with DVC.

1

Hydra's callbacks API covers this use-case.

Using a library such as GitPython, you can get the current git sha in a Hydra Callback. The below example makes use of Hydra's standard logging mechanism:

# hydra_git_callback.py
import logging
from typing import Any
import git  # installed with `pip install gitpython`
from hydra.experimental.callback import Callback
from omegaconf import DictConfig

log = logging.getLogger(__name__)

def get_git_sha():
    repo = git.Repo(search_parent_directories=True)
    sha = repo.head.object.hexsha
    return sha

class MyCallback(Callback):
    def on_run_start(self, config: DictConfig, **kwargs: Any) -> None:
        sha = get_git_sha()
        log.info(f"Git sha: {sha}")

    def on_multirun_start(self, config: DictConfig, **kwargs: Any) -> None:
        sha = get_git_sha()
        log.info(f"Git sha: {sha}")

You can then target this callback in your Hydra config:

# config.yaml
hydra:
  callbacks:
    git_logging:
      _target_: hydra_git_callback.MyCallback

foo: bar
# my_app.py
import hydra
from omegaconf import DictConfig, OmegaConf

@hydra.main(version_base="1.2", config_path=".", config_name="config")
def my_app(cfg: DictConfig) -> None:
    print(OmegaConf.to_yaml(cfg))

if __name__ == "__main__":
    my_app()

Running the app:

$ python my_app.py
[2022-10-19 18:52:04,152][HYDRA] Git sha: 8bde1327f0e0ba7b1147b4338c53882aaeb0cf9f
foo: bar

Edit (2023-05-03)

See the new hydra-callbacks repo which provides a GitInfo callback.

Jasha
  • 5,507
  • 2
  • 33
  • 44
  • hydra does not log this info to file for me (`hydra-core==1.3.2`) when I use this snippet. If I hook into `on_job_start` (instead of `on_run_start`) it works. I assume that somehow the logger gets captured only after `on_run_start`? – jasonh Apr 24 '23 at 16:25