I am working on a data pipeline that follows a structure like this:
-- src/
---- etl.py
---- scripts/
------ moduleA.py
------ moduleB.py
I want to parametrise the scripts with Hydra. I have already done it for moduleA, which can be run independently:
import os
import hydra
from omegaconf import DictConfig
@hydra.main(config_path=os.getcwd(), config_name="config")
def main(cfg: DictConfig) -> None:
# Parse args
input_file: str = cfg.params.input_file
do_stuff(input_file)
I would like to have the same approach for moduleB et al and being able to to instantiate these main()
from etl.py
, which will act as the orchestrator.
TLDR: Is it possible to parametrise a function that reads from a config file without having to renounce to use Hydra? I would like etl.py
to be something like this:
import os
import hydra
from omegaconf import DictConfig
from scripts.moduleA import main as process_moduleA
@hydra.main(config_path=os.getcwd(), config_name="config")
def main(cfg: DictConfig) -> None:
# Parse args
input_file: str = cfg.params.input_file
process_moduleA(input_file)
Many thanks in advance!!