0

I am developing a project based on the pytorch lightning + hydra template found here https://github.com/ashleve/lightning-hydra-template. I am trying to instantiate a Pytorch dataset object using hydra instantiate, overriding the default cfg value with the DictConfig object. Specifically, I have this config file:

    ...
    training_dataset:
       _target_: src.datamodules.components.baseline_datasets.FSD50K
       cfg: omegaconf.dictconfig.DictConfig(content={})
       mode: "training"
    ...

While the pytorch lightning datamodule does the following:

    class AudioTagDataModule(LightningDataModule):
        def __init__(self, cfg: DictConfig):
           super().__init__()
           self.cfg = cfg
           self.save_hyperparameters(logger=False)
    
    
        def setup(self, stage: Optional[str] = None):
           
           self.trainset  = instantiate(self.cfg.datamodule.training_dataset, cfg=self.cfg)
           ...

The rest of the code is pretty much unmodified from the template. However, when the pytorch dataset is instantiated, I get an error due to the config being empty. Checking in debug, I see that the config value is not being overridden, despite having the same name that was specified in the configs. Could someone provide some guidance on why the override is not working and how to correctly proceed? Thank!

Jasha
  • 5,507
  • 2
  • 33
  • 44

1 Answers1

0

Looking at the AudioTagDataModule, I see that the setup method passes a cfg=self.cfg keyword argument to the instantiate function. This is why the cfg setting from your training_dataset config is not showing up in your instantiated dataset; the keyword argument is taking precedence. Based on the code you posted, it would probably make sense to pass an empty DictConfig to AudioTagDataModule.__init__ instead of defining a cfg key/value in training_dataset.

Another thing: in your yaml file, you'd probably want cfg: {} instead of cfg: omegaconf.dictconfig.DictConfig(content={}), as the latter will result in the python string "omegaconf.dictconfig.DictConfig(content={})" and the former will result in an empty DictConfig.

Jasha
  • 5,507
  • 2
  • 33
  • 44
  • Thank you for your answer, I updated my code according to your second reccomendation. Concerning the first, I'm not sure I understand. According to the documentation, if I define a keyword arg in the config file and then pass an object with the same name to instantiate, I should override the default value. Did I define the keyword arg incorrectly? – fred_101512 May 26 '22 at 15:27
  • My first point was about the line `self.trainset = instantiate(self.cfg.datamodule.training_dataset, cfg=self.cfg)` in `AudioTagDataModule`. If you replace that line with `self.trainset = instantiate(self.cfg.datamodule.training_dataset, cfg=self.cfg)` then the `cfg` value from your yaml file will be passed to `FSD50K` as expected. The presence of `cfg=self.cfg` in the `instantiate` call means the value from your `yaml` file will *not* be passed to `FSD50K`, as `self.cfg` is taking precedence. – Jasha May 26 '22 at 18:27
  • I am sorry, I did not explain myself correctly. I actually want self.cfg to take precedence, but in my code the value from the yaml file is instead passed. – fred_101512 May 27 '22 at 14:47
  • That is strange. Maybe `self.cfg` is equal to the yaml file's `training_dataset.cfg`? – Jasha May 27 '22 at 21:42
  • 1
    They are different, as self.cfg is the complete config object for the whole experiment while training_dataset.cfg in the yaml contains only a few keys. Maybe there is a bug, I will try and post it on hydra github page. Thanks again for your andwers! – fred_101512 May 28 '22 at 15:13