0

Assuming a directory structure like this:

/input/files/path
  /input1
    /file1_1.json
    /file1_2.json
  /input2
    /file2_1.json
  /someting_unrelated
  ...

I want to run a script that is configured using Hydra several times, each time getting the full path to one of the input* folders.

How could this be achieved?

Michael Litvin
  • 3,976
  • 1
  • 34
  • 40
  • How are you planning to use the .json file? Is each json file a different config for Hydra, and you want to configure multiple hydra jobs per each of the files? Or are you planning to use the json file as one of the inputs to your app (i.e. you want the path to the .json file to appear somewhere in the config object that's passed to your main function)? – Jasha Oct 28 '21 at 05:01
  • I want the JSON to be one of the inputs to the app. It could be any other type, like PNG or anything else. – Michael Litvin Oct 28 '21 at 14:18

2 Answers2

1

Here are the two methods I can think of:

Method 1: Use a shell utility to pass a list of *.json file names to your app

You could use an external utilities, e.g. GNU find and paste, to create a comma-separated list of the *.json files you'd like to use.

$ find input -name '*.json' | paste -sd "," -
input/files/path/input1/file1_1.json,input/files/path/input1/file1_2.json,input/files/path/input2/file2_1.json

You can then pass these file paths to your app to perform a multirun sweep:

$ python my_app.py --multirun file_path="$(find input -name '*.json' | paste -sd ',' -)"
[2021-10-29 12:53:37,059][HYDRA] Launching 3 jobs locally
[2021-10-29 12:53:37,059][HYDRA]        #0 : file_path=input/files/path/input1/file1_1.json
{'file_path': 'input/files/path/input1/file1_1.json'}
[2021-10-29 12:53:37,136][HYDRA]        #1 : file_path=input/files/path/input1/file1_2.json
{'file_path': 'input/files/path/input1/file1_2.json'}
[2021-10-29 12:53:37,219][HYDRA]        #2 : file_path=input/files/path/input2/file2_1.json
{'file_path': 'input/files/path/input2/file2_1.json'}

The file path of the .json file would be available to your app's main function via the file_path key in the app's config.

Here are the my_app.py and config.yaml files I used to produce the above output:

# my_app.py
import hydra

@hydra.main(config_path=".", config_name="config")
def main(cfg):
    print(cfg)

if __name__ == "__main__":
    main()
# config.yaml
file_path: ???

Method 2: Register the paths to the .json files in config groups

  • Step 1: Use python code to generate a list of paths of the .json files that you are interested in.
  • Step 2: Use the Config Store API to register input configs containing each of the file paths.
  • Step 3: At the command line, use a glob choice sweep to sweep over all the input configs registered in step 2.

In detail:

# my_app.py
import os
import hydra

# Step 1: Use os.walk to make a list of paths to .json files
json_filepaths = []
for dirpath, dirnames, filenames in os.walk("input"):
    for filename in filenames:
        if filename.endswith(".json"):
            json_filepaths.append(f"{dirpath}/{filename}")

# Step 2: For each .json file path, register an input config
cs = hydra.core.config_store.ConfigStore.instance()
for fpath in json_filepaths:
    cs.store(
        name=fpath.replace("/", "-"),  # the names must be unique for each fpath and must not contain a forward slash
        node={"file_path": fpath},
        group="json_input",
    )

@hydra.main(config_path=".", config_name="config")
def main(cfg):
    print(cfg)

if __name__ == "__main__":
    main()
# config.yaml
defaults:
  - json_input: ???
$ # Step 3: Doing a --multirun sweep over the `json_input` group:
$ p3 my_app.py --multirun 'json_input=glob(*)'
[2021-10-29 21:14:28,643][HYDRA] Launching 3 jobs locally
[2021-10-29 21:14:28,643][HYDRA]        #0 : json_input=input-files-path-input1-file1_1.json
{'json_input': {'file_path': 'input/files/path/input1/file1_1.json'}}
[2021-10-29 21:14:28,726][HYDRA]        #1 : json_input=input-files-path-input1-file1_2.json
{'json_input': {'file_path': 'input/files/path/input1/file1_2.json'}}
[2021-10-29 21:14:28,817][HYDRA]        #2 : json_input=input-files-path-input2-file2_1.json
{'json_input': {'file_path': 'input/files/path/input2/file2_1.json'}}

The file path of the .json file would be available to your app's main function via the json_input.file_path key in the app's config.

Jasha
  • 5,507
  • 2
  • 33
  • 44
1

Hydra does not support json. Use yaml files.

As for multirun on a glob, see glob in the Override grammar page.

e.g:

python foo.py 'config/group=glob(input*)'

Note that quoting may be required depending on your shell behavior.

One thing that is not clear from your question is if your config files are in the config search path. If they are not you must their directory to the searchpath. See searchpath for more info.

Omry Yadan
  • 31,280
  • 18
  • 64
  • 87