Here are the two methods I can think of:
Method 1: Use a shell utility to pass a list of *.json
file names to your app
You could use an external utilities, e.g. GNU find
and paste
, to create a comma-separated list of the *.json
files you'd like to use.
$ find input -name '*.json' | paste -sd "," -
input/files/path/input1/file1_1.json,input/files/path/input1/file1_2.json,input/files/path/input2/file2_1.json
You can then pass these file paths to your app to perform a multirun sweep:
$ python my_app.py --multirun file_path="$(find input -name '*.json' | paste -sd ',' -)"
[2021-10-29 12:53:37,059][HYDRA] Launching 3 jobs locally
[2021-10-29 12:53:37,059][HYDRA] #0 : file_path=input/files/path/input1/file1_1.json
{'file_path': 'input/files/path/input1/file1_1.json'}
[2021-10-29 12:53:37,136][HYDRA] #1 : file_path=input/files/path/input1/file1_2.json
{'file_path': 'input/files/path/input1/file1_2.json'}
[2021-10-29 12:53:37,219][HYDRA] #2 : file_path=input/files/path/input2/file2_1.json
{'file_path': 'input/files/path/input2/file2_1.json'}
The file path of the .json file would be available to your app's main function via the file_path
key in the app's config.
Here are the my_app.py
and config.yaml
files I used to produce the above output:
# my_app.py
import hydra
@hydra.main(config_path=".", config_name="config")
def main(cfg):
print(cfg)
if __name__ == "__main__":
main()
# config.yaml
file_path: ???
Method 2: Register the paths to the .json
files in config groups
- Step 1: Use python code to generate a list of paths of the
.json
files that you are interested in.
- Step 2: Use the Config Store API to register input configs containing each of the file paths.
- Step 3: At the command line, use a glob choice sweep to sweep over all the input configs registered in step 2.
In detail:
# my_app.py
import os
import hydra
# Step 1: Use os.walk to make a list of paths to .json files
json_filepaths = []
for dirpath, dirnames, filenames in os.walk("input"):
for filename in filenames:
if filename.endswith(".json"):
json_filepaths.append(f"{dirpath}/{filename}")
# Step 2: For each .json file path, register an input config
cs = hydra.core.config_store.ConfigStore.instance()
for fpath in json_filepaths:
cs.store(
name=fpath.replace("/", "-"), # the names must be unique for each fpath and must not contain a forward slash
node={"file_path": fpath},
group="json_input",
)
@hydra.main(config_path=".", config_name="config")
def main(cfg):
print(cfg)
if __name__ == "__main__":
main()
# config.yaml
defaults:
- json_input: ???
$ # Step 3: Doing a --multirun sweep over the `json_input` group:
$ p3 my_app.py --multirun 'json_input=glob(*)'
[2021-10-29 21:14:28,643][HYDRA] Launching 3 jobs locally
[2021-10-29 21:14:28,643][HYDRA] #0 : json_input=input-files-path-input1-file1_1.json
{'json_input': {'file_path': 'input/files/path/input1/file1_1.json'}}
[2021-10-29 21:14:28,726][HYDRA] #1 : json_input=input-files-path-input1-file1_2.json
{'json_input': {'file_path': 'input/files/path/input1/file1_2.json'}}
[2021-10-29 21:14:28,817][HYDRA] #2 : json_input=input-files-path-input2-file2_1.json
{'json_input': {'file_path': 'input/files/path/input2/file2_1.json'}}
The file path of the .json file would be available to your app's main function via the json_input.file_path
key in the app's config.