0
  • I have a directory contains a bunch of txt files:

dir/train/[train1.txt, train2.txt, train3.txt]

  • I'm able to read a single file, if I define following in a config.yaml

file_name: ${paths.data_dir}/train/train1.txt

So I get the str and I used np.loadtxt(self.hparams.file_name)

  • I tried

file_name: ${paths.data_dir}/train/*

So I have List[str], I then loop over file_name

dat = []
for file in self.hparams.file_name:
   dat.append(np.loadtxt(file))

but it didn't work out.

  • I'd recommend using Python's [pathlib](https://docs.python.org/3/library/pathlib.html) to loop over files on the file system. – Jasha Sep 30 '22 at 17:29

1 Answers1

1

You could define an OmegaConf custom resolver for this:

# my_app.py
import pathlib
from pathlib import Path
from typing import List

from omegaconf import OmegaConf

yaml_data = """
paths:
  data_dir: dir
file_names: ${pathlib_glob:${paths.data_dir}, 'train/*'}
"""


def pathlib_glob(data_dir: str, glob_pattern: str) -> List[str]:
    """Use Pathlib glob to get a list of filenames"""

    data_dir_path = pathlib.Path(data_dir)
    file_paths: List[Path] = [p for p in data_dir_path.glob(glob_pattern)]
    filenames: List[str] = [str(p) for p in file_paths]
    return filenames

OmegaConf.register_new_resolver("pathlib_glob", pathlib_glob)

cfg = OmegaConf.create(yaml_data)
assert cfg.file_names == ['dir/train/train3.txt', 'dir/train/train2.txt', 'dir/train/train1.txt']

Now, at the command line:

mkdir -p dir/train
touch dir/train/train1.txt
touch dir/train/train2.txt
touch dir/train/train3.txt
python my_app.py  # the assertion passes
Jasha
  • 5,507
  • 2
  • 33
  • 44
  • Cross link to related discussion on Github: https://github.com/facebookresearch/hydra/discussions/2399 – Jasha Sep 30 '22 at 19:15