In DVC one may define pipelines. In Unix, one typically does not work at the root level. Further, DVC expects files to be inside the git repository.
So, this seems like a typical problem.
Suppose I have the following:
/home/user/project/content-folder/data/data-type/cfg.json
/home/user/project/content-folder/app/foo.py
Git starts at /home/user/project/
cd ~/project/content-folder/data/data-type
../../app/foo.py do-this --with cfg.json --dest $(pwd)
Seems reasonable to me: the script takes a configuration, which is stored in a particular location, runs it against some encapsulated functionality, and outputs it to the destination using an absolute path.
The default behavior of --dest
is to output to the current working directory. This seems like another reasonable default.
Next, I go to configure the params.yaml
file for dvc
, and I am immediately confusing and unsure what is going to happen. I write:
foodoo:
params: do-this --with ????/cfg.json --dest ????
What I want to write (and would in a shell script):
#!/usr/bin/env bash
origin:=$(git rev-parse --show-toplevel)
verb=do-this
params=--with $(origin)/content-folder/data/data-type/cfg.json --dest $(origin)/content-folder/data/data-type
But, in DVC, the pathing seems to be implicit, and I do not know where to start as either:
- DVC will calculate the path to my script locally
- Not calculate the path to my script locally
Which is fine -- I can discover that. But I am reasonably sure that DVC will absolutely not prefix the directory and file params in my params.yaml with the path to my project.
How does one achieve path control that does not assume a fixed project location, like I would in BASH?