Data Version Control (DVC) is an open-source version control system for ML and data science projects. Use this tag for questions related to DVC usage and workflows.
Questions tagged [dvc]
138 questions
2
votes
2 answers
ERROR: bad DVC file name 'my_server\models\*.tar.gz.dvc' is git-ignored
I just started with DVC. I have a git repo in which there are heavy models that i want to push to dvc. So I initialized the dvc by
dvc init
and then configured the bucket
dvc remote add -d storage s3://mybucket/dvcstore
Now there is /models…

Sunil Garg
- 14,608
- 25
- 132
- 189
2
votes
2 answers
problems installing a DVC lower version [0.9.4]
I need to install an older version of DVC, namely 0.9.4, in a Python virtual environment.
I used the command:
pip install dvc==0.9.4
Everything seemed to work fine. However, when I try to run a dvc pull command, I get the following error:
Traceback…

lbrandao
- 95
- 10
2
votes
1 answer
dvc push, change the names of files on the remote storage
I'm working on a project with DVC (Data Version Control), when I push files in my remote storage, the name of the files are changed. How I can conserve the names?

Jorge
- 43
- 1
- 5
2
votes
1 answer
Use parameters from additional configs in dvc 2.0
Using dvc version 2.0.18 and python 3.9.2 I want to use parameters defined in a config file different from params.yaml when configuring the parameters of the stages in dvc.yaml. However, it does not work as I expected.
MWE:
Git repo + dvc…

ppmt
- 167
- 7
2
votes
1 answer
DVC - make scheduled csv dumps
Suppose we got some database (any database, that supports csv dumping), collecting raw data in real time for further usage in ML.
On the other side, we got DVC, that can work with csv files.
I want to organize a scheduled run of stored SELECT to…

franz-german
- 55
- 1
- 1
- 6
2
votes
1 answer
DVC Files Incomplete
I'm in a team using dvc with git to version-control data files. We are using dvc 1.3.1, with the an S3 bucket remote. I'm getting this error when executing dvc fetch or dvc pull on a colleague's branch:
ERROR: failed to fetch data from the cloud -…

Dr. Andrew
- 2,532
- 3
- 26
- 42
2
votes
0 answers
problem with dvc import-url from google spreadsheet export
I'm in the process of converting a Makefile-based data workflow to dvc. I have a Google spreadsheet that I'm using in a data workflow to make it easy to update a few things in a makeshift database. Currently this works with something like this:
#…

dino
- 3,093
- 4
- 31
- 50
2
votes
1 answer
SSH automation in jenkins
So I've been working on the automation of processes and it includes fetching data from an external source through DVC(data version control) for which I am using SSH client to pull and push changes. For automation, I'm using Jenkins and the problem…

Farrukh
- 153
- 1
- 8
2
votes
1 answer
Control tracked version of external dependency
I am trying to set up a DVC repository for machine learning data with different tagged versions of the dataset. I do this with something like:
$ cd /raid/ml_data # folder on a data drive
$ git init
$ dvc init
$ [add data]
$ [commit to dvc, git]
$…

Engineero
- 12,340
- 5
- 53
- 75
2
votes
2 answers
Initializing a DVC repository throws an error
I'm trying to use DVC and I'm following this kaggle tutorial as explained in this notebook . Whenever I try to use the command ! dvc init, I get the following error:
'dvc' is not recognized as an internal or external command,
operable program or…

Mayank Khanna
- 149
- 3
- 9
2
votes
0 answers
Data version control (DVC) commands not working ---> TypeError: public() got an unexpected keyword argument 'SEP'
All of a sudden, dvc has stopped functioning.
Any command typed fails and throws an exception.
example. dvc remote list results in -
Traceback (most recent call last):
File "/home/dev2/.local/bin/dvc", line 5, in
from dvc.main import…

Ronnie
- 483
- 1
- 5
- 18
2
votes
1 answer
DVC dependencies for derived data without imports
I am new to DVC, and so far I like what I see. But possibly my question is fairly easy to answer.
My question: how do we correctly track the dependencies to files in an original hugedatarepo (lets assume that this can also change) in a derivedData…

tePer
- 301
- 2
- 6
2
votes
1 answer
Data version control (DVC) edit files in place results in cyclic dependency
we have a larger dataset and have several preprocessing scripts.
These scripts alter data in place.
It seems when I try to register it with dvc run it complains about cyclic dependencies (input is the same as output).
I would assume this is a very…

nikste
- 140
- 9
2
votes
1 answer
dvc gc and files in remote cache
dvc documentation for dvc gc command states, the -r option indicates "Remote storage to collect garbage in" but I'm not sure if I understand it correctly. For example I execute this command:
dvc gc -r myremote
What exactly happens if I execute this…

NShiny
- 1,046
- 1
- 10
- 19
2
votes
2 answers
"dvc push" after several local commits
I work on a project with DVC (Data version control). Let's say I make a lot of local commits. Something like this:
# make changes for experiment 1
dvc add my_data_file
git add my_data_file.dvc
git commit -m "Experiment 1"
# make changes for…

NShiny
- 1,046
- 1
- 10
- 19