Data Version Control (DVC) is an open-source version control system for ML and data science projects. Use this tag for questions related to DVC usage and workflows.
Questions tagged [dvc]
138 questions
4
votes
1 answer
How to add a file to a dvc-tracked folder without pulling the whole folder's content?
Let's say I am working inside a git/dvc repo. There is a folder data containing 100k small files. I track it with DVC as a single element, as recommended by the doc:
dvc add data
and because in my experience, DVC is kinda slow when tracking that…

pbbhu
- 41
- 1
- 2
4
votes
1 answer
Corrupted dvc.lock
I'm using DAGsHub storage as a remote and running into the following error message (when trying to DVC pull):
ERROR: Lockfile 'bias_tagging_model/dvc.lock' is corrupted.
I thought I might have messed something up, but when cloning the git repo…

DallasJamey
- 85
- 4
4
votes
1 answer
"dvc add -external S3://mybucket/data.csv" is failing with access error even after giving correct remote cache configurations
I'm using dvc and connecting to remote S3 for data track and also setting remote dvc cache in same remote S3.
Following is configure file,
[core]
remote = s3remote
[cache]
s3 = s3cache
[‘remote “s3remote”’]
url = S3://dvc-example
…

veeresh patil
- 1,168
- 1
- 11
- 18
4
votes
1 answer
Data Version Control with Google Drive Remote: "googleapiclient.errors.UnknownApiNameOrVersion: name: drive version: v2"
I'm trying to setup DVC with Google Drive storage as shown here. So far, I've been unsuccessful in pushing data to the remote. I tried both with and without the Google App setup.
After running a dvc push -v, the following exception is shown:
File…

amiasato
- 914
- 6
- 14
4
votes
1 answer
updating data in dvc registry from other projects
I have a couple of projects that are using and updating the same data sources. I recently learned about dvc's data registries, which sound like a great way of versioning data across these different projects (e.g. scrapers, computational…

dino
- 3,093
- 4
- 31
- 50
4
votes
1 answer
How do I unit test a function in the CI pipeline that uses model files that are not part of the git remote?
I am developing machine learning repositories that require fairly large trained model files to run. These files are not part of the git remote but is tracked by DVC and is saved in a separate remote storage. I am running into issues when I am trying…

Ananda
- 2,925
- 5
- 22
- 45
4
votes
2 answers
Undo 'dvc add' operation
I dvc add-ed a file I did not mean to add. I have not yet committed.
How do I undo this operation? In Git, you would do git rm --cached .
To be clear: I want to make DVC forget about the file, and I want the file to remain untouched in my…

shadowtalker
- 12,529
- 3
- 53
- 96
4
votes
1 answer
Expanding environment variables in the command part of a dvc run
Summary: I am trying to define a dvc step using dvc-run where the command depends on some environment variables (for instance $HOME). The problem is that when I'm defining the step on machine A, then the variable is expanded when stored in the .dvc…

Dror
- 12,174
- 21
- 90
- 160
4
votes
2 answers
Updating tracked dir in DVC
According to this tutorial when I update file I should remove file from under DVC control first (i.e. execute dvc unprotect .dvc or dvc remove .dvc) and then add it again via dvc add . However It's not clear if I should apply…

NShiny
- 1,046
- 1
- 10
- 19
4
votes
1 answer
Reading missing files to DVC
A ran into problem with DVC when some files are missing in remote. For example when I execute dvc pull I get the output
[##############################] 100% Analysing status.
WARNING: Cache 'c31bcdd6910977a0e3a86446f2f3bdaa' not found. File…

NShiny
- 1,046
- 1
- 10
- 19
4
votes
1 answer
Resolving paths in mingw fails with Data Version Control
I am following the tutorial about Data Version Control using mingw32 on Windows 7.
I am getting very strange error when I try to use run:
$ dvc run -v echo "hello"
Debug: updater is not old enough to check for updates
Debug: PRAGMA…

hans
- 1,043
- 12
- 33
3
votes
2 answers
DVC checkout without Git
I am using DVC for data version control in machine learning projects. Typically, switching between versions of data is managed to done by checkout git branches, commits, or tags to get appropriate *.dvc files that represent data checksum, then run…

TaQuangTu
- 2,155
- 2
- 16
- 30
3
votes
1 answer
How to add to a DVC stage outputs already tracked by DVC?
In my project, I already have some files tracked by DVC that I added with dvc add. And now I want to create stages using thses files as outputs and dependencies, but when I try to create a stage I get an error that says ERROR: output '[FILE NAME]'…

Aymen
- 98
- 6
3
votes
1 answer
Not able to update experiment metrics from iterative.ai studio
I have DVC and gitlab-ci integrated using CML and with studio as well. But whenever I run an experiment from studio dashboard, the new experiment appears on the dashboard but metrics don't get updated in git and thus in studio dashboard as well. But…

Shabbir Bawaji
- 115
- 1
- 1
- 8
3
votes
1 answer
DVC | ERROR: unexpected error - _register_s3_control_events() takes 2 positional arguments but 6 were given
Only as of recently, have I had this error with DVC.
Tracback:
(venv) me@ubuntu-pcs:~/PycharmProjects/project$ dvc push
ERROR: unexpected error - _register_s3_control_events() takes 2 positional arguments but 6 were given …

DanielBell99
- 896
- 5
- 25
- 57