Data Version Control (DVC) is an open-source version control system for ML and data science projects. Use this tag for questions related to DVC usage and workflows.
Questions tagged [dvc]
138 questions
3
votes
1 answer
dataset versioning in vertex AI
I have my machine learning datasets in DVC. It's relatively simple to version the dataset with DVC + git.
Now, as all of the training and deployment have been moved to Vertex AI, I'm trying to version my datasets.
My dataset changes a lot, for…

Zabir Al Nazi
- 10,298
- 4
- 33
- 60
3
votes
0 answers
Second dvc push on AWS Batch using IAM role gets "Unable to locate credentials"
I'm running a job on AWS Batch, and this job prepares some data and versions it using dvc. Secondly, the job does some transformation generating new data, and it should save this new data using dvc again. Also, in this case, i'm setting a…

Tiago Albineli Motta
- 119
- 1
- 5
3
votes
1 answer
git-ignore dvc.lock in repositories where only the DVC pipelines are used
I want to use the pipeline functionality of dvc in a git repository. The data is managed otherwise and should not be versioned by dvc. The only functionality which is needed is that dvc reproduces the needed steps of the pipeline when dvc repro is…

ppmt
- 167
- 7
3
votes
1 answer
Failed to pull existing files from SSH DVC Remote
After running dvc push data.csv (to ssh-remote), when i try to dvc-pull the same file on another machine from the same remote, it won't get pulled. Below are the logs and the error:
2021-01-21 22:17:26,643 DEBUG: checking if…

Hlib Babii
- 599
- 1
- 7
- 24
3
votes
1 answer
Data Version Control: Absolute Paths and Project Paths in the Pipeline Parameters?
In DVC one may define pipelines. In Unix, one typically does not work at the root level. Further, DVC expects files to be inside the git repository.
So, this seems like a typical problem.
Suppose I have the…

Chris
- 28,822
- 27
- 83
- 158
3
votes
1 answer
Revert a dvc remove -p command
I have just removed a DVC tracking file by mistake using the command dvc remove training_data.dvc -p, which led to all my training dataset gone completely. I know in Git, we can easily revert a deleted branch based on its hash. Does anyone know how…

nguyendhn
- 423
- 1
- 6
- 19
3
votes
2 answers
Getting this weird error when trying to run DVC pull
I am new to using DVC and just exploring it. I am trying to pull data from s3 that was pushed by another person on my team. But I am getting this error:
WARNING: Some of the cache files do not exist neither locally nor on remote. Missing cache…

Achilleus
- 1,824
- 3
- 20
- 40
3
votes
2 answers
Unable to ignore .DS_Store files in DVC
I use DVC to track my media files. I use MacOS and I want".DS_Store" files to be ignored by DVC. According to DVC documentation I can achieve it with .dvcignore. I created .dvcignore file with ".DS_Store" rule. However every time ".DS_Store" is…

NShiny
- 1,046
- 1
- 10
- 19
2
votes
2 answers
Shell script “dvc pull” not working at Streamlit server
In my Streamlit app.py file, I used the code os.system("dvc pull") to load a .csv data file (labeled_projects.csv) from my Google service account (Google Drive), and it has been working well since I deployed it a few months ago. The code itself is…

Tony Peng
- 579
- 5
- 10
2
votes
0 answers
Unable to pull dvc data from remote path. Keeps giving "Everything is up to date"
I am trying to pull a csv file which was pushed to dvc via a gh action (The file was created, and pushed via gh action).
I have the .dvc version file of the csv in local (I've made gh actions echo the file contents). However, when I am trying to…

Dawny33
- 10,543
- 21
- 82
- 134
2
votes
1 answer
ERROR: unexpected error - no such column: FALSE on dvc pull
dvc pull command fails with ERROR: unexpected error - no such column: FALSE
What could be the possible cause?
dvc pull command is executed semi-regularly. No changes to the model, config or model storage were made since last successful execution.

huehue
- 50
- 5
2
votes
1 answer
DVC experiments with large data in kubernetes
We have a Computer Vision project. Raw Data stores in S3. Label Team every day send new increment of labeled data. We want to automize train process with these new data. We use dvc for reproducing pipelines and ML Flow for logging and deploying…

RazDva
- 21
- 3
2
votes
0 answers
DVC Error: Failed to import 'filepath' because 'filepath' does not exist
I'm trying to import my model from my repository A (ML repo which contains a model file) to my repository B (program repo which will use the model file). I have successfully added the model into repository A and pushed the actual model and its…

Zharfan Zahisham
- 143
- 1
- 9
2
votes
1 answer
What are the file name rules in DVC and can they be controlled via config?
Use case: id-10T proofing data removal with a zero-trust command.
I am looking through the documentation and I don't see clear cut guidelines for what can possibly go into DVC as a file name.
Right now, I know that DVC implements some name…

Chris
- 28,822
- 27
- 83
- 158
2
votes
2 answers
Downloading data from azure storage explorer using dvc
I have an azure blob container with data which I have not uploaded myself. The data is not locally on my computer.
Is it possible to use dvc to download the data to my computer when I haven’t uploaded the data with dvc? Is it possible with dvc…

Fyrstenberg
- 43
- 3