Data Version Control (DVC) is an open-source version control system for ML and data science projects. Use this tag for questions related to DVC usage and workflows.
Questions tagged [dvc]
138 questions
0
votes
0 answers
How to debug DVC in pycharm
I know Kedro allows debugging using Pycharm. Is there some way to wrap DVC terminal commands to allow stepping through code in the Pycharm debugger?

GlaceCelery
- 921
- 1
- 13
- 30
0
votes
1 answer
DVC - apply a pipeline to multiple source files
I have 100 source files to push through a pipeline. Is there a pattern for looping over a list, generating a new source and output file name each time for the same pipeline?
Thanks!

GlaceCelery
- 921
- 1
- 13
- 30
0
votes
1 answer
"SCM error" when using dvc.api.get_url() to access S3 remote repository
I have a remote repository that I want to use with DVC. I want to access my files through DVC in Python using the dvc.api module. Here's the code I'm using:
import dvc.api
path = 'data/test.csv'
repo = 's3://xxx/DVC_test/'
version = 'v1'
data_url…

Arseny Sokolov
- 65
- 8
0
votes
0 answers
Reproducible version of `git clone` to utilize docker build cache
I try to automate the process of checking out a git repository and building a docker image based on the Dockerfile and the files in the repo. Inside the image, I want to use DVC. Assume something like
git clone…

sauerburger
- 4,569
- 4
- 31
- 42
0
votes
1 answer
DVC : failed to pull data from the cloud
I am trying to implement CI/CD for my ml model and I am using DVC for that.
This is my yaml file
name: train-model
on:
push:
paths:
- "data/**"
- "src/**"
- "params.yaml"
- "dvc.*"
jobs:
train-model:
runs-on:…

Gops
- 1
- 2
0
votes
1 answer
How to change and set up dvc config file so can manage the weights file?
Im currently managing the weight file by dvc. This is the dvc file:
outs:
- md5: b2c80e73090cae013eef778308faf8fc
size: 202055043
path: checkpoint_1.pth.tar
So I would like to create a dvc config file so that when it's necessary to change the…

Bob9710
- 205
- 3
- 15
0
votes
0 answers
Can we connect oracle database with DVC ? and if yes then how?
I was trying to connect dvc with oracle database but unable to do it. So, Please can anyone help me with that.
0
votes
1 answer
How to setup a DVC shared cache without git repository between different services in minikube?
I need to setup a shared cache in minikube in such a way that different services can use that cache to pull and update DVC models and data needed for training Machine Learning models. The structure of the project is to use 1 pod to periodically…

LinkCoder
- 298
- 2
- 12
0
votes
0 answers
How can dvc pipeline recognize when to use encoding pipeline while new data added for the modeling?
I have created separate pipelines for feature encoding and feature scaling in DVC.
Now, when I will input new data from my flask API, how these DVC pipelines will automatically run and encode and scale data for modelling?

DRP
- 9
- 1
0
votes
2 answers
Why I got an invalid bucket name error using dvc mlflow on macos
Could anyone tell what's the reason for error:
botocore.exceptions.ParamValidationError: Parameter validation failed:
Invalid bucket name "": Bucket name must match the regex "^[a-zA-Z0-9.-_]{1,255}$" or be an ARN matching the regex…

Vladimir
- 21
- 1
0
votes
1 answer
What is the advantage of DVC, git-annex, git-lfs for large or binary files over git?
If I have different versions of a file, e.g., in different branches, and I try to reconcile those, git will has great mechanisms for that. However, in order to do the reconciliations, e.g., in a merge, git requires access to the "inside" of the…

Make42
- 12,236
- 24
- 79
- 155
0
votes
1 answer
How to merge data (CSV) files from multiple branches (Git and DVC)?
Background: In my projects I'm using GIT and DVC to keep track of versions:
GIT - only for source codes
DVC - for dataset, model objects and outputs
I'm testing different approaches in separate branches,…

Tomek Tarczynski
- 2,785
- 8
- 37
- 43
0
votes
1 answer
DVC experiment is restoring deleted files
I am using DVC to run experiments in my project using
dvc exp run
Now when i make changes to a file(example train.py) and run "dvc exp run" everything goes well,
but my problem is that when making changes by deleting a file(example train.py or an…

Masmoudi
- 123
- 1
- 9
0
votes
0 answers
Data Version Control (dvc) cannot push to remote storage because querying cache
I am setting up a remote storage with dvc using webdavs
I can connect to the remote storage from Finder.
I added the new remote and I see it when I check (dvc remote list)
But when I try to push data, I have the request for password with 0% Querying…

Amon Bazongo
- 315
- 4
- 10
-1
votes
0 answers
Failed to transfer ERROR in dvc push --all-cmmits command
Following is the error that is produced whenever I run dv push --all-commits. I have been following official docs of dagshub but no wonder why its keep failing despite all the steps followed
2023-08-16 23:44:26,733 DEBUG: v3.15.3 (pip), CPython…

Roshaan Zafar
- 31
- 5