Questions tagged [dvc]

Data Version Control (DVC) is an open-source version control system for ML and data science projects. Use this tag for questions related to DVC usage and workflows.

138 questions
0
votes
0 answers

How to debug DVC in pycharm

I know Kedro allows debugging using Pycharm. Is there some way to wrap DVC terminal commands to allow stepping through code in the Pycharm debugger?
GlaceCelery
  • 921
  • 1
  • 13
  • 30
0
votes
1 answer

DVC - apply a pipeline to multiple source files

I have 100 source files to push through a pipeline. Is there a pattern for looping over a list, generating a new source and output file name each time for the same pipeline? Thanks!
GlaceCelery
  • 921
  • 1
  • 13
  • 30
0
votes
1 answer

"SCM error" when using dvc.api.get_url() to access S3 remote repository

I have a remote repository that I want to use with DVC. I want to access my files through DVC in Python using the dvc.api module. Here's the code I'm using: import dvc.api path = 'data/test.csv' repo = 's3://xxx/DVC_test/' version = 'v1' data_url…
0
votes
0 answers

Reproducible version of `git clone` to utilize docker build cache

I try to automate the process of checking out a git repository and building a docker image based on the Dockerfile and the files in the repo. Inside the image, I want to use DVC. Assume something like git clone…
sauerburger
  • 4,569
  • 4
  • 31
  • 42
0
votes
1 answer

DVC : failed to pull data from the cloud

I am trying to implement CI/CD for my ml model and I am using DVC for that. This is my yaml file name: train-model on: push: paths: - "data/**" - "src/**" - "params.yaml" - "dvc.*" jobs: train-model: runs-on:…
Gops
  • 1
  • 2
0
votes
1 answer

How to change and set up dvc config file so can manage the weights file?

Im currently managing the weight file by dvc. This is the dvc file: outs: - md5: b2c80e73090cae013eef778308faf8fc size: 202055043 path: checkpoint_1.pth.tar So I would like to create a dvc config file so that when it's necessary to change the…
Bob9710
  • 205
  • 3
  • 15
0
votes
0 answers

Can we connect oracle database with DVC ? and if yes then how?

I was trying to connect dvc with oracle database but unable to do it. So, Please can anyone help me with that.
0
votes
1 answer

How to setup a DVC shared cache without git repository between different services in minikube?

I need to setup a shared cache in minikube in such a way that different services can use that cache to pull and update DVC models and data needed for training Machine Learning models. The structure of the project is to use 1 pod to periodically…
LinkCoder
  • 298
  • 2
  • 12
0
votes
0 answers

How can dvc pipeline recognize when to use encoding pipeline while new data added for the modeling?

I have created separate pipelines for feature encoding and feature scaling in DVC. Now, when I will input new data from my flask API, how these DVC pipelines will automatically run and encode and scale data for modelling?
DRP
  • 9
  • 1
0
votes
2 answers

Why I got an invalid bucket name error using dvc mlflow on macos

Could anyone tell what's the reason for error: botocore.exceptions.ParamValidationError: Parameter validation failed: Invalid bucket name "": Bucket name must match the regex "^[a-zA-Z0-9.-_]{1,255}$" or be an ARN matching the regex…
Vladimir
  • 21
  • 1
0
votes
1 answer

What is the advantage of DVC, git-annex, git-lfs for large or binary files over git?

If I have different versions of a file, e.g., in different branches, and I try to reconcile those, git will has great mechanisms for that. However, in order to do the reconciliations, e.g., in a merge, git requires access to the "inside" of the…
Make42
  • 12,236
  • 24
  • 79
  • 155
0
votes
1 answer

How to merge data (CSV) files from multiple branches (Git and DVC)?

Background: In my projects I'm using GIT and DVC to keep track of versions: GIT - only for source codes DVC - for dataset, model objects and outputs I'm testing different approaches in separate branches,…
Tomek Tarczynski
  • 2,785
  • 8
  • 37
  • 43
0
votes
1 answer

DVC experiment is restoring deleted files

I am using DVC to run experiments in my project using dvc exp run Now when i make changes to a file(example train.py) and run "dvc exp run" everything goes well, but my problem is that when making changes by deleting a file(example train.py or an…
Masmoudi
  • 123
  • 1
  • 9
0
votes
0 answers

Data Version Control (dvc) cannot push to remote storage because querying cache

I am setting up a remote storage with dvc using webdavs I can connect to the remote storage from Finder. I added the new remote and I see it when I check (dvc remote list) But when I try to push data, I have the request for password with 0% Querying…
Amon Bazongo
  • 315
  • 4
  • 10
-1
votes
0 answers

Failed to transfer ERROR in dvc push --all-cmmits command

Following is the error that is produced whenever I run dv push --all-commits. I have been following official docs of dagshub but no wonder why its keep failing despite all the steps followed 2023-08-16 23:44:26,733 DEBUG: v3.15.3 (pip), CPython…
1 2 3
9
10