Questions tagged [dvc]

Data Version Control (DVC) is an open-source version control system for ML and data science projects. Use this tag for questions related to DVC usage and workflows.

138 questions
2
votes
2 answers

ERROR: bad DVC file name 'my_server\models\*.tar.gz.dvc' is git-ignored

I just started with DVC. I have a git repo in which there are heavy models that i want to push to dvc. So I initialized the dvc by dvc init and then configured the bucket dvc remote add -d storage s3://mybucket/dvcstore Now there is /models…
Sunil Garg
  • 14,608
  • 25
  • 132
  • 189
2
votes
2 answers

problems installing a DVC lower version [0.9.4]

I need to install an older version of DVC, namely 0.9.4, in a Python virtual environment. I used the command: pip install dvc==0.9.4 Everything seemed to work fine. However, when I try to run a dvc pull command, I get the following error: Traceback…
lbrandao
  • 95
  • 10
2
votes
1 answer

dvc push, change the names of files on the remote storage

I'm working on a project with DVC (Data Version Control), when I push files in my remote storage, the name of the files are changed. How I can conserve the names?
Jorge
  • 43
  • 1
  • 5
2
votes
1 answer

Use parameters from additional configs in dvc 2.0

Using dvc version 2.0.18 and python 3.9.2 I want to use parameters defined in a config file different from params.yaml when configuring the parameters of the stages in dvc.yaml. However, it does not work as I expected. MWE: Git repo + dvc…
ppmt
  • 167
  • 7
2
votes
1 answer

DVC - make scheduled csv dumps

Suppose we got some database (any database, that supports csv dumping), collecting raw data in real time for further usage in ML. On the other side, we got DVC, that can work with csv files. I want to organize a scheduled run of stored SELECT to…
franz-german
  • 55
  • 1
  • 1
  • 6
2
votes
1 answer

DVC Files Incomplete

I'm in a team using dvc with git to version-control data files. We are using dvc 1.3.1, with the an S3 bucket remote. I'm getting this error when executing dvc fetch or dvc pull on a colleague's branch: ERROR: failed to fetch data from the cloud -…
Dr. Andrew
  • 2,532
  • 3
  • 26
  • 42
2
votes
0 answers

problem with dvc import-url from google spreadsheet export

I'm in the process of converting a Makefile-based data workflow to dvc. I have a Google spreadsheet that I'm using in a data workflow to make it easy to update a few things in a makeshift database. Currently this works with something like this: #…
dino
  • 3,093
  • 4
  • 31
  • 50
2
votes
1 answer

SSH automation in jenkins

So I've been working on the automation of processes and it includes fetching data from an external source through DVC(data version control) for which I am using SSH client to pull and push changes. For automation, I'm using Jenkins and the problem…
Farrukh
  • 153
  • 1
  • 8
2
votes
1 answer

Control tracked version of external dependency

I am trying to set up a DVC repository for machine learning data with different tagged versions of the dataset. I do this with something like: $ cd /raid/ml_data # folder on a data drive $ git init $ dvc init $ [add data] $ [commit to dvc, git] $…
Engineero
  • 12,340
  • 5
  • 53
  • 75
2
votes
2 answers

Initializing a DVC repository throws an error

I'm trying to use DVC and I'm following this kaggle tutorial as explained in this notebook . Whenever I try to use the command ! dvc init, I get the following error: 'dvc' is not recognized as an internal or external command, operable program or…
Mayank Khanna
  • 149
  • 3
  • 9
2
votes
0 answers

Data version control (DVC) commands not working ---> TypeError: public() got an unexpected keyword argument 'SEP'

All of a sudden, dvc has stopped functioning. Any command typed fails and throws an exception. example. dvc remote list results in - Traceback (most recent call last): File "/home/dev2/.local/bin/dvc", line 5, in from dvc.main import…
Ronnie
  • 483
  • 1
  • 5
  • 18
2
votes
1 answer

DVC dependencies for derived data without imports

I am new to DVC, and so far I like what I see. But possibly my question is fairly easy to answer. My question: how do we correctly track the dependencies to files in an original hugedatarepo (lets assume that this can also change) in a derivedData…
tePer
  • 301
  • 2
  • 6
2
votes
1 answer

Data version control (DVC) edit files in place results in cyclic dependency

we have a larger dataset and have several preprocessing scripts. These scripts alter data in place. It seems when I try to register it with dvc run it complains about cyclic dependencies (input is the same as output). I would assume this is a very…
nikste
  • 140
  • 9
2
votes
1 answer

dvc gc and files in remote cache

dvc documentation for dvc gc command states, the -r option indicates "Remote storage to collect garbage in" but I'm not sure if I understand it correctly. For example I execute this command: dvc gc -r myremote What exactly happens if I execute this…
NShiny
  • 1,046
  • 1
  • 10
  • 19
2
votes
2 answers

"dvc push" after several local commits

I work on a project with DVC (Data version control). Let's say I make a lot of local commits. Something like this: # make changes for experiment 1 dvc add my_data_file git add my_data_file.dvc git commit -m "Experiment 1" # make changes for…
NShiny
  • 1,046
  • 1
  • 10
  • 19