Data Version Control (DVC) is an open-source version control system for ML and data science projects. Use this tag for questions related to DVC usage and workflows.
Questions tagged [dvc]
138 questions
2
votes
1 answer
Reference custom params file in dvc template / dict unpacking
The examples for template referencing and dict unpacking assume the default params.yaml file.
So if for example I have the following params.yaml
group:
param_one: some_value
We can do
Template: python script.py --param_one…

Mark Loyman
- 1,983
- 1
- 14
- 23
2
votes
2 answers
Automate DVC authentication when using github actions
I'm using GitHub Actions to run some tests on every push and I need DVC. I'm trying to make this work with the runs-on: ubuntu-latest option but when I try to run it, the action get's stuck because it requires manual authentication. Is there a way…

WholesomeGhost
- 1,101
- 2
- 17
- 31
2
votes
1 answer
Delete cached data from DVC
I would like to be able to delete individual files or folders from the DVC cache, after they have been pulled with dvc pull, so they don't occupy space in local disk.
Let me make things more concrete and summarize the solutions I found so far.…

Jau A
- 395
- 1
- 10
2
votes
2 answers
wait for slurm job steps started in background by a separate program
In the following Slurm batch script, where program step_one and step_two are meant to run at the same time, the wait call is necessary so the job does not terminate before the job steps are done.
#!/bin/bash
#SBATCH --ntasks=2
srun --overlap -n1…

Ian
- 1,062
- 1
- 9
- 21
2
votes
1 answer
What DVC does when git merge is executed?
I have two git branches (master and develop). DVC maps a data folder in both of them. When I go into master and merging with develop is correct that DVC does not add any new file inside the data folder created in the develop branch but leaves the…

Will
- 1,619
- 5
- 23
2
votes
2 answers
Is there any way to log 'git hash' in hydra?
I want to control version of experiment configuration files with hydra and dvc without uploading original config files to git.
Hydra does control config, and dvc controls version. But Hydra does not specify which 'code version' is needed to…

이준혁
- 267
- 4
- 14
2
votes
0 answers
DVC pull returns ERROR: configuration error - Failed to authenticate GDrive remote: name: drive version: v2
I ran this github actions workflow with several variations, but I cannot pull the data from DVC.
name: auto-testing
on: [push]
jobs:
run:
runs-on: [ubuntu-latest]
steps:
- uses: actions/checkout@v2
- uses:…

Soerendip
- 7,684
- 15
- 61
- 128
2
votes
1 answer
dvc.exceptions.CyclicGraphError: Pipeline has a cycle involving: load_extract_save
stages:
load_extract_save:
cmd: python src/stage_01_load_extract_save.py --config=config/config.yaml
deps:
- config/config.yaml
- src/utils/all_utils.py
- src/stage_01_load_extract_save.py
- artifacts/data
…

Padmanabhan Poraiyar
- 23
- 4
2
votes
1 answer
How does one add individual files with DVC?
Suppose I run the following commands:
# set up DVC
mkdir foo
cd foo && git init
dvc init
git add * && git commit -m "dvc init"
# make a data file
mkdir -p bar/biz
touch bar/biz/boz
# add the data file
dvc add bar/biz/boz
And DVC outputs the…

Chris
- 28,822
- 27
- 83
- 158
2
votes
1 answer
How do launch experiments in DVC?
I want to launch some experiments in DVC. But when I set values of experiment parameters, DVC deletes file 'params.yaml', and experiment doesn't set in queue.
Simplified code for example:
Python file 'test.py':
import numpy as np
import json
import…

Alimagadov K.
- 175
- 2
- 7
2
votes
1 answer
Error with DVC on Google Colab - dvc.scm.CloneError: Failed to clone repo
I'm having a problem trying to run "dvc pull" on Google Colab. I have two repositories (let's call them A and B) where repository A is for my machine learning codes and repository B is for my dataset.
I've successfully pushed my dataset to…

Zharfan Zahisham
- 143
- 1
- 9
2
votes
1 answer
DVC Experiment management workflow
I'm struggling with the DVC experiment management. Suppose the following scenario:
I have params.yaml file:
recommendations:
k: 66
q: 5
I run the experiment with dvc exp run -n exp_66, and then I do dvc exp push origin exp_66. After this, I…

kevin_was_here
- 77
- 7
2
votes
0 answers
DVC Push KeyError fileSize
I've added a large list of CSV files to my dvc repository but when I try to do DVC push it complains with
ERROR: unexpected error - KeyError('fileSize')
Edit
So searching around it seem that it might help to include the verbose log with regards to…

jhylands
- 984
- 8
- 16
2
votes
1 answer
Multiple users in DVC
I would like to ask if it is possible to use DVC with several accounts on the same machine. At the moment, all commands (dvc pull, dvc push, ...) are executed under my name. But after several people joined this project too, I do not want them to…

neringab
- 613
- 1
- 7
- 16
2
votes
0 answers
How to track the big data stored in Gdrive through DVC?
I am currently working on the ML project and the data size is around 10 GB. The data I stored in google drive. Its impossible for me to download it on my local machine. So, how to use the DVC (data version control) to track that data? Thank you in…

dave vedant
- 329
- 2
- 4
- 11