Questions tagged [dvc]

Data Version Control (DVC) is an open-source version control system for ML and data science projects. Use this tag for questions related to DVC usage and workflows.

138 questions
2
votes
1 answer

Reference custom params file in dvc template / dict unpacking

The examples for template referencing and dict unpacking assume the default params.yaml file. So if for example I have the following params.yaml group: param_one: some_value We can do Template: python script.py --param_one…
Mark Loyman
  • 1,983
  • 1
  • 14
  • 23
2
votes
2 answers

Automate DVC authentication when using github actions

I'm using GitHub Actions to run some tests on every push and I need DVC. I'm trying to make this work with the runs-on: ubuntu-latest option but when I try to run it, the action get's stuck because it requires manual authentication. Is there a way…
WholesomeGhost
  • 1,101
  • 2
  • 17
  • 31
2
votes
1 answer

Delete cached data from DVC

I would like to be able to delete individual files or folders from the DVC cache, after they have been pulled with dvc pull, so they don't occupy space in local disk. Let me make things more concrete and summarize the solutions I found so far.…
Jau A
  • 395
  • 1
  • 10
2
votes
2 answers

wait for slurm job steps started in background by a separate program

In the following Slurm batch script, where program step_one and step_two are meant to run at the same time, the wait call is necessary so the job does not terminate before the job steps are done. #!/bin/bash #SBATCH --ntasks=2 srun --overlap -n1…
Ian
  • 1,062
  • 1
  • 9
  • 21
2
votes
1 answer

What DVC does when git merge is executed?

I have two git branches (master and develop). DVC maps a data folder in both of them. When I go into master and merging with develop is correct that DVC does not add any new file inside the data folder created in the develop branch but leaves the…
Will
  • 1,619
  • 5
  • 23
2
votes
2 answers

Is there any way to log 'git hash' in hydra?

I want to control version of experiment configuration files with hydra and dvc without uploading original config files to git. Hydra does control config, and dvc controls version. But Hydra does not specify which 'code version' is needed to…
이준혁
  • 267
  • 4
  • 14
2
votes
0 answers

DVC pull returns ERROR: configuration error - Failed to authenticate GDrive remote: name: drive version: v2

I ran this github actions workflow with several variations, but I cannot pull the data from DVC. name: auto-testing on: [push] jobs: run: runs-on: [ubuntu-latest] steps: - uses: actions/checkout@v2 - uses:…
Soerendip
  • 7,684
  • 15
  • 61
  • 128
2
votes
1 answer

dvc.exceptions.CyclicGraphError: Pipeline has a cycle involving: load_extract_save

stages: load_extract_save: cmd: python src/stage_01_load_extract_save.py --config=config/config.yaml deps: - config/config.yaml - src/utils/all_utils.py - src/stage_01_load_extract_save.py - artifacts/data …
2
votes
1 answer

How does one add individual files with DVC?

Suppose I run the following commands: # set up DVC mkdir foo cd foo && git init dvc init git add * && git commit -m "dvc init" # make a data file mkdir -p bar/biz touch bar/biz/boz # add the data file dvc add bar/biz/boz And DVC outputs the…
Chris
  • 28,822
  • 27
  • 83
  • 158
2
votes
1 answer

How do launch experiments in DVC?

I want to launch some experiments in DVC. But when I set values of experiment parameters, DVC deletes file 'params.yaml', and experiment doesn't set in queue. Simplified code for example: Python file 'test.py': import numpy as np import json import…
Alimagadov K.
  • 175
  • 2
  • 7
2
votes
1 answer

Error with DVC on Google Colab - dvc.scm.CloneError: Failed to clone repo

I'm having a problem trying to run "dvc pull" on Google Colab. I have two repositories (let's call them A and B) where repository A is for my machine learning codes and repository B is for my dataset. I've successfully pushed my dataset to…
2
votes
1 answer

DVC Experiment management workflow

I'm struggling with the DVC experiment management. Suppose the following scenario: I have params.yaml file: recommendations: k: 66 q: 5 I run the experiment with dvc exp run -n exp_66, and then I do dvc exp push origin exp_66. After this, I…
2
votes
0 answers

DVC Push KeyError fileSize

I've added a large list of CSV files to my dvc repository but when I try to do DVC push it complains with ERROR: unexpected error - KeyError('fileSize') Edit So searching around it seem that it might help to include the verbose log with regards to…
jhylands
  • 984
  • 8
  • 16
2
votes
1 answer

Multiple users in DVC

I would like to ask if it is possible to use DVC with several accounts on the same machine. At the moment, all commands (dvc pull, dvc push, ...) are executed under my name. But after several people joined this project too, I do not want them to…
neringab
  • 613
  • 1
  • 7
  • 16
2
votes
0 answers

How to track the big data stored in Gdrive through DVC?

I am currently working on the ML project and the data size is around 10 GB. The data I stored in google drive. Its impossible for me to download it on my local machine. So, how to use the DVC (data version control) to track that data? Thank you in…
dave vedant
  • 329
  • 2
  • 4
  • 11