Data Version Control (DVC) is an open-source version control system for ML and data science projects. Use this tag for questions related to DVC usage and workflows.
Questions tagged [dvc]
138 questions
1
vote
1 answer
Adding files that rely on pipeline outputs
In my workflow, I do the following:
Acquire raw data (e.g. a video containing people)
Transform it (e.g. automatically extract all crops with faces)
Manually label them (e.g. identify the person in each crop). The labels are stored in json files…

Michael Litvin
- 3,976
- 1
- 34
- 40
1
vote
1 answer
How to track a folder again when used "git rm -rf --cached folder_name" : Error: The following paths are ignored by one of your .gitignore files
I wanted to un-track my git files so I put .dvc inside my .gitignore file, and run
git rm -rf --cached .dvc
and then committed.
I realised my mistake soon and then wanted to add the files again . I tried deleting the gitignore file, commit, make a…

Deshwal
- 3,436
- 4
- 35
- 94
1
vote
0 answers
Undo changes in pandas Dataframe(column drops, row drops, edits performed on a single cell)
I am currently working on developing a 'undo' operation for my interface that deals with changes performed on csv files. I want to provide an option for the user to revert the changes that he had done to the csv file, these changes include edit a…

cbac
- 11
- 2
1
vote
0 answers
How to resolve DVC Pull Error on Pycharm?
When I execute 'DVC Pull', I get the following error
> dvc pull ERROR: unexpected error - invalid syntax (tz.py, line 78)
> Traceback (most recent call last): File
>…
1
vote
0 answers
Is there an alternative to DVC pipelines to create a DAG which is also aware of inputs/outputs to nodes to cache results?
I recently started to use DVC pipelines to create DAG in my application. I work on Machine Learning projects, and I need to experiment a lot with different nodes of my system. For example:
Data preprocessing -> feature extraction -> model training…

Kıvanç Yüksel
- 701
- 7
- 17
1
vote
0 answers
DVC(Data Version Control) keeps stuck at "dvc add xxx" with "Collecting stages from the workspace" in the terminal?
I used : dvc[webhdfs]==2.9.3, installed by pip install dvc[webhdfs]
Then the repo is already cloned by git.
I have also typed : dvc remote add -d storage webhdfs://xxx/dvc and git add .dvc/config
But the command dvc add ./assets/xxx/* was still…

ZenMoore
- 11
- 4
1
vote
1 answer
DVC Shared Windows Directory Setup
I have one Linux machine and one Windows machine for developments. For data sharing, we have set up a shared Windows directory in another Windows machine, which both my Linux and Windows can access.
I am now using DVC for version control of the…

feelfree
- 11,175
- 20
- 96
- 167
1
vote
1 answer
Is the default DVC behavior to store connection data in git?
I've recently started to play with DVC, and I was a bit surprised to see the getting started docs are suggesting to store .dvc/config in git.
This seemed like a fine idea at first, but then I noticed that my Azure Blob Storage account (i.e. my Azure…

Vlad Iliescu
- 8,074
- 5
- 27
- 23
1
vote
3 answers
DVC - Forbidden: An error occurred (403) when calling the HeadObject operation
I just started with DVC. following are the steps I am doing to push my models on S3
Initialize
dvc init
Add bucket url
dvc remote add -d storage s3://mybucket/dvcstore
add some files
dvc add somefiles
Add aws keys
dvc remote modify storage…

Sunil Garg
- 14,608
- 25
- 132
- 189
1
vote
1 answer
dvc.api.read() raises an "UnicodeDecodeError"
I am trying to acess a DICOM file [image saved in the Digital Imaging and Communications in Medicine (DICOM) format]:
import dvc.api
path = 'dir/image.dcm'
remote = 'remote_name'
repo = 'git_repo'
mode = 'r'
data = dvc.api.read(path = path, remote…

lbrandao
- 95
- 10
1
vote
1 answer
DVC connect to Min.IO to access S3
What is the proper way to connect DVC to Min.IO that is connected to some buckets on S3.
AWS-S3(My_Bucket) > Min.io(MY_Bucket aliased as S3)
Right now i am accessing my bucket by using mc for example mc cp s3/my_bucket/datasets datasets to copy…

Niewasz Biznes
- 105
- 10
1
vote
0 answers
Forbidden: An error occurred (403) when calling the HeadObject operation:
my ~/.aws/credentials looks like
[default]
aws_access_key_id = XYZ
aws_secret_access_key = ABC
[testing]
source_profile = default
role_arn = arn:aws:iam::54:role/ad
I add my remote like
dvc remote add --local -v myremote…

Areza
- 5,623
- 7
- 48
- 79
1
vote
1 answer
How to access DVC-controlled files from Oracle?
I have been storing my large files in CLOBs within Oracle, but I am thinking of storing my large files in a shared drive, then having a column in Oracle contain pointers to the files. This would use DVC.
When I do this,
(a) are the paths in Oracle…

Justin
- 53
- 6
1
vote
1 answer
How do I specify encryption type when using s3remote for DVC
I have just started to explore DVC. I am trying with s3 as my DVC remote. I am getting
But when I run the dvc push command, I get the generic error saying
An error occurred (AccessDenied) when calling the PutObject operation: Access Denied
which…

Achilleus
- 1,824
- 3
- 20
- 40
0
votes
0 answers
config file error: expected 'url' for dictionary value @ data['remote']['origin']
I have installed git and DVC and I push my git to dagshub without any error, but I faced with error with dvc push.
I run this command dvc push --all-commits and I faced with this error:
ERROR: failed to push data to the cloud - config file error:…

ebi_d
- 1