Questions tagged [dvc]

Data Version Control (DVC) is an open-source version control system for ML and data science projects. Use this tag for questions related to DVC usage and workflows.

138 questions
1
vote
1 answer

Adding files that rely on pipeline outputs

In my workflow, I do the following: Acquire raw data (e.g. a video containing people) Transform it (e.g. automatically extract all crops with faces) Manually label them (e.g. identify the person in each crop). The labels are stored in json files…
Michael Litvin
  • 3,976
  • 1
  • 34
  • 40
1
vote
1 answer

How to track a folder again when used "git rm -rf --cached folder_name" : Error: The following paths are ignored by one of your .gitignore files

I wanted to un-track my git files so I put .dvc inside my .gitignore file, and run git rm -rf --cached .dvc and then committed. I realised my mistake soon and then wanted to add the files again . I tried deleting the gitignore file, commit, make a…
Deshwal
  • 3,436
  • 4
  • 35
  • 94
1
vote
0 answers

Undo changes in pandas Dataframe(column drops, row drops, edits performed on a single cell)

I am currently working on developing a 'undo' operation for my interface that deals with changes performed on csv files. I want to provide an option for the user to revert the changes that he had done to the csv file, these changes include edit a…
cbac
  • 11
  • 2
1
vote
0 answers

How to resolve DVC Pull Error on Pycharm?

When I execute 'DVC Pull', I get the following error > dvc pull ERROR: unexpected error - invalid syntax (tz.py, line 78) > Traceback (most recent call last): File >…
1
vote
0 answers

Is there an alternative to DVC pipelines to create a DAG which is also aware of inputs/outputs to nodes to cache results?

I recently started to use DVC pipelines to create DAG in my application. I work on Machine Learning projects, and I need to experiment a lot with different nodes of my system. For example: Data preprocessing -> feature extraction -> model training…
1
vote
0 answers

DVC(Data Version Control) keeps stuck at "dvc add xxx" with "Collecting stages from the workspace" in the terminal?

I used : dvc[webhdfs]==2.9.3, installed by pip install dvc[webhdfs] Then the repo is already cloned by git. I have also typed : dvc remote add -d storage webhdfs://xxx/dvc and git add .dvc/config But the command dvc add ./assets/xxx/* was still…
ZenMoore
  • 11
  • 4
1
vote
1 answer

DVC Shared Windows Directory Setup

I have one Linux machine and one Windows machine for developments. For data sharing, we have set up a shared Windows directory in another Windows machine, which both my Linux and Windows can access. I am now using DVC for version control of the…
feelfree
  • 11,175
  • 20
  • 96
  • 167
1
vote
1 answer

Is the default DVC behavior to store connection data in git?

I've recently started to play with DVC, and I was a bit surprised to see the getting started docs are suggesting to store .dvc/config in git. This seemed like a fine idea at first, but then I noticed that my Azure Blob Storage account (i.e. my Azure…
Vlad Iliescu
  • 8,074
  • 5
  • 27
  • 23
1
vote
3 answers

DVC - Forbidden: An error occurred (403) when calling the HeadObject operation

I just started with DVC. following are the steps I am doing to push my models on S3 Initialize dvc init Add bucket url dvc remote add -d storage s3://mybucket/dvcstore add some files dvc add somefiles Add aws keys dvc remote modify storage…
Sunil Garg
  • 14,608
  • 25
  • 132
  • 189
1
vote
1 answer

dvc.api.read() raises an "UnicodeDecodeError"

I am trying to acess a DICOM file [image saved in the Digital Imaging and Communications in Medicine (DICOM) format]: import dvc.api path = 'dir/image.dcm' remote = 'remote_name' repo = 'git_repo' mode = 'r' data = dvc.api.read(path = path, remote…
lbrandao
  • 95
  • 10
1
vote
1 answer

DVC connect to Min.IO to access S3

What is the proper way to connect DVC to Min.IO that is connected to some buckets on S3. AWS-S3(My_Bucket) > Min.io(MY_Bucket aliased as S3) Right now i am accessing my bucket by using mc for example mc cp s3/my_bucket/datasets datasets to copy…
1
vote
0 answers

Forbidden: An error occurred (403) when calling the HeadObject operation:

my ~/.aws/credentials looks like [default] aws_access_key_id = XYZ aws_secret_access_key = ABC [testing] source_profile = default role_arn = arn:aws:iam::54:role/ad I add my remote like dvc remote add --local -v myremote…
Areza
  • 5,623
  • 7
  • 48
  • 79
1
vote
1 answer

How to access DVC-controlled files from Oracle?

I have been storing my large files in CLOBs within Oracle, but I am thinking of storing my large files in a shared drive, then having a column in Oracle contain pointers to the files. This would use DVC. When I do this, (a) are the paths in Oracle…
Justin
  • 53
  • 6
1
vote
1 answer

How do I specify encryption type when using s3remote for DVC

I have just started to explore DVC. I am trying with s3 as my DVC remote. I am getting But when I run the dvc push command, I get the generic error saying An error occurred (AccessDenied) when calling the PutObject operation: Access Denied which…
Achilleus
  • 1,824
  • 3
  • 20
  • 40
0
votes
0 answers

config file error: expected 'url' for dictionary value @ data['remote']['origin']

I have installed git and DVC and I push my git to dagshub without any error, but I faced with error with dvc push. I run this command dvc push --all-commits and I faced with this error: ERROR: failed to push data to the cloud - config file error:…
ebi_d
  • 1