4

I dvc add-ed a file I did not mean to add. I have not yet committed.

How do I undo this operation? In Git, you would do git rm --cached <filename>.

To be clear: I want to make DVC forget about the file, and I want the file to remain untouched in my working tree. This is the opposite of what dvc remove does.

One issue on the DVC issue tracker suggests that dvc unprotect is the right command. But reading the manual page suggests otherwise.

Is this possible with DVC?

shadowtalker
  • 12,529
  • 3
  • 53
  • 96

2 Answers2

7

As per mroutis on the DVC Discord server:

  1. dvc unprotect the file; this won't be necessary if you don't use symlink or hardlink caching, but it can't hurt.
  2. Remove the .dvc file
  3. If you need to delete the cache entry itself, run dvc gc, or look up the MD5 in data.dvc and manually remove it from .dvc/cache.

Edit -- there is now an issue on their Github page to add this to the manual: https://github.com/iterative/dvc.org/issues/625

shadowtalker
  • 12,529
  • 3
  • 53
  • 96
  • Yep. Basically `rm -f ; dvc gc` – Jorge Orpinel Pérez Sep 17 '19 at 05:32
  • @JorgeOrpinel the goal was to keep the original file intact while "un-tracking" it with DVC. – shadowtalker Sep 17 '19 at 06:18
  • 2
    My bad, I meant `rm -f ; dvc gc`. – Jorge Orpinel Pérez Sep 17 '19 at 17:31
  • 1
    Also, `rm -f ` doesn't remove the file from `.gitignore`, so Git will still think that the file doesn't exist. If you've not made any other changes to it then you may also want to `git restore .gitignore` as well. – IBBoard Aug 20 '21 at 14:52
  • @IBBoard good point. It might not be in the top-level gitignore either, so it might be better to `grep -R '' **/.gitignore` for it and change those files accordingly. – shadowtalker Aug 20 '21 at 14:58
  • From my tests just now, `dvc remove file.ext.dvc` appears to leave the file in the working directory (at least for uncommitted files). Saves a lot of messing with git ignore files! – IBBoard Aug 23 '21 at 19:23
1

dvc remove appears to do what you need for uncommitted files - at least for files that aren't in a pipeline. The key (which wasn't clear to me from the error or the docs) is to pass the ….dvc file name, otherwise it tries to find and remove it as a section from dvc.yaml.

# Precondition: DVC is configured for the repo. No dvc.yaml file (untested with it)
$ touch so-57966851.txt
$ dvc add so-57966851.txt
WARNING: 'so-57966851.txt' is empty.                                                                                                                          
100% Adding...|████████████████████████████████████████|1/1 [00:00, 49.98file/s]
                                                                                                                                                              
To track the changes with git, run:

    git add .gitignore so-57966851.txt.dvc
# Ooops! I did the wrong thing! I didn't mean to add that…
$ dvc remove so-57966851.txt.dvc
$ ll so-*.txt
-rw-r--r-- 1 ibboard users 0 Aug 23 20:27 so-57966851.txt

(Tested with v2.5.4)

IBBoard
  • 909
  • 4
  • 16