2

I work on a project with DVC (Data version control). Let's say I make a lot of local commits. Something like this:

# make changes for experiment 1
dvc add my_data_file
git add my_data_file.dvc
git commit -m "Experiment 1"

# make changes for experiment 2
# which change both code and data
dvc add my_data_file
git add my_data_file.dvc
git commit -m "Experiment 2"

# make changes for experiment 3
# which change both code and data
dvc add my_data_file
git add my_data_file.dvc
git commit -m "Experiment 3"

# Finally I'm done
# push changes:
dvc push
git push

However there is one problem: dvc push will only push data from experiment 3. Is there any way to push data from all local commits (i.e. starting from the first commit diverged from remote branch)?

Currently I see two options:

  1. Tag each commit and push it with dvc push -T
  2. After "expermient 3" commit execute git checkout commit-hash && dvc push for all local commits not yet pushed to remote.

Both these options seem cumbersome and error-prone. Is there any better way to do it?

NShiny
  • 1,046
  • 1
  • 10
  • 19

2 Answers2

3

To make it less error prone, you can use HEAD~1 to refer to the previous commit instead of using the exact commit hash.

If you are on Bash, you can use a for loop to iterate over the last 3 commits and dvc push the content.

for x in {1..3}; do git checkout HEAD~1 && dvc push; done

Remember to git checkout back to your working branch (i.e. git checkout master)


Answering your comment ("dvc push" after several local commits):

Is there a way to disable hooks after dvc install command?

When you run dvc install, it creates three files under the .git/hooks directory:

.git/hooks
├── post-checkout
├── pre-commit
└── pre-push

To disable them, you can remove those files (i.e. rm .git/hooks/post-checkout).

By the way, I edited DVC's documentation to include more information about this.

2

@NShiny, there is a related ticket:

support push/pull/metrics/gc, etc across different commits.

Please, give it a vote so that we know how to prioritize it.

As a workaround, I would recommend to run dvc install. It installs a pre-push GIt hook and runs dvc push automatically:

Git pre-push hook executes dvc push before git push to upload files and directories under DVC control to remote.

It means, though you need to run git push after every git commit :(

Shcheklein
  • 5,979
  • 7
  • 44
  • 53
  • 1
    Thanks for your response! BTW `dvc install` documentation describes how to install DVC hooks into the repository but doesn't explain how to uninstall it. Is there a way to disable hooks after `dvc install` command? – NShiny Jun 30 '19 at 10:18
  • @NShiny just remove the newly added hooks under `.git/hooks` directory – mambo_sun May 28 '22 at 08:51