0

I have a GitLab repository with a lot of binary garbage (cleanup is sadly not an option). I only need one subdirectory that actually only has text files. And I need to get that folder AFAP.

Now I found git archive and thought all my problems are solved. It seems though, that it does not accept hashes refs. Is there any way of retrieving specific combinations of version/subdirectory with git archive?

abergmeier
  • 13,224
  • 13
  • 64
  • 120
  • What do you mean it *“does not accept hashes refs”*? `git archive some-commit -- path/to/folder` should work – poke Jun 01 '17 at 14:18
  • Not according to the docs. See https://git-scm.com/docs/git-upload-archive – abergmeier Jun 01 '17 at 14:19
  • `git upload-archive` !== [`git archive`](https://git-scm.com/docs/git-archive) which explicitly mentions `git archive […] […​]` as its usage. – poke Jun 01 '17 at 14:22
  • `git archive` uses `git upload-archive` for e.g. `--remote` option. – abergmeier Jun 01 '17 at 14:23
  • Well, in that case, the docs clearly state that *“Clients may not use other sha1 expressions”*, so there is really nothing you can do about that. – poke Jun 01 '17 at 14:26
  • We luckily could activate `uploadArchive.allowUnreachable` on our private repo. – abergmeier Jun 01 '17 at 14:38

1 Answers1

0

I would think the use cases for this must be pretty limited; you're looking at getting a snapshot of some files, without historical context or the ability to write back changes. Well, ok...

When you use archive with --remote you pretty much have to pull from a ref. If you can push a tag to the remote, you can tag the version you want and then you should be able to pull from that tag. If you can't do that - and if the version you want doesn't happen to have a tag or be the current head of a branch - then you're probably out of luck.

The docs are pretty wishy-washy about even that; when it comes down to it, you're at the whim of the server whether it's going to help you out here. The git model doesn't really give much support to remote access at a sub-repository level.

Some partial solutions you might play with, depending on how exactly this repo's bloat is organized:

You could play with shallow and/or single-branch cloning. You'd still have to pull the full TREE for at least the version you want, but you'd be able to minimize (or maybe eliminate) pulling of history and unrelated versions of files.

Even though you can't clean up the original repo, if you're going to frequently read versions of this subtree, it might be worth using filter-branch with subdirectory-filter to create a repo with the history for just the subtree; then remove the subtree from the "next" commit in the original repo, replacing it with a submodule reference to the new repo. (But if it's just a one-time pull, this clearly wouldn't be worth the trouble.)

Mark Adelsberger
  • 42,148
  • 4
  • 35
  • 52