Download newly sparse checkedout files when pulling in git

Question

Lets say I create a git repo. I enable sparse checkout and then I add some files in git/info/sparse-checkout.

After that, i add a origin and pull. Now i will only have the files matching the pattern in the sparse-checkout file.

Now I'd like to add new files, which should be included in the sparse-checkout. And then I'd like to pull again, in order to have the new files, that i just added without recloning everything.

Any help would be appreciated.

score 1 · Accepted Answer · answered Mar 02 '21 at 22:37

There is no need to pull anything again. You're operating from a wrong assumption, which will lead you to the wrong conclusion.

What sparse checkout does is simply not check out some files. You still have the files. That's because Git isn't really about files at all. Instead, Git is all about commits. You either have a commit—the whole thing—or you don't, i.e., you have none of it.

Each commit contains files: in fact, each commit holds a full snapshot of every file. But the files that are inside a commit are not ordinary files. They are literally unusable to anything but Git. They are stored in a special, read-only, compressed, and de-duplicated format. The de-duplication handles the fact that most commits mostly just use the same files as some previous commit. The compression makes many big files take almost no space (though it tends not to work for large binary files, which is why Git tends not to be suited to hold a lot of large binary files). But because the files inside a commit aren't usable, those are not the files that you have checked out.

Instead, when you git checkout some commit, Git copies the files out of the commit, expanding them out into usable form. Using sparse checkout, you're simply telling Git: Don't copy out all the files, just copy out some selected subset of the files. If some files are large and/or clutter-y and/or slow to check out, that helps you when you go to deal with the files from the commit—which you do with the extracted copies, not with the actual files (since they're unusable).

But, when you run git pull, you're simply telling Git to run two other Git commands. Those two Git commands are: (1) git fetch, which obtains any new commits the other Git has that you don't, that your Git needs; followed by (2) a command to combine your commmits with their commits. The second command is your choice; the default, if you have not chosen anything, is git merge.

The merge operation works on commits, not files (though it needs to use your working tree, where you have Git store the checked-out files, to do the merging, in some of the harder cases). If no actual merging is required—which is pretty common—the merge that git pull runs will normally be what Git calls a fast-forward merge. This is not actually a merge at all, it's just a specialized kind of git checkout. So, again, once you have sparse checkout turned on, it just avoids copying out some or most files from the new commit you're using, and just copies out the particular files you've listed as "do copy these out".

If you've updated your list of files to copy out, what you need is to get Git to re-read the list and compensate for the change. The new-ish (Git 2.25) git sparse-checkout command helps do this: if you've updated the .git/info/sparse-checkout file, you can run git sparse-checkout reapply. There is no need to run git pull again. Indeed, doing so won't help—or even do anything at all—unless there are new commits for git fetch to fetch so that the subsequent git merge can fast-forward to a new commit to check out.

If you don't have Git 2.25 or later, probably the best thing to do is upgrade, but you can (rather painfully) clear the --skip-worktree bit on the various index entries for the files you want Git to check out, then run git checkout or git restore on those files to copy them out to your working tree.

So sparse-checkout does copy all files after all. This did solve my issue. But my original idea was to only copy certain files, and not all. Do you have any idea if there's a way to do that? — lolsu, Mar 03 '21 at 18:12
@lolsu: *Any* checkout copies files. That's a basic fundamental of Git. Sparse checkout just limits which ones get copied from index to working tree. The files *must* exist in the index, so that they will be in the next commit: the index acts as the proposed next commit, and the files that are in the index are the ones in the commit. Leave a file out (or take one out) and the next commit omits that file—which, compared to its predecessor, means *delete the file*. — torek, Mar 03 '21 at 21:01
Git does have a method for deferring *obtaining* files, where you get a commit object but not the trees and/or blobs that go with it, via so-called *promisor packs*. However, the moment you try to check out that commit, Git must call up the promisor (another Git repository) and obtain the trees and blobs in question, because the files must be copied to Git's index. So this doesn't help your particular case here. If you want a smaller `.git`, the ony thing that would help is a shallow clone. Whether, and how much, that actually helps depends greatly on the repository. — torek, Mar 03 '21 at 21:04

Download newly sparse checkedout files when pulling in git

1 Answers1