0

I'm having an odd issue with Git on my Azure Devops build agents. We have a large repo that is using GitLFS, while GitLFS prunes keep the size of .git/lfs/objects down, some of our environments have begun accumulating massive numbers and sizes of objects in .git/objects that do not get cleaned with either a git gc or a git lfs prune.

For a sense of the scale here, the .git pack file is about 2GB, the lfs objects folder is about 1.4GB and the .git/objects files that won't pack are about 105GB!!! Every single one of the files begins with an x as the first character.

On a typical developer's machine, the entire repo checked out is around 5GB, so something is very, very off, but nothing I try will clean up the files. Any ideas what there source is and/or how to clean them, short of simply periodically nuking the entire repository and re-pulling it.

torek
  • 448,244
  • 59
  • 642
  • 775
AJ Henderson
  • 1,120
  • 1
  • 12
  • 32
  • You won't clean objects that are _linked_, so the expectation is for the repo size to always _grow_, not shrink.... well, gc should be able to _clean_ stuff that is left unlinked.... like old reflogs, used stashes and so on. But I would not expect it to shrink by 50%, for example. – eftshift0 Nov 09 '22 at 16:16
  • now.... the thing about the files that start with x (in objects?). That's above my knowledge level. – eftshift0 Nov 09 '22 at 16:17
  • @eftshift0 normally objects will pack though, which brings the size down. What's weird here is that the objects are not packing and a newly cloned repo is only around 5GB. The actual size of all the git object files required is around 2GB, yet there are over 100GB of files that are not required and I can't clean up. I suspect it is something related to GitLFS, which would explain why git gc can't pack them, but I can't find anything in the documentation about things outside of the .git/lfs/objects folder for GitLFS. – AJ Henderson Nov 09 '22 at 17:14
  • I guess this is known material, right? https://manpages.debian.org/testing/git-lfs/git-lfs-prune.1.en.html – eftshift0 Nov 09 '22 at 18:19
  • Yeah, I've run the most agressive git lfs prune and it cleans up all but 2 files in the .git/lfs/objects folder and only leaves 2, but does nothing to the loose .git/objects files. – AJ Henderson Nov 09 '22 at 18:21
  • Creating new empty repos to be repopulated to replace the "broken" ones is not an option? – eftshift0 Nov 09 '22 at 18:25
  • @eftshift0 I can try it, but it's happened on 2 repos and they are on the build servers. Rebuilding the repos regularly will not scale well. It is my fallback, but I'd rather understand what is actually happening so I can try to fix it properly. – AJ Henderson Nov 09 '22 at 18:46
  • Git doesn't make such files and thus won't clean up such files. Loose objects always have the form `$GIT_DIR/objects/ab/cdefg...` where the `abcdefg...` part is the hash ID; all the calls in Git to `loose_object_path` supply a valid OID, which cannot start with `x`. So this must be some LFS thing, perhaps a bug in Git-LFS? – torek Nov 14 '22 at 21:08
  • @torek I mean the start of the contents is an x. The objects do appear within the OID directory structure. – AJ Henderson Nov 15 '22 at 19:29
  • Hi @AJHenderson, could you kindly share some screenshots of the details about the issue including what do you mean by 'the start of the contents is an x'? Thanks – Antonia Wu-MSFT Nov 28 '22 at 06:08
  • @AntoniaWu-MSFT - I had to reset the environment to get it cleaned up, but the object files that weren't being packed all had x as the first character in their file content. More specifically it begins with hex 7801. After that the file contents are different non-descript binary data. It does appear they are starting to accumulate again in my environment, but I can't post a screen shot of the file as it may contain sensitive data. – AJ Henderson Nov 30 '22 at 15:05

0 Answers0