0

I am aware of git's --ignore-revs option for asking git blame to ignore, for example, revisions that contain automated code formatting changes. I am working on a project to steer a moderate sized codebase towards a new code formatting tool, and creating lots of entries in our file listing revisions to ignore. I am pretty sure that for our fairly small use-case, there is no concern, regardless of the performance characteristics of this feature, but I am curious about how the --ignore-revs feature works, and what the performance characteristics are.

Here are some rough statistics on my use-case;

  • codebase has ~300k LOC
  • ignore file may have ~100 entries by the end of this migration
  • median formatting patch changes 1,500 LOC

Keep in mind, many of my teammates are using the git-lens VS Code extension, which appears to do a lot of git-blaming, since it shows the author inline everywhere throughout the editor. Presumably, their editor performs a blame on every file they open, so I expect I'll start getting some complaints if git blame performance starts to drag.

jdevries3133
  • 135
  • 1
  • 6

1 Answers1

1

git blame parses the ignored commits (from --ignore-revs and --ignore-revs-file) into an oidset, which is implemented as a khash open-addressing hash table.

The cost of parsing and populating the hash table is incurred once per git blame command (which should be insignificant for ~100 ignore entries, or even for thousands), and the cost of checking for ignored commits is one hash lookup per commit that git blame considers. This is independent of the number of files in your repo, and independent of the number of ignored revs (hash lookup is amortized O(1)). It does grow with the length of your commit history, but the check is cheap compared to the other work that blame does per-commit, so overall it's insignificant.

oidset is used heavily by other parts of git including git pull/git fetch and git fsck, so you can expect it to be pretty good at its job; if it wasn't, it would be an attractive target for optimization because improving it would make all fetch operations on large repos faster.

hobbs
  • 223,387
  • 19
  • 210
  • 288