-3

The gitattributes man page says:

Creating an archive

export-subst
If the attribute export-subst is set for a file then Git will expand several placeholders when adding this file to an archive. [...] The placeholders are the same as those for the option --pretty=format: of git-log(1), except that they need to be wrapped like this: $Format:PLACEHOLDERS$ in the file. E.g. the string $Format:%H$ will be replaced by the commit hash. However, only one %(describe) placeholder is expanded per archive to avoid denial-of-service attacks.

The git log man page says:

PRETTY FORMATS

[...]

  • format:<format-string>

    [...]

    The placeholders are:

    [...]

    • Placeholders that expand to information extracted from the commit:

      [...]

      %(describe[:options])
      human-readable name, like git-describe(1); empty string for undescribable commits. The describe string may be followed by a colon and zero or more comma-separated options. Descriptions can be inconsistent when tags are added or removed at the same time.

In the event that I forgot to tag a recent commit and git describe has to resort to scanning trillions of past commits to find the most recent tag... I can just ^C to terminate git archive. So whose service is being denied in this so called "denial-of-service".

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
figl
  • 55
  • 1
  • 5
  • 1
    Ctrl-C is no use if you're running a command non-interactively, e.g. as part of a scheduled job. – IMSoP Aug 25 '22 at 23:56
  • if your automated scripts are running `git archive` on untrusted repositories and you do not have a timeout... then you are probably not safeguarding against other basic protections like running out of disk space to support a hostile repository with TBs of commits... – figl Aug 26 '22 at 00:01
  • @IMSoP more to the point: security incompetence is not a denial-of-service risk that justifies removing basic functionality from `git archive`'s `export-subst`. If having just TWO `%(describe)`s is a risk then you are using a system that is doomed to be DOS attacked in far easier ways. – figl Aug 26 '22 at 00:05
  • 1
    Short answer: GitHub's. See [commit 96099726ddb00b45135964220ce56468ba9fe184](https://github.com/git/git/commit/96099726ddb00b45135964220ce56468ba9fe184). – torek Aug 26 '22 at 04:10
  • @moderators why delete my original answer but keep these comments?(rhetorical (as was my original intention in posting the original question, as the real intentions are almost too obvious)) You could at least delete my account to prevent me from commenting, anything less is just lazy. – figl Jan 04 '23 at 20:53

2 Answers2

1

At least GitHub uses git archive to produce archives, and this is also probably the case for GitLab, cgit, and other, similar environments. While GitHub caches archives for a period of time, having very expensive operations and spawning lots of processes is undesirable because it overloads the file servers which store data.

GitHub does have rate-limiting for expensive operations in place, but if archives are extremely expensive, then that means the same repository will see longer delay times for archives, clones, and fetches, and therefore the repository will scale less well. This would also be true if one used cgit on one's own self-hosted code with some sort of CPU or memory cap (e.g., due to a container limit), which also means that similar problems would likely affect sites like kernel.org.

It may be that two or three expansions isn't a problem, but a large number would be, and for now the limit is one, as mentioned in torek's comment.

bk2204
  • 64,793
  • 6
  • 84
  • 100
  • Kernel.org does not have to accept untrusted code: presumably linux code moderators are competent enough to notice code being submitted with thousands of `%(describe)`s, or competent enough to enforce a policy against using `%(describe)`s. – figl Jan 04 '23 at 20:28
0

It is not a denial-of-service risk. Proof:

Problem Solution
thousands of %(describe)'s slow down an unprotected process limit everyone to only use one %(describe) per repository
large file uploads crash poorly designed webservers web browsers limit all file uploads to 1kB/day/host

Both large numbers of %(describe)s and large file uploads can be prevented at the server end (when needed), and thus do not, in themselves, represent a denial of service attack.

I do not understand how this draconian "solution" was signed off on by two, supposedly independent, people. The mindset of severely restricting all users because of bad actions by a small number of users and/or for business benefits sets a dangerous precedent. There are valid, targeted, technical solutions to this problem, as I have demonstrated above.

The real lesson to learn is that git should not be bloated with code for this purpose in the first place, as anyone who needs it can implement their own git-free solution with make and sed. Now that %(describe) is effectively useless in all but the most limited of circumstances, it really is just code bloat. I would recommend not using it.

Cody Gray - on strike
  • 239,200
  • 50
  • 490
  • 574
figl
  • 55
  • 1
  • 5