0

For a small git helper script, based on this blog post, I'd like to be able to "discover" in which Git hosting app, a given remote URL (user@git.domain.tld:namespace/project.git) points to (e.g. GitLab CE/EE, Gitea, GHE, etc).

Using curl --head I found a wide range of "some identifying strings" to "none". So, that seems inaccurate, if fed into a heuristic. Going by the page body may provide more data for the heuristic, but seems equally crude.

Is there a more elegant or standardised way to find the app type? Something like a "server_agent"?


I understand that for security reasons, detailed info like the app version, will likely not be served. Also, I noticed that in Shodan, there is no "product" search for those apps. Does that mean it's fundamentally not possible to reliably identify them without HTML parsing?

Katrin Leinweber
  • 1,316
  • 13
  • 33
  • 1
    Btw Shodan does analyze the HTML to identify technologies but it's stored in a different property called ``http.components``. To search that property you need to use a different filter. For example, the following search identifies Gitlab servers: https://www.shodan.io/search?query=http.component%3Agitlab – achillean Jan 21 '22 at 07:38

2 Answers2

2

I do not think there is any "standardized" approach to finding the hosting application. The Git protocol itself does not provide any such thing. In HTTP (which most Git hosting apps use as the transfer protocol), the Server header is probably the best match - but of course, as you noted, there is no requirement for it to be meaningful (or even present).

Does that mean it's fundamentally not possible to reliably identify them without HTML parsing?

Yes, if the server chooses not to identify via the Server header, you can only guess (based on other headers, HTML responses, whatever).

So it seems there is no reliable way to do what you want. Maybe it helps to see it as a X-Y problem? If you describe what you want to do with the information, you may find a different solution.

Maybe you can try probing the server? Or ask the user?

sleske
  • 81,358
  • 34
  • 189
  • 227
  • Thanks for the pointers! The problem I'm solving is described in the blog post. It uses only a simple `sed` pattern, which I expanded in Ruby to various Git hosters. – Katrin Leinweber Jan 21 '22 at 10:15
2

Does that mean it's fundamentally not possible to reliably identify them without HTML parsing?

More or less, that is true. As sleske correctly states, there's no reliable way to use headers to identify the application/technology behind an HTTP server, as servers often choose not to provide this information.

Parsing the HTML response on the tld home page may or may not yield any useful information. With a familiarity of these services, you could probably get a good guess -- but it would be just that. A guess. With enough sophistication, you can probably get very good at guessing, but nothing is 100% certain.

You may also be able to make some positive determinations based on the remote URL and/or application behavior (if publicly accessible) -- (probing the server, as sleske also suggested)

For example, most SCM servers except for GitLab do not have deeply nested remote URLs. The remote URL git@domain.tld/foo/bar/project.git is not possible on GitHub, BitBucket, or Gitea, but is possible on GitLab.

You may also find that certain UI kits (presence of certain combinations of relevant javascript, CSS, etc) are used by certain SCM product versions exclusively or other unique elements in the response. Error responses (both over HTTP and SSH) can also be revealing.

sytech
  • 29,298
  • 3
  • 45
  • 86