1

The following link explains how Git computes object IDs (ruby code snippet). The object type (blob, tree, commit) is encoded in the header which is concatenated with the content and the SHA1 is computed on the concatenated string.

https://git-scm.com/book/en/v2/Git-Internals-Git-Objects

One can use git cat-file -t <object id> to determine the type of the object (blob, tree, commit).

I'm wondering how does this command extract the type from the object ID given that SHA1 is a oneway hashing function?

katboo
  • 55
  • 7

2 Answers2

9

"You're holding it upside down."

While it's true that SHA is a one-way hash, that's not a problem: you're supplying the hash yourself, which Git uses as a key in a key-value database, allowing Git to retrieve the data. (If you supply part of the hash, rather than the whole thing, Git looks for keys that match that prefix; if the prefix is unique, Git assumes that the resulting matching key is the right key.)

Having obtained the data—the zlib-compressed object—Git now needs only to uncompress the first few bytes of that data. These begin with one of the four object type strings: blob, commit, tag, or tree (followed by a space and then the decimal-expansion-in-ASCII of the size and the '\0' byte).

If Git extracts the entire object—the -t code can take a shortcut and stop decompressing early—Git will then verify that the bytes of the object, including the header, fed back through the hash function, produce the key that was used to retrieve the object. If Git stops short (as it does for -t), Git skips the verification step.

torek
  • 448,244
  • 59
  • 642
  • 775
  • 1
    Thanks for the answer. I think it was right there in my face. Essentially, the object ID is the SHA1 of the header + content and it is used as the name of the directory/file under .git/objects. But the content of the file itself are zlib compressed version of the exact same string (header + content) -- which contain the the type of the object at the beginning. – katboo Jun 29 '21 at 23:51
  • 1
    Right—for loose objects, at least. For *packed* objects, the pack file has the object further compressed, but the pack headers have extra information. – torek Jun 30 '21 at 00:26
4

given that SHA1 is a oneway hashing function

That's irrelevant. The SHA is not concealing anything. On the contrary. Think of the SHA as an address. The file at that address is readable and states the type.

matt
  • 515,959
  • 87
  • 875
  • 1,141