IPFS CID Reproducibility

Question

This is perhaps very basic question but I couldn’t find in my search. Assuming the file content doesn’t change even single bit, would IPFS CID never change even if it’s being added/pinned by different users?

Would this assumption stay true for foreseeable future? I know it depends on hash algorithm, so just by knowing what hash algorithm was used (SHA-256), the IPFS CID would be reproducable for foreseeable feature right? Or is there other information that needs to be stored as well?

score 5 · Answer 1 · answered Mar 01 '21 at 14:44

Would this assumption stay true for foreseeable future? I know it depends on hash algorithm, so just by knowing what hash algorithm was used (SHA-256), the IPFS CID would be reproducable for foreseeable feature right? Or is there other information that needs to be stored as well?

No, it is not safe to assume that ipfs add <file> will always give the same CID as there are many parameters other than the hash function itself that the binary is free to change over time. At a high level ipfs add turns a file/directory in a tree structure called UnixFS that represents that data, and since the default way in which ipfs add is allowed to change over time it means that CID output by ipfs add example.txt can change

Many of the UnixFS parameters are configurable (and described in ipfs add --help) and include options such as raw leaves and chunk size. This means that if you'd really like to ensure that ipfs add example.txt results in the same CID there are a set of flags you can pass to ensure this is the case.

Note, in general I'd try to avoid importing the same data to IPFS multiple times (it's a waste of resources anyway) although there may be some scenarios where that's just the easiest, or best, thing to do to get your project off the ground.

score 1 · Answer 2 · answered Feb 28 '21 at 23:33

As long as there won't be any default settings changed (and the user doesn't set any specific ones to override them), all clients will generate the same CID from the same binary data.

But on the other hand, there's talk to replace SHA-256 with Blake2b in the future, and there are two versions of CIDs: v0 and v1.

Most custom settings require v1 of the CID.

The other things which influence the CID are --raw-leaves --chunker as well as --trickle

And

--inline as well as --inline-limit when you're dealing with small files.

score 0 · Answer 3 · answered Oct 20 '22 at 09:55

IPFS CIDs are not truly self describing in the sense that they don't capture information about how the data was chunked and what kind of merkle dag was created.

If you want the CIDs to be the same, I would suggest using content archive files (and uploading that). Content archive files would contain information about the various parameters @Ruben described in this answer, and different ipfs clients will be forced to use those parameters even if their default settings are different.

For example, I was using web3.storage (chunking size 1MB) and pinata (chunking size 256kb), uploading the same file to them gave different CIDs which was unacceptable for our purposes. Using CAR files allowed to have the same CIDs.

IPFS CID Reproducibility

3 Answers3