First of, a valid JSON (text) isn't appropriate for generating a hash, too: for a particular object there can be many and valid forms of JSONs which represent this object:
JSON is basically "text" and its character encoding is Unicode. Unicode has five different unicode schemes: it can be UTF-8, UTF-16BE, UTF-16LE, UTF-32BE or UTF-32LE. Each scheme would yield a different hash, even the object is the same.
A JSON may contain spaces and tabs (aka "pretty printed").
Then, any character in a JSON can be optionally represented as escaped unicode. And the "solidus" character /
may or may not be escaped.
Furthermore, the order of the elements in an JSON Object is not specified. And finally, the behavior in case of duplicate keys in a JSON-Object is undefined as well.
Thus, for any unique object there can be more than one valid JSON (text) representations which makes is inappropriate for creating a hash.
A solution would require to define your own JSON parser and serializer, which has properties which makes the generated representation (whatever it is actually) suitable for hashing.
It apparently would suffice to use the "average" JSON parser/serializer: given a valid JSON, we would create a representation and then serialize it back to a "canonical" JSON through setting options which generates a special form of JSON where the keys will be ordered.
However, this makes the assumption that you always use the exactly same parser/serializer for generating the hash for the lifetime of your database. This implies, the possibly undocumented internals and implementation details MUST NOT change, and thus guarantees that the generated and valid variation of the JSON is always exactly the same (see above how a JSON can be represented). If some implementation details would change, for example a newer version now escapes the "solidus" character, your database will break.
Unfortunately, NSJSONSerialization
lacks those kind of "documentation" and also has no options to set these properties (ordering of keys for example) to create such a "special" JSON representation which would it make appropriate to create a corresponding hash for the JSON object.
You are left with searching after a third party library which provides the source code where you have full control about the generated variant of the JSON which is appropriate for hashing. I strongly discourage from trying to implement your own parser/serializer - since it is not as easy as it looks at the first glance.
For the purpose of a solution of your problem ("Canonicalize JSON"), you don't even need a JSON parser/serializer which generates a Foundation representation: any representation form would suffice (e.g C++ containers, or any custom container), as long as it generates a canonical JSON (form any valid JSON) which fulfills your requirements.
I'm pretty sure there are a few third party libraries which are appropriate for a solution to your problem. I've implemented a JSON parser/serializer myself with an Objective-C API which is based on a C++ implementation. It very likely can be a solution to your problem since it has many options to control the output (JPJSONWriter
Options: JPJsonWriterSortKeys
, JPJsonWriterEscapeSolidus
). However, the library isn't that easy to apply, since it's source code is quite heavy (Objective-C API, C++ advanced templates, and optimized for performance and low memory footprint adds up a lot source code).
If it helps: JPJson (my attempt)
JPJson separates the concept of parsing and associated "semantic actions". A "Semantic action" for example is a "Foundation Representation Generator". That is, you could possibly implement a "HashGenerator" class which creates a hash directly from the received input without stacking up a representation.
and possibly Andrii Mamchur's JSON library: jsonlite, where the JsonLiteSerializer
method serializeDictionary:
can be easily modified to sort the keys before generating the output.
and a couple more libraries.