How should untrusted JSON be sanitized before using JSON.parse?

Question

Given a user-provided JSON string, how can we sanitize it before running JSON.parse(untrustedString)?

My primary concern is about prototype pollution, but I'm also wondering what else I should potentially look out for? If it's just prototype pollution that's a risk, then I assume that could be handled via regex, but I suspect there are additional concerns?

For example, this article on the dangers of parsing untrusted JSON and then creating a copy of the object.:

Now consider some malicious JSON data sent to this endpoint.
{
  "user": {
    "__proto__": {
      "admin": true
    }
  }
} 
If this JSON is sent, JSON.parse will produce an object with a __proto__ property. If the copying library works as described above, it will copy the admin property onto the prototype of req.session.user!

That was my initial impression, but the more I read the more paranoid I became. Can't `JSON.parse()` allow pollution of the object prototype? — Slbox, Sep 16 '20 at 19:21
i dont think it can - but it could create a malformed object — Daniel A. White, Sep 16 '20 at 19:22
I've added an article with and example of one of my concerns to the question text. — Slbox, Sep 16 '20 at 19:25
ah thats just saying additional properties can come thru `__proto__` and you should guard against that - nothing that JSON.parse is doing wrong... — Daniel A. White, Sep 16 '20 at 19:28

score 8 · Accepted Answer · answered Sep 16 '20 at 20:15

My primary concern is about prototype pollution

Note that JSON.parse will not pollute any prototype object. If the JSON string has a "__proto__" key, then that key will be created just as any other key, and what ever the corresponding value is in that JSON, it will end up as that property value, and not in the prototype object (Object.prototype).

The risk is in what you do with that object afterwards. If you perform a (deep) copy, with property assignments or Object.assign, then you would potentially mutate the prototype object.

how can we sanitize it before running JSON.parse(untrustedString)? ... I assume that could be handled via regex

Do not use a regular expression for that. Use the second argument of JSON.parse:

const cleaner = (key, value) => key === "__proto__" ? undefined : value;

// demo
let json = '{"user":{"__proto__":{"admin": true}}}';

console.log(JSON.parse(json));
console.log(JSON.parse(json, cleaner));

zcoop98 · Answer 2 · 2021-09-09T15:58:18.903

Before you do anything with it, userString is just a string, and nothing in that string, by itself, can harm a system, unless the system does something to allow that harm, like processing it in an unsafe way.

Enter JSON.parse().

JSON.parse() is simply a format conversion tool. It doesn't run any methods from the data (which proto pollution exploits rely on), or really even look at the data contained in the stringified object itself, beyond structural syntax it contains, and JavaScript reserved words, for validation purposes (MDN polyfill example). The same principle applies here as with the string; if you don't do anything unsafe with the output object, it can't hurt you or your system.

At the end of the day, preventing abuse simply boils down to validation and safe data handling practices:

Check the object that results from parsing the string, and validate it within strict parameters, ignoring prototype mutations.
Use Object.prototype.hasOwnProperty.call(). Enforce these practices in your code base with tools like eslint (no-prototype-builtins rule).
Don't expect users to send in perfect, safe data.

In the article you linked, the author mentions this exact idea:

...data coming in from users should always be filtered and sanitized.

"Data is always harmless; it is ill-prepared code that does bad things when fed strange input" is a truism. Nevertheless, it is better to be safe than sorry and perform defence-in-depth here as well (sanitize early and often). So using the `cleaner` outlined by @trincot above is a good idea. — Marcel Waldvogel, Jul 26 '21 at 09:43

score 1 · Answer 3 · answered May 10 '23 at 11:55

Probably the most important thing is to simply limit the size. Memory overflows can reach beyond your applications sandbox.

The next most important thing is to concern yourself with a character set and sanitize stuff that you're not going to work with.

If you're not expecting full Unicode support, explicitly don't support it (filter to something simpler like ASCII, after escapes are processed) because that code is complex and hits binary layers that could be corrupted

Finally, have an explicit list of keys that you support and validate the values in each one of them so that you don't have to worry about your application logic going astray.

How should untrusted JSON be sanitized before using JSON.parse?

3 Answers3

...data coming in from users should always be filtered and sanitized.