5

Given a user-provided JSON string, how can we sanitize it before running JSON.parse(untrustedString)?

My primary concern is about prototype pollution, but I'm also wondering what else I should potentially look out for? If it's just prototype pollution that's a risk, then I assume that could be handled via regex, but I suspect there are additional concerns?

For example, this article on the dangers of parsing untrusted JSON and then creating a copy of the object.:

Now consider some malicious JSON data sent to this endpoint.

{
  "user": {
    "__proto__": {
      "admin": true
    }
  }
} 

If this JSON is sent, JSON.parse will produce an object with a __proto__ property. If the copying library works as described above, it will copy the admin property onto the prototype of req.session.user!

trincot
  • 317,000
  • 35
  • 244
  • 286
Slbox
  • 10,957
  • 15
  • 54
  • 106

3 Answers3

8

My primary concern is about prototype pollution

Note that JSON.parse will not pollute any prototype object. If the JSON string has a "__proto__" key, then that key will be created just as any other key, and what ever the corresponding value is in that JSON, it will end up as that property value, and not in the prototype object (Object.prototype).

The risk is in what you do with that object afterwards. If you perform a (deep) copy, with property assignments or Object.assign, then you would potentially mutate the prototype object.

how can we sanitize it before running JSON.parse(untrustedString)? ... I assume that could be handled via regex

Do not use a regular expression for that. Use the second argument of JSON.parse:

const cleaner = (key, value) => key === "__proto__" ? undefined : value;

// demo
let json = '{"user":{"__proto__":{"admin": true}}}';

console.log(JSON.parse(json));
console.log(JSON.parse(json, cleaner));
trincot
  • 317,000
  • 35
  • 244
  • 286
2

Before you do anything with it, userString is just a string, and nothing in that string, by itself, can harm a system, unless the system does something to allow that harm, like processing it in an unsafe way.

Enter JSON.parse().

JSON.parse() is simply a format conversion tool. It doesn't run any methods from the data (which proto pollution exploits rely on), or really even look at the data contained in the stringified object itself, beyond structural syntax it contains, and JavaScript reserved words, for validation purposes (MDN polyfill example). The same principle applies here as with the string; if you don't do anything unsafe with the output object, it can't hurt you or your system.

At the end of the day, preventing abuse simply boils down to validation and safe data handling practices:

In the article you linked, the author mentions this exact idea:

...data coming in from users should always be filtered and sanitized.

zcoop98
  • 2,590
  • 1
  • 18
  • 31
  • "Data is always harmless; it is ill-prepared code that does bad things when fed strange input" is a truism. Nevertheless, it is better to be safe than sorry and perform defence-in-depth here as well (sanitize early and often). So using the `cleaner` outlined by @trincot above is a good idea. – Marcel Waldvogel Jul 26 '21 at 09:43
1

Probably the most important thing is to simply limit the size. Memory overflows can reach beyond your applications sandbox.

The next most important thing is to concern yourself with a character set and sanitize stuff that you're not going to work with.

If you're not expecting full Unicode support, explicitly don't support it (filter to something simpler like ASCII, after escapes are processed) because that code is complex and hits binary layers that could be corrupted

Finally, have an explicit list of keys that you support and validate the values in each one of them so that you don't have to worry about your application logic going astray.

Erik Aronesty
  • 11,620
  • 5
  • 64
  • 44