74

What is the easiest method to parse "relaxed" JSON but avoid evil eval?

The following throws an error:

JSON.parse("{muh: 2}");

since proper JSON should have keys quoted: {"muh": 2}


My use case is a simple test interface I use to write JSON commands to my node server. So far I simply used eval as it's just a test application anyway. However, using JSHint on the whole project keeps bugging me about that eval. So I'd like a safe alternative that still allows relaxed syntax for keys.

PS: I don't want to write a parser myself just for the sake of the test application :-)

Boann
  • 48,794
  • 16
  • 117
  • 146
axkibe
  • 2,787
  • 4
  • 19
  • 13
  • 3
    If it's a test app, and you have absolute control over your JSON input, there's no problem in just using `eval`. – bfavaretto Mar 09 '12 at 16:29
  • Another option is using proper JSON plus `JSON.parse`. Other than that, I guess it's eval or writing your own parser. – bfavaretto Mar 09 '12 at 16:30
  • 9
    @bfavaretto That is dangerous. We all know how many times "test" code gets into production. You might as well start with a safe foundation. – hspain Mar 09 '12 at 16:30
  • @hspain, I know. I think the best thing to do here would be using proper JSON in the first place. "Relaxed" JSON is also something that shouldn't go into production, right? – bfavaretto Mar 09 '12 at 16:32
  • @bfavaretto Sometimes we don't have control over our input. He may be consuming a service that sends in the improper JSON that he can't do anything about. Besides, there are other options to using eval here and they should be considered long before eval is. The fact that this is a test environment shouldn't be a consideration. – hspain Mar 09 '12 at 16:35
  • the test script takes "relaxed" JSON as input and sends proper JSON to the server. Yes its a safe bet to use eval() here, since all you can do is to kill the test script which can do nothing than sending JSON commands to server. Still I'd like to get rid of the eval() if simply possible without having me to put up with writing proper JSON all the time just try things out. – axkibe Mar 09 '12 at 16:39
  • try looking at this: http://code.google.com/p/jquery-json/ – lcapra Mar 09 '12 at 16:29
  • It [uses](http://code.google.com/p/jquery-json/source/browse/trunk/src/jquery.json.js#144) `eval` and the secure version doesn't allow the notation the OP uses. – pimvdb Mar 09 '12 at 16:32
  • I see, but substituting the eval you get a "more secured" string anyway. Maybe a step forward to handle it. – lcapra Mar 09 '12 at 16:41
  • @axkibe, it seems to me you're putting too much effort into a workaround for your laziness to write proper JSON for your manual tests? – lanzz May 28 '12 at 12:32

8 Answers8

42

You could sanitize the JSON using a regular expression replace:

var badJson = "{muh: 2}";
var correctJson = badJson.replace(/(['"])?([a-z0-9A-Z_]+)(['"])?:/g, '"$2": ');
JSON.parse(correctJson);
Arnaud Weil
  • 2,324
  • 20
  • 19
  • 2
    This regex works, the one by @kennebec does not work for JSON`value` that contains a boolean – Jan Grz Jun 08 '16 at 11:47
  • 2
    If you add `\s*` before `:/g` then you will also be able to repair JSON strings that have spaces before the colons, like `{muh : 2}` – Malvineous Aug 20 '16 at 04:17
  • 7
    Just realised this regex fails for values that contain colons, e.g. `{muh: "2:30pm"}` – Malvineous Aug 20 '16 at 04:25
  • This will not work if you have a string containing a date. (15/2/2017 17:00:00) – driconmax Feb 15 '17 at 17:52
  • 5
    Doesn't work for any strings which contain `:` (URLs, citations, ...) but still saved me some work. – Felix Dombek Jul 06 '17 at 06:19
  • Heres my slightly more complete version, that replaces values with single quotes to ones with double quotes, and also strips out functions. `.replace(/(?:['"])?([a-z0-9A-Z_]+)(?:['"])?:/g, '"$1": ').replace(/:\s*?(?:'([^']*)')/g, ': "$1"').replace(/\s*"[^"]*":\s*[^(,[\]{}]*\([^()]*(?:\([^()]*\)[^()]*)*\)\s*,?/g, "")` – L0laapk3 May 07 '19 at 00:43
  • This looked like it was going to be helpful, but then it didn't work for my purposes (reading props passed in Vue 2): `{ footer: null, visible: { type: Boolean, default: false } }` It got stuck on `Boolean` not being wrapped in double-quotes. – Ryan Dec 04 '20 at 18:35
33

You already know this, since you referred me here, but I figure it might be good to document it here:

I'd long had the same desire to be able to write "relaxed" JSON that was still valid JS, so I took Douglas Crockford's eval-free json_parse.js and extended it to support ES5 features:

https://github.com/aseemk/json5

This module is available on npm and can be used as a drop-in replacement for the native JSON.parse() method. (Its stringify() outputs regular JSON.)

Hope this helps!

Andrew Marshall
  • 95,083
  • 20
  • 220
  • 214
Aseem Kishore
  • 10,404
  • 10
  • 51
  • 56
  • 1
    File was deleted in a Commit for reason "End of Life". Link to the file from commit history: https://github.com/douglascrockford/JSON-js/blob/03157639c7a7cddd2e9f032537f346f1a87c0f6d/json_parse.js – leumasme May 25 '20 at 01:29
16

This is what I ended up having to do. I extended @ArnaudWeil's answer and added support for having : appear in the values:

var badJSON = '{one : "1:1", two : { three: \'3:3\' }}';

var fixedJSON = badJSON

 // Replace ":" with "@colon@" if it's between double-quotes
 .replace(/:\s*"([^"]*)"/g, function(match, p1) {
  return ': "' + p1.replace(/:/g, '@colon@') + '"';
 })

 // Replace ":" with "@colon@" if it's between single-quotes
 .replace(/:\s*'([^']*)'/g, function(match, p1) {
  return ': "' + p1.replace(/:/g, '@colon@') + '"';
 })

 // Add double-quotes around any tokens before the remaining ":"
 .replace(/(['"])?([a-z0-9A-Z_]+)(['"])?\s*:/g, '"$2": ')

 // Turn "@colon@" back into ":"
 .replace(/@colon@/g, ':')
;

console.log('Before: ' + badJSON);
console.log('After: ' + fixedJSON);
console.log(JSON.parse(fixedJSON));

It produces this output:

Before: {one : "1:1", two : { three: '3:3' }}
After: {"one":  "1:1", "two":  { "three":  "3:3" }}
{
  "one": "1:1",
  "two": {
    "three": "3:3"
  }
}
Malvineous
  • 25,144
  • 16
  • 116
  • 151
  • 4
    This is virtually guaranteed to break on some edge case in the future. –  Aug 20 '16 at 06:00
  • 2
    @torazaburo: Of course, but perhaps when that happens someone can build on it to fix the problem, just as I built on an earlier solution to solve the issue I found. – Malvineous Aug 20 '16 at 06:27
  • A strange approach to coding--write code that you know will probably break, planning to fix it later. –  Aug 20 '16 at 06:29
  • 7
    @torazaburo: You might think it will break, but I am dealing with a legacy system that won't change so it is no more likely to break for me than any other code. If you think this answer is so bad, why not contribute a better one? I would like to see how you propose to address this issue in a way that won't break on a future edge case. – Malvineous Aug 20 '16 at 06:34
  • This works for me, however it would be great if we could have some comments on what each line does. Just for us morals that are not experts in regex. – opcode May 14 '17 at 09:59
  • @opcode: I've added some comments, hope this helps. I still recommend learning regexes, they are very useful! :-) – Malvineous May 15 '17 at 04:54
  • Good approach, but the next to last replace() doesn't work in case the quoted key has more funny characters in it. It works better for me with `.replace(/([^'",\}\]\s]+)\s*:/g, '"$1": ')` which just looks for unquoted keys: So the `[^...]` basically looks for any character that can't be part of an unquoted key: `" ' , } ] `. Note that you'll still have to get rid of dangling commas as in `{"x":1,}`, which is perfectly fine for JavaScript. – sjngm Apr 29 '19 at 19:13
  • This is another example of the fact that regular expressions are for regular grammars, which JSON is not. The arbitrary nesting of quotes and colons means that no matter what regex you come up with, it won't work with one nesting deeper. If you're going to write complex code to solve this anyway, write a dedicated DFA instead. – Mike 'Pomax' Kamermans Dec 18 '21 at 17:32
  • @Mike'Pomax'Kamermans What do you mean 'arbitrary nesting of quotes and colons'? You can only nest one level deep in the JSON spec (colons inside a string), unless I am misunderstanding you. – Malvineous Dec 22 '21 at 15:02
  • quoted strings lets you nest things as deep as you want. `{ "key": "{ \"val\": \"{ \\\"etc\\\": [...]` and so on and so forth. – Mike 'Pomax' Kamermans Dec 22 '21 at 15:46
  • I disagree that's nesting at the JSON-compatible level though, it's one key and one value. If your value happens to contain escaped JSON then that's up to you to parse again in your code, it's not the job of the regex (or JSON parser) to handle it. If you passed your example to a standard JSON reader you'd get the same one-level result with no nesting. The regex here is designed to fix missing quotes so that invalid JSON can be fixed up and parsed with standard JSON readers, it is not intended to actually parse the JSON object. – Malvineous Dec 23 '21 at 02:37
7

If you can't quote keys when writing the string, you can insert quotes before using JSON.parse-

var s= "{muh: 2,mah:3,moh:4}";
s= s.replace(/([a-z][^:]*)(?=\s*:)/g, '"$1"');

var o= JSON.parse(s);
/*  returned value:[object Object] */
JSON.stringify(o)
/*  returned value: (String){
    "muh":2, "mah":3, "moh":4
}
kennebec
  • 102,654
  • 32
  • 106
  • 127
  • 1
    > '{muh: "foo",mah:3,moh:4}'.replace(/([a-z][^:]*)(?=\s*:)/g, '"$1"'); '{"muh": ""foo",mah":3,"moh":4}' I was thinking along this, but see the example it doesn't quite cut it. Its a tad more complicated. – axkibe Mar 09 '12 at 17:59
  • 1
    try inserting quotes to a invalid JSON like this: `"{muh: true, notMuch: 123, pika:{pika:\"chu\"}}"` the result will be `{"muh": "true, notMuch": 123, "pika":{"pika":"chu"}}` – Jan Grz Jun 08 '16 at 11:39
  • 1
    You forgot to include /i modifier into your RegExp, like this: `s= s.replace(/([a-z][^:]*)(?=\s*:)/gi, '"$1"');` – alevkon Jul 24 '16 at 07:57
  • Thank you for actually answering the question! – Enkay Oct 13 '16 at 02:14
2

You can also use Flamenco's really-relaxed-json (https://www.npmjs.com/package/really-relaxed-json) that goes a step further and allows no commas, dangling commas, comments, multiline strings, etc.

Here's the specification http://www.relaxedjson.org

And some online parsers:
http://www.relaxedjson.org/docs/converter.html

Preloaded with the 'bad json'

{one : "1:1", two : { three: '3:3' }}

Bad JSON

Preloaded with even 'worse json' (no commas)

{one : '1:1' two : { three: '3:3' }}

Worse JSON

Preloaded with 'terrible json' (no commas, no quotes, and escaped colons)

{one : 1\:1 two : {three : 3\:3}}

Terrible JSON

Mike 'Pomax' Kamermans
  • 49,297
  • 16
  • 112
  • 153
Steven Spungin
  • 27,002
  • 5
  • 88
  • 78
  • 10
    really spooky library, no code at github, unkown waht's inside package and how it couldchange – Bogdan Mart Aug 12 '18 at 21:44
  • This library was _almost_ what I needed, except that the npm module is a pre-built/bundled library and by that token, virtually unmodifiable. Unfortunate :( – Austin Burk Jun 09 '20 at 19:54
  • no need to modify a library that works virtually flawlessly. its not open source, it is creative commons free to use commercialy – Steven Spungin Jun 10 '20 at 03:33
  • If it's CC (not even CC-BY or CC-SA), then not being open source is all that much weirder, requiring even more security scrutiny than normal NPM packages. – Mike 'Pomax' Kamermans Dec 17 '21 at 20:48
2

[EDIT: This solution only serves for pretty simple objects and arrays, but does not do well with more complicated scenarios like nested objects. I recommend using something like jsonrepair to handle more interesting cases.]

I've modified Arnaud's solution slightly to handle periods in the keys, colons in the key values and arbitrary whitespace (although it doesn't deal with JSON object key values):

var badJson = `{
    firstKey: "http://fdskljflksf",
    second.Key: true, thirdKey:
    5, fourthKey: "hello"
}`;


/*
    \s*
        any amount of whitespace
    (['"])?
        group 1: optional quotation
    ([a-z0-9A-Z_\.]+)
        group 2: at least one value key character
    (['"])?
        group 3: optional quotation
    \s*
        any amount of whitespace
    :
        a colon literal
    ([^,\}]+)
        group 4: at least one character of the key value (strings assumed to be quoted), ends on the following comma or closing brace
    (,)?
        group 5: optional comma
*/
var correctJson = badJson.replace(/\s*(['"])?([a-z0-9A-Z_\.]+)(['"])?\s*:([^,\}]+)(,)?/g, '"$2": $4$5');
JSON.parse(correctJson);
therightstuff
  • 833
  • 1
  • 16
  • 21
  • Nice, but has data format problems. ```test { _item123456: { data { type: "Object", id: "XY12246", date: "2018-04-12 16:19:42", shape: "cube", state: "painted" } } }``` – mike Feb 07 '23 at 12:38
  • Yeah, this regex doesn't support nested objects. For those, my initial approach would be a method that creates an array of inner objects from outside to in with each inner object being replaced by a temporary value (eg. uuid), then running the above regex on each inner object and finally replacing the temporary values with their fixed objects – therightstuff Feb 08 '23 at 15:33
  • I've edited my answer to include a reference to an npm package that solves the more complicated cases, safer JSON parsing is not a trivial problem. – therightstuff Feb 09 '23 at 06:31
1

If you are writing NodeJS code, you can also use the node:vm module to create a safer environment than that of eval to parse a relaxed JSON. Although the vm environment can run arbitrary code, it will be tighter, it's a pure V8 sandbox, without things such as require or process.

const vm = require('vm'); // or node:vm
const badJson = '{muh: 2}';
try {
    const parsedJson = new vm.Script(`x=${badJson}`).runInNewContext(
        { console: undefined }, // nuke console inside the vm
        { timeout: 1000, displayErrors: true }
    );
    if (typeof parsedJson !== 'object') { // in case you're expecting an object/array
        throw new Error(`Invalid JSON=${badJson}, parsed as: ${parsedJson}`);
    }
    console.log(parsedJson);
} catch (err) {
    throw new Error(`Could not parse JSON: ${err}`);
}

You can use vm2 instead, a module that promises more security and does the de same: https://github.com/patriksimek/vm2

ojosilva
  • 1,984
  • 2
  • 15
  • 21
  • I forgot about that question I gave long time ago, but I agree if I would the same thing I worked back then today, I would use the vm module. With a proper context. – axkibe Apr 19 '23 at 13:15
0

This is my simple solution that works with nested object AND semi colons in values:

const finalJson = relaxedJson
  .replace(/([{,])\s*([a-zA-Z_][a-zA-Z0-9_]*)\s*:/g, "$1\"$2\":");
Herobrine
  • 1,661
  • 14
  • 12