7

I am trying to match keys in JSON of this type:

define({
  key1: "some text: and more",
  key2 : 'some text ',
  key3: ": more some text",
  key4: 'some text:'
});

with this regexp (?<=\s|{|,)\s*(\w+)\s*:\s?[\"|\']/g. But currently it's matching the last text: also that should be ignore.

An example could be seen here

Could you give me hint how to fix this regex so it matches only keys.

user3599444
  • 145
  • 1
  • 1
  • 8
  • It would definitely be helpful to know the flavor of RegEx you're using. I assume Perl or ECMAScript compatible? – Mario Jun 19 '14 at 06:32
  • Can you tell us what do you want to do with json? If you are using Javascript, getting keys in Json is simply a `for .. in ...` loop or `Object.keys(json)`. Using regex is probably a cumbersome task. – Herrington Darkholme Jun 19 '14 at 06:38

3 Answers3

10

How about this shorter regex:

(?m)^[ ]*([^\r\n:]+?)\s*:

In the demo, look at the Group 1 captures in the right pane.

  • (?m) allows the ^ to match at the beginning of each line
  • ^ asserts that we are positioned at the beginning of the line
  • [ ]* eats up all the space characters
  • ([^\r\n:]+?) lazily matches all characters that are colons : or newlines, and capture them to Group 1 (this is what we want), up to...
  • \s*: matches optional whitespace characters and a colon
zx81
  • 41,100
  • 9
  • 89
  • 105
  • 1
    FYI added explanation. :) – zx81 Jun 19 '14 at 06:40
  • 1
    `[ ]*` overcomplicates things. You could just use ` *` and you'd get the same effect. However, this won't match other whitespace characters like line breaks, tabs, etc. so I'd really use `\s*` instead. – Mario Jun 19 '14 at 06:41
  • @Mario I am very aware of the syntax, thanks. :) The `[ ]*` is for readability and can be replaced by a space-star. But I really don't think we don't want a `\s*` just after the anchor. :) – zx81 Jun 19 '14 at 06:48
  • @user3599444 Don't let this confuse you, please ask if you have any questions. :) – zx81 Jun 19 '14 at 06:51
  • Thanks, your regex is short and does what I need. – user3599444 Jun 19 '14 at 06:53
3

I wouldn't suggest parsing JSON using regular expressions. There are small libraries for that, some even header only and with very convenient licensing terms (like rapidjson, which I'm using rightn ow).

But if you really want to, the following expression should find your key/value pairs (note that I'm using Perl, mostly for nice syntax highlighting):

(\w+)\s*:\s*('[^']*'|"[^"]*"|[+\-]?\d+(?:.\d+)?)
  • Keep in mind that this won't work properly with escaped quotes inside your values or not properly enclosed strings.
  • (\w+) will match the full key.
  • \s* matches any or no sequence of space characters.
  • : is really just a direct match.
  • '[^']*' will match any characters enclosed by ' (same for the second part of that bracket).
  • [+\-]?\d+(?:.\d+)? will match any number (with or without decimals).

Edit: Since others provided nice and easy to see online demos, here's mine.

Mario
  • 35,726
  • 5
  • 62
  • 78
  • this is nice. can you suggest how to extend this to handle array values [] and json object values {}? – Oleg Shirokikh Oct 10 '16 at 03:00
  • @OlegShirokikh Just duplicate the sub expressions for quotes to match on corresponding brackets. But as I mentioned, better use a real JSON parser to avoid issues from misinterpreting values. – Mario Oct 10 '16 at 05:40
  • thanks - but that wouldn't immediately work for nested json objects - e.g. when the values are itself json objs enclosed in {}? I was looking at your regex b/c i have the JSON with keys not enclosed in the double quotes, which is not parseable by the library. so my goal is to surround all keys with double quotes. I asked a question for it - but haven't got a nice solution yet - http://stackoverflow.com/questions/39928124/c-parsing-json-string-having-keys-not-enclosed-into-double-quotes. I'd appreciate if you can give a hint on how to approach this using your regex sol-n! – Oleg Shirokikh Oct 10 '16 at 05:43
  • @OlegShirokikh That's exactly the problem this regex approach causes. In theory it can be extended to accommodate for that, but it makes the whole thing really complicated. FYI unquoted keys violate the standard so I guess your best bet would be writing a converter/fix-up script or modify one of the parsers yourself. – Mario Oct 10 '16 at 05:56
2

Try this regular expression:

text is matched initially because it is considered as a key.

(\w+)\s*:\s*(["']).+\2,?

Demo

Stephan
  • 41,764
  • 65
  • 238
  • 329