6

normally I parse a json string to json object instead of manipulating the json string directly. for example, a json string like

{"number": "1234567"}

if I have to add 000 at the end

...
{...,"number" : "1234567000",...}
....

I will use jackson either parse it as Json Object or POJO

I understand readability perspective parsing to Json object or POJO is much better, but I'm curious about the performance. In this case, if I manipulate the json string directly, I have to use regex to extract the number attribute, and add 000 at the end, which is much more expensive than parsing to Json Object if having lots of data? because string object basically creates a new string object?

EDIT: Based on @Itai Steinherz's link I also make a benchmark in JS, and it shows json parse is better https://jsbench.me/93jr1w6k5b/1

Holm
  • 2,987
  • 3
  • 27
  • 48
  • 2
    Why don't you check performance yourself under load or using currentMillis and tell us? It really depends on various number of factors like json size and etc – deathangel908 Jan 17 '19 at 15:05
  • 5
    Everytime you have to decide between parse and regex, go for parse. A parser has strict rules which get maintained (almost always) by a team of professionals. Whereas writing your own regex is quite error-prone – Lino Jan 17 '19 at 15:05
  • `Everytime you have to decide between parse and regex, go for parse` Well it's not true, if performance is really a bottleneck you can consider to do something crazy. As as I said if it's really so. – deathangel908 Jan 17 '19 at 15:07
  • @deathangel908 true, with highly specialized code you might be able to squeeze out some performance but string manipulation on Json you can't control is bound to get buggy in a nasty way. And if you can control the Json (structure, contents etc.) then there are probably faster, easier and less error prone ways to implement it by changing the Json/api itself. – Thomas Jan 17 '19 at 15:11
  • Ways regex can break when "just appending a few zeroes to a number" if you don't fully control possible inputs: the string expected to be a number contains non-digits (e.g. `.` or `E`) - what do you do? Silently leave the input unmodified? Throw? Append zeroes anyway? What if the "number" is `0` - is appending a few more zeroes the right thing to do, or did you mean to add `+1000`? – Hulk Jan 17 '19 at 15:15
  • Of course, there are rare occasions where "just appending" something is correct - for example, these numbers might be some kind of serial number, where one of two systems appends some additional digits that the other one omits because they are known to be always zero anyway. Still, I would always prefer parsing, because it gives you more flexibility as to how to react when encountering unexpected input. – Hulk Jan 17 '19 at 15:28

2 Answers2

0

Since I'm not very familiar with JSON parsing/manipulation in Java, I'll compare the same operations in JavaScript (which I am more experienced in).

Comparing using a basic regex with .replace and using JSON.parse & JSON.stringify, the result are that using JSON.parse is slower by a small percentage (4.37% to be precise).

However, I don't think the perf gain is worth it, and I would always go with more readable and maintainable code (the JSON.parse approach) rather than the more performant (the .replace approach).

See the complete benchmark I used here.

Itai Steinherz
  • 777
  • 1
  • 9
  • 19
  • this question is about java and not javascript, though you have a probable valid answer in the second paragraph – Lino Jan 18 '19 at 08:14
  • Thanks, I didn't even notice. I'll update my answer. – Itai Steinherz Jan 18 '19 at 08:22
  • I add more manipulations based on your link, and it shows JSON.parse performs better https://jsbench.me/93jr1w6k5b/1 – Holm Jan 18 '19 at 10:21
  • @AkiraSendoh That's because you are comparing two pieces of code which do different things. The first sets `number1`...`number13` to be `{jsonObject.number}000`, and the second sets `number` to be `{number}000` 13 times. – Itai Steinherz Jan 18 '19 at 10:29
  • I changed to .number instead of number1 ... number13, the result was the same. if several new attributes are created and it should further slow Json Parse approach, or? – Holm Jan 18 '19 at 10:34
  • @AkiraSendoh In your fork, the `JSON.parse` benckmark does a different thing than the `.replace` one - it set 13 different properties, whereas the second sets the same property 13 different times. Also, the `.replace` appends `000` to `jsonObj.number` on every line. – Itai Steinherz Jan 18 '19 at 17:36
  • changing to set 1 property the `JSON.parse` approach performs still better, and to set 1 property should be less expensive than to set 13 properties. For your second question, appending 000 indicates one operation for an attribute(or several attributes). This mimics the case that after parsing to JSON, several operations are executed on a Json Object. For manipulating directly on a String, `.replace` on every line is the only way – Holm Jan 21 '19 at 08:57
0

I would just use the regular expression.

I imagine the parser is using some form of pattern matching, or most-likely a for-loop on a character stream.

\"number\":\s+\"(\d+)\"

You can reference RFC 8259 for a complete BNF on the JSON grammar and syntax.

Reilas
  • 3,297
  • 2
  • 4
  • 17