1

If you JSON-decode material containing a value that contains a backslashed "n" to indicate a newline, at what point should you replace it with a true newline?

Here's an artificial example:

let dict = ["key": "value\\nvalue"]
let json = try! JSONEncoder().encode(dict)
let result = try! JSONDecoder().decode([String:String].self, from: json)
print(result["key"])

That outputs "value\\nvalue", but for purposes of display in my app, I then call replacingOccurrences to change "\\n" (meaning backslashed n" into "\n" (meaning newline) so that an actual newline appears in my interface.

Now, I'm a little surprised that JSONDecoder isn't already doing this for me. Just as it has a configurable policy for decoding a date string value into a date, I would expect it at the least to have a configurable policy for decoding a string value into a string. But it doesn't.

My question is: what do people do about this sort of situation, as a general rule? Dealing with it on a case by case basis, as I'm doing in my app, feels wrong; in reality, these JSON data are coming from the server and I want all JSON HTTP response bodies to be treated in this way.

matt
  • 515,959
  • 87
  • 875
  • 1,141
  • 1
    *"what do people do about this sort of situation"* - don't escape the newline in the JSON text. This becomes a non-issue if an actual newline is provided in the JSON text from the server. There's no need to support this in JSONDecoder because there's no need to escape newlines. A newline character is no different than any other character. A double-quote is the only character that needs to be escaped (for obvious reasons). – HangarRash Jul 21 '23 at 17:18
  • Your dictionary has \\n, which is two characters, a backslash followed by a `n` (and your string literal is escaping the backslash.) The resulting JSON represents that as *three* actual characters, a backslash followed by a backslash followed by a `n`. It would be a serious mistake if `JSONDecoder` took those three characters and replaced them with a single newline character. Bottom line, both `JSONEncoder` and `JSONDecoder` are both doing the right thing with re both newline characters and backslashes. – Rob Jul 21 '23 at 17:26
  • @HangarRash Well, you tell that to (a) the server people and (b) the JSON people (literal newlines are not permitted in JSON until JSON 5). – matt Jul 21 '23 at 17:38
  • 1
    So, that having been said, how can we help you? `JSONDecoder` is definitely doing the correct thing. Sounds like your backend (which is outside of your control) is just poorly designed/implemented. If they’re sending you model data that includes actual escape characters, then you really are stuck manually unescaping it. By the way, when you do this, remember that newline is not the only problem. You will probably want to confirm what they’re doing with quotation marks, \u escape codes, etc. – Rob Jul 21 '23 at 18:09
  • 2
    @matt I don't understand what you're meaning here about "literal newlines are not permitted in JSON." This absolutely works fine if the JSON data is `value\nvalue` without an extra backslash. I mean the byte "backslash" followed by the byte `n`. There's no need for a literal newline. And it, by JSON spec, encodes 0x0a, newline. So I agree that your server folks may be doing a thing, but "JSON people" aren't the problem. JSON totally handles this. If you can't fix the server, then as you say, you'll need some ad hoc workaround for non-standard encoding. – Rob Napier Jul 21 '23 at 18:55
  • 2
    "I want all JSON HTTP response bodies to be treated in this way." I absolutely don't want non-spec transformations applied to my JSON. That's what breaks me all the time when I work in PHP, and to a lesser extent JavaScript. Please stop guessing at what the bytes probably mean, and just decode according to the rule. I've seen too many systems that apply "replaceEscapesAndPercentEscapes" repeatedly until the strings stops changing. That's bad. – Rob Napier Jul 21 '23 at 19:00
  • @RobNapier That's a pretty good answer: if you pull both those comments together and make an answer it will give me something to sink the company's teeth into. – matt Jul 21 '23 at 19:03
  • But if newlines in dictionary string values are legal I don't quite get why e.g. jsonlint.com rejects them. – matt Jul 21 '23 at 19:19
  • jsonlint.com rejects them because it is not validating the strings within your dictionary or model object (which may have `0a` newline characters in them), but rather is validating the strings within the encoded JSON (which should not have `0a`, but rather be encoded as `5c6e`). – Rob Jul 22 '23 at 17:34

2 Answers2

2

It looks like the server is sending 5c5c6e (i.e. backslash-backslash-n or \\n). That's valid JSON, but it doesn't mean "newline." It means "backslash-n" (\ followed by n). If the server means to send newline, that's mis-encoded. It needs to be 5c6e, "backslash-n." Sure, you can fix it on the client side, but there's no "normal" way to do that because it's just wrong.

The right way to fix that is to fix it on the server side. You can double-unescape the strings on the client-side, but that's ambiguous and I don't recommend it unless there's no better way. Repeatedly unescaping tends to mess things up when actual backslashes show up in the string.

"Literal newlines" are not allowed in JSON strings in that the byte 0a is not allowed between quotation marks. Putting that into JSONLint should fail. But 5c6e (backslash-n or \n) is, and is the correct way to do it.

Rob
  • 415,655
  • 72
  • 787
  • 1,044
Rob Napier
  • 286,113
  • 34
  • 456
  • 610
  • 1
    Yes, it really does contain `5c5c6e`. Really. It really does. I know the difference between printout formats. I can pause at a breakpoint and say `exp response.body.forEach { print(String($0, radix: 16)) }` and see the bytes as hex bytes. The problem I'm having is the problem I described. What I need from you is armament to tell my boss to tell the server people not to do that. – matt Jul 21 '23 at 20:52
2

The backend is not providing the data correctly. The JSONEncoder and JSONDecoder do properly escape and unescape characters.

What I need … is armament to tell my boss to tell the server people not to do that.

You can tell your boss that that the raw JSON payload should be 5c6e, not 5c5c6e. If you are receiving JSON with 5c5c6e in it, then the server is providing a newline character (0a) that has been incorrectly escaped twice.

Beyond that, we can’t be more specific. That having been said, there are one of two likely sources of this problem:

  1. The actual server database/model contains 5c6e rather than 0a and the JSON encoding is (as it should) converting that initial 5c to 5c5c, thus resulting in 5c5c6e in the final raw JSON payload; or

  2. The database/model contains 0a, and the backend devs are unaware that standard JSON encoding routines will properly escape/convert this to 5c6e for them and are manually (whether intentionally or not) converting it themselves to 5c6e, and again the server’s JSON encoder is (correctly) converting it to 5c5c6e.

The first scenario is likely what is going on, but we don’t have enough information to diagnose it further. You will likely need to have some back end dev look at hex representations of what is actually in the database to figure out where the problem rests.


For what it is worth, if the first scenario applies, to may need to go back further in the process to figure how how the 5c6e got into the database/model in the first place (if that is indeed the case).

We should recognize that it might not be a server bug at all, but a matter of garbage-in-garbage-out. I.e., some client app may have over-escaped the original input. Perhaps a client app had input with a 0a newline character, added escapes itself, and provided 5c6e to its JSON encoder, resulting in sending the server 5c5c6e in the raw JSON, and the server dutifully unescaped it and stored 5c6e in the database/model.

Bottom line, you have to determine what the server really has in its model/database, and figure out whether it was a bug in the process of storing the model data on the server or in the process of retrieving the model data from the server.

Rob
  • 415,655
  • 72
  • 787
  • 1,044