-3

I am writing a JavaScript function that generates a Python program. Can I use JavaScript's JSON.stringify() on a string and expect a valid Python string every time or are there edge cases that mean I have to write my own toPythonString() function?

Boris Verkhovskiy
  • 14,854
  • 11
  • 100
  • 103
  • 1
    Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/240484/discussion-on-question-by-boris-are-all-json-strings-also-syntactically-valid-py). – deceze Dec 27 '21 at 19:58

2 Answers2

2

The short answer is no.

The long answer is in practice, yes. Almost. The difference is that JSON strings can have backslash-escaped forward slashes. So the string "\/" is interpreted by JSON (and JavaScript) as a single forward slash '/' but as a backslash followed by a forward slash by Python '\\/'.

However, it's not mandatory to serialize forward slashes this way in JSON. If you have control over the JSON serializer and can be sure that it doesn't backslash escape forward slashes then your JSON string should always be a valid Python string.

Looking at the source code of the JSON serializer in V8 (Google Chrome's and Node.js's JavaScript engine), it always serializes '/' as "/", as you'd expect.


JSON string syntax is documented on json.org and Python string syntax is documented in the Lexical analisys section of its docs.

We can see that a regular JSON string character is "Any codepoint except ", \ or control characters" whereas Python string characters are "Any source character except \ or newline or the quote". Since newline is a control character, this means JSON is more restrictive than Python, which is good. Then the question is what is the overlap between "Any codepoint" and "Any source character"? I don't know how to answer that, but I'd guess they're probably the same, assuming both the JSON and Python are encoded in UTF-8. (If you're interested in JavaScript strings instead of JSON strings, then JavaScript is generally encoded in UTF-16, so there could be some incompatibilities arising from that here.)

We also see that JSON has some backslash escapes, which are all supported by Python except one, the escaped forward slash, \/. This is the one part of JSON strings that isn't shared by Python.

Finally, in JSON we can use the \uXXXX syntax to escape any character, which is also supported in Python.

Boris Verkhovskiy
  • 14,854
  • 11
  • 100
  • 103
  • 3
    JSON encodes Unicode characters above the BMP differently (as surrogate pairs), which is probably the most significant difference for strings! – deceze Dec 27 '21 at 19:14
  • @deceze can I get a specific example character to try? [Shavian letters](https://en.wikipedia.org/wiki/Shavian_(Unicode_block)) are in the Supplementary Multilingual Plane and `JSON.stringify("")` returns `'""'`, so I'm not sure what you mean. – Boris Verkhovskiy Dec 27 '21 at 19:39
  • If they’re encoded to plain UTF-8, fine. But if you’re using ASCII-safe `\u` escape sequences, you’ll see a difference. Good test characters are emoji. – deceze Dec 27 '21 at 19:58
  • @deceze so then, as I said "they're probably the same, assuming both the JSON and Python are encoded in UTF-8"? – Boris Verkhovskiy Dec 27 '21 at 20:00
  • Tentatively agree, there isn’t a lot to those encodings otherwise that I’m aware of. – deceze Dec 27 '21 at 20:22
0

If you want to serialize a string as Python source code in JavaScript, do it like this:

const regexSingleEscape = /'|\\|\p{C}|\p{Z}/gu;
const regexDoubleEscape = /"|\\|\p{C}|\p{Z}/gu;

function asPythonStr(s) {
  let quote = "'";
  if (s.includes("'") && !s.includes('"')) {
    quote = '"';
  }
  const regex = quote === "'" ? regexSingleEscape : regexDoubleEscape;
  return (
    quote +
    s.replace(regex, (c: string): string => {
      switch (c) {
        case " ":
          return " ";
        case "\x07":
          return "\\a";
        case "\b":
          return "\\b";
        case "\f":
          return "\\f";
        case "\n":
          return "\\n";
        case "\r":
          return "\\r";
        case "\t":
          return "\\t";
        case "\v":
          return "\\v";
        case "\\":
          return "\\\\";
        case "'":
        case '"':
          return "\\" + c;
      }
      const hex = (c.codePointAt(0) as number).toString(16);
      if (hex.length <= 2) {
        return "\\x" + hex.padStart(2, "0");
      }
      if (hex.length <= 4) {
        return "\\u" + hex.padStart(4, "0");
      }
      return "\\U" + hex.padStart(8, "0");
    }) +
    quote
  );
}
Boris Verkhovskiy
  • 14,854
  • 11
  • 100
  • 103