0

I have a JSON string containing keys that have $.

In databricks notebook, using select prefixed with r to denote a raw-literal:

SELECT
  r'{
         "key1": "value1",
          "$version": 74501,
          "$metadata": {
              "$modified": "2022-08-10T15:32:36.2808234Z"
          }
      }' AS jsonStr

Returns invalid str:

{
    "key1": "value1",
    "": 74501,
    "": {
        "": "2022-08-10T15:32:36.2808234Z"
    }
}

Whereas escaping special characters with \, AND removing the r:

SELECT
  '{
         "key1": "value1",
          "\$version": 74501,
          "\$metadata": {
              "\$modified": "2022-08-10T15:32:36.2808234Z"
          }
      }' AS jsonStr

Returns correct string:

{
    "key1": "value1",
    "$version": 74501,
    "$metadata": {
        "$modified": "2022-08-10T15:32:36.2808234Z"
    }
}

I thought r substituted the need to escape special characters with \ ? I prefer not having to replace $ with \$ on source data as that's expensive.

Anyway, I'm trying to extract the keys into a struct:

SELECT
  jsonStr,
  jsonStr:key1 AS value1,
  struct(
    get_json_object(jsonStr, '$.key1') AS key1,
    get_json_object(jsonStr, '$.\$version') AS version,
    get_json_object(jsonStr, '$.\$metadata.\$modified') AS modified
  ) AS extracted
FROM
  (
    -- Using manually escaped json str for now
    SELECT
      '{ "key1" :"value1",
      "\$version" :74501,
      "\$metadata": { "\$modified" :"2022-08-10T15:32:36.2808234Z" } }' AS jsonStr
  )

This works. However, it is not efficient.

I believe from_json is preferred, but does not work with '$':

from_json(jsonStr, 'key1 STRING, \$version INTEGER')

Appreciate any help with handling $.

alhazen
  • 1,907
  • 3
  • 22
  • 43

1 Answers1

0

I think this is a bug. r or no R I see no difference. $ also breaks in other places, like setting a variable.

It is worth testing whether this applies only to literals entered in code, or whether data read from file is affected.