0

Normally,

F.get_json_object(name, "$.element_name")

works fine to extract the element_name from a JSON object like this

{"element_name" : 1}

But what if the name has a space in this? How do I quote the name?

{"element name" : 1}

this doesn't work obviously

F.get_json_object(name, "$.elementname")

Normally, this is not a pyspark specific problem but it seems like pyspark (and maybe java) can have slightly different specs for the jsonpath.

xiaodai
  • 14,889
  • 18
  • 76
  • 140

2 Answers2

1

For JSON keys that have names that are unfriendly to properties, you'll need to use the indexer syntax.

$["element name"]

(Single quotes should also work.)

gregsdennis
  • 7,218
  • 3
  • 38
  • 71
1

For Spark, one of the following two should be working: (1) dot-notation .name with name excluding any dot . or opening bracket [; or (2) bracket-notation ['name'] with name excluding any single quote ' or question-mark ?, for example:

F.get_json_object('name', "$['element name']")
F.get_json_object('name', "$.element name")

see below from the source code with Scala JsonPathParser:

// parse `.name` or `['name']` child expressions
def named: Parser[List[PathInstruction]] =
  for {
    name <- '.' ~> "[^\\.\\[]+".r | "['" ~> "[^\\'\\?]+".r <~ "']"
  } yield {
    Key :: Named(name) :: Nil
  }

Thus, if the name contains dot or opening bracket, use ['name'], if the name contains single quote or question mark, use .name. otherwise you can select either one. more examples of working expressions:

F.get_json_object('name', "$.Trader Joe's")
F.get_json_object('name', "$['amazon.com']")
jxc
  • 13,553
  • 4
  • 16
  • 34