1

I need the count of rows in a JSON text where A.adList.optionalField=null

The JSON looks like :

{  
   "A":{  
      "adList":[  
         {  
            "a":"qwfqw"
         },
         {  
            "b":"fqw",
            "c":23423,
            "optionalField":null
         }
      ]
   }
}

This works:

df.select(df("id")).where(array_contains(df("A.adList.optionalField"),4)).registerTempTable("hb")


select count(*) from hb

However, I can't do the same for NULL

df.select(df("id")).where(array_contains(df("A.adList.optionalField"),"null")).registerTempTable("hb")

Any idea how i can do this easily? The question Check if arraytype column contains null here talks about possible NULLs in a Seq[Int], where i am dealing with a possible NULL field in a Struct in a Seq[Struct] as posted above.

GothamGirl
  • 299
  • 1
  • 2
  • 15
  • 1
    Possible duplicate of [Check if arraytype column contains null](https://stackoverflow.com/questions/44307744/check-if-arraytype-column-contains-null) – vdep Jan 30 '19 at 10:10
  • In the question you posted above there is a Seq[Integer], I am looking for NULL values for a particular field in a struct; Seq[Struct] – GothamGirl Jan 30 '19 at 10:53
  • can you paste the schema of `df`? – vdep Jan 30 '19 at 14:23

1 Answers1

-1

The array_contains() doesn't allow null in the second parameter. To check if the array has null, you can do sort_array(), by setting ascending=true. Then if the first element is null, you can test it again isnull(sort_array(col(a),true)(0))

Check this out:

scala> val df = spark.read.format("json").option("multiLine","true").load("/tmp/stack/tanvi.json").toDF("id")
df: org.apache.spark.sql.DataFrame = [id: struct<adList: array<struct<a:string,b:string,c:bigint,optionalField:string>>>]

scala> df.printSchema
root
 |-- id: struct (nullable = true)
 |    |-- adList: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- a: string (nullable = true)
 |    |    |    |-- b: string (nullable = true)
 |    |    |    |-- c: long (nullable = true)
 |    |    |    |-- optionalField: string (nullable = true)


scala> df.select(sort_array(df("id.adList.optionalField"),true)(0),size(df("id.adList.optionalField"))).show(false)
+---------------------------------------------------------------+------------------------------------------------+
|sort_array(id.adList.optionalField AS `optionalField`, true)[0]|size(id.adList.optionalField AS `optionalField`)|
+---------------------------------------------------------------+------------------------------------------------+
|null                                                           |2                                               |
+---------------------------------------------------------------+------------------------------------------------+


scala> df.select(sort_array(df("id.adList.optionalField"),true)(1),size(df("id.adList.optionalField"))).show(false)
+---------------------------------------------------------------+------------------------------------------------+
|sort_array(id.adList.optionalField AS `optionalField`, true)[1]|size(id.adList.optionalField AS `optionalField`)|
+---------------------------------------------------------------+------------------------------------------------+
|null                                                           |2                                               |
+---------------------------------------------------------------+------------------------------------------------+


scala> df.select(isnull(sort_array(df("id.adList.optionalField"),true)(0)),size(df("id.adList.optionalField"))).show(false)
+-------------------------------------------------------------------------+------------------------------------------------+
|(sort_array(id.adList.optionalField AS `optionalField`, true)[0] IS NULL)|size(id.adList.optionalField AS `optionalField`)|
+-------------------------------------------------------------------------+------------------------------------------------+
|true                                                                     |2                                               |
+-------------------------------------------------------------------------+------------------------------------------------+


scala>
stack0114106
  • 8,534
  • 3
  • 13
  • 38