0

I'm using pyspark 3.2.1. I'm trying to find missing value count in each of the column of my pyspark data frame. So I used following code

dataColumns=['columns in my data frame']
df.select([count(when(isnan(c), c)).alias(c) for c in dataColumns]).show(truncate=False)

But I got error message

---------------------------------------------------------------------------
AnalysisException                         Traceback (most recent call last)
<ipython-input-56-6c7766e33c77> in <module>()
      1 dataColumns=['myDate']
----> 2 df.select([count(when(isnan(c), c)).alias(c) for c in dataColumns]).show(truncate=False)

/usr/local/spark/python/pyspark/sql/dataframe.py in select(self, *cols)
   1667         [Row(name='Alice', age=12), Row(name='Bob', age=15)]
   1668         """
-> 1669         jdf = self._jdf.select(self._jcols(*cols))
   1670         return DataFrame(jdf, self.sql_ctx)
   1671 

/usr/local/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1303         answer = self.gateway_client.send_command(command)
   1304         return_value = get_return_value(
-> 1305             answer, self.gateway_client, self.target_id, self.name)
   1306 
   1307         for temp_arg in temp_args:

/usr/local/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
    115                 # Hide where the exception came from that shows a non-Pythonic
    116                 # JVM exception message.
--> 117                 raise converted from None
    118             else:
    119                 raise

AnalysisException: cannot resolve 'isnan(`myDate`)' due to data type mismatch: argument 1 requires (double or float) type, however, '`myDate`' is of timestamp type.;
'Aggregate [count(CASE WHEN isnan(myDate#1994) THEN myDate END) AS myDate#5831]

Can you please help me to resolve this issue?

JDoe
  • 423
  • 2
  • 9
  • 19
  • 1
    Does this answer your question? [PySpark - Resolving isnan errors with TimeStamp datatype](https://stackoverflow.com/questions/70458759/pyspark-resolving-isnan-errors-with-timestamp-datatype) – blackbishop Feb 16 '22 at 09:58
  • 1
    Check this out https://stackoverflow.com/questions/44413132/count-the-number-of-missing-values-in-a-dataframe-spark – JAdel Feb 16 '22 at 10:42

0 Answers0