I have a data frame with some columns, and before doing analysis, I'd like to understand how complete the data frame is. So I want to filter the data frame and count for each column the number of non-null values, possibly returning a dataframe back.
Basically, I am trying to achieve the same result as expressed in this question but using Scala instead of Python.
Say you have:
val row = Row("x", "y", "z")
val df = sc.parallelize(Seq(row(0, 4, 3), row(None, 3, 4), row(None, None, 5))).toDF()
How can you summarize the number of non-null values for each column and return a dataframe with the same number of columns and just a single row with the answer?