2

I have a specific requirement wherein, i need to check for empty DataFrame. If empty then populate a default value. Here is what i tried but not getting what i want.

def checkNotEmpty(df: org.apache.spark.sql.DataFrame, col: String):org.apache.spark.sql.DataFrame = 
 {
 if (!df.rdd.isEmpty())  df
    else
  df.na.fill(0, Seq(col))
 }

val age = checkNotEmpty(w_feature_md.filter("age='22'").select("age_index"),"age_index")

The idea is to get the df if it is not empty. If it is empty then fill in a default value of ZERO. This doesn't seem to work. The following is what i am getting.

scala> age.show
+---------+
|age_index|
+---------+
+---------+

Please help..

Balaji Krishnan
  • 437
  • 8
  • 27

1 Answers1

2
  def checkNotEmpty(df: org.apache.spark.sql.DataFrame, col: String):org.apache.spark.sql.DataFrame = 
     {
     if (!df.rdd.isEmpty())  df
        else
      df.na.fill(0, Seq(col))
     }

In your method :

control goes to if part if the df is not empty .

And goes to else part when df is empty .

df.na (org.apache.spark.sql.DataFrameNaFunctions) : Functionality for working with missing data in DataFrames.
Since you are using df.na on an empty dataframe , there is nothing to replace hence result is always empty.

Check this ques for more on replacing null values in df.

Community
  • 1
  • 1
bob
  • 4,595
  • 2
  • 25
  • 35
  • Thanks @p2. Is there a way to fill in a default value of ) when it is empty – Balaji Krishnan Sep 16 '16 at 08:32
  • thanks again. It is still not working as i expected. ` def checkNotEmpty(df: org.apache.spark.sql.DataFrame, col: String):org.apache.spark.sql.DataFrame = { if (df.rdd.isEmpty()) { println("here"); df.na.fill(0.0,Seq(col)) } else df } ` I tried the above as well. The value is not NULL but empty and hence i don't think **df.na.fill** works in this case.. – Balaji Krishnan Sep 16 '16 at 09:19
  • check this ques : http://stackoverflow.com/questions/33376571/replace-null-value-in-spark-dataframe – bob Sep 16 '16 at 09:21
  • you can try somthing like this: df.na.replace("age", Map(35 -> 61,24 -> 12))).show() – bob Sep 16 '16 at 09:23
  • thanks once more.. This did not help either. I was not sure what are we doing, but did what you had suggested. Below are the things what i tried. ** scala> age.na.replace("age",Map(35->61,24 ->12)).show() +---------+ |age_index| +---------+ +---------+ scala> age.na.replace("age_index",Map(35->61,24 ->12)).show() +---------+ |age_index| +---------+ +---------+ scala> age.na.replace("",Map(35->61,24 ->12)).show() +---------+ |age_index| +---------+ +---------+ ** – Balaji Krishnan Sep 16 '16 at 10:10
  • i guess it should be : age.na.replace("age_index",Map(""->"EMPTY_VALUE")).show() – bob Sep 16 '16 at 10:27
  • that is not working either. Thank you so much for all the help so far. – Balaji Krishnan Sep 16 '16 at 11:34