9

I have a dataframe and I want to use one of the replace() function of org.apache.spark.sql.DataFrameNaFunctions on that dataframe.

Problem: I don't get these methods in intelligence (suggestions) with dataframe's instance. I imported that class explicitly.

I am not able to find any stuff which can give me some demonstration of how to use these functions or how to cast dataframe to type of DataFrameNaFunctions.

I tried to cast it using asInstanceof[] method but it throws exception.

eliasah
  • 39,588
  • 11
  • 124
  • 154
Parth Vishvajit
  • 295
  • 4
  • 13

1 Answers1

20

This can be a bit confusing but it's quite straightforward to be honest. Here is an small example :

scala> val df = sqlContext.read.format("com.databricks.spark.csv").option("header","true").option("inferSchema","true").load("na_test.csv")
// df: org.apache.spark.sql.DataFrame = [name: string, age: int]

scala> df.show()
// +-----+----+
// | name| age|
// +-----+----+
// |alice|  35|
// |  bob|null|
// |     |  24|
// +-----+----+

scala> df.na.fill(10.0,Seq("age"))
// res4: org.apache.spark.sql.DataFrame = [name: string, age: int]

// scala> df.na.fill(10.0,Seq("age")).show
// +-----+---+
// | name|age|
// +-----+---+
// |alice| 35|
// |  bob| 10|
// |     | 24|
// +-----+---+

scala> df.na.replace("age", Map(35 -> 61,24 -> 12))).show()
// +-----+----+
// | name| age|
// +-----+----+
// |alice|  61|
// |  bob|null|
// |     |  12|
// +-----+----+

To access org.apache.spark.sql.DataFrameNaFunctions you can call .na.

eliasah
  • 39,588
  • 11
  • 124
  • 154
  • 2
    ya.. You are actually right. I am so sorry for this kind of silly question. But I wasn't having an idea of that .na variable can get access on functions of DataFrameNaFunctions. Really, Thank you for approaching. @eliasah – Parth Vishvajit Apr 08 '16 at 13:21