17

I saw a solution here but when I tried it doesn't work for me.

First I import a cars.csv file :

val df = sqlContext.read
              .format("com.databricks.spark.csv")
              .option("header", "true")
              .load("/usr/local/spark/cars.csv")

Which looks like the following :

+----+-----+-----+--------------------+-----+
|year| make|model|             comment|blank|
+----+-----+-----+--------------------+-----+
|2012|Tesla|    S|          No comment|     |
|1997| Ford| E350|Go get one now th...|     |
|2015|Chevy| Volt|                null| null|

Then I do this :

df.na.fill("e",Seq("blank"))

But the null values didn't change.

Can anyone help me ?

eliasah
  • 39,588
  • 11
  • 124
  • 154
Gavin Niu
  • 1,315
  • 4
  • 20
  • 27
  • The statement `df.na.fill("e",Seq("blank"))` returns a new `DataFrame` so `df` will not be modified. Are you assigning it into a new `DataFrame`? – Rohan Aletty Oct 27 '15 at 19:27

3 Answers3

32

This is basically very simple. You'll need to create a new DataFrame. I'm using the DataFrame df that you have defined earlier.

val newDf = df.na.fill("e",Seq("blank"))

DataFrames are immutable structures. Each time you perform a transformation which you need to store, you'll need to affect the transformed DataFrame to a new value.

eliasah
  • 39,588
  • 11
  • 124
  • 154
  • There is no error message. The null values comes from 0/0 divide but I cannot replace of with `val newDf = outputDF.na.fill("0", Seq("blank"))` – mathema Feb 25 '19 at 12:10
  • @mathema do you mind asking a question describing your problem with a reproducible example ? I can't figure out what's wrong with your actual describe and I don't think that your problem can fit nicely into comments – eliasah Feb 25 '19 at 12:14
  • My dataframe has also null values comes from 0/0 dividing. The type of field is a kind of string. I tried to replace null values using `val newDf = outputDF.na.fill("0", Seq("blank"))` and showing with `newDf.show()` but it don't work. Dataframe example https://i.imgur.com/qrWZXg8.png – mathema Feb 25 '19 at 12:19
  • This doesn't answer the question that I have asked @mathema – eliasah Feb 25 '19 at 12:20
  • 1
    Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/188992/discussion-between-eliasah-and-mathema). – eliasah Feb 25 '19 at 12:21
  • So, while the above solution works for me, Cant we use this approach? Well, it didnt work for me, so i thought, i must be doing something wrong on the syntax or usage. .withColumn("blank", when(col("blank") === null, 0).otherwise(col("blank"))). It doesnt take effect even after assigning to another DF variable. – Ak777 Oct 06 '20 at 18:14
3

you can achieve same in java this way

Dataset<Row> filteredData = dataset.na().fill(0);
Bhagwati Malav
  • 3,349
  • 2
  • 20
  • 33
0

If the column was string type,

val newdf= df.na.fill("e",Seq("blank"))

would work.

Since it's float type (as the image tells) you need to use

val newdf= df.na.fill(0.0, Seq("blank"))

Y. Yazarel
  • 1,385
  • 1
  • 8
  • 13