2

I have a Spark dataframe which consists of empty values in a row. I want to replace null to value of that in another column.

A       B
2017    209
2019    208
2016    NA
2016    NA
2018    209

expected output:

A       B
2017    209
2019    208
2016    2016
2016    2016
2018    209

I have tried using

na.replace

ifelse(is.na(df$B), df$A, df$B)

df$B[is.na(df$B)] = as.character(df$A[is.na(df$B)]

but I get an output with no changes

Vasudha Jain
  • 93
  • 2
  • 10
  • In R, that would be `df$B[is.na(df$B)] <- df$A[is.na(df$B)]` not sure if it would be anything different in spark. – Ronak Shah Aug 14 '19 at 11:36
  • seems you are trying to merge these dataframe in spark. see below if it works https://stackoverflow.com/questions/53587175/spark-incremental-loading-overwrite-old-record/53590644#53590644 – vikrant rana Aug 14 '19 at 11:53

2 Answers2

1

Using dplyr:

library(dplyr)
df <- df%>%
  mutate(B= ifelse(is.na(B),A,B))
Aaron Parrilla
  • 522
  • 3
  • 13
1

You need to use the specific SparkR functions on a spark data frame.

isNull and ifelse

df = data.frame('A' = c(2017, 2019, 2016, 2016, 2018), 'B' = c(209, 208, NA, NA, 209))
spark_df = as.DataFrame(df)

spark_df$B = ifelse(isNull(spark_df$B), spark_df$A, spark_df$B)
head(spark_df)
     A    B
1 2017  209
2 2019  208
3 2016 2016
4 2016 2016
5 2018  209
liamvt
  • 203
  • 1
  • 9