Replace empty columns to the value in another column in spark dataframe in r

Question

I have a Spark dataframe which consists of empty values in a row. I want to replace null to value of that in another column.

A       B
2017    209
2019    208
2016    NA
2016    NA
2018    209

expected output:

A       B
2017    209
2019    208
2016    2016
2016    2016
2018    209

I have tried using

na.replace

ifelse(is.na(df$B), df$A, df$B)

df$B[is.na(df$B)] = as.character(df$A[is.na(df$B)]

but I get an output with no changes

In R, that would be `df$B[is.na(df$B)] <- df$A[is.na(df$B)]` not sure if it would be anything different in spark. — Ronak Shah, Aug 14 '19 at 11:36
seems you are trying to merge these dataframe in spark. see below if it works https://stackoverflow.com/questions/53587175/spark-incremental-loading-overwrite-old-record/53590644#53590644 — vikrant rana, Aug 14 '19 at 11:53

score 1 · Accepted Answer · answered Aug 14 '19 at 11:41

1

Using dplyr:

library(dplyr)
df <- df%>%
  mutate(B= ifelse(is.na(B),A,B))

answered Aug 14 '19 at 11:41

Aaron Parrilla

522
3
13

score 1 · Answer 2 · answered Aug 14 '19 at 15:23

You need to use the specific SparkR functions on a spark data frame.

isNull and ifelse

df = data.frame('A' = c(2017, 2019, 2016, 2016, 2018), 'B' = c(209, 208, NA, NA, 209))
spark_df = as.DataFrame(df)

spark_df$B = ifelse(isNull(spark_df$B), spark_df$A, spark_df$B)
head(spark_df)
     A    B
1 2017  209
2 2019  208
3 2016 2016
4 2016 2016
5 2018  209

Replace empty columns to the value in another column in spark dataframe in r

2 Answers2