0

Is there an equivalent dplyr which does this? I'm after 'replace all' which matches string xxx with NA

is.na(df) <- df=="xxx" 

I want to execute a sparklyr command using the pipe function from R to Spark dataframe

tbl(sc,"df") %>%

and sticking the first script above doesn't work.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
Choc_waffles
  • 518
  • 1
  • 4
  • 15
  • Do you want to replace NA values with a specified string? Or the other way around? Your first statement is confusing to me. – Dale Kube Jul 14 '17 at 04:56
  • Replace all variables with string "xxx" to NA. First script is to assign NA to all variables in df which matches "xxx" – Choc_waffles Jul 14 '17 at 04:58

1 Answers1

0

Replace "XXX" with the string you want to look for:

#Using dplyr piping
library(dplyr)
df[] = df %>% lapply(., function(x) ifelse(grepl("XXX", x), NA, x))

#Using only the base package
df[] = lapply(df, function(x) ifelse(grepl("XXX", x), NA, x))

This method assesses each column in your data frame one-by-one and applies the function to lookup "XXX" and replace it with NA.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
Dale Kube
  • 1,400
  • 13
  • 24
  • 2
    Running the pipe followed by collect (from Spark to local R dataframe) gave me an error "Error in UseMethod("collect") : no applicable method for 'collect' applied to an object of class "list". So improvised as.data.frame(do.call(cbind, lapply(.,function(x)ifelse(grepl("xxx",x)==T,NA,x))), stringsAsFactors=FALSE), and none of the "xxx" was converted to NA – Choc_waffles Jul 18 '17 at 02:48