0

I am using sparklyr and it seems to be working well. However, some of my former code will not be implemented.

When is use

complete.cases

I get

Error: org.apache.spark.sql.AnalysisException: undefined function COMPLETE.CASES

I get the same result for the quantile function

Furthermore is seems that in Spark dataframes the is.na is not computed the same way. So when I do

filter(!is.na(V1) & is.na(V2))

I get an empty dataframe instead of it returning all fields that are full in V1 and empty in V2.

Any advice how these functions can be used/modified for sparklyr, or how wrappers for these can be constructed?

Levi Brackman
  • 325
  • 2
  • 17
  • Maybe you want `filter(!is.na(V1) & !is.na(V2))`? The `!` will take precedence over the `&` as you have it, giving rows where `V1` is not missing and `V2` is missing. – Gregor Thomas Nov 03 '16 at 20:50

1 Answers1

0

You can use na.omit as in:

sc <- spark_connect(master = "local")
tbl_flights <- copy_to(sc, flights)

tbl_flights %>% na.omit
Javier Luraschi
  • 912
  • 5
  • 4