is.na and quantile with sparklyr

Question

I am using sparklyr and it seems to be working well. However, some of my former code will not be implemented.

When is use

complete.cases

I get

Error: org.apache.spark.sql.AnalysisException: undefined function COMPLETE.CASES

I get the same result for the quantile function

Furthermore is seems that in Spark dataframes the is.na is not computed the same way. So when I do

filter(!is.na(V1) & is.na(V2))

I get an empty dataframe instead of it returning all fields that are full in V1 and empty in V2.

Any advice how these functions can be used/modified for sparklyr, or how wrappers for these can be constructed?

Maybe you want `filter(!is.na(V1) & !is.na(V2))`? The `!` will take precedence over the `&` as you have it, giving rows where `V1` is not missing and `V2` is missing. — Gregor Thomas, Nov 03 '16 at 20:50

score 0 · Answer 1 · answered Nov 16 '16 at 16:25

0

You can use na.omit as in:

sc <- spark_connect(master = "local")
tbl_flights <- copy_to(sc, flights)

tbl_flights %>% na.omit

answered Nov 16 '16 at 16:25

Javier Luraschi

1 Answers1