0

I want to calculate the correlation matrix of a Spark table in R, I tried using cor() has in R, but it does not work, here the code:

library(sparklyr)
library(dplyr)
sc <- spark_connect(master = "local")

flights_tbl <- copy_to(sc, nycflights13::flights, "flights")
data = flights_tbl

numeric_data = select_if(datos,function(col) is.numeric(col))

Then I tried cor(numeric_data) and this is what I get:

>cor(numeric_data)
Error in cor(numeric_data) : supply both 'x' and 'y' or a matrix-like 'x'

I am using

Spark 2.0.2
dplyr 0.7.2
sparklyr 0.7.0-9000

then how can I get the correlation matrix

Joe
  • 561
  • 1
  • 9
  • 26
  • What do you expect by calculating correlation of `flights` data.frame? Have you tried doing this in R? Does it work? What do you think the error is trying to tell you? You are using `cor(data)` and `cor(numeric_data)`. Which one is giving you which error? Notice that there is no creation of the `datos` object in your code. – Roman Luštrik Sep 27 '17 at 14:38
  • @RomanLuštrik I just want to know how to calculate the correlation of a Spark data frame in R. Is cor(numeric_data) sorry that was a typing error – Joe Oct 03 '17 at 19:48

0 Answers0