0

I'm working with some tables that I want to join, for that I use sparklyr (due to tables size) with left_joint of dplyr. here is the code sample :

query.1 <- left_join(pa11, pa12, by = c("CODIGO_HAB_D","ID_EST","ID_ME","ID_PARTE_D","ID_PAR", "ID_REP")) %>% left_join(., pa13, by = c("ID_SINI" = "ID_SINI"))

query.1 <- left_join(query.1, a14, by = "ID_REP" )
query.1 <-left_join(query.1, a16, by = c("ID_MEJ" = "ID_ME"))
query.1 <-left_join(query.1, a17, by = c("ID_EST"  = "ID_ESTE"))
query.1 <-left_join(query.1, a18, by = "ID_PARTE_D" )
query.1 <-left_join(query.1, a19, by = "CODI" )
query.1 <-left_join(query.1, a110, by = c("ID_PROF.x" = "ID_PROF" ))
query.1 <-left_join(query.1, a111, by = c("ID_COM.x" = "ID_COM" ))
query.1 <-left_join(query.1, a113, by = c("ID_GRANDES.x" = "ID_GRANDES"))

When I left_joint the 5 first tables, everything goes as expected. When I repeat this with more tables i get this error

Error in as.vector(x, "character") : 
cannot coerce type 'environment' to vector of type 'character'

then, when I try to take a look at the Spark table I get an error in Rstudio. enter image description here

nidabdella
  • 811
  • 8
  • 24

1 Answers1

0

I get these errors from time to time for some other reasons.

From my experience, increasing the Sparklyr memory and executor overheadmemory helps

    config <- spark_config()
    config$`sparklyr.shell.driver-memory` <- "8G"
    config$`sparklyr.shell.executor-memory` <- "8G"
    config$spark.yarn.executor.memoryOverhead <- "2g"
axiom
  • 406
  • 1
  • 4
  • 16