9

Would like to remove a single data table from the Spark Context ('sc'). I know a single cached table can be un-cached, but this isn't the same as removing an object from the sc -- as far as I can gather.

library(sparklyr)
library(dplyr)
library(titanic)
library(Lahman)

spark_install(version = "2.0.0")
sc <- spark_connect(master = "local")

batting_tbl <- copy_to(sc, Lahman::Batting, "batting")
titanic_tbl <- copy_to(sc, titanic_train, "titanic", overwrite = TRUE)
src_tbls(sc) 
# [1] "batting" "titanic"

tbl_cache(sc, "batting") # Speeds up computations -- loaded into memory
src_tbls(sc) 
# [1] "batting" "titanic"

tbl_uncache(sc, "batting")
src_tbls(sc) 
# [1] "batting" "titanic"

To disconnect the complete sc, I would use spark_disconnect(sc), but in this example it would destroy both "titanic" and "batting" tables stored inside of sc.

Rather, I would like to delete e.g., "batting" with something like spark_disconnect(sc, tableToRemove = "batting"), but this doesn't seem possible.

eyeOfTheStorm
  • 351
  • 1
  • 5
  • 15

2 Answers2

18
dplyr::db_drop_table(sc, "batting")

I tried this function and it seems work.

Sonic
  • 196
  • 1
  • 5
  • That looks right to me! Will mark this as correct, unless someone else can prove otherwise. Even by caching without uncaching, it appears the table gets deleted after calling `src_tbls(sc)`. Thanks! – eyeOfTheStorm Dec 20 '16 at 18:28
  • 1
    It returned: Error in UseMethod("db_drop_table"): no applicable method for 'db_drop_table' applied to an object of class "c('spark_connection', 'spark_shell_connection', 'DBIConnection')" – Onur Demir Jul 07 '22 at 18:33
  • 1
    This doesn't work anymore, see https://stackoverflow.com/questions/70710414/in-sparklyr-how-to-drop-a-existing-object-table-thanks – Frank Nov 15 '22 at 19:27
8

The slightly lower-level alternative is

tbl_name <- "batting"
DBI::dbGetQuery(sc, paste("DROP TABLE", tbl_name))
Richie Cotton
  • 118,240
  • 47
  • 247
  • 360