1

I have an internal package that uses S3 method dispatch to enable lazy operations on what are likely large tables. Currently, I support:

  • "data.frame", in the case where I already downloaded a satisfying subset;
  • "Dataset", derived from arrow::open_dataset; I chose this (out of several, see below) because it seemed the most agnostic to whatever arrow could throw my way (e.g., local parquet file or Arrow Flight);
  • "arrow_dplyr_query", where the arrow connection has been "touched" by a dplyr lazy operation;
  • for legacy reasons, I also dispatch on database connections (e.g., "Microsoft SQL Server") and deal with the SQL literally.

I'd like to add a dbplyr connection (e.g., tbl(con, "TableName")), and it inherits multiple classes:

class(tb)
# [1] "tbl_Microsoft SQL Server" "tbl_dbi"                  "tbl_sql"                  "tbl_lazy"                
# [5] "tbl"                     

I'm having a difficult time determining which is best (by definition or intent) for this purpose. tbl might be okay, but is fairly close to data.frame, will it fail me? Since there are actually multiple DBMS types in use (beyond mssql), I don't want to choose the first. I believe tbl_dbi, tbl_sql, and tbl_lazy will likely all satisfy the intent.

I believe that often classes have formality merely in the functions written to dispatch on them. Since I don't know how the authors intend these to be used (other than inferring from the functions that dispatch), are there risks with using the (seemingly) top level "tbl" for dispatch?

Some example classes:

library(dplyr)
# con <- DBI::dbConnect(...)
tb <- tbl(con, "TableName")
pq <- arrow::open_dataset("myfile.pq")

class(pq)
# [1] "FileSystemDataset" "Dataset"           "ArrowObject"       "R6"               
pq %>%
  filter(Name == "ABC") %>%
  class()
# [1] "arrow_dplyr_query"

class(tb)
# [1] "tbl_Microsoft SQL Server" "tbl_dbi"                  "tbl_sql"                  "tbl_lazy"                
# [5] "tbl"                     
tb %>%
  filter(Name == "ABC") %>%
  class()
# [1] "tbl_Microsoft SQL Server" "tbl_dbi"                  "tbl_sql"                  "tbl_lazy"                
# [5] "tbl"                     
r2evans
  • 141,215
  • 6
  • 77
  • 149

0 Answers0