Say I have a function called boop
. It has different behaviour depending on the class of its argument, so I use generics, like so:
library(dplyr)
df <- data.frame(a = c("these", "are", "some", "strings"),
b = 1:4)
boop <- function(x, ...) UseMethod("boop", x)
boop.numeric <- function(x) mean(x, na.rm = TRUE)
boop.character <- function(x) mean(nchar(x), na.rm =TRUE)
df %>% summarise(across(everything(), boop))
# a b
# 1 4.75 2.5
Perfect! Now, say I want to use boop
with a parquet file before collect
ing the data. I can write similar dplyr
code to above for the summarise
, but first I need to register my functions. For example,
library(arrow)
register_scalar_function(
"boop.numeric",
function(context, x) {
mean(x, na.rm = TRUE)
},
in_type = schema(x = float64()),
out_type = float64(),
auto_convert = TRUE
)
But how do I first of all define boop
as a generic? If I translate my original boop
directly into an arrow
format I need to define the input schema. Nevertheless, unlike boop.numeric
or boop.character
, its generic so x
doesn't have a class.
Question: How do I use generics, such as shown above, with Apache Arrow prior to collect
ing data?