There is no on
argument in dplyr::.*_join
. Usage according to ?dplyr::full_join
is
full_join(
x,
y,
by = NULL,
copy = FALSE,
suffix = c(".x", ".y"),
...,
keep = NULL
)
on
is a data.table join argument. We need by
here
library(arrow)
library(dplyr)
full_join(xa1, xa2, by = "x") %>%
collect() %>%
compute()
-output
# A tibble: 2 × 1
x
<int>
1 1
2 2
By looking at the methods
and source code
> methods("full_join")
[1] full_join.arrow_dplyr_query* full_join.ArrowTabular* full_join.data.frame* full_join.Dataset* full_join.RecordBatchReader*
> getAnywhere(full_join.ArrowTabular)
function (x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"),
..., keep = FALSE)
{
query <- do_join(x, y, by, copy, suffix, ..., keep = keep,
join_type = "FULL_OUTER")
if (!keep) {
query$selected_columns <- post_join_projection(names(x),
names(y), handle_join_by(by, x, y), suffix)
}
query
}
by
is used in the functions that are called inside