Assume two data sets that are linked by common columns, let's call them "recipient" and "donor". One can combine the data sets using merge
(R) or LEFT OUTER JOIN
(SQL).
Often, in my applications, there is supposed to be one (and exactly one) matching donor record for each recipient record. If this is violated, there are either missing values or duplicate recipient records in the result.
Question
Is there a special name for this merge/join operation that enforces this additional existence+uniqueness constraint? I want the merge/join operation to fail if this constraint is violated, since in this case something's wrong either with the data or with some earlier code that generated the data.
Sketch of R function
enrich <- function(x, y, ...) {
xd <- deparse(substitute(x))
yd <- deparse(substitute(y))
stopifnot(!("..watchdog.." %in% colnames(y)))
y$..watchdog.. <- TRUE
res <- merge(x, y, sort = FALSE, all.x = TRUE, ...)
if (any(is.na(res$..watchdog..)))
stop("At least one row in x=", xd, " has no corresponding row in y=", yd)
if (nrow(res) > nrow(x))
stop("At least one row in x=", xd, " has duplicate corresponding rows in y=", yd)
res$..watchdog.. <- NULL
res
}
Results of R function
rec <- data.frame(id=letters[1:3], a=1)
don <- data.frame(id=letters[1:3], b=2)
enrich(rec, don)
## id a b
## 1 a 1 2
## 2 b 1 2
## 3 c 1 2
enrich(rec, don[1:2,])
## Error in enrich(rec, don[1:2, ]): At least one row in x=rec has no corresponding row in y=don[1:2, ]
enrich(rec, rbind(don, don))
## Error in enrich(rec, rbind(don, don)): At least one row in x=rec has duplicate corresponding rows in y=rbind(don, don)