Context
My question is similar to these two:
But I want to keep multiple rows, if there is a 'conflict' between two rows. (Also these questions start AFTER the merging of data frames. For my question it would be possible to provide an answer that modifies the merging process. I can guarantee, that the IDs are unique for each data frame. Of course it is also possible to provide a more general solution, that starts after the merge.)
The Problem
I want to merge two data frames but ignore included NAs.
Example
Given
> my_data1
ID m n o p q
1 A K U <NA> <NA> K
2 B L V Q A L
3 C M W <NA> S M
4 D N X W D N
5 E O Y <NA> <NA> O
and
> my_data2
ID m n o p r
1 A K U E <NA> U
2 B L V <NA> B V
3 C M W R <NA> W
4 D N X <NA> <NA> X
5 E O Y T F Y
Merging them returns
> my_data
ID m n o p q r
1 A K U E <NA> <NA> U
2 A K U <NA> <NA> K <NA>
3 B L V Q A L <NA>
4 B L V <NA> B <NA> V
5 C M W R <NA> <NA> W
6 C M W <NA> S M <NA>
7 D N X W D N <NA>
8 D N X <NA> <NA> <NA> X
9 E O Y T F <NA> Y
10 E O Y <NA> <NA> O <NA>
But instead I want to obtain
> my_data
ID m n o p q r
1 A K U E <NA> K U
2 B L V Q A L <NA>
3 B L V <NA> B <NA> V
4 C M W R S M W
5 D N X W D N X
6 E O Y T F O Y
ID == B
can not be merged, because of columnp
.ID == A
is quite special because of both NAs in columnp
.IDs
C, D, and E are relatively easy and can be solved by the code below.
Code
my_data1 <- data.frame(ID = LETTERS[1:5],
m = LETTERS[11:15],
n = LETTERS[21:25],
o = c(NA, 'Q', NA, 'W', NA),
p = c(NA, 'A', 'S', 'D', NA),
q = LETTERS[11:15])
my_data2 <- data.frame(ID = LETTERS[1:5],
m = LETTERS[11:15],
n = LETTERS[21:25],
o = c('E', NA, 'R', NA, 'T'),
p = c(NA, 'B', NA, NA, 'F'),
r = LETTERS[21:25])
my_data <- merge(my_data1, my_data2, all = TRUE)
# or
my_data <- bind_rows(my_data1, my_data2) |>
arrange(ID)
# Solution (incomplete)
my_data <- my_data |>
filter(ID != 'A', ID != 'B') |> # Error without this line
select(-m, -n) |> # Error without this line
group_by(ID) |>
summarise(across(.cols = everything(),
.fns = ~ na.omit(.x)))
my_data |>
view()