My problem: I had missing values in a data frame (original_df). After a while I received new data for the original observations for some columns (new_df). I wanted to update the original_df accordingly. Data that wasn't missing priorly should be untouched.
As I understand, this is neither a job for row_bind nor for joins. Here is what I did (and was sufficient), but it seems very repetitive. I wanted to write a function for this, but this didn't work somehow.
#overwrite cells with new values in old table
#join with renaming of new columns (+ .y)
original_df <- left_join(original_df, new_df, by="ID", suffix = c("", ".y"),)
#overwrite old data with data from new columns
original_df <- original_df %>% mutate(col_a = ifelse(!is.na(col_a.y),
col_a.y, col_a)) %>%
mutate(col_b = ifelse(!is.na(col_b.y),
col_b.y, col_b)) %>%
mutate(col_c = ifelse(!is.na(col_c.y),
col_c.y,col_c))
#delete new columns
original_df <- original_df %>% select(-contains(".y"))
#more concise try but not working
update_cell <- function(new_data){
old_data <- paste(new_data, ".y", sep="")
if_else(!is.na(new_data), new_data, old_data)
}
original_df %>% mutate(across(update_cell(c(col_a, col_b, col_c)), update_cell(.)))
Created on 2022-11-06 by the reprex package (v2.0.1)
Sorry for not contributing the original data, which should be kept private.
Thanks in advance! Best regards.