0

My problem: I had missing values in a data frame (original_df). After a while I received new data for the original observations for some columns (new_df). I wanted to update the original_df accordingly. Data that wasn't missing priorly should be untouched.

As I understand, this is neither a job for row_bind nor for joins. Here is what I did (and was sufficient), but it seems very repetitive. I wanted to write a function for this, but this didn't work somehow.

#overwrite cells with new values in old table

#join with renaming of new columns (+ .y)
original_df <- left_join(original_df, new_df, by="ID", suffix = c("", ".y"),)

#overwrite old data with data from new columns
original_df <- original_df %>% mutate(col_a = ifelse(!is.na(col_a.y),
                                                  col_a.y, col_a)) %>%
  mutate(col_b = ifelse(!is.na(col_b.y),
                             col_b.y, col_b)) %>%
  mutate(col_c = ifelse(!is.na(col_c.y),
                           col_c.y,col_c))

#delete new columns
original_df <- original_df %>% select(-contains(".y"))


#more concise try but not working
update_cell <- function(new_data){
  old_data <- paste(new_data, ".y", sep="")
  if_else(!is.na(new_data), new_data, old_data)
}

original_df %>% mutate(across(update_cell(c(col_a, col_b, col_c)), update_cell(.)))

Created on 2022-11-06 by the reprex package (v2.0.1)

Sorry for not contributing the original data, which should be kept private.

Thanks in advance! Best regards.

0 Answers0