Updating data.frame from another data.frame with different columns in R

Question

Given two data frames

old.df = data.frame(SampleNo=c('A1', 'B4', 'C5', 'D4'), Result=c(rep("Successful",4)), NoUnit = c(rep(4,4)))
new.df = data.frame(SampleNo=c('A1', 'C5', 'D4', 'E4'), Result=c(rep("Successful",2),rep( "Failure",2)),State=c(rep("California",2),rep("New York",2)))

such that it has the following format:

> old.df
  SampleNo     Result      NoUnit
1       A1     Successful      4
2       B4     Successful      4
3       C5     Successful      4
4       D4     Successful      4


> new.df
  SampleNo     Result      State
1       A1 Successful California
2       C5 Successful California
3       D4    Failure   New York
4       E4    Failure   New York

I would like to update contents of old.df with new data from new.df maintaining the row succession of old.df and adding new columns from new.df. The resulting data.frame would be:

 SampleNo     Result   NoUnit      State
1       A1 Successful      4 California
2       B4 Successful      4       <NA>
3       C5 Successful      4 California
4       D4    Failure      4   New York
5       E4    Failure     NA   New York

Roland · Answer 1 · 2012-11-24T10:20:57.433

merge(old.df,new.df,all=TRUE)

  SampleNo     Result NoUnit      State
1       A1 Successful      4 California
2       B4 Successful      4       <NA>
3       C5 Successful      4 California
4       D4    Failure      4   New York
5       E4    Failure     NA   New York

Edit after the rules were changed by OP:

df <- merge(old.df,new.df,all=TRUE,by="SampleNo")
df$Result <- with(df,factor(ifelse(is.na(Result.y),
                             as.character(Result.x),as.character(Result.y))))
df$Result.x <- NULL; df$Result.y <- NULL

  SampleNo NoUnit      State     Result
1       A1      4 California Successful
2       B4      4       <NA> Successful
3       C5      4 California Successful
4       D4      4   New York    Failure
5       E4     NA   New York    Failure

I checked the answer too soon. The old.df was supposed to be all "Successful". This answer is wrong as you get two D4's. — geodex, Nov 24 '12 at 01:28

score 1 · Answer 2 · edited Nov 24 '12 at 09:33

1

Merge won't do this by itself. But you don't really want to merge on the "Result" column, only on the "SampleNo" column, then combine the "Result" values, using new values if available, otherwise old values.

Here is some code that does that, for all columns in the intersection other than "SampleNo"

merge.by.sample <- function(old.df, new.df, by='SampleNo') {
  r <- merge(old.df, new.df,all=T,by=by)

  merge.col <- function(r, col) {
    xname <- paste0(col, '.x')
    yname <- paste0(col, '.y')

    r[col] <- factor(r[,yname], levels=union(levels(r[,xname]), levels(r[,yname])))
    r[col][is.na(r[col])] <- r[xname][is.na(r[col])]
    r[!(names(r) %in% c(xname, yname))]
  }

  i <- intersect(names(old.df), names(new.df))
  i <- i[!i %in% by]

  for (col in i) {
    r <- merge.col(r, col)
  }
  r
}

edited Nov 24 '12 at 09:33

Roland

127,288
10
191
288

answered Nov 24 '12 at 03:17

Matthew Lundberg

42,009
6
90
112

This might work on a small dataset like above but if I have to deal with >100 columns then this would become too 'tasky'. Is there a simpler way to do this? – geodex Nov 24 '12 at 04:42
A function can be created to do this, then it can be applied to each desired column. I'll edit the answer. – Matthew Lundberg Nov 24 '12 at 05:01
its not possible to do this: factor(r[yname]) – geodex Nov 25 '12 at 23:05

Updating data.frame from another data.frame with different columns in R

2 Answers2