2

Given two data frames

old.df = data.frame(SampleNo=c('A1', 'B4', 'C5', 'D4'), Result=c(rep("Successful",4)), NoUnit = c(rep(4,4)))
new.df = data.frame(SampleNo=c('A1', 'C5', 'D4', 'E4'), Result=c(rep("Successful",2),rep( "Failure",2)),State=c(rep("California",2),rep("New York",2)))

such that it has the following format:

> old.df
  SampleNo     Result      NoUnit
1       A1     Successful      4
2       B4     Successful      4
3       C5     Successful      4
4       D4     Successful      4


> new.df
  SampleNo     Result      State
1       A1 Successful California
2       C5 Successful California
3       D4    Failure   New York
4       E4    Failure   New York

I would like to update contents of old.df with new data from new.df maintaining the row succession of old.df and adding new columns from new.df. The resulting data.frame would be:

 SampleNo     Result   NoUnit      State
1       A1 Successful      4 California
2       B4 Successful      4       <NA>
3       C5 Successful      4 California
4       D4    Failure      4   New York
5       E4    Failure     NA   New York
geodex
  • 1,219
  • 3
  • 13
  • 22

2 Answers2

3
merge(old.df,new.df,all=TRUE)

  SampleNo     Result NoUnit      State
1       A1 Successful      4 California
2       B4 Successful      4       <NA>
3       C5 Successful      4 California
4       D4    Failure      4   New York
5       E4    Failure     NA   New York

Edit after the rules were changed by OP:

df <- merge(old.df,new.df,all=TRUE,by="SampleNo")
df$Result <- with(df,factor(ifelse(is.na(Result.y),
                             as.character(Result.x),as.character(Result.y))))
df$Result.x <- NULL; df$Result.y <- NULL

  SampleNo NoUnit      State     Result
1       A1      4 California Successful
2       B4      4       <NA> Successful
3       C5      4 California Successful
4       D4      4   New York    Failure
5       E4     NA   New York    Failure
Roland
  • 127,288
  • 10
  • 191
  • 288
  • I checked the answer too soon. The old.df was supposed to be all "Successful". This answer is wrong as you get two D4's. – geodex Nov 24 '12 at 01:28
1

Merge won't do this by itself. But you don't really want to merge on the "Result" column, only on the "SampleNo" column, then combine the "Result" values, using new values if available, otherwise old values.

Here is some code that does that, for all columns in the intersection other than "SampleNo"

merge.by.sample <- function(old.df, new.df, by='SampleNo') {
  r <- merge(old.df, new.df,all=T,by=by)

  merge.col <- function(r, col) {
    xname <- paste0(col, '.x')
    yname <- paste0(col, '.y')

    r[col] <- factor(r[,yname], levels=union(levels(r[,xname]), levels(r[,yname])))
    r[col][is.na(r[col])] <- r[xname][is.na(r[col])]
    r[!(names(r) %in% c(xname, yname))]
  }

  i <- intersect(names(old.df), names(new.df))
  i <- i[!i %in% by]

  for (col in i) {
    r <- merge.col(r, col)
  }
  r
}
Roland
  • 127,288
  • 10
  • 191
  • 288
Matthew Lundberg
  • 42,009
  • 6
  • 90
  • 112