2

I have two data.table(dt1 & dt2). dt1 is past product data and dt2 is present product data. I want to create a third data.table that inserts new rows from dt2 into dt1 only when product characteristics(Level or Color) are different or Product itself is different.

library(data.table)
dt1 <- fread('
    Product  Level   Color     ReviewDate
    A          0     Blue      9/7/2016
    B          1     Red       9/7/2016
    C          1     Purple    9/7/2016 
    D          2     Blue      9/7/2016
    E          1     Green     9/7/2016 
    F          4     Yellow    9/7/2016  
')
dt2 <- fread('
    Product  Level   Color     ReviewDate
    A          1     Black     9/8/2016
    B          1     Red       9/8/2016
    C          5     White     9/8/2016 
    D          2     Blue      9/8/2016
    E          1     Green     9/8/2016 
    F          4     Yellow    9/8/2016 
    G          3     Orange    9/8/2016 
')

My final data.table(dt3) should have the following changes:A and C are both different in dt2 than dt1, thats why the new(different) rows from dt2 gets inserted into the final table alongside all rows from dt1. G is a totally new product that was not in dt1, thats why it makes it into the final table.

    Product  Level   Color     ReviewDate
    A          0     Blue      9/7/2016
    A          1     Black     9/8/2016
    B          1     Red       9/7/2016
    C          1     Purple    9/7/2016 
    C          5     White     9/8/2016
    D          2     Blue      9/7/2016
    E          1     Green     9/7/2016 
    F          4     Yellow    9/7/2016
    G          3     Orange    9/8/2016 

I have tried:

setkey(dt1, Product)
setkey(dt2, Product)
dt3<- dt1[dt2]
setkey(dt3,Product,ReviewDate)
gibbz00
  • 1,947
  • 1
  • 19
  • 31

3 Answers3

4

You can stack and uniqify:

unique(rbind(dt1, dt2), by=c("Product", "Level", "Color"))
Frank
  • 66,179
  • 8
  • 96
  • 180
3

Another alternative is to only rbind the subset of the data which is different (avoids the creation of one big data.table which contains dt1 and dt2)

dt3 <- rbind(dt1, setDT(dt2)[!dt1, on=c("Product", "Level", "Color")])
dt3[order(Product, ReviewDate),]
Aeck
  • 543
  • 7
  • 11
1

Using merge...

d<-merge(dt1, dt2, by=c("Product","Level","Color"), all.x=T,all.y=TRUE)
d$ReviewDate <-ifelse(is.na(d$ReviewDate.x), d$ReviewDate.y, d$ReviewDate.x)
as.data.frame(select(d, 1,2,3,6))

   Product Level  Color ReviewDate
1       A     0   Blue   9/7/2016
2       A     1  Black   9/8/2016
3       B     1    Red   9/7/2016
4       C     1 Purple   9/7/2016
5       C     5  White   9/8/2016
6       D     2   Blue   9/7/2016
7       E     1  Green   9/7/2016
8       F     4 Yellow   9/7/2016
9       G     3 Orange   9/8/2016
Cyrus Mohammadian
  • 4,982
  • 6
  • 33
  • 62