0

I need to merge two datasets, but in the second one, there can be duplicate id, for example several id of 1,1,1. If there are duplicate id, how to merge to the very first row of them?

To be more clear, here's a reproducible example:

df1
structure(list(id = 1:2, y = 10:11), .Names = c("id", "y"), class = "data.frame", row.names = c(NA, 
-2L))

df2
structure(list(id = c(1L, 1L, 1L, 2L), x1 = 435:438, x2 = c(435L, 
436L, 436L, 438L), x3 = c(435L, 436L, 436L, 438L)), .Names = c("id", 
"x1", "x2", "x3"), class = "data.frame", row.names = c(NA, -4L
))

Eaxample: In output i expect this format

id  y   x1  x2  x3
1   10  435 435 435
2   11  438 438 438

I.E. 2 and 3 rows (1 id) do not participate in merging.

psysky
  • 3,037
  • 5
  • 28
  • 64

1 Answers1

1

You can do it using data.table. You can retain only first occurrence where id == 1 from your second data set and then merge both the data sets.

Here is the solution:

library(data.table)
setDT(df2)
df2[, idx := 1:.N, by = id]
df2 <- df2[idx == 1, ]
df2[, idx := NULL]
output <- merge(df1, df2, by = "id")
output

It'll give you your desired output:

 id  y  x1  x2  x3
1  1 10 435 435 435
2  2 11 438 438 438
sm925
  • 2,648
  • 1
  • 16
  • 28