0

I have two data frames EMOJ and EYETRACK and I need to merge them by "session", without duplicating rows.

Dataframes:

> EMOJ
  session age attitude
1    s001  18        2
2    s002  22        4


> EYETRACK
  session stimuli response_time
1    s001       A          1023
2    s001       B          1009
3    s001       C          1832
4    s002       A          1092
5    s002       B          1076

What I want:

  session age attitude stimuli response_time
1    s001  18        2       A          1023
2    s001                    B          1009
3    s001                    C          1832
4    s002  22        4       A          1092
5    s002                    B          1076

What I am getting:

df <- merge(EMOJ, EYETRACK, by.x = 'session', by.y = 'session')

  session age attitude stimuli response_time
1    s001  18        2       A          1023
2    s001  18        2       B          1009
3    s001  18        2       C          1832
4    s002  22        4       A          1092
5    s002  22        4       B          1076
Olivia
  • 81
  • 6
  • https://stackoverflow.com/questions/37749412/select-only-the-first-row-when-merging-data-frames-with-multiple-matches this might be helpful – AdroMine Mar 25 '22 at 19:41

3 Answers3

0

I came up with this but I had to add 'stimuli' to the EMOJ df

EMOJ$stimuli <- 'A'

df1 <- merge(EMOJ, EYETRACK, by = c('session','stimuli'), all = TRUE)
stefan_aus_hannover
  • 1,777
  • 12
  • 13
0

Using the dplyr package:

EMOJ %>%
  left_join(distinct(EYETRACK, session, .keep_all = T)) %>%
  full_join(EYETRACK)

OUTPUT

  session age attitude stimuli response_time
1    s001  18        2       A          1023
2    s002  22        4       A          1092
3    s001  NA       NA       B          1009
4    s001  NA       NA       C          1832
5    s002  NA       NA       B          1076

EDIT: Following Ruam's recommendation, I included the output. For some reason, my past response didn't give me the same output as the first time I ran it, so I updated the response. Now it should work.

0

One approach is to number each row within each session consecutively. Then merge on both the session and this index number. Only the first row within each session from both data.frames will be merged. If there's only 1 row per session in EMOJ, can simply use EMOJ$i <- 1.

library(data.table)

EMOJ$i <- rowid(EMOJ$session)
EYETRACK$i <- rowid(EYETRACK$session)

merge(EYETRACK, EMOJ, by = c("session", "i"), all.x = T)

Output

  session i stimuli response_time age attitude
1    s001 1       A          1023  18        2
2    s001 2       B          1009  NA       NA
3    s001 3       C          1832  NA       NA
4    s002 1       A          1092  22        4
5    s002 2       B          1076  NA       NA
Ben
  • 28,684
  • 5
  • 23
  • 45