0

I am new in R, I have collected eye-tracking data that has the following structure:

Participant Trial Condition Fixation.Start  Fixation.End Fixated.Area
P01         T01   Early     4               206          Outside
P01         T01   Early     258             476          Competitor
P01         T01   Early     496             882          Target
P01         T02   Late      4               794          Outside
P01         T02   Late      838             1026         Target
P01         T02   Late      1046            1328         Target
P02         T01   Early     4               168          Outside
P02         T01   Early     232             452          Competitor
P02         T01   Early     494             738          Target
P02         T02   Late      4               176          Outside
P02         T02   Late      238             466          Target
P02         T02   Late      524             632          Competitor

In it, the fixation time to the different areas shown on screen was measured in milliseconds form beginning (Fixiation Start) to end (Fixation End). Each row is a fixation.

What I would like to do is to reshape the data into time bins of 50ms in a new dataframe so that each time bin (row) reflects what area was being fixated at that moment. In other words, I want the new dataframe to look like this:

Participant Trial   Condition   Time.Bin    Fixated.Area
P01         T01     Early       50          Outside
P01         T01     Early       100         Outside
P01         T01     Early       150         Outside
P01         T01     Early       200         Outside
P01         T01     Early       250         Competitor
P01         T01     Early       300         Competitor
P01         T01     Early       350         Competitor
P01         T01     Early       400         Competitor
P01         T01     Early       450         Competitor
P01         T01     Early       500         Target
P01         T01     Early       550         Target
P01         T01     Early       600         Target
P01         T01     Early       650         Target  

I think this should be pretty easy to do in R. Any ideas?

Miguel
  • 15
  • 3
  • Sample data, please? I can't work on an image, and choose to not transcribe your data into something usable. The gold-standard for sample data is usually `dput(x)` where `x` is enough rows/columns to get the point across (and show sufficient variability, etc) without clobbering us with too much data. Thanks. – r2evans Jun 22 '21 at 12:15
  • 1
    @r2evans I edited the post to make the data look as text. I hope it helps. – Miguel Jun 22 '21 at 12:57
  • You're extrapolating time 250 for P01, right? The data shows the fixated area is "Outside" up until time 206 and starts "Competitor" at time 258, but you're reporting that Competitor is active at time 250. Can you explain this? (It seems like dirty-data to me.) – r2evans Jun 22 '21 at 12:59
  • 1
    @r2evans, Actually the Fixated Area for time bin 250 that appears above is a typo I made while typing the data to the post. You are right, It should say NA. – Miguel Jun 22 '21 at 14:39
  • @r2evans, Btw, I tried to run the code in the full data set but an error came up. It mentions ```wrong sign in 'by' argument``` I think this error is happening because some of the fixations lasted less than 50ms. – Miguel Jun 22 '21 at 15:02

1 Answers1

0

Here's a technique that expands each timeframe into by=50 time bins.

base R

Time.Bins <- Map(
  function(a, b) seq(a, b, by = 50),
  ceiling(dat$Fixation.Start / 50) * 50,
  dat$Fixation.End)

out <- cbind(
  dat[, c("Participant", "Trial", "Condition", "Fixated.Area")
      ][ rep(seq_len(nrow(dat)), lengths(Time.Bins)),],
  Time.Bin = unlist(Time.Bins)
)
head(out, 20)
#     Participant Trial Condition Fixated.Area Time.Bin
# 1           P01   T01     Early      Outside       50
# 1.1         P01   T01     Early      Outside      100
# 1.2         P01   T01     Early      Outside      150
# 1.3         P01   T01     Early      Outside      200
# 2           P01   T01     Early   Competitor      300
# 2.1         P01   T01     Early   Competitor      350
# 2.2         P01   T01     Early   Competitor      400
# 2.3         P01   T01     Early   Competitor      450
# 3           P01   T01     Early       Target      500
# 3.1         P01   T01     Early       Target      550
# 3.2         P01   T01     Early       Target      600
# 3.3         P01   T01     Early       Target      650
# 3.4         P01   T01     Early       Target      700
# 3.5         P01   T01     Early       Target      750
# 3.6         P01   T01     Early       Target      800
# 3.7         P01   T01     Early       Target      850
# 4           P01   T02      Late      Outside       50
# 4.1         P01   T02      Late      Outside      100
# 4.2         P01   T02      Late      Outside      150
# 4.3         P01   T02      Late      Outside      200

dplyr

library(dplyr)
out <- dat %>%
  rowwise() %>%
  summarize(
    Participant, Trial, Condition, Fixated.Area,
    Time.Bin = seq(ceiling(Fixation.Start / 50) * 50, Fixation.End, by = 50),
    .groups = "drop"
  )
out
# # A tibble: 64 x 5
#    Participant Trial Condition Fixated.Area Time.Bin
#    <chr>       <chr> <chr>     <chr>           <dbl>
#  1 P01         T01   Early     Outside            50
#  2 P01         T01   Early     Outside           100
#  3 P01         T01   Early     Outside           150
#  4 P01         T01   Early     Outside           200
#  5 P01         T01   Early     Competitor        300
#  6 P01         T01   Early     Competitor        350
#  7 P01         T01   Early     Competitor        400
#  8 P01         T01   Early     Competitor        450
#  9 P01         T01   Early     Target            500
# 10 P01         T01   Early     Target            550
# # ... with 54 more rows

Fixing time=250

Your expected output shows "Competitor" at time=250, but the data does not support that. If you need 250 (with or without an area), then you can interpolate this way.

expbins <- do.call(rbind, by(out, out[,c("Participant", "Trial", "Condition")],
   FUN = function(z) {
     rng <- seq(min(z$Time.Bin), max(z$Time.Bin), by = 50)
     transform(z[rep(1, length(rng)),], Fixated.Area = NULL, Time.Bin = rng)
   }))
out2 <- merge(expbins, out, by = c("Participant", "Trial", "Condition", "Time.Bin"), all = TRUE)
head(out2, 10)
#    Participant Trial Condition Time.Bin Fixated.Area
# 1          P01   T01     Early       50      Outside
# 2          P01   T01     Early      100      Outside
# 3          P01   T01     Early      150      Outside
# 4          P01   T01     Early      200      Outside
# 5          P01   T01     Early      250         <NA>
# 6          P01   T01     Early      300   Competitor
# 7          P01   T01     Early      350   Competitor
# 8          P01   T01     Early      400   Competitor
# 9          P01   T01     Early      450   Competitor
# 10         P01   T01     Early      500       Target

which presents the time=250 as NA, an unknown state (which is better, imo).

Dplyr, same:

out %>%
  group_by(Participant, Trial, Condition) %>%
  summarize(
    Time.Bin = seq(min(Time.Bin), max(Time.Bin), by = 50),
    .groups = "drop"
  ) %>%
  full_join(out, by = c("Participant", "Trial", "Condition", "Time.Bin"))
# # A tibble: 69 x 5
#    Participant Trial Condition Time.Bin Fixated.Area
#    <chr>       <chr> <chr>        <dbl> <chr>       
#  1 P01         T01   Early           50 Outside     
#  2 P01         T01   Early          100 Outside     
#  3 P01         T01   Early          150 Outside     
#  4 P01         T01   Early          200 Outside     
#  5 P01         T01   Early          250 <NA>        
#  6 P01         T01   Early          300 Competitor  
#  7 P01         T01   Early          350 Competitor  
#  8 P01         T01   Early          400 Competitor  
#  9 P01         T01   Early          450 Competitor  
# 10 P01         T01   Early          500 Target      
# # ... with 59 more rows

Data:

dat <- structure(list(Participant = c("P01", "P01", "P01", "P01", "P01", "P01", "P02", "P02", "P02", "P02", "P02", "P02"), Trial = c("T01", "T01", "T01", "T02", "T02", "T02", "T01", "T01", "T01", "T02", "T02", "T02"), Condition = c("Early", "Early", "Early", "Late", "Late", "Late", "Early", "Early", "Early", "Late", "Late", "Late"), Fixation.Start = c(4L, 258L, 496L, 4L, 838L, 1046L, 4L, 232L, 494L, 4L, 238L, 524L), Fixation.End = c(206L, 476L, 882L, 794L, 1026L, 1328L, 168L, 452L, 738L, 176L, 466L, 632L), Fixated.Area = c("Outside", "Competitor", "Target", "Outside", "Target", "Target", "Outside", "Competitor", "Target", "Outside", "Target", "Competitor")), class = "data.frame", row.names = c(NA, -12L))
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • (Your error about `wrong sign in 'by' argument`.) That means you have a time range that is less than 50. One technique would be to filter those out before aggregating, as in `subset(dat, Fixated.End - Fixated.Start > 50)` or similar in dplyr. Another technique would be to check before the call to `seq` to make sure that the one is less than the other. Either way, are you comfortable with losing that area of fixation? – r2evans Jun 22 '21 at 15:15
  • 1
    For the study we are conducting it is fine to lose that area of fixation only if the range between the start and the end of the fixation does not include one of the time bins. For example if the range of a fixation is between 245 and 268 then we would like to include that are of fixation for the 250ms time bin. – Miguel Jun 22 '21 at 15:22