1

I have pupillometry data for 24 participants, each having thousands of rows of pupil size measurements (as I have named PupilAvg). The time column is called TrialTimestamp and is measured in ms. I also have trial.number and trial.type as columns. The head of my data frame(mydata1) can be seen below.

RecordingName trial.number trial.type TrialTimestamp PupilAvg
1    Mix_20_S04            1       same              0    3.910
2    Mix_20_S04            1       same             17    3.815
3    Mix_20_S04            1       same            133    3.545
4    Mix_20_S04            1       same            150    3.460
5    Mix_20_S04            1       same            167    3.410
6    Mix_20_S04            1       same            183    3.345

My question is: how can I obtain an average baseline per trial per participant, where the baseline is equal to the average pupil size between the time 5400ms and 5500ms? I would like to be able to subtract these baseline measurements from the pupil measurements within my window of analysis (to correct them for individual differences).

I have come up with a code to do this for one trial(trial 3) for one participant (04).

S04data<-filter(mydata1, RecordingName == "Mix_20_S04")
S04data1<-filter(S04data, trial.number == "3")
baselineS04 <- with(S04data1, mean(PupilAvg[TrialTimestamp >= 5400 & TrialTimestamp <= 5500]))

This returns a value of 3.1225. So the baseline value for participant 4, trial 3 is 3.1225.

I would very much appreciate it if someone could help me write code to get baseline measures for each participant at each trial (without me having to write out my individual code for each participant for each trial!!).

user20650
  • 24,654
  • 5
  • 56
  • 91

2 Answers2

1

I think you can use aggregate with the data subset to only include observations within your TrialTimestamp range. I use 130 and 150 here (so I can use the posted data) but you can change these to 5400 and 5500ms.

# your data
mydata1 <- read.table(text="RecordingName trial.number trial.type TrialTimestamp PupilAvg
1    Mix_20_S04            1       same              0    3.910
2    Mix_20_S04            1       same             17    3.815
3    Mix_20_S04            1       same            133    3.545
4    Mix_20_S04            1       same            150    3.460
5    Mix_20_S04            1       same            167    3.410
6    Mix_20_S04            1       same            183    3.345", header=TRUE)


# Find mean: subset the data so that only values within 
# required TrialTimestamp range
aggregate(PupilAvg ~ RecordingName + trial.number ,
     data=mydata1[(mydata1$TrialTimestamp > 130 & mydata1$TrialTimestamp < 155),], 
                                                               mean)

EDIT

As Michael mentioned in the comments,, aggregate has a subset argument so you may find this easier on the eye

aggregate(PupilAvg ~ RecordingName + trial.number, data=mydata1, mean, 
                         subset = TrialTimestamp > 130 & TrialTimestamp < 155)
user20650
  • 24,654
  • 5
  • 56
  • 91
  • Could take advantage of `subset` argument on `aggregate`: `aggregate(PupilAvg ~ RecordingName + trial.number, data=mydata1, mean, subset = TrialTimestamp > 130 & TrialTimestamp < 155)` – Michael Lawrence Oct 22 '14 at 23:00
  • @MichaelLawrence; thanks Michael - you know i have never noticed the subset argument before – user20650 Oct 23 '14 at 01:35
1

See if you like try data.table option:

library(data.table)
setDT(mydata1) # set data frame to data table
mydata1[TrialTimestamp > 130 & TrialTimestamp < 155,  ## i arg  - subset
             list(PupilAvg = mean(PupilAvg)),         ## j arg  - aggregate
       by = c("RecordingName", "trial.number")]       ## by arg - group by
#    RecordingName trial.number      PupilAvg
# 1:    Mix_20_S04            1        3.5025

Also checkout ?between in data.table package.

Arun
  • 116,683
  • 26
  • 284
  • 387
KFB
  • 3,501
  • 3
  • 15
  • 18