I have data on gaze behavior during Q
uestion and A
nswer sequences; gazes are recorded for each speaker A
, B
, and C
in columns A_aoi
, B_aoi
, and C_aoi
, gaze durations are recorded in columns A_aoi_dur
, B_aoi_dur
, and C_aoi_dur
:
df <- data.frame(
Speaker = c("ID01.A", NA, "ID01.B", "ID33.B", "ID33.A", "ID33.C"),
Utterance = c("Who did it?", NA, "Peter did.", "So you're coming?", "erm", "Yes, sure."),
Sequ = c(1,1,1,2,2,2),
Q = c("q_wh", "", "", "q_decl", "", ""),
A_aoi = c("C*B*B", "B", "B*", "B*C", "*C", "B*"),
A_aoi_dur = c("1,2,3,4,5", "1", "1,2", "1,2,3", "1,2", "1,2"),
B_aoi = c("C*A", "*A", "A*", "A*C", "C", "*C"),
B_aoi_dur = c("1,2,3", "1,2", "1,2", "1,2,3", "1", "1,2"),
C_aoi = c("A*A", "A", "A*", "B*C*B", "*B", "B*A"),
C_aoi_dur = c("1,2,3", "1", "1,2", "1,2,3,4,5", "1,2", "1,2,3")
)
What I need to find out is which person the questioner, upon finishing their question, is gazing at last.
I've been trying to get there with this sequence of operations but have got stuck:
library(tidyr)
library(dplyr)
library(stringr)
df %>%
# for each `Sequ`...
group_by(Sequ) %>%
mutate(
# Who is the question by?
Quest_by = sub(".*(.)$", "\\1", first(Speaker)),
# Who is the answer by?
Answ_by = sub(".*(.)$", "\\1", last(Speaker))
) %>%
# rename to create column names that are processable by `names_pattern` for `pivot_longer`:
rename_with(~ str_c(., "_AOI"), ends_with("_aoi")) %>%
# collect all AOI gazes by A, B, and C into one column:
pivot_longer(cols = contains("_aoi"),
names_to = c("Gaze_by", ".value"), #
names_pattern = "^(.*)_([^_]+$)"
) %>%
# rename `AOI` and `dur` columns:
rename(Gaze_to = AOI, Gaze_dur = dur) %>%
# edit `Gaze_by` and `Gaze_to` values for upcoming analysis:
mutate(
# simplify `Gaze_by` values to speaker labels:
Gaze_by = sub("^(.).*", "\\1", Gaze_by),
# insert comma into `Gaze_to` as splitting pattern for `separate_rows` command below:
Gaze_to = str_replace_all(Gaze_to, "(?<=.)(?=.)", ",")
) %>%
# assign each `Gaze_to` and `Gaze_dur` value its own row based on comma as splitting pattern:
separate_rows(c(Gaze_to, Gaze_dur), sep = ",", convert = TRUE)
Desired output: (in this or similar form)
Speaker Utterance Sequ Q Q_by Answ_by Last_Gaze_to Last_Gaze_dur
1 ID01.A Who did it? 1 q_wh A B B 5
2 <NA> <NA> 1
3 ID01.B Peter did. 1
4 ID33.B So you're coming? 2 q_decl B C C 3
5 ID33.A erm 2
6 ID33.C Yes, sure. 2
EDIT:
I've come up with this solution (where df0
is the result of the above operations):
df0 %>%
filter(Quest_by == Gaze_by) %>%
group_by(Q, Sequ) %>%
mutate(Last_Gaze_to = last(Gaze_to),
Last_Gaze_dur = last(Gaze_dur)) %>%
ungroup() %>%
group_by(Line) %>%
slice_head() %>%
select(-matches("^G")) %>%
ungroup() %>%
mutate(across(c(5:9),
~ifelse(Q == "", NA, .)))
# A tibble: 6 × 9
Line Speaker Utterance Sequ Q Quest_by Answ_by Last_Gaze_to Last_Gaze_dur
<int> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <int>
1 1 ID01.A Who did it? 1 q_wh A B B 5
2 2 NA NA 1 NA NA NA NA NA
3 3 ID01.B Peter did. 1 NA NA NA NA NA
4 4 ID33.B So you're coming? 2 q_decl B C C 3
5 5 ID33.A erm 2 NA NA NA NA NA
6 6 ID33.C Yes, sure. 2 NA NA NA NA NA
Thanks for anybody who did take the trouble of looking into this difficult question. Tips for improvement of the solution are well taken!