I have a large job-exposure database, and I wanted to calculate the duration of exposure of each subject to each agent. But a subject can be exposed to an agent through different jobs. For each job, I have the start year and end year. There are overlapping periods between the jobs, and I want to find the total duration of exposure to the agent with R while counting for the overlapping years only once (If I count it in job1, I don't want to recount it in job 2). For example with the subject 2, he was exposed to agent A through his job1 and job2 and we have the YEARIN and YEAROUT for each job, but there is a 3 years overlap between the 2 jobs (1998-2000).
Here is the data, called datatest:
structure(list(ID = c(2, 2, 2, 2, 7, 7, 15, 18, 18, 18, 18, 18,
20, 20, 20), JOB = c(1, 2, 7, 8, 1, 1, 1, 1, 2, 4, 2, 3, 3, 4,
6), AGENT = c("A", "A", "B", "B", "B", "A", "A", "D", "D", "D",
"A", "A", "C", "C", "C"), YEARIN = c(1998, 1996, 1979, 1978,
1973, 1973, 1979, 1976, 1980, 1970, 1978, 1984, 1988, 1996, 2000
), YEAROUT = c(2009, 2000, 1985, 1982, 2006, 2006, 2007, 1985,
2008, 2005, 1979, 1995, 1993, 2002, 2008)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -15L))
I'm using this code to find the duration :
datatest %>% group_by(ID, JOB, AGENT) %>% summarise(year = seq(YEARIN, YEAROUT, by=1)) %>% unnest(year) %>% group_by(ID, AGENT) %>% summarise(nyear = length(unique(year)))
I realized that when I run the code I have this warning message:
Warning message:
Returning more (or less) than 1 row per `summarise()` group was
deprecated in dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that
`reframe()` always returns an ungrouped data frame and adjust
accordingly.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning
was generated.
>
Does it mean that using summarise is not ok and I should use reframe instead. Or I can still use summarise ?