1

I have a large job-exposure database, and I wanted to calculate the duration of exposure of each subject to each agent. But a subject can be exposed to an agent through different jobs. For each job, I have the start year and end year. There are overlapping periods between the jobs, and I want to find the total duration of exposure to the agent with R while counting for the overlapping years only once (If I count it in job1, I don't want to recount it in job 2). For example with the subject 2, he was exposed to agent A through his job1 and job2 and we have the YEARIN and YEAROUT for each job, but there is a 3 years overlap between the 2 jobs (1998-2000).

Here is the data, called datatest:

structure(list(ID = c(2, 2, 2, 2, 7, 7, 15, 18, 18, 18, 18, 18, 
20, 20, 20), JOB = c(1, 2, 7, 8, 1, 1, 1, 1, 2, 4, 2, 3, 3, 4, 
6), AGENT = c("A", "A", "B", "B", "B", "A", "A", "D", "D", "D", 
"A", "A", "C", "C", "C"), YEARIN = c(1998, 1996, 1979, 1978, 
1973, 1973, 1979, 1976, 1980, 1970, 1978, 1984, 1988, 1996, 2000
), YEAROUT = c(2009, 2000, 1985, 1982, 2006, 2006, 2007, 1985, 
2008, 2005, 1979, 1995, 1993, 2002, 2008)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -15L)) 

I'm using this code to find the duration :

datatest %>%    group_by(ID, JOB, AGENT) %>%    summarise(year = seq(YEARIN, YEAROUT, by=1)) %>%    unnest(year) %>%    group_by(ID, AGENT) %>%    summarise(nyear = length(unique(year))) 

I realized that when I run the code I have this warning message:

Warning message:
Returning more (or less) than 1 row per `summarise()` group was
deprecated in dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that
  `reframe()` always returns an ungrouped data frame and adjust
  accordingly.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning
was generated. 
> 

Does it mean that using summarise is not ok and I should use reframe instead. Or I can still use summarise ?

R_help
  • 25
  • 5

0 Answers0