1

I have one df containing individuals' arrival & departure dates and their total length of stay (los):

    arrive <- as.Date(c("2016/08/01","2016/08/03","2016/08/03","2016/08/04"))
    depart <- as.Date(c("2016/08/02","2016/08/07","2016/08/04", "2016/08/06"))
    people <- data.frame(arrive, depart)
    people$los <- people$depart - people$arrive
    View(people)

...and another df containing start & end dates.

    start <-seq(from=as.Date("2016/08/01"), to=as.Date("2016/08/08"), by="days")
    end <-seq(from=as.Date("2016/08/01"), to=as.Date("2016/08/08"), by="days") 
    range <- data.frame(start, end)
    View(range)

How can I add a column range$census to count how many people were present each day? For my example, the values I'm looking for would be as follows:

range$census <- c(1,1,2,3,2,2,1,0)

What I am not sure of is how to apply a calculation on values from one df to another df of a different length. Here's what I've tried so far:

    people$count <- 1 
    range$census <- sum(people$count[people$arrival <= range$start & people$depart >= range$end])

Note: in example above the start/end dates are the same day, but I will also need to look at larger ranges, where the start/end dates will be a month or a year apart.

jesstme
  • 604
  • 2
  • 10
  • 25
  • 1
    http://stackoverflow.com/q/40831059/4497050 – alistaire Nov 29 '16 at 00:02
  • I'm new to SO but that question didn't have an answer nor a reproducible example so I didn't think mine was duplicating efforts. Is best practice to have waited for that question to receive a response? Edit it to incorporate a reproducible example? or...? Thank you! – jesstme Nov 29 '16 at 00:05
  • Using `data.table`'s non-equi join should work well here. – Gregor Thomas Nov 29 '16 at 00:05

1 Answers1

1

Why do you need the 'end' column in range?

This will work-

range$count <- rep(0, nrow(range))
sapply(seq(nrow(people)), function(x) 
       {
        range$count <<- range$count + range$start %in%
                        seq(people[x, "arrive"], people[x, "depart"], by = "day")
       })
code_is_entropy
  • 611
  • 3
  • 11
  • I need the 'end' column in range because I'll occasionally need to look at larger time ranges where start <- as.Date(c("2016/08/01","2016/09/01","2016/10/01")) and end <- as.Date(c("2016/08/31","2016/09/30","2016/10/31")). I'll have to adjust the code a bit for that but I've accepted your answer because it worked great for my current example--thanks! – jesstme Nov 29 '16 at 00:22