0

I have to datasets, that I want to combine:

Dataset 1: Contains time periods for which "perc" is valid:

set.seed(1)

example_df <- data.frame(ID = rep(1:2, each=2),   
start = c(as.Date("2014-01-01"), as.Date("2014-03-05"), as.Date("2014-01-13"), as.Date("2014-03-15")), 
                         end = c(as.Date("2014-03-05"), as.Date("2014-04-12"), as.Date("2014-03-01"), as.Date("2014-04-02")), 
                         perc = rnorm(mean= 30, sd= 10, 4)) 

Dataset 2: Contains pay for each month:

 month_start <- as.Date("2014-01-01") + months(0:3)
    month_end <-  ceiling_date(month_start, "month") - days(1)

set.seed(1)
example_df2 <-  data.frame(month_start, month_end,
                           ID = rep(1:2, each=4),
                           pay = rnorm(mean= 2000, sd= 80, 8))

The goal is to calculate pay for each individual for each month based on how much perc they worked. Important is to take into account the valid time periods for perc, which might change within the month.

e.g.:

Jan 2014 for ID 1: Pay = 1949.884 (pay)*23.73546 (perc)/100

because perc is valid for the entire January.

However, for March, perc is 23.73546 until 5th and perc is 31.83643 for the rest of March.

Thus,

Mar 2014 for ID 1: Pay = 1949.884 (pay)*23.73546 (perc)/100/ 31 (days of March)*5 + 1949.884 (pay)*31.83643 (perc)/100/ 31 (days of March)*26

user138089
  • 99
  • 1
  • 5

1 Answers1

0

Start with a left_join() between your 2 dataframes. Each period of work of an ID will be replicate on each pay month period of this ID. Then, we a succession of ifelse(), you can determine if the total month should be count, only a part, or not at all.

library(tidyverse)

result <- example_df %>% 
  left_join(example_df2, by = 'ID') %>% 
  mutate(
    TEST_MONTH = ifelse(end >= month_start & start < month_end, 1, 0), 
    TEST_DAYS  = ifelse(TEST_MONTH == 1,
                        ifelse(end > month_end,
                               ifelse(start >= month_start, month_end - start + 1, month_end - month_start + 1), 
                               end - month_start + 1), 
                        0),
    PAID = pay * perc/100 * TEST_DAYS / as.numeric(month_end - month_start + 1)
  )

result %>% filter(ID == 1)

# ID      start        end     perc month_start  month_end      pay TEST_MONTH TEST_DAYS      PAID
# 1  1 2014-01-01 2014-03-05 23.73546  2014-01-01 2014-01-31 1949.884          1        31 462.81390
# 2  1 2014-01-01 2014-03-05 23.73546  2014-02-01 2014-02-28 2014.691          1        28 478.19633
# 3  1 2014-01-01 2014-03-05 23.73546  2014-03-01 2014-03-31 1933.150          1         5  74.00678
# 4  1 2014-01-01 2014-03-05 23.73546  2014-04-01 2014-04-30 2127.622          0         0   0.00000
# 5  1 2014-03-05 2014-04-12 31.83643  2014-01-01 2014-01-31 1949.884          0         0   0.00000
# 6  1 2014-03-05 2014-04-12 31.83643  2014-02-01 2014-02-28 2014.691          0         0   0.00000
# 7  1 2014-03-05 2014-04-12 31.83643  2014-03-01 2014-03-31 1933.150          1        27 536.03354
# 8  1 2014-03-05 2014-04-12 31.83643  2014-04-01 2014-04-30 2127.622          1        12 270.94364
demarsylvain
  • 2,103
  • 2
  • 14
  • 33
  • Thank you very much!!! Yes, this works. I would not have come up with the left_join. Makes perfect sense. :-) – user138089 Mar 21 '19 at 17:43