2

I am struggling to create a new variable that captures the stock price (prc) at the begin of a fiscal year (fyear) for a given firm.

In the data the fiscal year is defined with a start and end date, complemented with the monthly stock price. The stock price is based on the price of the last trading day of the month, and therefore is not always on the last day of the month.

For example: as the fiscal year starts on the first of January of 2001 I would like to get the stock price at the end of December 2000.

Here's a sample of the data:

dt <- data.table(id = rep(c(59328, 61241), each = 36), fyear = c(rep(2001, 
    each = 12), rep(2002, each = 12), rep(2003, each = 12), rep(2001, 
    each = 12), rep(2002, each = 12), rep(2003, each = 12)), 
    fyear_start = as.Date(c(rep("2001-01-01", each = 12), rep("2002-01-01", 
        each = 12), rep("2003-01-01", each = 12), rep("2000-07-01", 
        each = 12), rep("2001-07-01", each = 12), rep("2002-07-01", 
        each = 12))), fyear_end = as.Date(c(rep("2001-12-31", 
        each = 12), rep("2002-12-31", each = 12), rep("2003-12-31", 
        each = 12), rep("2001-06-30", each = 12), rep("2002-06-30", 
        each = 12), rep("2003-06-30", each = 12))), prc_month_end = as.Date(c("2001-01-31", 
        "2001-02-28", "2001-03-30", "2001-04-30", "2001-05-31", 
        "2001-06-29", "2001-07-31", "2001-08-31", "2001-09-28", 
        "2001-10-31", "2001-11-30", "2001-12-31", "2002-01-31", 
        "2002-02-28", "2002-03-28", "2002-04-30", "2002-05-31", 
        "2002-06-28", "2002-07-31", "2002-08-30", "2002-09-30", 
        "2002-10-31", "2002-11-29", "2002-12-31", "2003-01-31", 
        "2003-02-28", "2003-03-31", "2003-04-30", "2003-05-30", 
        "2003-06-30", "2003-07-31", "2003-08-29", "2003-09-30", 
        "2003-10-31", "2003-11-28", "2003-12-31", "2000-07-31", 
        "2000-08-31", "2000-09-29", "2000-10-31", "2000-11-30", 
        "2000-12-29", "2001-01-31", "2001-02-28", "2001-03-30", 
        "2001-04-30", "2001-05-31", "2001-06-29", "2001-07-31", 
        "2001-08-31", "2001-09-28", "2001-10-31", "2001-11-30", 
        "2001-12-31", "2002-01-31", "2002-02-28", "2002-03-28", 
        "2002-04-30", "2002-05-31", "2002-06-28", "2002-07-31", 
        "2002-08-30", "2002-09-30", "2002-10-31", "2002-11-29", 
        "2002-12-31", "2003-01-31", "2003-02-28", "2003-03-31", 
        "2003-04-30", "2003-05-30", "2003-06-30")), prc = c(37, 
        28.56, 26.31, 30.91, 27.01, 29.25, 29.81, 27.96, 20.44, 
        24.42, 32.66, 31.45, 35.04, 28.55, 30.41, 28.61, 27.62, 
        18.27, 18.79, 16.67, 13.89, 17.3, 20.88, 15.57, 15.7, 
        17.26, 16.28, 18.37, 20.82, 20.81, 24.89, 28.59, 27.52, 
        32.95, 33.54, 32.05, 24.6, 21.5, 26.54, 31, 28.25, 28.9, 
        18.26, 13.55, 8.15, 9.84, 13.56, 15.86, 16.05, 13.5, 
        14.71, 11.18, 11.43, 9.72, 8.03, 8.85, 5.34, 6.14, 9, 
        6.46, 5.24, 5.49, 6.18, 7.44, 7.28, 6.41, 7.3, 11.29, 
        11.11, 15.2, 17.97, 14.9))

First three rows:

       id fyear fyear_start  fyear_end prc_month_end   prc
 1: 59328  2001  2001-01-01 2001-12-31    2001-01-31 37.00
 2: 59328  2001  2001-01-01 2001-12-31    2001-02-28 28.56
 3: 59328  2001  2001-01-01 2001-12-31    2001-03-30 26.31

I've read the following posts for guidance, but I didn't get the expected result.

  1. How to loop lapply to create LAG terms over multiple variables in R

    • Using the solution of thelatemail, I can make a lagged variable of the stock price. However, it takes the previous month stock price, and dit not take into account the fiscal year.
    vars <- c("prc")
    rpv <- rep(1:2, each=length(vars))
    dt_test <- dt[, paste(vars, "lag", rpv, sep="_") := Map(shift, .SD, rpv), by=id, .SDcols=vars]
    
  2. How to create lag variables

    • Same as above, lag variable stock price is based on the previous month.
  3. Create lead and lag variables in R
    • Same as above, lag variable stock price is based on the previous month.

Can't use the .SD[1]/.N statement of data.table, as it returns the first/last month of the fiscal year, and not the last of the previous fiscal year.

Is there a way to return for a fiscal year the last monthly stock price at previous fiscal year?

The desired result is as follows:

output <- data.table(id = rep(c(59328, 61241), each = 3), fyear = c(2001, 
    2002, 2003, 2001, 2002, 2003), fyear_start = as.Date(c("2001-01-01", 
    "2002-01-01", "2003-01-01", "2000-07-01", "2001-07-01", "2002-07-01")), 
    fyear_end = as.Date(c("2001-12-31", "2002-12-31", "2003-12-31", 
        "2001-06-30", "2002-06-30", "2003-06-30")), begin_prc = c(NA, 
        31.45, 15.57, NA, 15.86, 6.46))

      id fyear fyear_start  fyear_end begin_prc
1: 59328  2001  2001-01-01 2001-12-31        NA
2: 59328  2002  2002-01-01 2002-12-31     31.45
3: 59328  2003  2003-01-01 2003-12-31     15.57
4: 61241  2001  2000-07-01 2001-06-30        NA
5: 61241  2002  2001-07-01 2002-06-30     15.86
6: 61241  2003  2002-07-01 2003-06-30      6.46

I'd appreciate some assistance. Thanks in advance.

Frank
  • 66,179
  • 8
  • 96
  • 180
Patrick
  • 106
  • 1
  • 7

2 Answers2

3

Is there a way to return for a fiscal year the last monthly stock price at previous fiscal year?

out = unique(dt[, .(id, fyear, fyear_start, fyear_end)])

out[, prc_end := {
  dt[.(id = .SD$id, prc_month_end = .SD$fyear_start - 1L), on=.(id, prc_month_end), roll=TRUE, x.prc]
}]

      id fyear fyear_start  fyear_end prc_end
1: 59328  2001  2001-01-01 2001-12-31      NA
2: 59328  2002  2002-01-01 2002-12-31   31.45
3: 59328  2003  2003-01-01 2003-12-31   15.57
4: 61241  2001  2000-07-01 2001-06-30      NA
5: 61241  2002  2001-07-01 2002-06-30   15.86
6: 61241  2003  2002-07-01 2003-06-30    6.46

This is a rolling update join: For rows of table out

  • Construct the lookup vectors .(id, fyear_start - 1) using .SD = out, the subset of data
  • Lookup rows of dt, "rolling" the last vector, fyear_start - 1, to the nearest earlier date
  • Take matched values of x.prc, the prc column from dt

The notation x.* comes from the x[i] join/lookup syntax. For more details, see ?data.table.

Frank
  • 66,179
  • 8
  • 96
  • 180
  • Thank you for your answer and explanation. Just to be sure that I understand your code, the rolling vector looks for `prc_month_end` based on the nearest earlier date of fyear_start. I validated it on my dataset and it works great! – Patrick Jul 12 '19 at 21:51
  • 1
    @Patrick Glad to hear it works :) Yes, it looks for the date fyear_start - 1 and if it is not found, finds the nearest earlier date by id (possibly none, leading to the NA). The roll option always applies to the last vector/column in the `on=` join rules. – Frank Jul 12 '19 at 22:29
2

This works for your example but you'll want to double-check the logic--feels a little hackish to me. I will revisit later and think through it more. Hopefully this gets you started!

dt[, test := (shift(fyear_start, -1) - prc_month_end) > 0, by = id]
out <- dt[test == T | is.na(test)][, prc := shift(prc, 1), by = id]
out[, c("test", "prc_month_end") := NULL]

dt
      id fyear fyear_start  fyear_end   prc
1: 59328  2001  2001-01-01 2001-12-31    NA
2: 59328  2002  2002-01-01 2002-12-31 31.45
3: 59328  2003  2003-01-01 2003-12-31 15.57
4: 61241  2001  2000-07-01 2001-06-30    NA
5: 61241  2002  2001-07-01 2002-06-30 15.86
6: 61241  2003  2002-07-01 2003-06-30  6.46
Andrew
  • 5,028
  • 2
  • 11
  • 21
  • Thanks Andrew for your answer. It works for the reprex. However in my dataset (1,4 M observations) there are some observations with missing fyears. For those observations the `begin_prc` is not correct. With the answer of @Frank those get a 'NA' which is desired. – Patrick Jul 12 '19 at 21:56
  • No worries, @Patrick, glad you found a solution that works for you! Frank's solution is much more elegant!! Thanks for the explanation, too! – Andrew Jul 13 '19 at 01:25