The key issue of this question is how Period
is mapped to Date
. From OP's decription I have understood that each period includes the last day of the actual month plus the first three days into the next month, 4 days in total.
This can be solved with some date arithmetic and a right join:
library(data.table)
result <-
# coerce to data.table
setDT(dailycds)[
# compute period by subtracting 3 days of date
, Period := format(as.IDate(Date, "%d-%m-%Y") - 3L, "%Y%m")][
# right join, dropping all rows from dailycds without matching period
periodicassets, on = "Period"][
# change column order to be in line with expected result df
, setcolorder(.SD, names(df))]
result
Period Date Assets CDS
1: 201506 30-06-2015 1314 194
2: 201506 01-07-2015 1314 195
3: 201506 02-07-2015 1314 198
4: 201506 03-07-2015 1314 198
5: 201606 30-06-2016 2134 165
6: 201606 01-07-2016 2134 172
7: 201606 02-07-2016 2134 213
8: 201606 03-07-2016 2134 123
There are only 4 rows per period as requested and the result is in line with the expected result df
:
all.equal(df, as.data.frame(result[, lapply(.SD, forcats::fct_drop)]))
[1] TRUE
Unused levels have to be dropped to pass the strict checks of all.equal()
Caveat
The code has been tested to work with the sample data provided. In case of continuous daily as well as periodic data it might be necessary to add code to remove days which do not belong to the 4 days period.
Edit: More realistic sample data
The OP has updated his question and is providing more realistic sample data via dropbox. Now, dailycds
contains daily data (except for weekends). As already mentioned in the Caveat above, this requires dailycds
to be filtered for the relevant days.
The OP isn't clear how the days to be considered before and after the turn of the month are defined. Here, we assume that 3 days before the end of the month and 6 days afterwards refers to calendar days and not business days.
# define day range of interest relativ to turn of the month
days_before <- 3L
days_after <- 6L
stopifnot(days_before + days_after < 28)
# read data from dropbox links, note ?dl=1
dailycds <- readRDS(url("https://www.dropbox.com/s/r7v5dq6la0mnn71/dailycds.RDS?dl=1"))
periodicassets <-
readRDS(url("https://www.dropbox.com/s/gdflcngwp8nm552/periodicassets.RDS?dl=1"))
library(data.table)
# coerce to data.table
setDT(dailycds)[
# filter calendar dates
mday(Date) <= days_after | mday(Date) > lubridate::days_in_month(Date) - days_before][
# compute period by shifting dates from next month into actual month
# coersion to IDate is required because Date is of class POSIXct
, Period := format(as.IDate(Date) - days_after, "%Y%m")][
# right join, dropping all rows from dailycds without matching period
setDT(periodicassets), on = "Period"][]
Date CDS Period BankName value
1: 2015-01-06 48.633000000000003 201412 BPCE 112189.50
2: 2015-01-05 46.670999999999999 201412 BPCE 112189.50
3: 2015-01-02 45.158000000000001 201412 BPCE 112189.50
4: 2015-01-01 47.32 201412 BPCE 112189.50
5: 2014-12-31 47.658000000000001 201412 BPCE 112189.50
6: 2014-12-30 45.843000000000004 201412 BPCE 112189.50
7: 2014-12-29 47.588999999999999 201412 BPCE 112189.50
8: 2015-02-06 47.265000000000001 201501 BPCE 103142.06
9: 2015-02-05 47.073999999999998 201501 BPCE 103142.06
10: 2015-02-04 46.634999999999998 201501 BPCE 103142.06
11: 2015-02-03 46.405000000000001 201501 BPCE 103142.06
12: 2015-02-02 47.567 201501 BPCE 103142.06
13: 2015-01-30 47.396000000000001 201501 BPCE 103142.06
14: 2015-01-29 48.448999999999998 201501 BPCE 103142.06
15: 2015-01-06 48.633000000000003 201412 Credit Agricole 81618.76
16: 2015-01-05 46.670999999999999 201412 Credit Agricole 81618.76
...
26: 2015-02-02 47.567 201501 Credit Agricole 73987.36
27: 2015-01-30 47.396000000000001 201501 Credit Agricole 73987.36
28: 2015-01-29 48.448999999999998 201501 Credit Agricole 73987.36
Date CDS Period BankName value
Edit 2: Using business days instead of calendar dates.
The OP has clarified that he is using buiness days instead of calendar days. This seemingly minor change of the specification has a severe impact on the way the dates to be included are selected.
Now, always the first 6 entries in each month are picked as well as the last 3 entries before the last trading day of the month (ultimo) and the ultimo itself which results in 3 + 1 + 6 = 10 business days to pick.
# define range of business days relative to the last trading day (ultimo)
days_before <- 3L
days_after <- 6L
stopifnot(days_before + days_after < 28)
library(data.table)
# read data from dropbox links, note ?dl=1
dailycds <- readRDS(url("https://www.dropbox.com/s/r7v5dq6la0mnn71/dailycds.RDS?dl=1"))
periodicassets <- readRDS(url("https://www.dropbox.com/s/gdflcngwp8nm552/periodicassets.RDS?dl=1"))
# coerce to data.table
setDT(dailycds)[
# filter business dates:
# for each month pick the first days_after business days into the month
# and the last days_before biz days before and including ultimo
dailycds[, c(head(.I, days_after), tail(.I, days_before + 1L)),
by = .(year(Date), month(Date))]$V1][
# compute period by shifting dates from next month into actual month
# coersion to IDate is required because Date is of class POSIXct
, Period := format(as.IDate(Date) - days_after, "%Y%m")][
# right join, dropping all rows from dailycds without matching period
setDT(periodicassets), on = "Period"][]
Date CDS Period BankName value
1: 2015-01-06 48.633000000000003 201412 BPCE 112189.50
2: 2015-01-05 46.670999999999999 201412 BPCE 112189.50
3: 2015-01-02 45.158000000000001 201412 BPCE 112189.50
4: 2015-01-01 47.32 201412 BPCE 112189.50
5: 2014-12-31 47.658000000000001 201412 BPCE 112189.50
6: 2014-12-30 45.843000000000004 201412 BPCE 112189.50
7: 2014-12-29 47.588999999999999 201412 BPCE 112189.50
8: 2014-12-26 47.625999999999998 201412 BPCE 112189.50
9: 2014-12-25 47.697000000000003 201412 BPCE 112189.50
10: 2014-12-24 47.414999999999999 201412 BPCE 112189.50
11: 2015-02-05 47.073999999999998 201501 BPCE 103142.06
12: 2015-02-04 46.634999999999998 201501 BPCE 103142.06
13: 2015-02-03 46.405000000000001 201501 BPCE 103142.06
14: 2015-02-02 47.567 201501 BPCE 103142.06
15: 2015-01-30 47.396000000000001 201501 BPCE 103142.06
16: 2015-01-29 48.448999999999998 201501 BPCE 103142.06
17: 2015-01-28 49.442 201501 BPCE 103142.06
18: 2015-01-27 49.502000000000002 201501 BPCE 103142.06
19: 2015-01-26 49.73 201501 BPCE 103142.06
20: 2015-01-23 50.917000000000002 201501 BPCE 103142.06
21: 2015-01-06 48.633000000000003 201412 Credit Agricole 81618.76
22: 2015-01-05 46.670999999999999 201412 Credit Agricole 81618.76
...
39: 2015-01-26 49.73 201501 Credit Agricole 73987.36
40: 2015-01-23 50.917000000000002 201501 Credit Agricole 73987.36
Date CDS Period BankName value
Note that the result data set contains (3 + 1 + 6) * 2 months * 2 banks = 40 rows.
Data from dropbox
In case the dropbox links break:
dailycds <-
structure(list(Date = structure(c(1424649600, 1424390400, 1424304000,
1424217600, 1424131200, 1424044800, 1423785600, 1423699200, 1423612800,
1423526400, 1423440000, 1423180800, 1423094400, 1423008000, 1422921600,
1422835200, 1422576000, 1422489600, 1422403200, 1422316800, 1422230400,
1421971200, 1421884800, 1421798400, 1421712000, 1421625600, 1421366400,
1421280000, 1421193600, 1421107200, 1421020800, 1420761600, 1420675200,
1420588800, 1420502400, 1420416000, 1420156800, 1420070400, 1419984000,
1419897600, 1419811200, 1419552000, 1419465600, 1419379200, 1419292800,
1419206400, 1418947200, 1418860800, 1418774400, 1418688000, 1418601600,
1418342400, 1418256000, 1418169600, 1418083200, 1417996800, 1417737600,
1417651200, 1417564800, 1417478400, 1417392000, 1417132800, 1417046400,
1416960000, 1416873600, 1416787200, 1416528000, 1416441600, 1416355200,
1416268800, 1416182400, 1415923200, 1415836800, 1415750400, 1415664000,
1415577600, 1415318400, 1415232000, 1415145600, 1415059200, 1414972800
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), CDS = c("44.259",
"44.555999999999997", "45.076999999999998", "44.951000000000001",
"45.762", "45.573", "45.634999999999998", "45.956000000000003",
"47.064", "47.51", "48.576999999999998", "47.265000000000001",
"47.073999999999998", "46.634999999999998", "46.405000000000001",
"47.567", "47.396000000000001", "48.448999999999998", "49.442",
"49.502000000000002", "49.73", "50.917000000000002", "51.37",
"52.536999999999999", "49.188000000000002", "47.893999999999998",
"46.728000000000002", "46.634999999999998", "46.366999999999997",
"47.012999999999998", "46.869", "48.121000000000002", "48.625999999999998",
"48.801000000000002", "48.633000000000003", "46.670999999999999",
"45.158000000000001", "47.32", "47.658000000000001", "45.843000000000004",
"47.588999999999999", "47.625999999999998", "47.697000000000003",
"47.414999999999999", "48.075000000000003", "48.085999999999999",
"47.496000000000002", "46.534999999999997", "48.149000000000001",
"49.421999999999997", "48.223999999999997", "47.100999999999999",
"47.484999999999999", "47.491999999999997", "47.052", "46.697000000000003",
"44.670999999999999", "47.706000000000003", "46.835000000000001",
"48.66", "46.841999999999999", "48.069000000000003", "49.49",
"50.155000000000001", "50.155000000000001", "50.49", "52.024000000000001",
"50.33", "50", "50.67", "53.15", "52.994999999999997", "55.31",
"50.82", "50.49", "50.832999999999998", "52.241", "51.97", "52.8",
"50.667000000000002", "51.134999999999998")), .Names = c("Date",
"CDS"), row.names = c(NA, -81L), class = c("tbl_df", "tbl", "data.frame"))
periodicassets <-
structure(list(BankName = c(" BPCE", " BPCE", " Credit Agricole",
" Credit Agricole"), Period = c("201412", "201501", "201412",
"201501"), value = c(112189.50293406, 103142.064337463, 81618.762099507,
73987.36251389)), .Names = c("BankName", "Period", "value"), row.names = c(10L,
11L, 18L, 19L), class = "data.frame")