Align dates in R date.table for linear regression

Question

I am having a data.table with returns on n dates for m securities. I would like to do a multiple linear regression in the form of lm(ReturnSec1 ~ ReturnSec2 + ReturnSec3 + ... + ReturnSecM). The problem that I am having is that there might be dates missing for some of the securities and the regression should be on aligned dates. Here is what I have come up with so far:

#The data set
set.seed(1)
dtData <- data.table(SecId = rep(c(1,2,3), each= 4), Date = as.Date(c(1,2,3,5,1,2,4,5,1,2,4,5)), Return = round(rnorm(12),2))

#My solution so far
dtDataAligned <- merge(dtData[SecId == 1,list(Date, Return)], dtData[SecId == 2, list(Date, Return)], by='Date', all=TRUE)
dtDataAligned <- merge(dtDataAligned, dtData[SecId == 3,list(Date, Return)], by='Date',  all=TRUE)
setnames(dtDataAligned, c('Date', 'Sec1', 'Sec2', 'Sec3'))
dtDataAligned[is.na(dtDataAligned)] <- 0

#This is what i want to do
fit <- lm(dtDataAligned[, Sec1] ~ dtDataAligned[, Sec2] + dtDataAligned[, Sec3])

Is there a better (more elegant, possibly faster) way of doing this without having to loop and merge the data.table to perform a regression on the values with aligned dates?

G. Grothendieck · Answer 1 · 2014-04-23T15:52:38.767

3

If the question is how to reproduce the output from the code shown in the question in a more compact fashion then try this:

library(zoo)

z <- read.zoo(dtData, split = 1, index = 2)
z0 <- na.fill(z, fill = 0)
lm(`1` ~., z0)

ADDED

Regarding the comment about elegance we could create a magrittr package pipeline out of the above like this:

library(magrittr)

dtData %>% 
    read.zoo(split = 1, index = 2) %>%
    na.fill(fill = 0) %>%
    lm(formula = `1` ~.)

edited Apr 23 '14 at 15:52

answered Apr 23 '14 at 12:25

G. Grothendieck

254,981
17
203
341

+1 for `.` in `lm`, keep forgetting that is an option. – BrodieG Apr 23 '14 at 12:50
Thanks for the answer. This is exactly what I intended to do. Sorry for being unclear about the loop. In my example there are only 3 securities, however in my problem I have many more, therefore requiring a loop to merge them together! – Wolfgang Wu Apr 23 '14 at 14:17

score 3 · Accepted Answer · answered Apr 23 '14 at 12:48

Here is a data.table solution using dcast.data.table, which takes data in the long format (your input) and converts it to the wide format required for the lm call.

lm(`1` ~ ., dcast.data.table(dtData, Date ~ SecId, fill=0))

Here is the output of the dcast call:

         Date     1     2     3
1: 2014-01-02 -0.63  0.33  0.58
2: 2014-01-03  0.18 -0.82 -0.31
3: 2014-01-04 -0.84  0.00  0.00
4: 2014-01-05  0.00  0.49  1.51
5: 2014-01-06  1.60  0.74  0.39

I stole the lm piece from @G.Grothendieck. Note that if you have more than three columns in your real data you will need to specify the value.var parameter for dcast.data.table.

Great stuff! I didn't know about the dcast function. I just updated my data.table library and this works perfect. — Wolfgang Wu, Apr 23 '14 at 14:18

Align dates in R date.table for linear regression

2 Answers2