0

I am using R to connect to WRDS. Now, I would like to link compustat and crsp tables. In SAS, this would be achieved using macros and the CCM link table. What would be the best way to approach this topic in R?

PROGRESS UPDATE:

I downloaded crsp, compustat and ccm_link tables from wrds.

sql <- "select * from CRSP.CCMXPF_LINKTABLE"
res <- dbSendQuery(wrds, sql)
ccmxpf_linktable <- fetch(res, n = -1)
ccm.dt <- data.table(ccmxpf_linktable)
rm(ccmxpf_linktable)

I am then converting the suggested matching routine from the wrds event study sas file into R:

ccm.dt[,typeflag:=linktype %in% c("LU","LC","LD","LN","LS","LX") & USEDFLAG=="1"]
setkey(ccm.dt, gvkey, typeflag)
for (i in 1:nrow(compu.dt)) {
  gvkey.comp = compu.dt[i, gvkey]
  endfyr.comp = compu.dt[i,endfyr]
  PERMNO.val <- ccm.dt[.(gvkey.comp, TRUE),][linkdt<=endfyr.comp & endfyr.comp<=linkenddt,lpermno]
  if (length(PERMNO.val)==0) PERMNO.val <- NA
  suppressWarnings(compu.dt[i, "PERMNO"] <- PERMNO.val)
}

However, this code is fantastically inefficient. I started out with data.table, but do not really understand how to apply the logic in the for-loop. I am hoping that some could point me to a way how to improve the for-loop.

FlipperPA
  • 13,607
  • 4
  • 39
  • 71

2 Answers2

0

Matching fields in stages works better. maybe someone finds this useful. Any suggestions for further improvement are of course very welcome!!!

# filter on ccm.dt
ccm.dt <- ccm.dt[linktype %in% c("LU","LC","LD","LN","LS","LX") & USEDFLAG=="1"]
setkey(ccm.dt, gvkey)
setkey(compu.dt, gvkey)
compu.merged <- merge(compu.dt, ccm.dt, all.x = TRUE, allow.cartesian = TRUE)
# deal with NAs in linkenddt - set NAs to todays date, assuming they still exist.
today <- as.character(Sys.Date())
compu.merged[is.na(linkenddt), "linkenddt":=today]
# filter out date mismatches
compu <- compu.merged[linkdt <= endfyr & endfyr<=linkenddt]
  • Are you sure `compu.merged <- merge(compu.dt, ccm.dt, all.x = TRUE, allow.cartesian = TRUE)` is the correct way to merge them? Do you have any helpful update? – Konstantinos Jun 02 '16 at 18:15
  • no, not sure. I also have not revisited this subject in favor of wrds/sas macro. – Timo Predoehl Jul 04 '16 at 07:45
0

No loops should be needed. Just use SQL (perhaps using dbplyr inside R). See here.

Ian Gow
  • 3,098
  • 1
  • 25
  • 31