1

I am using xts and am loading 100 to 1000 files in a loop. Each file is between 50k to 300k lines. I am using the latest version of R 2.15.1 on Windows 7 64 bit. I have the same issue on Ubuntu Linux with R version 2.14.X.

The code below will regularly crash R:

library(xts)
N <- 1e6
for(i in 1:1000) {
  allTimes <- Sys.time()-N:1
  x <- NULL
  x <- xts(,allTimes)
  sampTimes <- allTimes[seq(1,length(allTimes),by=2)]
  y <- merge(xts(seq_along(sampTimes), sampTimes), allTimes)
  y <- na.locf(y)
  y <- to.period(y, 'seconds', 10)
  index(y) <- index(to.period(x, 'seconds', 10))
}
Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
Dave
  • 2,386
  • 1
  • 20
  • 38
  • It sounds like you're running out of memory. You say you tried this on Ubuntu, but you don't say whether crashes occurred. Do you have swap space on your Ubuntu machine? – Joshua Ulrich Jul 22 '12 at 00:39
  • Care to share the code you're using? You can try debugging with browser(). That way, you "get into" the function and get to poke around. It's a lot of fun. – Roman Luštrik Jul 22 '12 at 07:25
  • A couple of things. It seems that R does not like the usage of print and reading files. Using a progress bar has helped R behave. R Studio still has issues. Applies to both linux and windows. – Dave Jul 22 '12 at 18:34
  • I just upgraded R to latest on the linux box. Now seeing this. > DATTick <- parseTickDataFromDir(tickerDirSecond, "seconds",10, fun) |======= | 10% *** caught segfault *** address 0x10, cause 'memory not mapped' Traceback: 1: scan(file = inputFile, sep = ",", skip = 1, what = list(Date = "", Time = "", Close = 0, Volume = 0), quiet = T) 2: parseTickData(tickerAbsFilenames[i]) 3: parseTickDataFromDir(tickerDirSecond, "seconds", 10, fun) – Dave Jul 22 '12 at 19:05
  • 1
    [Cross-posted on R-devel](https://stat.ethz.ch/pipermail/r-devel/2012-July/064466.html) – Joshua Ulrich Jul 22 '12 at 20:46

1 Answers1

4

This was answered on R-devel. The issue was calling to.period on a zero-width xts object would return a OHLC data of random memory locations. For example:

library(xts)
x <- xts(,Sys.time()-10:1)
y <- to.period(x)
y
#                           x.Open       x.High         x.Low       x.Close
# 2012-07-23 15:47:30 4.25426e-314 2.36246e-300 1.428936e-316 1.428936e-316

Since aggregating "no data" doesn't make sense, I have patched to.period to throw an error when run on zero-width/length objects (revision 690 on R-Forge).

Instead of running to.period on a zero-width object, just create a temporary xts object full of ones and run to.period on that. This will work with the xts currently on CRAN.

library(xts)
N <- 1e6
for(i in 1:100) {
  allTimes <- Sys.time()-N:1
  x <- NULL
  x <- xts(,allTimes)
  sampTimes <- allTimes[seq(1,length(allTimes),by=2)]
  y <- merge(xts(seq_along(sampTimes), sampTimes), allTimes)
  y <- na.locf(y)
  y <- to.period(y, 'seconds', 10)
  tmp <- xts(rep(1,length(allTimes)), allTimes)
  index(y) <- index(to.period(tmp, 'seconds', 10))
}
Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418