R - create Stock data from data frame with begin and end date

Question

I need to create stock data from a data frame. The data frame contains a begin and end date, which signify the period an item was in stock. I want to aggregate stock levels per item, then create a time series with the data.

I have data of the form:

A <- c("a","b","a","b","c")
begindate <- as.Date(c("2014-01-01", "2014-01-03", "2014-01-03", "2014-01-02", "2014-01-02"))
enddate <- as.Date(c("2014-01-04", "2014-01-05", "2014-01-06", "2014-01-04", "2014-01-06"))
source <- data.frame(A, begindate, enddate)
source
  A  begindate    enddate
1 a 2014-01-01 2014-01-04
2 b 2014-01-03 2014-01-05
3 a 2014-01-03 2014-01-06
4 b 2014-01-02 2014-01-04
5 c 2014-01-02 2014-01-06

what I want to create from this data is a time series like

    2014-01-01  2014-01-02  2014-01-03  2014-01-04  2014-01-05  2014-01-06
a   1   1   2   2   1   1
b       1   2   2   1   
c       1   1   1   1   1

the original data is rather large, about 180k rows. what is an effective way to do this?

EDIT

The answer given by David Arenburg worked well

library(data.table)
library(reshape2)
setDT(mydata)[, indx := .I]
mydata <- mydata[, list(A = A, seq(begindate, enddate, by = 1)), by = indx]

but for my data it was rather slow. Adding an intermediate step significantly sped up the casting operation.

# intermediate step (pre-aggregation)
mydata_aggregated <- mydata[, list(number_cases = length(indx)), by = list(A, V2)]
# casting over the aggregated list
mydata_series <- dcast(mydata_aggregated, V2 ~ A, value.var = "number_cases") # note
# that I switched the rows and columns, since I found that its easier to pass this
# data to zoo or xts

# creating the zoo object
mydata_zoo <- zoo(mydata_series[,-1],mydata_series[,1])

I am rather new to time series within R. What I have tried is to create seq(begindate, enddate) objects, but I don't know how to aggregate them, since they all have different starting and end dates. — Chris, Jun 23 '14 at 08:34
This question isn't too broad and is very clear what he is asking for. So no need to close it for those two reasons — David Arenburg, Jun 23 '14 at 10:06
No need to. When you accepted an answer it means it's solved — David Arenburg, Jun 25 '14 at 07:51

David Arenburg · Accepted Answer · 2014-06-28T20:14:19.977

2

If your data set is big, I would use data.table

library(data.table)
library(reshape2)
setDT(source)[, indx := .I]
source <- source[, list(A = A, seq.int(begindate, enddate, by = 1)), by = indx]
dcast.data.table(source, A ~ V2, value.var = "V2", length)

##  A 2014-01-01 2014-01-02 2014-01-03 2014-01-04 2014-01-05 2014-01-06
##1 a          1          1          2          2          1          1
##2 b          0          1          2          2          1          0
##3 c          0          1          1          1          1          1

Just a side note, source is a stored function in R, thus try to use other name for your data set

edited Jun 28 '14 at 20:14

answered Jun 23 '14 at 08:50

David Arenburg

91,361
17
137
196

the solution seems to work - seems because the process is running since about 2 hours and still isn't done calculating. my original data has ~18k different indices (the intermediate table created by the list function is 6 million rows long). – Chris Jun 23 '14 at 11:30
I know that R can handle multivariate time series formats. xts and zoo are able to handle multivariate time series, but they require that each row of data has the same date (a transposed version of the matrix I was asking for). is there a more effective way to create the time series object, without the intermediate matrix? the data has ~18k unique indexes and in concentrated form almost 75k observations. the date range required is at least one year, with daily data. do you know of a more effective way to handle this calculation? or is there no way around the intermediate matrix? – Chris Jun 23 '14 at 12:10
I'm not sure I understand your question. Matrices and lists are usually the fastest ways to handle data in R (compared to data frame). – David Arenburg Jun 23 '14 at 13:31
I want to bild a time series object from the original data frame, but the functions generating the series (xts or zoo) require a specific input format as far a I know. The building of the intermediate matrix with the dcast operator takes quite a long time. I was wondering if there is a faster way to get the time series object from the original data frame. – Chris Jun 24 '14 at 10:47
`dcast` is fully optimized function. I think that the `seq` part in the `data.table` takes the most time. Try to play around with `xts` or `zoo` and see if you can achieve something. If not, post another question, provide a reproducible example, show what you've tried and the desired output – David Arenburg Jun 24 '14 at 10:51
I found that pre-aggregating the list significantly sped up the casting operation. Edited my original post. Thanks for your help! – Chris Jun 25 '14 at 07:46
Replace `seq` with `seq.int()`. The latter is a primitive (much faster) while the former is a generic. – Arun Jun 28 '14 at 19:51

R - create Stock data from data frame with begin and end date

1 Answers1