-2

I need to create stock data from a data frame. The data frame contains a begin and end date, which signify the period an item was in stock. I want to aggregate stock levels per item, then create a time series with the data.

I have data of the form:

A <- c("a","b","a","b","c")
begindate <- as.Date(c("2014-01-01", "2014-01-03", "2014-01-03", "2014-01-02", "2014-01-02"))
enddate <- as.Date(c("2014-01-04", "2014-01-05", "2014-01-06", "2014-01-04", "2014-01-06"))
source <- data.frame(A, begindate, enddate)
source
  A  begindate    enddate
1 a 2014-01-01 2014-01-04
2 b 2014-01-03 2014-01-05
3 a 2014-01-03 2014-01-06
4 b 2014-01-02 2014-01-04
5 c 2014-01-02 2014-01-06

what I want to create from this data is a time series like

    2014-01-01  2014-01-02  2014-01-03  2014-01-04  2014-01-05  2014-01-06
a   1   1   2   2   1   1
b       1   2   2   1   
c       1   1   1   1   1

the original data is rather large, about 180k rows. what is an effective way to do this?

EDIT

The answer given by David Arenburg worked well

library(data.table)
library(reshape2)
setDT(mydata)[, indx := .I]
mydata <- mydata[, list(A = A, seq(begindate, enddate, by = 1)), by = indx]

but for my data it was rather slow. Adding an intermediate step significantly sped up the casting operation.

# intermediate step (pre-aggregation)
mydata_aggregated <- mydata[, list(number_cases = length(indx)), by = list(A, V2)]
# casting over the aggregated list
mydata_series <- dcast(mydata_aggregated, V2 ~ A, value.var = "number_cases") # note
# that I switched the rows and columns, since I found that its easier to pass this
# data to zoo or xts

# creating the zoo object
mydata_zoo <- zoo(mydata_series[,-1],mydata_series[,1])
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
Chris
  • 55
  • 5

1 Answers1

2

If your data set is big, I would use data.table

library(data.table)
library(reshape2)
setDT(source)[, indx := .I]
source <- source[, list(A = A, seq.int(begindate, enddate, by = 1)), by = indx]
dcast.data.table(source, A ~ V2, value.var = "V2", length)

##  A 2014-01-01 2014-01-02 2014-01-03 2014-01-04 2014-01-05 2014-01-06
##1 a          1          1          2          2          1          1
##2 b          0          1          2          2          1          0
##3 c          0          1          1          1          1          1

Just a side note, source is a stored function in R, thus try to use other name for your data set

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • the solution seems to work - seems because the process is running since about 2 hours and still isn't done calculating. my original data has ~18k different indices (the intermediate table created by the list function is 6 million rows long). – Chris Jun 23 '14 at 11:30
  • I know that R can handle multivariate time series formats. xts and zoo are able to handle multivariate time series, but they require that each row of data has the same date (a transposed version of the matrix I was asking for). is there a more effective way to create the time series object, without the intermediate matrix? the data has ~18k unique indexes and in concentrated form almost 75k observations. the date range required is at least one year, with daily data. do you know of a more effective way to handle this calculation? or is there no way around the intermediate matrix? – Chris Jun 23 '14 at 12:10
  • I'm not sure I understand your question. Matrices and lists are usually the fastest ways to handle data in R (compared to data frame). – David Arenburg Jun 23 '14 at 13:31
  • I want to bild a time series object from the original data frame, but the functions generating the series (xts or zoo) require a specific input format as far a I know. The building of the intermediate matrix with the dcast operator takes quite a long time. I was wondering if there is a faster way to get the time series object from the original data frame. – Chris Jun 24 '14 at 10:47
  • `dcast` is fully optimized function. I think that the `seq` part in the `data.table` takes the most time. Try to play around with `xts` or `zoo` and see if you can achieve something. If not, post another question, provide a reproducible example, show what you've tried and the desired output – David Arenburg Jun 24 '14 at 10:51
  • I found that pre-aggregating the list significantly sped up the casting operation. Edited my original post. Thanks for your help! – Chris Jun 25 '14 at 07:46
  • Replace `seq` with `seq.int()`. The latter is a primitive (much faster) while the former is a generic. – Arun Jun 28 '14 at 19:51