0

I am having an extraordinarily difficult time dealing with -any- time series objects of some budget data.

The original data is 14,460 rows of payments on ~1800 contracts, where each row has a DD/MM/YYYY and Amount feature. There are 5296 days between 1/1/2000 and 12/31/2014, but only 3133 of these days actually had payments. The days are therefore irregularly spaced, with more than one contract payment showing up on some days, and zero payments on others.

The main issue I'm having is the brutal stubbornness these time series object exhibit when being fed daily data that happens at irregular intervals. I've even merged the payments to a continuous date vector and am still having the same issue, namely with frequency, periodicity, or order.by.

CTS_date_V <- data.frame(Date = seq(as.Date("2000/07/01"), as.Date("2014/12/31"), "days"))
exp_d <- merge(exp, CTS_date_V, by="Date", all.y = T)
exp_d$Amount[is.na(exp_d$Amount)] <- 0

head(exp_d[,c("Amount","Date")],20)
      Amount       Date
1        0.0 2000-07-01
2        0.0 2000-07-02
3        0.0 2000-07-03
4        0.0 2000-07-04
5   269909.4 2000-07-05
6   130021.9 2000-07-06
7  1454135.3 2000-07-06
8   140065.5 2000-07-07
9        0.0 2000-07-08
10       0.0 2000-07-09
11       0.0 2000-07-10
12  274147.2 2000-07-11
13  106959.2 2000-07-11
14  119208.6 2000-07-12
15       0.0 2000-07-13
16       0.0 2000-07-14
17       0.0 2000-07-15
18  125402.5 2000-07-16
19 1170603.1 2000-07-16
20 1908463.3 2000-07-16

Most of the forecasting packages I am familiar with (as well as any of the questions I have found asked so far on SO) like fpp, forecasting, timeSeries, tseries, xts, and the like require a much more orderly Date feature to order.by or some other such concern.

My concern is over the appropriateness of the R package, not the statistical method. For example, I've tried a few different ways of building the time-series objects needed for the forecasting packages, including XTS, TS, and all of them have issues with either the frequency, the periodicity, or are asking for order.by.

UPDATE:

I build my xts object with

exp_xts <- xts(exp_d$Amount, start = min(exp$Date), end = max(exp$Date), order.by=exp_d$Date, colnames = "Amount", frequency = "") 

head(exp_xts,15)
                [,1]
2000-07-01       0.0
2000-07-02       0.0
2000-07-03       0.0
2000-07-04       0.0
2000-07-05  269909.4
2000-07-06  130021.9
2000-07-06 1454135.3
2000-07-07  140065.5
2000-07-08       0.0
2000-07-09       0.0
2000-07-10       0.0
2000-07-11  274147.2
2000-07-11  106959.2
2000-07-12  119208.6
2000-07-13       0.0

without an issue, and that object can be plot.xts()ed, but when I try

fit_xts <- stl(exp_xts, s.window="periodic",robust = T) 

is says

Error in if (frequency > 1 && abs(frequency - round(frequency)) < ts.eps) frequency <- round(frequency) : missing value where TRUE/FALSE needed`
d8aninja
  • 3,233
  • 4
  • 36
  • 60
  • 2
    What is your goal? forecasting or some sort of time series models? `xts` can handle irregularly spaced data but `ts` can't. – Metrics Feb 19 '15 at 23:18
  • I don't think that error is associated with `stl` function (you can confirm that by just typing `stl` in R console). – Metrics Feb 19 '15 at 23:46
  • 1
    @Metrics, not `stl` immediately, but probably something called by `stl`. (Running `traceback()` would tell us for sure). – Gregor Thomas Feb 19 '15 at 23:53
  • 3
    @D8Amonk, you're trying a new and relatively advanced technique on large and complicated data. You could do a lot for yourself and for us by trying to whittle down your examples so that they are **minimal**. You show us the head of a bunch of columns of a data frame, but then most of them are irrelevant because you make an `xts` object out of two of them. Your question boils down to "I have irregular time-series data. I'd like to do a seasonal decomposition. Is this possible with `stl`? `dput(head(exp_xts, 20))`." Instead we get four paragraphs and error we can't reproduce. – Gregor Thomas Feb 20 '15 at 00:04
  • 1
    I've never worked with `xts` objects, but `stl` is built to work with regular `ts` objects that have a frequency. Without a frequency, you're not really giving `stl` anything to go in in terms of the length of the "season". You could maybe resample to try to create an evenly-spaced time-series to deseasonalize, but this is sounding like a question that should be on cross-validated. – Gregor Thomas Feb 20 '15 at 00:07
  • I have tried this with an evenly-spaced time-series (`CTS_date_V <- data.frame(Date = seq(as.Date("2000/07/01"), as.Date("2014/12/31"), "days"))`) to which I merge my data.frame of payments, which fills "no payment" day columns with NAs. In the Amount column of this data.frame, I have replaced NAs with 0s, so that I have a continuous date vector with the correct payments on the right dates, and 0s on days when no payments were made. (The original data includes 0s on some dates, too.) Same error.@Gregor – d8aninja Feb 20 '15 at 00:43
  • 1
    There is a package, `its`, for irregular time series. – IRTFM Feb 20 '15 at 00:51
  • Why don't you just sum all contract payments for each day and replace with 0 for which you don't have any payments. – Metrics Feb 20 '15 at 01:33
  • I'd like to be able to keep the individual contract payments separated but maybe this is just where I break my task up into two different models. @Metrics – d8aninja Feb 20 '15 at 01:47
  • @BondedDust my main concern about using anything other than the ts objects is that it doesn't look like stl, seasonplot, and the like take anything but these neatly ordered ts objects. – d8aninja Feb 20 '15 at 05:55
  • This stl error shows in case of duplicated dates. You can aggregate with `x<- period.apply( x, endpoints(x,'days'), sum)` – bergant Feb 20 '15 at 07:47
  • Its always a good idea to read the help file - `?stl` . `stl` is intended to be used with a `"ts"` class time series thus it requires a regularly spaced time series in which the time scale is such that a complete cycle is 1 and the frequency exceeds 1. Normally it is used for monthly or quarterly time series. It seems it can sometimes be used with other time series classes even though this was not its intention when written and here is an example using stl with an hourly zoo time series: http://stackoverflow.com/questions/4833008/feeding-an-hourly-zoo-time-series-into-function-stl – G. Grothendieck Apr 29 '15 at 14:47

2 Answers2

2

I tried using timeseries objects in R for a kaggle competition . What I found was that use timeseries predictions using the various timeseries forecast methods around didn't work well for me. What did work for me was to create a normal standard R dataframe, and create a neural network, based on contextual data, like: temperature, day of the week, day of the year, is today a holiday or not, and so on.

What this could mean for you, since you're not doing prediction, but simple statistical analysis is, maybe you don't need the time series functionality at all, and could simply use a standard 'R' dataframe?

I came 9th in the end, using a standard dataframe, and a neural net, no time series stuff :-)

Hugh Perkins
  • 7,975
  • 7
  • 63
  • 71
  • if I wanted to produce by hand the same type of estimates / outputs that the seasonplot() and stl() methods were giving me, how would I do that? – d8aninja Feb 20 '15 at 05:56
1

I think that it might be related with the following problem I encountered recently.

I tried to run autocorrelation function on time series (acf()). Data were converted into suitable time series format using xts/zoo package. However, acf() is a function, which exists in R without installing any package, so it is adjusted to data converted into time series by more 'traditional' function, which in this case is ts(). So this code returned the same error as in your case:

ts<- xts(dane.filtered$CRO, dane.filtered$Date_xts)
acf(ts, col="red")

The solution is to create time series using default time series function built into R (this code runs perfectly fine):

ts <- ts(dane.filtered$CRO)
acf(ts, col="red")

Hope it helps.

ZygD
  • 22,092
  • 39
  • 79
  • 102