-1

I have a data frame (DF) with several columns, but the target columns are date, index and site. A subset table is here: https://www.dropbox.com/s/48165ey5rsv628c/DATA.csv?dl=0

SITE    date        index
A       2006.001    0.394
A       ..          1.408
A       2015.353    1.295
B       2006.001    0.176
B       ..          2.354
B       2015.353    0.417
C       2006.001    0.232
C       ..          1.733
C       2015.353    0.653

The Time-Series start in 2006 julian day 1 and end in 2015 jd 353 with 23 observations for year.

INDEX_TS <- ts(DF$index, start = c(2006,1), end = c(2015,23), frequency = 23)

Then i decompose it with stl and obtein the seasonal, trend and remainder for each date.

stl(INDEX_TS, 12)

 Call:
 stl(x = INDEX_TS, s.window = 12)

Components
Time Series:
Start = c(2008, 18) 
End = c(2017, 16) 
Frequency = 23 
             seasonal     trend   remainder
2006.000  0.244352688 0.9678620 -0.34804205
...       ...         ...       ...
2015.957  0.191399568 1.5224135  0.57215711

To extract to table the seasonal, trend and remainder:

STL12 <- stl(INDEX_TS, 12)
DF_STL <- data.frame(STL12, INDEX_TS$time.series)

But only result in a df with index, seasonal, trend and remainder.

I can do it for each site separatelly, subsetting the DF by each one, but the real DF have many different site names.

The final DF that i need is one with de decompose values for each site, like:

SITE    date        index    seasonal    trend     remainder
A       2006.001    0.394    x1        y1        z1
A       ..          1.408    x2        y2        z2
A       2015.353    1.295    x3        y3        z3
B       2006.001    0.176    x4        y4        z4
B       ..          2.354    x5        y5        z5
B       2015.353    0.417    x6        y6        z6
C       2006.001    0.232    x7        y7        z7
C       ..          1.733    x8        y8        z8
C       2015.353    0.653    x9        y9        z9
OSCAR_P
  • 13
  • 4

1 Answers1

0

Try the following. This splits the dataframe by SITE, store index in each SITE as a ts, decompose and combines with the original subset. Finally, do.call rbinds all subsets + decopositions into a final dataframe:

Reproducible example:

AirData = data.frame(AirPassengers, SITE = rep(c("A", "B", "C"), each = 48))

do.call(rbind, lapply(split(AirData, AirData$SITE), function(x) {
  INDEX_TS <- ts(x$AirPassengers, frequency = 12)
  STL12 <- stl(INDEX_TS, 12)$time.series
  return(data.frame(x, STL12))
}))

Result:

     AirPassengers SITE   seasonal    trend  remainder
A.1            112    A -13.986104 123.5683  2.4177707
A.2            118    A  -7.759212 124.1061  1.6531607
A.3            132    A   8.325181 124.6438 -0.9689496
A.4            129    A  -1.887274 125.1815  5.7057890
A.5            121    A  -5.517268 125.7871  0.7302135
A.6            135    A  12.098461 126.3926 -3.4910836
A.7            148    A  27.559203 126.9982 -6.5573953
A.8            148    A  28.502489 127.5898 -8.0922545
A.9            136    A   9.726413 128.1813 -1.9077517
A.10           119    A -12.472175 128.7729  2.6992637
A.11           104    A -31.553871 129.7343  5.8195429
A.12           118    A -13.061798 130.6957  0.3660530
A.13           115    A -13.978583 131.6572 -2.6785793
A.14           126    A  -7.772715 133.1337  0.6389980
A.15           141    A   8.281701 134.6103 -1.8919729
A.16           135    A  -2.206362 136.0868  1.1195345
A.17           125    A  -5.580592 137.8077 -7.2271040
A.18           149    A  12.368207 139.5286 -2.8967712
A.19           170    A  27.747586 141.2494  1.0029824
A.20           170    A  28.926157 143.7331 -2.6593066
...            ...  ...        ...      ...        ...

OP's example:

do.call(rbind, lapply(split(DF, DF$SITE), function(x) {
  INDEX_TS <- ts(x$index, start = c(2006,1), end = c(2015,23), frequency = 23)
  STL12 <- stl(INDEX_TS, 12)$time.series
  return(data.frame(x, STL12))
}))
acylam
  • 18,231
  • 5
  • 36
  • 45
  • `Error in ts(x$LAI, start = c(2006, 1), end = c(2017, 17), frequency = 23) : 'ts' object must have one or more observations` i think it can be useful but send me error – OSCAR_P Dec 01 '17 at 16:58
  • @OSCAR_P Are you sure that each `SITE` has the same number of observations? This code that you ran is not the same as the code in my solution. Also, please provide a reproducible example by copy and pasting the output of `dput(DF)` into your question. – acylam Dec 01 '17 at 17:01
  • i corrected that. Effectively the example runs very well, but looking, the main difference is in the AirData table, where airpassengeers are in time series. Now i obtain `Error in stl(INDEX_TS, 12) : NA/NaN/Inf in foreign function call (arg 1) Called from: stl(INDEX_TS, 12)` – OSCAR_P Dec 01 '17 at 20:35