Testing multiple columns in a time series simultaneously

Question

library("xts")
data1<- cbind(a = c(1,2,3,4,5,6,5,4,3,4,5,6,5,4,3,5),
              b = c(1,2,3,4,5,6,5,4,3,4,5,6,5,4,3,5),
              c = c(1,2,3,4,5,6,5,4,5,4,5,4,5,4,5,2),
              d = c(1,2,3,4,5,6,5,4,1,1,1,1,1,2,3,2))
data<- xts(data1, Sys.Date() - (16:1))

data

           a b c d
2013-07-09 1 1 1 1
2013-07-10 2 2 2 2
2013-07-11 3 3 3 3
2013-07-12 4 4 4 4
2013-07-13 5 5 5 5
2013-07-14 6 6 6 6
2013-07-15 5 5 5 5
2013-07-16 4 4 4 4
2013-07-17 5 3 5 1
2013-07-18 4 4 4 1
2013-07-19 5 5 5 1
2013-07-20 4 6 4 1
2013-07-21 5 5 5 1
2013-07-22 4 4 4 2
2013-07-23 3 3 5 3
2013-07-24 5 5 2 2

I have a data set which contains 100 such columns. I need a method or to define a function which can tell me how many such columns are, say above the 5 days SMA (moving average) on a given day. If I give a specific date and 5 days SMA, I should get the number of columns above that SMA and, if possible, the column names too.

shadow · Accepted Answer · 2013-07-26T08:48:17.973

2

You can use whichand then tabulate, order, etc.

all <- which(data>5, arr.ind=TRUE)
table(all[,"row"])
all[order(all[,"row"]),]
split(all, all[,"row"])

EDIT: For the rolling mean, you can calculate the rolling mean first and then procede as mentioned above.

sra <- apply(data, 2, rollmean, k=5)
all <- which(sra>5, arr.ind=TRUE)

EDIT2: You can also get the dates, if you use rownames(all).

table(rownames(all))
split(all, rownames(all))

EDIT3: Apparently I missunderstood the question. The problem with the names comes from the apply function. If you use lapply instead, you get the desired rownames. Then you can cbind it with the data to get NA's for the first and last 2 days.

sra <- do.call(cbind, lapply(data, rollmean, k=5))
sra <- cbind(sra, data)[, 1:ncol(sra)]
all <- which(sra>data, arr.ind=TRUE)
all

EDIT4: Note that rollmean has an align-argument. You apparently want to right-align (default is center).

sra <- do.call(cbind, lapply(data, rollmean, k=5, align="right"))
sra <- cbind(sra, data)[, 1:ncol(sra)]
all <- which(sra>data, arr.ind=TRUE)
all

EDIT 5: If sra is of class xts, it does not have rownames and the matrix all consequently does not either. You can just use as.matrix(sra) to get rownames again. The final line I added is just in case you want to know the names of the columns instead of the number.

sra <- do.call(cbind, lapply(data, rollmean, k=5, align="right"))
sra <- as.matrix(cbind(sra, data)[, 1:ncol(sra)])
all <- which(sra>data, arr.ind=TRUE)
table(rownames(all))
split(all[,"col"], rownames(all))
lapply(split(all[,"col"], rownames(all)), function(x) colnames(data)[x])

EDIT 6: To look at one particular date, save the final list and specify the date and then extract the date from your list. For instance:

lst <- lapply(split(all[,"col"], rownames(all)), function(x) colnames(data)[x])
dat <- "2013-07-23"
lst[dat]

edited Jul 26 '13 at 08:48

answered Jul 25 '13 at 11:13

shadow

21,823
4
63
77

thanx for the reply! I think it will work but i need 5 day simple moving average of the column.if less than data.i.e. data>sma(5)..... – Ashish Jalan Jul 25 '13 at 11:18
also how can i write this date "2013-07-21" as row name instead of all row. – Ashish Jalan Jul 25 '13 at 11:26
sra is calculating rollmean but why does it show dates from "2013-07-11", rather it should show "2013-07-13", as we are here calculating 5 days moving average. As it is calculating it perfectly but it should show under starting date"2013-07-13". – Ashish Jalan Jul 25 '13 at 11:43
also i think there should a be a bit correction in the formual all <- which(sra>data, arr.ind=TRUE). please check i think this gives the solution. – Ashish Jalan Jul 25 '13 at 11:48
i thing still i'm not getting that when we are calculating rollmean for 5 days why the data does not start from 5th day rather than 3rd day in your given formula. if it comes from 5th data the problem would become much easier to solve ( i believe). – Ashish Jalan Jul 25 '13 at 12:15
a b c d 2013-07-13 3.0 3.0 3.0 3.0 2013-07-14 4.0 4.0 4.0 4.0 2013-07-15 4.6 4.6 4.6 4.6 2013-07-16 4.8 4.8 4.8 4.8 2013-07-17 4.6 4.6 5.0 4.2 2013-07-18 4.4 4.4 4.8 3.4 2013-07-19 4.2 4.2 4.6 2.4 2013-07-20 4.4 4.4 4.4 1.6 2013-07-21 4.6 4.6 4.6 1.0 2013-07-22 4.8 4.8 4.4 1.2 2013-07-23 4.6 4.6 4.6 1.6 2013-07-24 4.6 4.6 4.0 1.8 it should be like this. – Ashish Jalan Jul 25 '13 at 12:18
now its properly working, but i don't know why table(rownames(all)) split(all, rownames(all)) are not working. – Ashish Jalan Jul 26 '13 at 03:25
Oh its working perfectly fine now. though it solves my problem, can you tell me if there is a way by which if i can see only for a perticular date as i'm new and curious to R. – Ashish Jalan Jul 26 '13 at 08:41
what if i need to count the number of columns in lst[dat], i.e. if [1] "a" "b" "c" .................., – Ashish Jalan Jul 29 '13 at 06:00
Try `length(lst[dat])` or if you want all counts `sapply(lst, length)` – shadow Jul 29 '13 at 08:04
i've tried length(lst[dat]) earlier but its not working. for the date dat<- "2013-07-26", lst[dat] will give "a" "b" "c" and hence i want 3. but length function gives me 1. – Ashish Jalan Jul 29 '13 at 11:43

Testing multiple columns in a time series simultaneously

1 Answers1