0

I have a dataframe:

table <- data.frame(timestamp = c(1,1,1,3,4,6), category= c("one", "two",
          "one", "one", "one", "one"))

I would like to print out a rolling average of how many unique categories there are with a window of 3 from 1 to 10.

I figured zoo and rollapply might do the trick, but not so easy... I was thinking something like this

data.zoo<- zoo(table$category, table$timestamp)
rollapply(data.zoo, 3, function(x) length(unique(x["category"])),
            by.column = FALSE,   align = "left")

How could I get the rolling values which would give me the number of unique categories within that timeframe? Thanks!

Update: Here's a more realistic sample.

data <- data.frame(timestamps = c(1346429301, 1343935647, 1343304074, 1340206043,  
1337597294, 1337416388, 1336990994, 1232115485, 1315389473, 1211613521, 1211613504, 
1211613457, 1211613444, 1211613422, 1211613406, 1211613393, 1211613373, 
1211613360, 1211613241, 1199875788, 1199706375, 1143890762, 1320996636, 
1320956547, 1320649756, 1320592969, 1320591789, 1320588556, 1320400058, 
1320399855, 1320399821, 1320399477, 1320342831, 1320341877, 1320314749, 
1320314579, 1320312309, 1320312211, 1319621394, 1319621260, 1319619285, 
1322403580, 1322230068, 1341092455, 1438358681, 1282068591, 1282068581, 
1175778515, 1191177180, 1191176811, 1191176666, 1191176399, 1191176371, 
1191176265, 1191176203, 1191176111, 1191176086, 1191176007, 1191175963, 
1191175858, 1191175740, 1191175260, 1191175082, 1191174006, 1191173957, 
1191173684, 1191173560, 1191173443, 1356995639, 1208845102, 1451824878, 
1451032348, 1446370725, 1440909807, 1439615035, 1437893303, 1434297250, 
1432450870, 1424677011, 1423417238, 1422110879, 1420222870, 1236413141, 
1232212455, 1281281933, 1281281776, 1281281703, 1281281609, 1259508927, 
1259508842, 1259508558, 1259508530, 1259508351, 1259508279, 1259508256, 
1259508208, 1259508171, 1259508108, 1259507703, 1259507657, 1259507145, 
1259506397, 1259506298, 1259506268), categories = structure(c(9L, 9L, 9L,
        9L, 9L, 9L, 9L, 10L, 11L, 11L, 11L, 11L, 
         11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 1L, 1L, 1L, 
         1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
         2L, 2L, 3L, 4L, 5L, 5L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
         1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 4L, 5L, 5L, 5L, 
         5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 7L, 8L, 8L, 8L, 8L, 8L, 
         8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L), .Label 
         =  c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K"), class = "factor")
puslet88
  • 1,288
  • 15
  • 25
  • Depending on precisely what you want maybe: `rollapply(table$category, 3, function(x) length(unique(x)), align = "left", partial = TRUE)` – G. Grothendieck Jan 13 '16 at 17:09
  • It seems to be quite close, however it doesn't address the timestamp by name? Does rollapply choose the column titled "timestamp" by default? – puslet88 Jan 13 '16 at 17:23
  • You can alternately apply it to your zoo object. – G. Grothendieck Jan 13 '16 at 17:34
  • I get this: `Error in seq.default(start.at, NROW(data), by = by) : wrong sign in 'by' argument` if I try `rollapply(data.zoo$user, 3, function(x) length(unique(x)), align = "left", partial = TRUE) ` – puslet88 Jan 13 '16 at 17:35
  • Oh, I was stupid, `rollapply(data.zoo$category, 3, function(x) length(unique(x)), align = "left", partial = TRUE) ` works of course. However, it does not take the time variable as time then, but just considers the sequence of items. – puslet88 Jan 13 '16 at 18:47
  • data.zoo does not have a category column. Also you can't have a zoo object with multiple equal time stamps. – G. Grothendieck Jan 13 '16 at 21:36
  • any idea, what object/method I could use to have both those features - category column and multiple equal time stamps? – puslet88 Jan 14 '16 at 09:11

0 Answers0