I have a data frame with 1,000,000 rows. I would like to calculate mean and variance of Tor
overtime for each SID
to see if I can predict when Tor
is starting to go out of limits. The Low limit is 0.4 and the high limit is 0.7. Below is a small example of my data.
dat <- structure(list(timestamp = c("29-06-2021-06:00", "29-06-2021-06:01",
"29-06-2021-06:02", "29-06-2021-06:03", "29-06-2021-06:04", "29-06-2021-06:05",
"29-06-2021-06:06", "29-06-2021-06:07", "29-06-2021-06:08", "29-06-2021-06:09",
"29-06-2021-06:10", "29-06-2021-06:11", "29-06-2021-06:12", "29-06-2021-06:13",
"29-06-2021-06:14", "29-06-2021-06:15", "29-06-2021-06:16", "29-06-2021-06:17",
"29-06-2021-06:18", "29-06-2021-06:19", "29-06-2021-06:20", "29-06-2021-06:21",
"29-06-2021-06:22", "29-06-2021-06:23", "29-06-2021-06:24", "29-06-2021-06:25",
"29-06-2021-06:26"), SID = c(301L, 351L, 304L, 357L, 358L, 302L,
303L, 309L, 356L, 304L, 308L, 351L, 304L, 357L, 358L, 302L, 303L,
352L, 307L, 353L, 304L, 308L, 352L, 307L, 304L, 354L, 356L),
Tor = c(0.70161919, 0.639416295, 0.288282073, 0.932362166,
0.368616626, 0.42175565, 0.409735918, 0.942170196, 0.381396521,
0.818102394, 0.659391671, 0.246387978, 0.196001777, 0.632630259,
0.66618385, 0.440625167, 0.639759498, 0.050001835, 0.775660271,
0.762934189, 0.516830196, 0.244674975, 0.38620466, 0.970792903,
0.752674581, 0.190366737, 0.56596405), Lowt = c(0L, 0L, 1L,
0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 1L,
0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L), Hit = c(1L, 0L, 0L,
1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 0L)), class = "data.frame", row.names = c(NA,
-27L))
head(dat)
# timestamp SID Tor Lowt Hit
#1 29-06-2021-06:00 301 0.7016192 0 1
#2 29-06-2021-06:01 351 0.6394163 0 0
#3 29-06-2021-06:02 304 0.2882821 1 0
#4 29-06-2021-06:03 357 0.9323622 0 1
#5 29-06-2021-06:04 358 0.3686166 1 0
#6 29-06-2021-06:05 302 0.4217556 0 0
Timestamp
is when sample is recordedSID
is the ID of the part taking the reading. These values can be 301 - 310 and 351 to 360Tor
is the actual reading, and its data type is<dbl>
.Lowt
is a binary variable showing that theTor
reading is below the lower limit.Hit
is a binary variable showing that theTor
reading is below the upper limit.
I have read up about variance but I can't seem to get my head around it. Any help would be great.