I have a data frame that I want to remove any week that contains an outlier. I will be happy if I can indicate the entire week as an outlier, as I understand how to do the subset from there. I have not been able to come up with an appropriate solution. I keep thinking that I am going to need to loop through subsets of weeks to achieve the desired goal, or create a separate function to handle the individual outlier week and use sapply. I have yet to make either of these solutions viable.
date <- seq(as.Date("2015-01-01"), length=365, by="1 day")
dow <- as.factor(weekdays(as.Date(date))
df <- data.frame(cbind(date, dow))
df$date <- as.Date(df$date,format="%m/%d/%Y",origin="01/01/1970")
df$dow <- as.factor(weekdays(as.Date(df$date)))
set.seed(1115)
df$var1 <- rnorm(365, 1912, 40795)
stdev <- sd(df$var1, na.rm=TRUE)
avg <- mean(df$var1, na.rm=TRUE)
df$LB <- avg-(2.75*stdev)
df$UB <- avg+(2.75*stdev)
df$outlier <- ifelse(df$var1<df$LB | df$var1>df$UB, 1,0)
df$weeknum <- as.numeric(format(df$date, "%U"))
head(df, 17)
> head(df, 17)
date dow var1 LB UB outlier weeknum
1 2015-01-01 Thursday -7828.412 -114675.6 120479.8 0 0
2 2015-01-02 Friday 25674.456 -114675.6 120479.8 0 0
3 2015-01-03 Saturday -33588.871 -114675.6 120479.8 0 0
4 2015-01-04 Sunday -54418.175 -114675.6 120479.8 0 1
5 2015-01-05 Monday -10002.002 -114675.6 120479.8 0 1
6 2015-01-06 Tuesday 34050.390 -114675.6 120479.8 0 1
7 2015-01-07 Wednesday -37584.648 -114675.6 120479.8 0 1
8 2015-01-08 Thursday 84048.878 -114675.6 120479.8 0 1
9 2015-01-09 Friday -24801.346 -114675.6 120479.8 0 1
10 2015-01-10 Saturday 33974.637 -114675.6 120479.8 0 1
11 2015-01-11 Sunday 77432.088 -114675.6 120479.8 0 2
12 2015-01-12 Monday 128196.236 -114675.6 120479.8 1 2
13 2015-01-13 Tuesday 9740.418 -114675.6 120479.8 0 2
14 2015-01-14 Wednesday 26539.887 -114675.6 120479.8 0 2
15 2015-01-15 Thursday 12172.834 -114675.6 120479.8 0 2
16 2015-01-16 Friday 1032.544 -114675.6 120479.8 0 2
17 2015-01-17 Saturday 76870.095 -114675.6 120479.8 0 2
In the above example, the desired output would be a 1 the outlier column in each row that corresponds with weeknum = 2.