This is really a duplicate of Difference between `%in%` and `==`, since you're trying to use equality for a set-membership operation, even if you aren't (yet) trying %in%
. (Unless I've completely misinterpreted your question.)
Basic equality of vectors vec1
and vec2
in R work in a few ways:
if vec2
(or vec1
) is length 1, then each of vec1
is compared against it, as in vec1[1] == vec2[1]
, vec1[2] == vec2
, as in
1:10 == 3
# [1] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
if length(vec1) == length(vec2)
, then we're happen the comparison is element-wise:
1:10 == c(1, 2, 3, 99, 99, 6, 7, 99, 99, 99)
# [1] TRUE TRUE TRUE FALSE FALSE TRUE TRUE FALSE FALSE FALSE
if length(vec1)
length is an even multiple of length(vec2)
, then R silently recycles, and this is where of the confusion and problems occur. This means that
1:10 == c(3, 2)
# [1] FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
### which is effectively
1:10 == c(3, 2, 3, 2, 3, 2, 3, 2, 3, 2)
# [1] FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
This seems right so far, this is by chance here. Ultimately, when we type 1:10 == c(2, 3)
, we're ultimately saying the 1st, 3rd, 5th, ... elements of vec1
are 2
, and the 2nd, 4th, 6th, ... elements of vec1
are 3
. Typically that's not what is intended, usually meaning set-membership instead. If it were doing set-membership, then we would expect that reversing the numbers in vec2
would have no effect ... but that's not true.
1:10 == c(2, 3)
# [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
### which is effectively
1:10 == c(2, 3, 2, 3, 2, 3, 2, 3, 2, 3)
# [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
if length(vec1)
is not an even multiple of length(vec2)
, close to the above still occurs, but at least we see a warning:
1:10 == c(3, 2, 1)
# Warning in 1:10 == c(3, 2, 1) :
# longer object length is not a multiple of shorter object length
# [1] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
### which is effectively
1:10 == c(3, 2, 1, 3, 2, 1, 3, 2, 1, 3) # uneven recycling
# [1] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
To sum up vector ==
operations, it is intended (and safe!) to compare vectors of the same length or when one of the vectors is length 1. While any other condition might not warn or error, the results are often not what is intended.
When you want to know which of vec1
are contained within vec2
, then we need the %in%
operator:
1:10 %in% c(2, 3)
# [1] FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
### order in vec2 is not important
1:10 %in% c(3, 2)
# [1] FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
This is effectively saying for each element in vec1
, is that element ==
to any of the elements in vec2
, which is effectively our first bullet above: the element is length 1, and vec2
is 1 or more. Bad pseudo-code loops demonstrating this:
for (el in vec1) # el is length 1
if (any(el == vec2)) # this works as intended per bullet 1 above
then true
else false
done
If your excluded_years
is truly an integer
vector, as in
excluded_years <- c(1957, 1960:1970, 1987)
excluded_years
# [1] 1957 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1987
(Technically, this vector is numeric
, not integer
, but we'll ignore that distinction for now.)
Then we can simply filter on it:
library(dplyr)
filter(mtcars, ! cyl %in% c(4, 8))
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
# Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
# Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
# Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
# Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
# Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
and see that the data no longer contains the cyl
values (which include 4, 6, and 8 only). With this, you could replace your function with one of:
remove_years <- function(daily_mean_Q, excluded_years) {
daily_mean_Q %>%
mutate(Year = as.integer(stringr::str_sub(Date, 1, 4))) %>%
filter(! Year %in% excluded_years) %>%
select(-Year)
}
remove_years <- function(daily_mean_Q, excluded_years) {
daily_mean_Q %>%
filter(! as.integer(stringr::str_sub(Date, 1, 4)) %in% excluded_years)
}
However, if your excluded_years
is a string, as shiny
fields tend to return, then we have a few options to convert this:
we might be tempted to structure it like R language and then eval it ... this works, but opens your app up to "injection" security problems:
excluded_years <- "1957, 1960:1970, 1987"
eval(parse(text = paste("c(", excluded_years, ")")))
# [1] 1957 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1987
### PROBLEM
excluded_years <- "1957, 1960:1970); message('gotcha'); c("
eval(parse(text = paste("c(", excluded_years, ")")))
# gotcha
# NULL
we should likely bake a home-grown function to split and split again, ensuring that the users know the rules
excluded_years <- "1957, 1960:1970, 1987"
strsplit(excluded_years, "[, ]+")
# [[1]]
# [1] "1957" "1960:1970" "1987"
unlist(lapply(strsplit(excluded_years, "[, ]+")[[1]],
function(a) {
a <- strsplit(a, "[: ]+")[[1]]
if (length(a) == 1) return(as.integer(a))
if (length(a) == 2) return(seq(a[1], a[2]))
stop("unrecognized sequence");
}))
# [1] 1957 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1987