Here are some approaches.
1) sqldf In SQL one can do a join on complex conditions. The syntax using between
matches if the year
is greater than or equal to the lower bound and less than or equal to the upper bound. For a particular year the left join will result in NA being used if there is no match (although such situation does not arise in the example in the question).
library(sqldf)
sqldf("select year, century_name from x
left join century_bands on year between min_year and max_year")
## year century_name
## 1 1730 18th
## 2 1860 19th
## 3 1941 20th
## 4 2011 21st
2) findInterval This approach uses only base R. For each component of its first argument findInterval
returns the number of values in its second argument which are less or equal to it. The second argument is assumed to be sorted in ascending order. The number returned by findInterval
can be used to index into century_name
. findInterval
tends to be quite efficient.
transform(x, year_name =
with(century_bands, century_name[findInterval(year, min_year)]))
## year year_name
## 1 1730 18th
## 2 1860 19th
## 3 1941 20th
## 4 2011 21st
Although this does not occur in the question if it were possible that year
were outside all bands then we could extend this without changing the code by adding additional rows to century_bands
associated with a century_name
of NA or else we could extend findInterval
liek this:
FindInterval <- function(x, vec, upper) {
ifelse(x < vec[1] | x > upper, NA, findInterval(x, vec))
}
transform(x, year_name =
with(century_bands, century_name[FindInterval(year, min_year, max(max_year))]))
We could replace transform
with mutate
if using dplyr anyways; otherwise, using transform
eliminates that dependency.
3) sapply Another base solution is
Match <- function(x) with(century_bands, century_name[x >= min_year & x <= max_year])
transform(x, century_name = sapply(year, Match))
## year century_name
## 1 1730 18th
## 2 1860 19th
## 3 1941 20th
## 4 2011 21st
This should be sufficient if all years are within the bands. This is the case in the example in the question but if this cannot be guaranteed then extend Match
like this:
Match <- function(x) {
Name <- with(century_bands, century_name[x >= min_year & x <= max_year])
if (length(Name)) Name else NA
}
4) cut This base solution is similar to findInterval
but it returns NA if year
is not within any of the bands.
transform(x, year_name = with(century_bands, century_name[
cut(year, c(min_year, max(max_year)), label = FALSE, include.lowest = TRUE)
]))
5) car::recode This function allows recoding of values as follows.
library(car)
recodes <-
"1700:1799='18th'; 1800:1899='19th'; 1900:1999='20th'; 2000:2099='21st'; else=NA"
transform(x, year_name = recode(year, recodes))
## year year_name
## 1 1730 18th
## 2 1860 19th
## 3 1941 20th
## 4 2011 21st
To avoid hard coding the recodes
string it could be derived from century_bands
like this
recodes <- with(century_bands,
paste(sprintf("%d:%d='%s'", min_year, max_year, century_name), collapse = ";")
)
recodes <- paste0(recodes, "; else=NA")
6) expand bands We could expand the bands into individual years in which case we can simply perform a match. years which do not match any band result in an NA in the century_name
.
century_bands2 <- with(century_bands,
stack(setNames(Map(seq, min_year, max_year), century_name)))
transform(x, century_name = with(century_bands2, ind[match(year, values)]))
## year century_name
## 1 1730 18th
## 2 1860 19th
## 3 1941 20th
## 4 2011 21st
6a) A tidyverse variation which is largely along the lines of this would be:
library(dplyr)
library(purrr)
library(tibble)
library(tidyr)
century_bands2 <- century_bands %>%
{ set_names(map2(.$min_year, .$max_year, seq), .$century_name) %>%
as_tibble %>%
pivot_longer(everything(), names_to = "century_name", values_to = "year")
}
x %>% left_join(century_bands2, by = "year")
## year century_name
## 1 1730 18th
## 2 1860 19th
## 3 1941 20th
## 4 2011 21st
7) case_when. We could hard code the band definitions into a case_when
:
library(dplyr)
x %>% mutate(century_name = case_when(
year %in% 1700:1799 ~ "18th",
year %in% 1800:1899 ~ "19th",
year %in% 1900:1999 ~ "20th",
year %in% 2000:2099 ~ "21st",
TRUE ~ NA_character_))
## year century_name
## 1 1730 18th
## 2 1860 19th
## 3 1941 20th
## 4 2011 21st
7a) Another way to express this with case_when
is:
x %>% mutate(century_name = case_when(
year < 1700 ~ NA_character_,
year < 1800 ~ "18th",
year < 1900 ~ "19th",
year < 2000 ~ "20th",
year < 2100 ~ "21st",
TRUE ~ NA_character_))
## year century_name
## 1 1730 18th
## 2 1860 19th
## 3 1941 20th
## 4 2011 21st