I've been trying to mutate a dataset with a user-defined function that includes calls to str_locate
and str_sub
. The aim is to locate then extract the first digit within a sequence of 3 digits amongst strings, then add this digit (as a character
) to a new column called Hundreds.
For example:
- Given string '821': the string '8' is added to
Hundreds
. - Given string 'Af823.22', the string '8' is added to
Hundreds
.
Here is my function:
get_hundred <- function(s) {
match_pos <- str_locate(s, "[0-9]{3}")
return(str_sub(s, match_pos[1], match_pos[1]))
The first 20 rows of my data look like this:
df1 <- structure(list(call.number = c("372.35044 L4383", "344.049 C235",
"344.410415 DIM", "346.944043 NEI", "808.0667 B2616", "363.6909945 CAST",
"ABS 2015.0", "371.38 MACK", "372.1102 PRAW", "A823.3 WRIG/T",
"havmf test", "[DENTISTRY] CROW", "[DENTISTRY] JAWS", "[DENTISTRY] LOWE",
"[DENTISTRY] MOLA", "[DENTISTRY] SERI", "[DENTISTRY] SKUL", "[DENTISTRY] TEET",
"[HEALTH]ANKL", "[HEALTH]FOOT"), num.items = c(1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2)), row.names = c(NA,
-20L), class = c("tbl_df", "tbl", "data.frame"))
Filtering the data
In fact I'm only looking for particular forms of string within a large list of call.number
s. I believe the below str_detect
is detecting the forms of string I want.
df2 <- df1 %>%
filter(str_detect(call.number, "^[A-Z]?[A-Z|a-z]?[0-9]{3}.*"))
What am I doing wrong?
Now I do this:
df2 %>%
mutate(Hundreds = get_hundred(call.number))
Doing this however puts an 'A' in the Hundreds
column for row 9, where I expect to see an '8'. Yet, if I call get_hundred
on "A823.3 WRIG/T" (the "equivalent string") the function does return an '8'.
get_hundred("A823.3 WRIG/T")
What is it I'm not understanding here?