1

I have been trying to get this right. What I want to do is extract a year from a string. The string looks like this for example:

Toy Story (1995)

Or it could look like this

Twelve Monkeys (a.k.a. 12 Monkeys) (1995)

To extract the numbers, I currently use

year = gsub("(?<=\\()[^()]*(?=\\))(*SKIP)(*F)|.", "", x, perl=T)

Now, this would work in most cases, where the first one is used, but in the list the second one is also used.

[1] 1995
[2] a.k.a. 12 Monkeys1995

So obviously I do not want the string but only the year, how do I get this?

R User
  • 13
  • 1
  • 4

3 Answers3

4

We can use

library(stringr)
as.numeric(str_extract(x, "(?<=\\()[0-9]+(?=\\))"))
#[1] 1995 1995

data

x <-  c("Toy Story (1995)", "Twelve Monkeys (a.k.a. 12 Monkeys) (1995)")
akrun
  • 874,273
  • 37
  • 540
  • 662
2
stringi::stri_match_last_regex(x, "\\(([[:digit:]]+)\\)")[,2]

Escaping the parens is still a pain, but it's a far more readable regex IMO.

hrbrmstr
  • 77,368
  • 11
  • 139
  • 205
0

If the years are always located at the end of each string circled by parentheses, you could do this in base R:

as.numeric(gsub("\\(|\\)", "", substr(x, nchar(x)-5,nchar(x))))
#[1] 1995 1995

Use trimws(x) beforehand in case there might be any head or tail spaces.

989
  • 12,579
  • 5
  • 31
  • 53