1

I have a regular expression that is correctly parsed by grepl but generates an error when used as pattern of str_extract_all.

I'm using stringr v1.0.0, R v3.2.3 under OSX.

In this question a regex passed to a stringr generates a similar error, yet the solution proposed doesn't apply in my case.

require(stringr)

string <- "Decreto Legislativo 6 marzo 1992, n. 248; G.U. n. 77 del 1° aprile 1992"

it_months <- c("gennaio","febbraio","marzo","aprile","maggio","giugno","luglio",
               "agosto","settembre","ottobre","novembre","dicembre")
grep_it_date <- paste0("\\d{1-2}(º?) (", paste(it_months, collapse="|") ,") \\d{4}$")

grepl(grep_it_date, string)
# [1] TRUE

dates_from_string <- str_extract_all(tolower(string), grep_it_date, simplify = TRUE)
# Error in stri_extract_all_regex(string, pattern, simplify = simplify,  : 
#                                   Error in {min,max} interval. (U_REGEX_BAD_INTERVAL)
Community
  • 1
  • 1
CptNemo
  • 6,455
  • 16
  • 58
  • 107

2 Answers2

3

You need to change \\d{1-2} to \\d{1,2} as told in the error, since the interval separator is , and not -

Thomas Ayoub
  • 29,063
  • 15
  • 95
  • 142
0

If we need to extract 6 marzo 1992 and 1° aprile 1992 from the 'string'

 grep_it_date <- paste0("[0-9]{1,2}([^0-9 ]?)\\s+(", paste(it_months, collapse="|") ,")\\s+\\d{4}")
 str_extract_all(tolower(string), grep_it_date)[[1]]
 #[1] "6 marzo 1992"   "1° aprile 1992"
akrun
  • 874,273
  • 37
  • 540
  • 662