1

I am getting "Error: unclosed character class" in a Rust regex. Testing the regex using an online Regex tester with PCRE compliant regexes works fine, but using the regex crate on the Rust Playground gives an error.

The character class must include a minus sign. I tried putting the minus sign in first position, last position and leaving it out altogether, but always get an error.

For most of the expected inputs, the source string will be "op(number)" for some op and some non-negative integer. For a few, I expect "op(number/number/number)".

If there is a superior way to extract the named captures, I am all ears.

use lazy_static::lazy_static;
use regex::Regex;

fn main() {
    lazy_static! {
        static ref FANCY_OPCODE_RE: Regex = Regex::new(r"(?x)
            ^                              # Match start of string
            (?P<opname>[-a-zA-Z#+]+)       # Match abbreviated name of OpCode as 'opname'
            \(                             # Open parentheses
            (?P<arg1>[0-9]+)               # Match first number as 'arg1'
            (/                             # Delimiter
            (?P<arg2>[0-9]+)               # Optionally match second number as 'arg2'
            /                              # Delimiter
            (?P<arg3>[0-9]+))?             # Optionally match third number as 'arg3'
            \)                             # Closing parenthesis
            $                              # Match end of string
        ").unwrap();
    }
    let s = "+loop(3)";
    let opname: String; 
    let arg1: String;
    let arg2: String;
    let arg3: String;
    match FANCY_OPCODE_RE.captures(s) {
        Some(cap) => { 
            opname = format!("{:?}", cap.name("opname")); 
            arg1 = format!("{:?}", cap.name("arg1"));
            arg2 = format!("{:?}", cap.name("arg2"));
            arg3 = format!("{:?}", cap.name("arg3"));
        },
        None => { 
            opname = "No match".to_string(); 
            arg1 = String::new();
            arg2 = String::new();
            arg3 = String::new();
        }
    }

    println!("opname = {}, arg1 = {}, arg2 = {}, arg3 = {}", opname, arg1, arg2, arg3);
}

Here is the error message:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Syntax(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
regex parse error:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 1: (?x)
 2:             ^                              # Match start of string
 3:             (?P<opname>[-a-zA-Z#+]+)       # Match abbreviated name of OpCode as 'opname'
                           ^^
 4:             \(                             # Open parentheses
 5:             (?P<arg1>[0-9]+)               # Match first number as 'arg1'
 6:             (/                             # Delimiter
 7:             (?P<arg2>[0-9]+)               # Optionally match second number as 'arg2'
 8:             /                              # Delimiter
 9:             (?P<arg3>[0-9]+))?             # Optionally match third number as 'arg3'
10:             \)                             # Closing parenthesis
11:             $                              # Match end of string
12:         
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
error: unclosed character class
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
)', src/main.rs:17:12
Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Paul Chernoch
  • 5,275
  • 3
  • 52
  • 73
  • 1
    *with PCRE-compliant regexes*. The regex crate doesn't adhere to PCRE syntax, so this doesn't really matter. From the second sentence of the crate's README (emphasis mine): *syntax is **similar** to Perl-style regular expressions, **but lacks a few features like*** – Shepmaster Jul 02 '21 at 17:04
  • 1
    As a totally unrelated comment, the positioning of the `^^` in the error message looks off. I think there should be only one `^` and it should be under the opening `[`. Bugs... bugs... bugs everywhere! – BurntSushi5 Jul 02 '21 at 22:53
  • 1
    Filed a bug: https://github.com/rust-lang/regex/issues/792 – BurntSushi5 Jul 02 '21 at 22:55
  • @Shepmaster - I am well aware that Rust’s Regexes lack some PCRE features, but I was confident that I was not employing any of those features. However, the only online Rust Regex tester I could find lacks features I needed to debug my Regex (related to named captures) – Paul Chernoch Jul 05 '21 at 18:25

1 Answers1

4

When debugging a problem, it's useful to create a minimal, reproducible example. By deleting parts of your regex that don't cause the problem, you can quickly reduce to:

Regex::new(r"(?x)(?P<opname>[-a-zA-Z#+]+)").unwrap();

The problem is that you have included the comment character # inside your regex. Escape it:

[-a-zA-Z\#+]
Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366