-1

I need to match sequence='999' inside a <noteinfo> tag in a xml document using Java RegEx (xml parser is not an option).

Snippet of the xml:

<xmltag sequence='11'>
  <noteinfo noteid='1fe' unid='25436AF06906885A8525840B00805DBC' sequence='3'/>
</xmltag>

I am using this: (?<=<noteinfo.*)sequence='[0-9999]'(?=/>)

I am expecting a match on this: sequence='3'

Getting error: java.util.regex.PatternSyntaxException: Look-behind group does not have an obvious maximum length

I understand the issue is with the .* in the look-behind part. Any alternatives to avoid the error?

user2287359
  • 497
  • 1
  • 5
  • 16

2 Answers2

0

never use a lookbehind if not absolutely necessary

You can reduce the length of a lookbehind with the curly braces eg. {1,255}.
Your problem is solvable without the use of a lookbehind:
static final Pattern seqpat = Pattern.compile( "<noteinfo[^>]+(?<seq>sequence\\s*=\\s*'[\\d]*')", Pattern.MULTILINE );

read through the file with:

Matcher m = seqpat.matcher( s );
while( m.find() )
  System.err.println( m.group( "seq" ) );

Pattern.MULTILINE is necessary in the case a noteinfo-line is wrapped
seqpat finds (not matches!) any line starting with <noteinfo and ending with >
the requested sequence is captured in group( "seq" )
perhaps You have to deal with spaces or newlines between sequence, = and the sequence-id '3' — therefore: \\s*=\\s*

the above Pattern finds each sequence-id (even an empy one)
to find only the '999' sequence-id, take this Pattern:
Pattern.compile( "<noteinfo[^>]+(?<seq>sequence\\s*=\\s*'999')", Pattern.MULTILINE );

Kaplan
  • 2,572
  • 13
  • 14
-1

My guess is that you might want to design an expression similar to:

(?=<noteinfo).*(sequence='[0-9]'|sequence='[1-9][0-9]{0,3}')

DEMO

Test

import java.util.regex.Matcher;
import java.util.regex.Pattern;

final String regex = "(?=<noteinfo).*(sequence='[0-9]'|sequence='[1-9][0-9]{0,3}')";
final String string = "<xmltag sequence='11'>\n"
     + "  <noteinfo noteid='1fe' unid='25436AF06906885A8525840B00805DBC' sequence='3'/>\n"
     + "</xmltag>\n"
     + "<xmltag sequence='11'>\n"
     + "  <noteinfo noteid='1fe' unid='25436AF06906885A8525840B00805DBC' sequence='9999'/>\n"
     + "</xmltag>\n"
     + "<xmltag sequence='11'>\n"
     + "  <noteinfo noteid='1fe' unid='25436AF06906885A8525840B00805DBC' sequence='10000'/>\n"
     + "</xmltag>\n"
     + "<xmltag sequence='11'>\n"
     + "  <noteinfo noteid='1fe' unid='25436AF06906885A8525840B00805DBC' sequence='-1'/>\n"
     + "</xmltag>";

final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);

while (matcher.find()) {
    System.out.println("Full match: " + matcher.group(0));
    for (int i = 1; i <= matcher.groupCount(); i++) {
        System.out.println("Group " + i + ": " + matcher.group(i));
    }
}
Emma
  • 27,428
  • 11
  • 44
  • 69