5

This is my test-string:

<img rel="{objectid:498,newobject:1,fileid:338}" width="80" height="60" align="left" src="../../../../files/jpg1/Desert1.jpg" alt="" />

I want to get each of the JSON formed Elements inbetween the rel attribute. It's working for the first element (objectid).

Here is my ReqEx, which works fine:

(?<=(rel="\{objectid:))\d+(?=[,|\}])

But i want to do somthing like this, which doesn't work:

(?<=(rel="\{.*objectid:))\d+(?=[,|\}])

So i can parse every element of the search string.

I'm using Java-ReqEx

mpneuried
  • 51
  • 2

3 Answers3

2

Java (and nearly all regex flavors except .NET and JGSoft) don't support infinite repetition inside lookbehinds.

You could use capturing groups instead. Also, better use [^{]* instead of .*, and ensure word boundaries with \b.

rel="\{[^{]*\bobjectid:(\d+)

should be sufficient (then look at the capturing group 1 for the value of the attribute.

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
1

Do you want to iterate through all the key/value pairs? You don't need lookbehind for that:

String s = 
    "<img rel=\"{objectid:498,newobject:1,fileid:338}\" " +
    "width=\"80\" height=\"60\" align=\"left\" " +
    "src=\"../../../../files/jpg1/Desert1.jpg\" alt=\"\" />";
Pattern p = Pattern.compile(
    "(?:\\brel=\"\\{|\\G,)(\\w+):(\\w+)");
Matcher m = p.matcher(s);
while (m.find())
{
  System.out.printf("%s = %s%n", m.group(1), m.group(2));
}

The first time find() is called, the first part of the regex matches rel="{. On subsequent calls, the second alternative (\G,) takes over to match a comma, but only if it immediately follows the previous match. In either case it leaves you lined up for (\w+):(\w+) to match the next key/value pair, and it can never match anywhere outside the rel attribute.

I'm assuming you're applying the regex to an isolated IMG tag, as you posted it, not to a whole HTML file. Also, the regex may need a little tweaking to match your actual data. For example, you might want the more general ([^:]+):([^,}]+) instead of (\w+):(\w+).

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
0

Lookaheads and lookbehinds may not contain arbitrary regular expressions in general: Most engines (Java’s included) require that their length is well-known so you can’t use quantifiers like * in them.

Why are you using lookaheads and lookbehinds here, anyway? Just use capture groups instead, that’s much simpler.

rel="\{.*objectid:(\d+)

Now the first capture group will contain the ID.

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214