I'm trying to recover two positions using java regex
The first one is given by the regex:
val r="""(?=(?<=[ ]|^)[^ ]{1,21474836}(?=[ ]|$)(?<=[^A-Z]|^)[A-Z]{1,21474836}(?=[^A-Z]|$))"""
The second one is given by the regex
val p="""(?<=(?<=[ ]|^)[^ ]{1,21474836}(?=[ ]|$)(?<=[^A-Z]|^)[A-Z]{1,21474836}(?=[^A-Z]|$))"""
Note that the two expressions are identical, except the first "=" is replaced by an "<=" in the second expression. I am not using neste quantifiers here.
My command to test it is the following:
r.findAllMatchIn("a <b/>"*100) //.... some long string of size 600...
p.findAllMatchIn("a <b/>"*100) //.... some long string of size 600...
The first example is almost instant during execution, whereas the second takes dozens of seconds. If I launch the same examples in a REPL, both are very fast.
Where does that come from? How can I make the second expression faster?
Update: Why this matters
Note that in general, I can have expressions of the type:
[^ ]+[^.]+
and I would like to know when this regular expression can be found on the left of a given position, or when it can end. If I have the following data with the position below it:
abc145A
0123456
I would like the end of the previous expression to match position 1,2,3,4,5 and 6. If I use non-greedy repeating jokers, then it will match 1,3 and 5. If I use greedy operators, it matches only 6. This is why I need look-behind assertions. Or you will find me a way to define operators to find the positions I am looking for.