0

I am extracting the tags and their surrounding text in the HTML source code. This is the Regex pattern that I use in my code:

String a_pattern = "(.*?)<a (.*?)</a>(.*)";

I tried to use "?" to make the quantifier lazy. However, this pattern takes lots of time while compiling a long string.

Can you please give me some hints on optimizing this pattern?

I should note that I need all three groups of text (before, within, and after )

Thank you

n4z4nin
  • 11
  • 1
  • 6

1 Answers1

0

If you want only to get the content from those tags you can use this regex:

<a.*?>(.*?)<\/a>

Working demo

You can check the performance for both regex in this link:

Regex performance

As you can see, the regex you are using is 97% slower.

enter image description here

enter image description here

This result uses the javascript regex engine, but it is really useful to know regex performance.

Federico Piazza
  • 30,085
  • 15
  • 87
  • 123