0

Consider a long html string: I need to have the whole content without certain tags sections.

Example:

Consider the following string:

;decreasing'>1</a>&nbsp;<span class='active'>2</span><a href='&#2;F;search&

I need to select all but the span section, thus removing the following

<span class='active'>2</span>

and ending up with only the following

;decreasing'>1</a>&nbsp;<a href='&#2;F;search&

I tried the following with negative look behind selections in regex101.com but no luck.

^(?!=(<span class='active'>(.*?)<\/span>)).*$

[Additional Info]

If I could combine the two following selections it would solve the problem:

1.Selects everything up to the span tag

.*?(?=<span)
  1. Selects everything from the closing span tag onward:
(?<=span>).*

Thanks your help in advance.

polarized
  • 23
  • 1
  • 4

1 Answers1

0

If your coding language permits you to do a regex split or replace call, you can use this pattern:

~<(span).*?(/\1)>~ or expand your tag list like this: ~<(span|div).*?(/\1)>~

Demo Link

There are risks involved in processing html with regex patterns, but they may or may not come into play depending on the structure of your html.

mickmackusa
  • 43,625
  • 12
  • 83
  • 136
  • If my method is not suitable for any reason, please leave me a comment and consider improving your question. – mickmackusa Jul 04 '17 at 12:11
  • Hi mickmackusa, thanks for the suggestion but if I am not wrong your regex selects the text I need to eliminate. What I need to do is select all the remaining text with exception of that. – polarized Jul 04 '17 at 12:49
  • @polarized That's why I am suggesting that you use this with a split or replace function. What language are you using? – mickmackusa Jul 04 '17 at 12:50
  • Hi, I need to use it with the screen-scraper program to create a sub-extractor pattern. As the position of the tag section constantly changes across the multiple pages being scraped, I need to ignore it and consider only the rest of the string. Unfortunately I cannot use a replace function, only RegEx. I managed to do it once but really cannot remember the solution. :-( – polarized Jul 04 '17 at 12:58
  • @polarized What is the name of the screen-scraper program? – mickmackusa Jul 04 '17 at 13:00