0

I want to create a regular expression to receive:

<p class="MyClass">
   <p> something 1 </p>
   <p> something 2 </p>
   <span>         <span>  // or more html tag here
   something
</p>
something's here, not in any tag!

from:

<p class="MyClass">
   <p> something 1 </p>
   <p> something 2 </p>
   <span>         <span>  // or more html tag here
   something
</p>
something's here, not in any tag!

<p class="MyClass">
   <p> another thing 1</p>
   <p> another thing 2</p>
   <p> another thing 3</p>
   another thing
</p>
...

I think I will use a regex to match everything between <p class="MyClass"> and the next one. So the regex is /(<p class="MyClass">[\s\S]*)<p class="MyClass">/, work correctly in this case. But it doesn't work when I want to get a notification of this page http://daotao.dut.udn.vn/sv/G_Thongbao_LopHP.aspx. The DOM is so strange ?!

Sorry for my bad English.

Thengocphan
  • 18
  • 1
  • 5

1 Answers1

1

regex should be

(<p class="MyClass">[\s\S]*?)(?=<p class="MyClass">|$)
  • [\s\S]*? : *? is a lazy quantifier so that it matches the shortest the default is greedy (matches the largest).
  • (?=<p class="MyClass">|$): lookhead so that it does not belongs to the match, and |$ to get also the last match
Nahuel Fouilleul
  • 18,726
  • 2
  • 31
  • 36