Given the following text:
<p style="color: blue">Some text</p>
<p style="color:blue; margin-left: 10px">* Item 1</p> // Should match
<p style="margin-left: 10px">* Item 2</p>
<p style="margin-left: 20px">* Sub Item 1a</p> // Should match
<p style="margin-left: 20px">* Sub Item 2a</p>
<p style="margin-left: 10px">* Item 3</p>
<p style="margin-left: 20px">* Sub Item 1b</p> // Should match
<p style="margin-left: 20px">* Sub Item 2b</p>
<p style="margin-left: 30px">* Sub Item 1c</p> // Should match
<p>Some text</p>
<p style="color:blue; margin-left: 10px">* Item 1</p> // Should match
I am trying to find any p
elements which match the following criteria:
- The begin with an asterisk character
- They have a
margin-left
inline style - The preceding content is either:
- A
p
element which has no margin-left - A
p
element with amargin-left
which is lower than the matched element - Any other element
- A
So in the example, I need to match the following elements:
<p style="color:blue; margin-left: 10px">* Item 1</p> (preceding element is a p but doesn't have any margin-left)
<p style="margin-left: 20px">* Sub Item 1a</p> (preceding element is a p but has a different margin-left value)
<p style="margin-left: 20px">* Sub Item 1b</p> (preceding element is a p but has a different margin-left value)
<p style="margin-left: 30px">* Sub Item 1c</p> (preceding element is a p but has a margin-left value lower than the current matched element)
<p style="color:blue; margin-left: 10px">* Item 1</p> (preceding element is a p but has no margin-left value)
I cannot use DomDocument
because the markup I receive is not always valid markup (generally comes from a Microsoft Office > HTML conversion), so I am using regular expressions to solve the problem.
My current regex is:
(?!<p style=".*?(margin-left:\s?(?!\k'margin')px;).*?">\* .*?<\/p>)<p style="(?P<styles>.*?)margin-left:\s?(?P<margin>[0-9]{1,3})px;?">\* (?P<listcontent>.*)<\/p>
But this only matches based on the existing of preceding elements being a p
with a margin-left
.
How can I factor in the matched margin-left
group and return values which are greater than the previous match?
I have created an online regex to demonstrate the problem, with sample data and my current output.