Regexp to match tag that contain specific word in class and has specific id

Question

Trying to match the tag section with a class, that contain some specific word and has id also, maybe some other tag attributes.

<section id="footer-widget-wysija-2" class="widget footer-widget widget_wysija">Some html</section>

Want to remove this section from html before render it.

Tried a lot of things, but no luck.
Any help will appreciated.
Thanks

You say you've tried a lot of things, so what exactly have you tried? You don't want answers you've already tried. — GAntoine, Feb 19 '16 at 14:43

score 0 · Answer 1 · answered Feb 19 '16 at 15:06

0

as long as you dont have nested sections:

$html = preg_replace('#<section.+id="footer-widget-wysija-2".+</section>#is', '', $html);

answered Feb 19 '16 at 15:06

Brad Kent

4,982
3
22
26

thanks for answer. Just tried and again no luck. Getting only background of page and topmenu. But anyway I want to make it a little better, because maybe cases when id will like `footer-widget-wysija-3`. Format of id should be `footer-widget-wysija-number`. – Vlad Barseghyan Feb 19 '16 at 15:25

score 0 · Accepted Answer · answered Feb 19 '16 at 15:12

The best way to operate with HTML documents is using a parser.

In these examples I will use built-in DOMDocument.

First of all, you have to init DOMDocument and load HTML string:

$dom = new DOMDocument();
libxml_use_internal_errors( True );
$dom->loadHTML( $html );
libxml_use_internal_errors( False );

I use ->loadHTML to load a string, but if your original HTML is in a file, you can directly use

$dom->loadHTMLFile( $yourFilePath );

To avoid annoying warnings about invalid HTML syntax,
I set libxml_use_internal_errors( True ).

Example 1: Delete all nodes with ‘section’ tag:

$nodes = $dom->getElementsByTagName( 'section' );
while( $nodes->length )
{
    $nodes->item(0)->parentNode->removeChild( $nodes->item(0) );
}

With ->getElementsByTagName( 'section' ) I get all document's nodes with tag section, then — in the while loop — I delete each node. Note that I use while instead of foreach, because (if I have two section node, i.e.) when I delete first node, second node become first, and the following foreach loop will fail. As alternative, I can use a decrementing for loop.

Example 2: Delete node by ID:

if( $node = $dom->getElementById( 'footer-widget-wysija-1' ) )
{
    $node->parentNode->removeChild( $node );
}

ID is unique by definition, so ->getElementById() return only one element: if it is found, I can delete it using ->removeChild()

Output HTML:

Finally, to output resulting HTML, you have to use

echo $dom->saveHTML();

I've just tried to solve this with regexp, as it should be 1-2 lines of code. Anyway, yes, your solution should work. — Vlad Barseghyan, Feb 19 '16 at 15:50
Read [this famous answer](http://stackoverflow.com/a/1732454/3294262) about parsing html with regex. Regex is **never** a good practice to parse HTML. If in your `section` in future there are complex html (or a nested `section`) your regex will fail. — fusion3k, Feb 19 '16 at 16:39

Regexp to match tag that contain specific word in class and has specific id

2 Answers2

Example 1: Delete all nodes with ‘section’ tag:

Example 2: Delete node by ID:

Output HTML: