0

Trying to match the tag section with a class, that contain some specific word and has id also, maybe some other tag attributes.

<section id="footer-widget-wysija-2" class="widget footer-widget widget_wysija">Some html</section>

Want to remove this section from html before render it.

Tried a lot of things, but no luck.
Any help will appreciated.
Thanks

  • 1
    You say you've tried a lot of things, so what exactly have you tried? You don't want answers you've already tried. – GAntoine Feb 19 '16 at 14:43

2 Answers2

0

as long as you dont have nested sections:

$html = preg_replace('#<section.+id="footer-widget-wysija-2".+</section>#is', '', $html);
Brad Kent
  • 4,982
  • 3
  • 22
  • 26
  • thanks for answer. Just tried and again no luck. Getting only background of page and topmenu. But anyway I want to make it a little better, because maybe cases when id will like `footer-widget-wysija-3`. Format of id should be `footer-widget-wysija-number`. – Vlad Barseghyan Feb 19 '16 at 15:25
0

The best way to operate with HTML documents is using a parser.

In these examples I will use built-in DOMDocument.

First of all, you have to init DOMDocument and load HTML string:

$dom = new DOMDocument();
libxml_use_internal_errors( True );
$dom->loadHTML( $html );
libxml_use_internal_errors( False );

I use ->loadHTML to load a string, but if your original HTML is in a file, you can directly use

$dom->loadHTMLFile( $yourFilePath ); 

To avoid annoying warnings about invalid HTML syntax,
I set libxml_use_internal_errors( True ).

Example 1: Delete all nodes with ‘section’ tag:

$nodes = $dom->getElementsByTagName( 'section' );
while( $nodes->length )
{
    $nodes->item(0)->parentNode->removeChild( $nodes->item(0) );
}

With ->getElementsByTagName( 'section' ) I get all document's nodes with tag section, then — in the while loop — I delete each node. Note that I use while instead of foreach, because (if I have two section node, i.e.) when I delete first node, second node become first, and the following foreach loop will fail. As alternative, I can use a decrementing for loop.

Example 2: Delete node by ID:

if( $node = $dom->getElementById( 'footer-widget-wysija-1' ) )
{
    $node->parentNode->removeChild( $node );
}

ID is unique by definition, so ->getElementById() return only one element: if it is found, I can delete it using ->removeChild()

Output HTML:

Finally, to output resulting HTML, you have to use

echo $dom->saveHTML();
fusion3k
  • 11,568
  • 4
  • 25
  • 47
  • I've just tried to solve this with regexp, as it should be 1-2 lines of code. Anyway, yes, your solution should work. – Vlad Barseghyan Feb 19 '16 at 15:50
  • Read [this famous answer](http://stackoverflow.com/a/1732454/3294262) about parsing html with regex. Regex is **never** a good practice to parse HTML. If in your `section` in future there are complex html (or a nested `section`) your regex will fail. – fusion3k Feb 19 '16 at 16:39