1

I know there are several similar questions already asked. But can't fix this issue with regex.

In any post where I have header like

<h1><a href="#hello">link text</a>Title with header tag </h1>
<h2><a href="http://so.com">link text</a>Title with header tag</h2>

I tried to remove anchor tag from header tag with anchor tag link and text. but not header tag title.

here is my regex which removes my title text also.

(<h[1-2].*?>)<a.*?>

And

(<h([1-6])[^>]*>)\s?<a>(.*)?<\/a>\s?(<\/h\2>)

Here is URL

My final result will be like.

<h1>Title with header tag </h1>
<h2>Title with header tag</h2>
Дтдця
  • 344
  • 1
  • 3
  • 14
  • 3
    Haven't posted [this link](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) for a while... (*hint: don't attempt to parse HTML with RegEx*) – CD001 Sep 08 '17 at 15:48
  • @CD001 I haven't clicked the link yet, although I know what it is, and already upvoted it... – Right leg Sep 08 '17 at 15:59
  • Are you trying to remove the anchor tag only on specific header tags? just `

    ` and `

    `?

    – Aaron K. Sep 08 '17 at 16:01
  • @AaronK. Yes I would like to remove anchor tag from

    or

    tag

    – Дтдця Sep 08 '17 at 16:54
  • My final result will remove the anchor tag with the link. like `

    Title with header tag

    `
    – Дтдця Sep 08 '17 at 16:59

2 Answers2

1

The DOM way sets up a DOMDocument object for your string and uses an xpath object for your links. These will be removed afterwards.

<?php

$html = <<<DATA
<body>
    <h1><a href="#hello">link text</a>Title with header tag </h1>
    <h2><a href="http://so.com">link text</a>Title with header tag</h2>
</body>
DATA;

$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

$xpath = new DomXPath($dom);

$links = $xpath->query("//a[parent::h1|parent::h2]");
foreach ($links as $link) {
    $link->parentNode->removeChild($link);
}

echo $dom->saveHTML();

?>

Don't use regular expressions for everything.

Jan
  • 42,290
  • 8
  • 54
  • 79
0

Here is a regex that will get you that particular "hello":

(?<=<h[12]><a href="#)[^"]*

    <h[12]><a                  Search for a <a> tag inside of a <h1> or a <h2>...
              href="#          ... that has a href attribute starting with a hash
(?<=                 )         If a string matches that...
                      [^"]*    ... then take all that follows until the closing quotes

In order to delete the anchor, the following regex will give you all the whole href attribute:

(?<=<h[12]><a) *href="#.*"    

    <h[12]><a                 Search a <a> tag inside of a header tag
(?<=         )                If a string matches that...
               *              ... take all the spaces...
                href="#       ... then the href attribute, the opening quotes, and the hash...
                       .*     ... then whatever...
                         "    ... until the closing quotes  

You can remove the complete <a> tag with the following regex:

(?<=<h[12]>)<a *href="#.*".*>

Then remove the closing tag with this straightforward regex:

</a>

Here is a link to phplivergex.com where you can check the replacement result.

Right leg
  • 16,080
  • 7
  • 48
  • 81