3

I am wanting to find all links in a string of HTML that do not have a target attribute so that it can be added.

Here is some code that detects the attributes... I could try search the output to find if there is a target but is there a simpler way I can do this to detect if it has a target attribute or not?

$content = '<p>This is some <a href="http://www.google.com">sample text</a> with
<a href="htttp://bing.com" target="_blank" class="test">links</a>.</p>';

preg_match_all('/<a([^>]*)href="([^"]*)"([^>]*)>([^<]*)<\/a>/', $content, $matches);

print_r($matches);

Output:

Array
(
    [0] => Array
        (
            [0] => <a href="http://www.google.com">sample text</a>
            [1] => <a href="htttp://bing.com" target="_blank" class="test">links</a>
        )

    [1] => Array
        (
            [0] =>  
            [1] =>  
        )

    [2] => Array
        (
            [0] => http://www.google.com
            [1] => htttp://bing.com
        )

    [3] => Array
        (
            [0] => 
            [1] =>  target="_blank" class="test"
        )

    [4] => Array
        (
            [0] => sample text
            [1] => links
        )

)
halfer
  • 19,824
  • 17
  • 99
  • 186
Ben Sinclair
  • 3,896
  • 7
  • 54
  • 94

3 Answers3

13

Another way to tackle this problem instead of regex is using the php DOM extension which allows you to operate on XML documents through the DOM API. Here is an example for you:

$content = '<p>This is some <a href="http://www.google.com">sample text</a> 
with <a href="htttp://bing.com" target="_blank" class="test">links</a>.</p>'; 

$doc = new DOMDocument();
$doc->loadHTML($content);
$links = $doc->getElementsByTagName('a');
foreach ($links as $item) {
    if (!$item->hasAttribute('target'))
        $item->setAttribute('target','_blank');  
}
$content=$doc->saveHTML();
echo $content;

This is better instead of using an intricate regex which is hard to mantain and debug.

Hope it helps. Good luck!

Mihai Crăiță
  • 3,328
  • 3
  • 25
  • 37
1

When I addressed a similar problem, I solved the issue in two steps:

  1. Search for all the anchors tag in the HTML document (like you did)

  2. For each found anchor I applied a new regexp aiming to list all the attributes.

Than it was easy to discover which ones don't specify the target attribute. An useful regular expression for which you can start for the step n°2 is

(\S+)=["']?((?:.(?!["']?\s+(?:\S+)=|[>"']))+.)["']?

which I've found here

Community
  • 1
  • 1
FabioDch
  • 516
  • 4
  • 8
0

I am not sure if php support it, but this regexp takes first A element:

 <a ((?!target)[^>])+?>

Found solution/explanation here https://stackoverflow.com/a/406408/1692632

Community
  • 1
  • 1
Darka
  • 2,762
  • 1
  • 14
  • 31