0

I have a big string with a lot of URLs, I need to replace the URLs that match:

<a href="../plugins/re_records/somefile.php?page=something&id=X">important_name</a>

(where X is an any integer and important_name is any string) with:

<a href="/map/important_name">important_name</a>

I'm using preg_match_all() to match all URLs:

preg_match_all('/\/plugins\/re\_records\/somefile\.php\?page\=something\&id\=*(\d+)/', $bigString, $matches, PREG_OFFSET_CAPTURE);

The problem is that I don't understand how to get the important_name from the hyperlink's visible text to become part of the new url after the URL match.

Is it a good idea to use preg_match_all()?

mickmackusa
  • 43,625
  • 12
  • 83
  • 136
Adoc
  • 63
  • 1
  • 6

3 Answers3

2

Don't use regex. Use DOMDocument. They are specifically made to parse HTML/XML documents.

Get all anchor tag elements, check for value in href attribute and change the attribute accordingly using setAttribute() method.

Snippet:

<?php

libxml_use_internal_errors(true); // to disable warnings if HTML is not well formed 
$o = new DOMDocument();
$o->loadHTML('<a href="../plugins/re_records/somefile.php?page=something&id=45">important_name</a>');

foreach($o->getElementsByTagName('a') as $anchor_tag){
    $href = $anchor_tag->getAttribute('href');
    if(strpos($href,'/plugins/re_records/somefile.php?page=something&id=') !== false){
        $anchor_tag->setAttribute('href','/map/'.$anchor_tag->nodeValue);
    }
}

echo $o->saveHTML();

Demo: https://3v4l.org/5GPXA

nice_dev
  • 17,053
  • 2
  • 21
  • 35
  • 1
    Thanks for the answer! This solves my problem and I think that is the most clean and beautiful way, but unfortunately (and I know it's silly), this adds the and tags, is there a way (without more code) to avoid that? Maybe some DOMDocument method? – Adoc Aug 01 '20 at 17:16
  • Well, you can use str_replace to trim off those. – nice_dev Aug 01 '20 at 17:29
  • 1
    Yes, I have been seeing that this has been a problem for a long time. I can use a lot of things to remove them. Thanks! – Adoc Aug 01 '20 at 17:41
0

If I understand you correctly, you are trying to get the matched important_name?

Then just add parentheses around it and you can get it in the $matches.

<?php
$s = '<a href="../plugins/re_records/somefile.php?page=something&id=123">important_name</a>';

preg_match_all('/\<a href\=\"\.\.\/plugins\/re\_records\/somefile\.php\?page\=something\&id\=*(\d+)\"\>(.*?)\<\/a\>/', $s, $matches, PREG_OFFSET_CAPTURE);

var_dump($matches[2][0][0])
?>

Jack Song
  • 478
  • 4
  • 9
0
  • Definitely get into the habit of parsing HTML with a legitimate DOM parser. Using regex will set you up for headaches eventually. When a DOM parser fails you, then consider using regex.

  • I prefer to filter the parsed document with XPath because the expressions can be very power and flexible.

  • To silence any warnings when loading your string into DOMDocument, call libxml_use_internal_errors(true);. This will silence any warnings.

  • Use the LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED flags to omit the <DOCTYPE>, <HTML>, and <BODY> tags that you don't want/need.

  • starts-with() will do nicely since you are not trying to extract the id number from the end of the querystring.

  • Don't be put off by the encoded & in the output -- it's a good thing / part of a more modern standard.

Code: (Demo)

$html = <<<HTML
<div>
    <p> some text <a href="../plugins/re_records/somefile.php?page=something&id=345">find_me_1</a></p>
    <br>
    <a href="../plugins/re_records/somefile.php?page=something&id=99">find_me_2</a>
    <div>
        <div>
            <a href="example.com?page=something&id=55">don't even think about it!</a>
            <a href="../plugins/re_records/somefile.php?page=something&id=90210">find_me_3</a>
        </div>
    </div>
</div>
HTML;

$hrefStartsWith = '../plugins/re_records/somefile.php?page=something&id=';

$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($html, LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//a[starts-with(@href, '$hrefStartsWith')]") as $a) {
    $a->setAttribute('href', '/map/' . $a->nodeValue);
}
echo $dom->saveHTML();

Output:

<div>
    <p> some text <a href="/map/find_me_1">find_me_1</a></p>
    <br>
    <a href="/map/find_me_2">find_me_2</a>
    <div>
        <div>
            <a href="example.com?page=something&amp;id=55">don't even think about it!</a>
            <a href="/map/find_me_3">find_me_3</a>
        </div>
    </div>
</div>
mickmackusa
  • 43,625
  • 12
  • 83
  • 136