Using DOMDocument()
, I'm replacing links in a $message
and adding some things, like [@MERGEID]
. When I save the changes with $dom_document->saveHTML()
, the links get "sort of" url-encoded. [@MERGEID]
becomes %5B@MERGEID%5D
.
Later in my code I need to replace [@MERGEID]
with an ID. So I search for urlencode('[@MERGEID]')
- however, urlencode()
changes the commercial at symbol (@) to %40, while saveHTML() has left it alone. So there is no match - '%5B@MERGEID%5D' != '%5B%40MERGEID%5D'
Now, I know can run str_replace('%40', '@', urlencode('[@MERGEID]'))
to get what I need to locate the merge variable in $message.
My question is, what RFC spec is DOMDocument using, and why is it different than urlencode or even rawurlencode? Is there anything I can do about that to save a str_replace?
Demo code:
$message = '<a href="http://www.google.com?ref=abc" data-tag="thebottomlink">Google</a>';
$dom_document = new \DOMDocument();
libxml_use_internal_errors(true); //Supress content errors
$dom_document->loadHTML(mb_convert_encoding($message, 'HTML-ENTITIES', 'UTF-8'));
$elements = $dom_document->getElementsByTagName('a');
foreach($elements as $element) {
$link = $element->getAttribute('href'); //http://www.google.com?ref=abc
$tag = $element->getAttribute('data-tag'); //thebottomlink
if ($link) {
$newlink = 'http://www.example.com/click/[@MERGEID]?url=' . $link;
if ($tag) {
$newlink .= '&tag=' . $tag;
}
$element->setAttribute('href', $newlink);
}
}
$message = $dom_document->saveHTML();
$urlencodedmerge = urlencode('[@MERGEID]');
die($message . ' and url encoded version: ' . $urlencodedmerge);
//<a data-tag="thebottomlink" href="http://www.example.com/click/%5B@MERGEID%5D?url=http://www.google.com?ref=abc&tag=thebottomlink">Google</a> and url encoded version: %5B%40MERGEID%5D