I used the PHP DomDocument
class to extract all the a
tags from over 3000 posts and collected them in a database as follows -
I used domDocument
C14N()
function to fill the existing_link
table.
id | existing_link | replacement_link
1 | <a class="class1" href="domain1.com" rel="nofollow">Domain1.com</a> | <a href="domain2.com">Domain2.com</a>
My initial thought was to simply use Laravel's Str::replace()
to find and replace the links using above table. But, C14N()
did something I did not think of. It put the link's attributes in alphabetical order. That is, while the link in my post exists as -
<a href="domain1.com" class="class1" rel="nofollow">Domain1.com</a>
The C14N() function saved it with attribute order changed (class -> href -> rel)! Look existing_link
in above table.
As a result, I cannot use Laravel's Str::replace()
to quickly replace links; even though they technically are the same links; they are not the same strings.
Each post in my DB can have multiple links to be replaced based on the table I've prepared. My best attempt so far is as follows -
$new_links = DB::table('links')->get();
foreach ($new_links as $new_link)
{
$post = Post::where('id', $new_link->id)->first();
$post_body = $post->body;
$domDocument = new \DOMDocument();
$domDocument->loadHTML($post_body, LIBXML_NOERROR);
// Pull the links in the post body
$old_links = $domDocument->getElementsByTagName('a');
foreach ($old_links as $old_link)
{
if($old_link->C14N() == $new_link->existing_link)
{
// Perform the replacement. I can't figure out how to do this using DOMDOcument.
}
}
}
How can I achieve the final replacement of links using DOMDocument? I am open to any other approach.