0

I used the PHP DomDocument class to extract all the a tags from over 3000 posts and collected them in a database as follows -

I used domDocument C14N() function to fill the existing_link table.

id | existing_link | replacement_link

1 | <a class="class1" href="domain1.com" rel="nofollow">Domain1.com</a> | <a href="domain2.com">Domain2.com</a> 

My initial thought was to simply use Laravel's Str::replace() to find and replace the links using above table. But, C14N() did something I did not think of. It put the link's attributes in alphabetical order. That is, while the link in my post exists as -

<a href="domain1.com" class="class1" rel="nofollow">Domain1.com</a>

The C14N() function saved it with attribute order changed (class -> href -> rel)! Look existing_link in above table.

As a result, I cannot use Laravel's Str::replace() to quickly replace links; even though they technically are the same links; they are not the same strings.

Each post in my DB can have multiple links to be replaced based on the table I've prepared. My best attempt so far is as follows -

$new_links = DB::table('links')->get();

        foreach ($new_links as $new_link)
        {

            $post = Post::where('id', $new_link->id)->first();
            $post_body = $post->body;

            $domDocument = new \DOMDocument();
            $domDocument->loadHTML($post_body, LIBXML_NOERROR);

            // Pull the links in the post body
            $old_links = $domDocument->getElementsByTagName('a');

            foreach ($old_links as $old_link)
            {
                if($old_link->C14N() == $new_link->existing_link)
                {
                    // Perform the replacement. I can't figure out how to do this using DOMDOcument.
                }
            }
        }

How can I achieve the final replacement of links using DOMDocument? I am open to any other approach.

halfer
  • 19,824
  • 17
  • 99
  • 186
TheBigK
  • 451
  • 5
  • 17
  • I can't understand why you can't use str_replace. If you need to replace the link in the body. 'str_replace($old_link->C14N(), $new_link->existing_link, $post->body); – Manuel Eduardo Romero Jan 12 '23 at 15:16
  • You can perform a dump of C14N() in order to see what methods are available. get_class_methods($old_link->C14()) if it's an object. Maybe it has a method called ->getRel() or some similar. – Manuel Eduardo Romero Jan 12 '23 at 15:17
  • @ManuelEduardoRomero -> Because the `str_replace()` cannot detect the string in the text. It still exists as a link before `C14N()` detects it and sorts it out. – TheBigK Jan 13 '23 at 05:42

0 Answers0