0

I have a problem with some contents, which have the same link again and again, so i want to remove all duplicate links except a single, have anyone idea how to do this????

here is my code which remove all links

function anchor_remover($page) {
    $filter_text = preg_replace("|<<blink>a *<blink>href=\<blink>"(.*)\">(.*)</a>|","\\2",$page); 
    return $filter_text; 
}

add_filter('the_content', 'anchor_remover');

basically i need this for wordpress, to filter the contents and remove duplicate links should have only a single link.

George Brighton
  • 5,131
  • 9
  • 27
  • 36
Bheem
  • 307
  • 2
  • 8
  • 24
  • Have no idea what are you talking about :( Please, be more specific in your question. You'd better provide some question-relative information like where do you store those links at least. – Nemoden May 12 '11 at 09:36
  • i want to remove all same links from a page but at least one link should be there from all, hope you get me..? – Bheem May 12 '11 at 09:55
  • can you provide some code sample or entry sample? – Narcis Radu May 12 '11 at 09:56
  • 1
    do you want to keep the html-nodes? or do you want to proccess the links just in php? that is the question. if you don't provide more information the simple answer to your problem would be `array_unique`.... – Fidi May 12 '11 at 10:20
  • **missing** preg_replace("|(.*)|","\\2",$page); – Bheem May 12 '11 at 10:29
  • is there any array of links u have?? then try with `array_unique` – xkeshav May 12 '11 at 10:58

1 Answers1

0

Using preg_replace_callback:

<?php
/*
 * vim: ts=4 sw=4 fdm=marker noet
 */
$page = file_get_contents('./dupes.html');

function do_strip_link($matches)
{
        static $seen = array();

        if( in_array($matches[1], $seen) )
        {
                return $matches[2];
        }
        else
        {
                $seen[] = $matches[1];
                return $matches[0];
        }
}
function strip_dupe_links($page)
{
        return preg_replace_callback(
                '|<a\s+href="(.*?)">(.*?)</a>|',
                do_strip_link,
                $page
        );
}

$page = strip_dupe_links($page);
echo $page;

Input:

<html>
        <head><title>Hi!</title></head>
        <body>
                <a href="foo.html">foo</a>
                <a href="foo.html">foo</a>
                <a href="foo.html">foo</a>
                <a href="foo.html">foo</a>
                <a href="foo.html">foo</a>
                <a href="foo.html">foo</a>
                <a href="foo.html">foo</a>
                <a href="foo.html">foo</a>
                <a href="foo.html">foo</a>
                <a href="foo.html">foo</a>
                <a href="bar.html">bar</a>
        </body>
</html>

Output:

<html>
        <head><title>Hi!</title></head>
        <body>
                <a href="foo.html">foo</a>
                foo
                foo
                foo
                foo
                foo
                foo
                foo
                foo
                foo
                <a href="bar.html">bar</a>
        </body>
</html>
Mel
  • 6,077
  • 1
  • 15
  • 12