0

Possible Duplicate:
How to parse and process HTML/XML with PHP?

I want to find the backlink in the html source code. See the code below. But I want to find anchor tags that don't have a rel='nofollow' attribute.

Example:

<a href='http://domain.com/abd/ff/' rel='nofollow'>

Regex:

if(preg_match("/<a(.*)href=[\"']".$match_pattern."(\/?)[\"'](.*)>(.*)<\/a>/", $part)){...}

Function:

function check_back_link($remote_url, $your_link) {
  $match_pattern = preg_quote(rtrim($your_link, "/"), "/"); 
  $found = false;
  if($handle = @fopen($remote_url, "r")){
    while(!feof($handle)){
      $part = fread($handle, 1024);
      if(preg_match("/<a(.*)href=[\"']".$match_pattern."(\/?)[\"'](.*)>(.*)<\/a>/", $part)){
        $found = true;
        break;
      }
    }
    fclose($handle);
  }
  return $found;
}
Community
  • 1
  • 1
Navneet Singh
  • 1,218
  • 11
  • 17
  • 4
    Use an XHTML parser. Regex is not well suited for this. – user229044 Jan 06 '13 at 16:06
  • thanks for your advise, but it's not too complicated task, just need to parse the anchor tag. – Navneet Singh Jan 06 '13 at 16:17
  • So, you're saying you need to *parse HTML*, but you don't want to use an HTML parser? Regex is not the best tool for this particular job, so why do you insiste on using it? – user229044 Jan 06 '13 at 16:18
  • [Please see this](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) and [this](http://stackoverflow.com/questions/3577641/how-to-parse-and-process-html-xml-with-php?lq=1) – Leigh Jan 06 '13 at 16:51
  • @NavneetSingh: You say "it's not too complicated task", but what we are telling you is that it **is** too complicated of a task to do with regular expressions. All of us saying "Don't use regular expressions to parse HTML" aren't just saying it to bug you. We're saying it because we know what we're talking about. See http://htmlparsing.com/regexes for more about why regexes can't do the job reliably. – Andy Lester Jan 06 '13 at 18:05

0 Answers0