1

I'm using as a web crawler http://phpcrawl.cuab.de for one of my projects and it`s working fine so far, except that I don't know how to exclude or skip links with a specific name.

There are rules I use already to ignore specific file types

$crawler->addURLFilterRule("#\.(jpg|jpeg|gif|png|css|js|pdf|swf|ico)$# i");

see http://phpcrawl.cuab.de/classreferences/PHPCrawler/overview.html

but how can I add a filter for names within a link?

i.e. ignore links that include %feed% or %imprint% etc.

Oliver
  • 156
  • 1
  • 13
  • 1
    `addURLFilterRule("#feed|imprint#i");` maybe. Appears to just be regex. Not sure what the `%` are for, but if needed: `addURLFilterRule("#%(feed|imprint)%#i");` – AbraCadaver Jun 05 '17 at 19:35
  • Does it really work? When I print out all the links /w links_found I get an unfiltered array. example `addURLFilterRule("#\.(css)$# i");` – melkawakibi Jul 14 '17 at 08:00

0 Answers0