PHPCRAWL - How to add a filter for specific link names?

Asked Jun 05 '17 at 19:32

Active Jun 05 '17 at 19:32

Viewed 78 times

I'm using as a web crawler http://phpcrawl.cuab.de for one of my projects and it`s working fine so far, except that I don't know how to exclude or skip links with a specific name.

There are rules I use already to ignore specific file types

$crawler->addURLFilterRule("#\.(jpg|jpeg|gif|png|css|js|pdf|swf|ico)$# i");

see http://phpcrawl.cuab.de/classreferences/PHPCrawler/overview.html

but how can I add a filter for names within a link?

i.e. ignore links that include %feed% or %imprint% etc.

asked Jun 05 '17 at 19:32

Oliver

1

`addURLFilterRule("#feed|imprint#i");` maybe. Appears to just be regex. Not sure what the `%` are for, but if needed: `addURLFilterRule("#%(feed|imprint)%#i");` – AbraCadaver Jun 05 '17 at 19:35
Does it really work? When I print out all the links /w links_found I get an unfiltered array. example `addURLFilterRule("#\.(css)$# i");` – melkawakibi Jul 14 '17 at 08:00

PHPCRAWL - How to add a filter for specific link names?

0 Answers0