I need help making a website crawler using php

Question

I really want to make a website crawler that goes to a website, scans it for links, puts the links in a database and moves on to another website. I found one website but the code was really buggy. If you have seen anything like this or have written one your self.

How many sites are you wanting to crawl? Unless you are spawning multiple PHP processes on the server, you are going to have trouble. PHP is single-threaded, and you won't be efficiently crawling pages. — Brad, Jan 19 '11 at 15:14
`please post the code, not the website!` I highly discourage/disagree with that, the website will be of much greater use then pre-cooked code, also for future reference. — orlp, Jan 19 '11 at 15:14
is there any other language that is more efficient? I just want a web crawler — , Jan 19 '11 at 15:16
You'll find more readymade crawlers in the Perl area. WWW::Mechanize comes to mind. — mario, Jan 19 '11 at 15:19
i dont really know perl so if possible make it in php/python/js — , Jan 19 '11 at 15:21
You should read this similar item: http://stackoverflow.com/questions/1733599/is-there-a-list-of-known-web-crawlers — Christa, Jan 19 '11 at 15:56

score 1 · Answer 1 · answered Jan 19 '11 at 15:45

You probably won't find anything suitable for PHP, as it is generally for short-running pages. Many severs, for example, are set to time out at 30 seconds. You can write PHP for command-line scripts, but I suspect that's not what you want.

Anywyay, if you want a pre-packaged solution, why care about the language?

I would recommend something like wget to crawl the sites and save them to disc. Then you can iterate over the files and directories, and pull out links. The hard bit is crawling the sites (it's not simple). You can write the code to pull out links without too much difficulty.

score 1 · Accepted Answer · answered Jan 23 '11 at 16:19

1

I found one, so if anyone is looking, here is the link: php-crawler

answered Jan 23 '11 at 16:19

I need help making a website crawler using php

2 Answers2

Linked