0

What is the best way (in PHP) to get the page title and metatag contents of potentially millions of remote URLs in realtime?

also, is this feasible to accomplish using a single shared server?

So far I'm looking into 4 possibilities. (I'm also using Codeigniter)

fOpen, get_meta_tags, file_get_contents, cURL

Charles
  • 50,943
  • 13
  • 104
  • 142
Chamilyan
  • 9,347
  • 10
  • 38
  • 67

3 Answers3

2

You can't do millions in real time on a shared server. You'll very likely get shutdown for using too much CPU. But if you are using PHP, your best bet would be to use multi-curl. See a very similar question, which contains code sample:

Status checker for hundreds IP addresses

Community
  • 1
  • 1
Brent Baisley
  • 12,641
  • 2
  • 26
  • 39
  • how does node.js stack up in this scenario? – Chamilyan Mar 09 '12 at 06:27
  • I haven't tried node for something like this, but node seems to perform pretty awesome for most things I've tried with it. Node should also have smaller memory requirements, and perhaps less CPU load. – Brent Baisley Mar 09 '12 at 13:03
0

It depends of your purpose. Actually in any case you should use asynchronous approach. In PHP you can try to use curl with async sockets or pcntl extension (form cgi mode). Or you can use so popular now node.js too (but it's not a PHP at all :) )

user1016265
  • 2,307
  • 3
  • 32
  • 49
0

You can try with PHP Simple HTML DOM Parser. With this DOM Parser you will get the whole page content and parse the head title and meta tags.

Jilani J
  • 49
  • 5