0

Let's assume my URL is mysite.com/myuri. There are many methods the world knows, to know if a local uri myuri exists on my site mysite.com.

  1. The easiest and the fastest method is:
    $uri_exists = file_exists($_SERVER['DOCUMENT_ROOT']."/myuri").
  2. The next method is:
    $headers = @get_headers("http://mysite.com/myuri");
    $uri_exists = ($headers[0] == "HTTP/1.1 404 Not Found");
  3. Another (a little faster than the previous one) method is:
    $curl = curl_init('http://mysite.com/myuri');
    curl_setopt($curl, CURLOPT_NOBODY, true);
    curl_exec($curl);
    $info = curl_getinfo($curl);
    $uri_exists = ($info['http_code'] == 200);
    curl_close($curl);

These were the methods I am aware of. The first method precisely checks if a file exists, not if a URI exists. If mod_rewrite is used, this method is highly inaccurate.

The next two methods are accurate (even when mod_rewrite is used) but slow since they perform remote GET requests to their own site. They are also inefficient because, just to know if a URI exists the scripts on mysite.com will execute themselves unnecessarily. Moreover they will cause unnecessary traffic to mysql server.

Coming to what I am trying to ask, I want the PHP scripts at mysite.com to check if a URL mysite.com/myuri exists using a method more efficient and accurate than the above three methods.

Thank you

Peace...

Tabrez Ahmed
  • 2,830
  • 6
  • 31
  • 48
  • Er, correct me if I'm wrong but shouldn't you know if a URL exists on your own site? ;) It might just be easier to do some kind of introspection on either the file system or with the framework you're using (if you are) rather than trying to figure it out via HTTP. – enygma Mar 09 '12 at 12:46
  • Thanks for replying .... Yes, the first method I mentioned in the question does exactly that. But it seldom checks Rewritten(mod_rewrite) URLs.... hope you get that.... – Tabrez Ahmed Mar 09 '12 at 12:49
  • 1
    Sounds like you'd need two checks then...one for the file system then, if that fails, check via HTTP (to see if it's rewritten). I'm still confused as to why you wouldn't know what URLs would be on your own site though. – enygma Mar 09 '12 at 12:51
  • There is no other way, so the only solution to alleviate the overhead, is to cache the results for say 12 hours or something. – Gerben Mar 09 '12 at 16:05

2 Answers2

1

Instead of sending a GET request, which will result in having the contents of the uri returned to you, you could use a HEAD request, which, depending on the web application implementation, may just tell you some information about the uri instead of trying to return it.

You could modify your third example using the following code:

curl_setopt($curl, CURLOPT_CUSTOMREQUEST, 'HEAD');
Edward Dale
  • 29,597
  • 13
  • 90
  • 129
  • Thanks for replying.... But that still performs a remote request. The request will again create an extra load... correct me if I am wrong... – Tabrez Ahmed Mar 09 '12 at 12:51
1

If you are using PHP and Apache, depending on the SAPI you might be able to use apache_lookup_uri($uri) which may be slightly more efficient.

http://ca2.php.net/manual/en/function.apache-lookup-uri.php

Martin
  • 5,945
  • 7
  • 50
  • 77
  • Thanks for replying... I use PHP and Apache... apache_lookup_uri($uri) returned status code 200 even for non-existent URLs when using mod_rewrite. – Tabrez Ahmed Mar 09 '12 at 14:07