1

I display favicon's from other sites on my page.

About half the time they are here:

hostname.com/favicon.ico

But the other half they are not. For ecample in my own site I link to my .ico file like this. FAVICON is just a PHP definition of the path.

<link rel="SHORTCUT ICON" href="<?php echo FAVICON ?>" />

How do I get the URL of a site's favicon using the the link in the html?

This is site sais you can do a google search like this where you enter the domain you need the favicon for.

http://www.google.com/s2/favicons?domain=domain

Which is one solution but seems less efficient than just reading the html from the path.

I think google cached "ALL" icons into .png format and made them searchable -

per this site

  • Possible Duplicate: http://stackoverflow.com/questions/1276688/php-getting-a-sites-favicon-and-converting-it-to-png-if-necessary – CodeJoust Dec 17 '11 at 16:15

2 Answers2

3

Load the page using Ajax and a proxy page. For the Ajax:

// Create a request object:
var rq = new XMLHttpRequest(); // Not IE6-compatible, by the way.

// Set up the request:
rq.open('GET', 'proxy.php?url=' + encodeURIComponent(thePageURL), true);

// Handle when it's loaded:
rq.onreadystatechange = function() {
    if(rq.readyState === 4) {
        // The request is complete:
        if(rq.status < 400) {
            // The HTML is stored in rq.responseText; you could use a regular expression to extract the favicon, like /shortcut icon.+?href="(.+?)"/i.
        } else {
            // There was an error fetching the page; fall back?
        }
    }
};

And the proxy page (you'll probably want to add some security):

<?php
echo file_get_contents($_REQUEST['url']);
?>

Google "Ajax" and you'll find lots of information on how to do that sort of thing.

The reason you need to proxy the page is that browsers don't allow Ajax requests from JavaScript to go across domains unless the target allows it, which it must do explicitly. This is for security reasons, since the JavaScript could be maliciously impersonating the user. So instead, you proxy the content using a server-side script and avoid such problems.

Ry-
  • 218,210
  • 55
  • 464
  • 476
  • When you say "load the page and proxy the page"...what do you mean? –  Dec 17 '11 at 16:12
  • You are saying to run a regex on the entire page contents after pulling it in via ajax I think –  Dec 17 '11 at 16:13
  • You really don't need to proxy the page as search for the string remotely. – CodeJoust Dec 17 '11 at 16:14
  • @stack.user.0 Sorry, I added that to the answer. :P – Ry- Dec 17 '11 at 16:15
  • I would guess that is what the google search is doing or similar...but I thought there would be a programmable way to do this...to hit the DOM ...something like document.getIcon() ? –  Dec 17 '11 at 16:16
  • @CodeJoust - How do you do a remote search? –  Dec 17 '11 at 16:17
  • The Google code would work ...simple one liner...perhaps I should keep it dynamic like this incase the link out changes...but it would be nice to understand the remote search process –  Dec 17 '11 at 16:18
  • There really isn't a way to do it in raw javascript due to security concerns, you'll need a proxy or a library such as in my answer below. – CodeJoust Dec 17 '11 at 16:19
  • @stack.user.0: Google doesn't show favicons, does it? Anyway, no, you can't do that because the websites aren't guaranteed to be valid XHTML. – Ry- Dec 17 '11 at 16:23
  • Google already has the page cached...the best trick would be to get there search to return the actual path..instead of the icon..with the the "google path" –  Dec 17 '11 at 16:28
2

Parsing HTML is nasty - you probably want to use a library like: http://www.controlstyle.com/articles/programming/text/php-favicon/ or let google do it for you: http://www.google.com/s2/favicons?domain=domain (much more efficient - you don't have to parse all the HTML on your server, and it's just one tag). If you want something like google's functionality on your server, check out the link above.

CodeJoust
  • 3,760
  • 21
  • 23
  • Great, for `let google do it for you` this should be the answer. – Ricardo Souza Dec 17 '11 at 16:23
  • That PHP is overkill...it's one tag in the html...i.e....one regex...that you can run to find it...and of course spend about 8 hours modifying for they myriad of facivon link types. I'm just gong to use the google code...as they most certainly alreayd have the pages cached...they don't incur any overhead. –  Dec 17 '11 at 16:25