0

My site uses some aggressive caching techniques to keep requests to a minimum, among them being:

  • .htaccess redirects to cached HTML files;
  • Automatic merging of content images into CSS sprites.

This works great for human traffic, but when an article is posted on Facebook, Pinterest, Google+, Reddit, etc the bot fails to find a suitable thumbnail since the page images are all big sprite JPEGs.

One solution would be .htaccess rules that bypass the cache when a bot is making the request. Preferably without having to specifically name every possible bot user-agent. I am unsure how to accomplish that.

Another solution would be to embed one good thumbnail image on every page that a bot would download but a real web browser would not. Any ideas how to accomplish that?

Other suggestions are welcome. If all else fails I'll rework my script to exclude the first image of every post from the autosprites, but that will effectively double the number of image requests my poor overworked server must accomodate.

Alan Bellows
  • 1,781
  • 1
  • 14
  • 21
  • 1
    You should accept more answers to your questions. Not only will it make people more likely to help you, but it will also give you rep (+2 for each answer you accept). – Joshua Dwire Nov 06 '12 at 16:43
  • @jdwire: I do try to always accept acceptable answers...but unfortunately oftimes none of the suggestions adequately address my query. I'll look through my history and see if there are any exceptions. Thanks for the guidance. – Alan Bellows Nov 06 '12 at 17:10

1 Answers1

1

Showing different things to bots than to humans is a very bad approach regardless of the problem you're trying to solve. Google will sometimes even punish sites that do this with a low search ranking. A better way to do this would be to go to each bot's website and see if there is a way to tell that bot to display an image that is relevant to that page.

For example, Facebook accepts the following meta tag in the head of your html to tell it an image that is relevant to your page:

<meta property="og:image" content="[url to the image]">
Joshua Dwire
  • 5,415
  • 5
  • 29
  • 50
  • I was not aware of this meta tag. I'll research and see if other services offer similar approaches. It'll add some maintenance issues, but it's better than some alternatives. Thanks! – Alan Bellows Nov 06 '12 at 17:13