My site uses some aggressive caching techniques to keep requests to a minimum, among them being:
- .htaccess redirects to cached HTML files;
- Automatic merging of content images into CSS sprites.
This works great for human traffic, but when an article is posted on Facebook, Pinterest, Google+, Reddit, etc the bot fails to find a suitable thumbnail since the page images are all big sprite JPEGs.
One solution would be .htaccess rules that bypass the cache when a bot is making the request. Preferably without having to specifically name every possible bot user-agent. I am unsure how to accomplish that.
Another solution would be to embed one good thumbnail image on every page that a bot would download but a real web browser would not. Any ideas how to accomplish that?
Other suggestions are welcome. If all else fails I'll rework my script to exclude the first image of every post from the autosprites, but that will effectively double the number of image requests my poor overworked server must accomodate.