13

I'm looking for ways to prevent indexing of parts of a page. Specifically, comments on a page, since they weigh up entries a lot based on what users have written. This makes a Google search on the page return lots of irrelevant pages.

Here are the options I'm considering so far:

1) Load comments using JavaScript to prevent search engines from seeing them.

2) Use user agent sniffing to simply not output comments for crawlers.

3) Use search engine-specific markup to hide parts of the page. This solution seems quirky at best, though. Allegedly, this can be done to prevent Yahoo! indexing specific content:

<div class="robots-nocontent">
This content will not be indexed!
</div>

Which is a very ugly way to do it. I read about a Google solution that looks better, but I believe it only works with Google Search Appliance (can someone confirm this?):

<!--googleoff: all-->
This content will not be indexed!
<!--googleon: all-->

Does anyone have other methods to recommend? Which of the three above would be the best way to go? Personally, I'm leaning towards #2 since while it might not work for all search engines, it's easy to target the biggest ones. And it has no side-effect on users, unless they're deliberately trying to impersonate a web crawler.

Blixt
  • 49,547
  • 13
  • 120
  • 153
  • What about displaying your comments in an iframe (essentially another page altogether)? – Klaus Byskov Pedersen Dec 29 '09 at 09:40
  • It might work, if that page is specified as not to be indexed by search engines... But it feels like a very round-about way of doing it... I was never a fan of iframes. – Blixt Dec 29 '09 at 09:42
  • "This makes a Google search on the page return lots of irrelevant pages." What do you mean by "Google search on the page"? The page should show up in Google results when it matches the search query - are you worried that your page will show up too often? – Jørn Schou-Rode Dec 29 '09 at 09:45
  • 1
    Ah, maybe I should have clarified. I'm doing a domain-limited search on the page. It's for use with Google's Search API. The problem is that since comments constitute 95% of the content on the site (there are several hundred comments on every blog post), they mess up the search results. Even if I make separate searches for blog posts and the rest of the site, the blog search will still be pretty bad. – Blixt Dec 29 '09 at 09:55

4 Answers4

7

I would go with your JavaScript option. It has two advantages:

1) bots don't see it 2) it would speed up your page load time (load the comments asynchronously and unobtrusively, e.g. via jQuery) ... page load times have a much underrated positive effect on your search rankings

autonomatt
  • 4,393
  • 4
  • 27
  • 36
  • 2
    Not strictly true any more - the major search engines now can and do crawl javascript. – Bob C Jul 04 '13 at 14:21
  • 1
    Provided you load the comments async, this has to be the way to go. As Bob said, a lot of bots (including Google's) do run some limited JavaScript. But I bet they won't do the ajax and index the result as part of the page. – T.J. Crowder Jul 13 '13 at 13:12
4

Javascript is an option but engines are getting better at reading javascript, to be honest I think your thinking too much into it, Engines love unique content, the more content you have on each page the better and if the users are providing it... its the holy grail.

Just because your commenter made a reference to star wars on your toaster review doesn't mean your not going to rank for the toaster model, it just means you might rank for star wars toaster.

Another idea would be, you could only show comments to people who are logged in, collegehumor do the same I believe, they show the amount of comments a post has but you have to login to see them.

Dom Hodgson
  • 508
  • 2
  • 5
  • 18
  • I don't think you see just how big the comment/page content ratio is. If you'd search for, for example, "how to register", you'd get lots of comment hits on irrelevant pages, before actually getting the page that has information about how to register. Simply because out of the hundreds of comments that some of the pages have, several of them will be talking about registering. – Blixt Jan 02 '10 at 08:59
2

googleoff and googleon are for the Google Search Appliance, which is a search engine they sell to companies that need to search through their own internal documents. It's not effective for the live Google site.

I think number 1 is the best solution, actually. The search engines doesn't like when you give them other material than you give your users so number 2 could get you kicked out from the search listings altogether.

Emil Vikström
  • 90,431
  • 16
  • 141
  • 175
1

This is the first I have heard that search engines provide a method for informing them that part of a page is irrelevant.

Google has a feature for web masters to declare parts of their site for a web search engine to use to find pages when crawling.

  1. http://www.google.com/webmasters/
  2. http://www.sitemaps.org/protocol.php

You might be able to relatively de-emphasize some things on the page by specifying the most relevant keywords using META tag(s) in the HEAD section of your HTML pages. I think that is more in line with the engineering philosophy used to architect search engines in the first place.

Look at Google's Search Engine Optimization tips. They spell out clearly what they will and will not let you do to influence how they index your site.

JohnnySoftware
  • 2,053
  • 16
  • 15