2

Pretty much that is the question. Is there a way that is more efficient than the standart sitemap.xml to [add/force recrawl/remove] i.e. manage your website's index entries in google?

I remember a few years ago I was reading an article of an unknown blogger that was saying that when he write news in his website, the url entry of the news will appear immediately in google's search result. I think he was mentioning about something special. I don't remember exactly what.. . some automatic re-crawling system that is offered by google themselves? However, I'm not sure about it. So I ask, do you think that I am blundering myself and there is NO OTHER way to manage index content besides sitemap.xml ? I just need to be sure about this.

Thank you.

PatlaDJ
  • 1,226
  • 2
  • 17
  • 31
  • This probably belongs on http://webmasters.stackexchange.com. – cHao Jan 23 '11 at 10:25
  • For reference, though, SE's google-fu is strong. Google "more efficient than sitemap" and see what comes up. :) – cHao Jan 23 '11 at 10:28
  • ah.. :( I guess you are right – PatlaDJ Jan 23 '11 at 10:28
  • LOL, it is my question that is going on there on result N1 :) – PatlaDJ Jan 23 '11 at 10:29
  • That is exactly what I am talking about, immediate index update, how would this happen ? I have PR4 website. How those dudes from stackoverflow do this? Do they do only by updating their sitemap.xml or there is something else more efficient ? – PatlaDJ Jan 23 '11 at 10:32

2 Answers2

1

I don't think you will find that magical "silver bullet" answer you're looking for, but here's some additional information and tips that may help:

  • Depth of crawl and rate of crawl is directly influenced by PageRank (one of the few things it does influence). So increasing your site's homepage and internal pages back-link count and quality will assist you.
  • QDF - this Google algorithm factor, "Query Deserves Freshness", does have a real impact and is one of the core reasons behind the Google Caffeine infrastructure project to allow much faster finding of fresh content. This is one of the main reasons that blogs and sites like SE do well - because the content is "fresh" and matches the query.
  • XML sitemaps do help with indexation, but they won't result in better ranking. Use them to assist search bots to find content that is deep in your architecture.
  • Pinging, especially by blogs, to services that monitor site changes like ping-o-matic, can really assist in pushing notification of your new content - this can also ensure the search engines become immediately aware of it.
  • Crawl Budget - be mindful of wasting a search engine's time on parts of your site that don't change or don't deserve a place in the index - using robots.txt and the robots meta tags can herd the search bots to different parts of your site (use with caution so as to not remove high value content).

Many of these topics are covered online, but there are other intrinsic things like navigational structure, internal linking, site architecture etc that also contribute just as much as any "trick" or "device".

Mike Hudson
  • 1,180
  • 5
  • 13
  • Thank you. To conclude: 1. There is NO OTHER elegant way to just tell google wich urls from your website to add/edit/delete let's say like a command? I was hoping there is :( 2. "XML sitemaps do help with indexation, but they won't result in better ranking" - The question is not about ranking at all. Just indexing efficiency. – PatlaDJ Jan 23 '11 at 23:53
  • 3--- "Pinging, especially by blogs, to services that monitor site changes like ping-o-matic, can really assist in pushing notification of your new content" - never heard about, I will explore these suggestions, thank you! 4--- "Crawl Budget - be mindful of wasting a search engine's time on parts of your site that don't change or don't deserve a place in the index - using robots.txt and the robots meta tags can herd the search bots to different parts of your site" - that is also usefull. Thank you, i'll research on it. – PatlaDJ Jan 24 '11 at 00:01
  • It's worth pointing out there are other tools at your disposal - like Google Webmaster Tools - that allow you to get better visibility into indexation issues and status, but there's no other method other what I have outlined. I only mentioned ranking to head off any comments that might suggest otherwise. Good luck. – Mike Hudson Jan 24 '11 at 00:20
0

Getting many links, from good sites, to your website will make the Google "spiders" reach your site faster.

Also links from social sites like Twitter can help the crawlers visit your site (although the Twitter links do not pass "link juice" - the spiders still go through them).

One last thing, update your content regularly, think of content as "Google Spider Food". If the spiders will come to your site, and will not find new food, they will not come back again soon, if each time they come, there is new food, they will come a lot. Article directories for example, get indexed several times a day.

Mo_S
  • 27
  • 2
  • Thank you, but your answer is not relevant. – PatlaDJ Jan 23 '11 at 13:46
  • What you say is usefull, but yet has a little relevance to the question. I'm asking about the technical side of this matter. I know very well about the "things" that you told me. I am not questioning google's methods of evaluation of the importance of a website, nevertheless I realize that the frequency rate of re-crawls is logical to be associated with this. My website is PR4, has about a thousand unique human visitors per day, me and them combined - produce about 50 pages changed per day and 10 new added. I can guess, it's a frequent situation for many websites on the internet? Right? – PatlaDJ Jan 23 '11 at 14:01
  • Google crawls in avg about 4000 thausand pages of my website every day, and that is not efficient!!! My site is small2avarage and is simply inefficient so many traffic from crawls, inefficient for me and for google themselves from the standpoint of traffic and CPU consumption to crawl 4000+ pages a day, agree? Well... having all this in mind -> the question about reindexing efficiency still stands. Is there more efficient way, like to tell google only to recrawl changed pages, when they get changed (I know which they are), not to recrawl the others in vain... and etc things... – PatlaDJ Jan 23 '11 at 14:02