2

Googlebot is constantly trying to index URLs that don't exist in our server, so it gets a 404 error all the time. We don't have any reference to that web site (I think it's a blog from Nigeria) so I don't know why Google is trying to access those pages.

The strange thing is that I cannot find that website on Internet, it's like it doesn't exist anywhere.

This is an example of an entry in my logs:

66.249.72.201 - - [17/Sep/2011:10:08:10 +0200] "GET /main.php/v/Agadez+2006/Tagama/IMG_1214.JPG.html?g2_imageViewsIndex=3&g2_fromNavId=x50ca95f2 HTTP/1.1" 404 245 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Any idea about what is happening?

mailq
  • 17,023
  • 2
  • 37
  • 69
Curri
  • 141
  • 2

4 Answers4

2

GoogleBot is known to try URL's that existed some time in the past. For example, I recently did a complete overhaul of my website. Old URL's that were indexed in the past are still getting hit (404) by Googlebot months later. I know for a fact my website does not internally use those URL's in any way. Some are linked by external sites; some are not even linked externally.

You may want to use Google Webmaster Tools, if you are not already. You can use the tools, to see what was indexed and what gave a 404. You can also see what pages are linked to from what external locations.

J. M. Becker
  • 2,471
  • 1
  • 17
  • 21
  • But the domain is completely new and the content is in Spanish, not french. And we are the only ones who enter the content so we are sure that we never entered it. In google Analytics I cannot see any access from external websites – Curri Sep 17 '11 at 20:58
  • Google Analytics,is not the same as http://www.google.com/webmasters/. In addition... If your domain was previously owned by someone else, and they were indexed by GoogleBot, that could explain where the old URLs came from. – J. M. Becker Sep 18 '11 at 20:47
  • Hi again Technilla :). In webmaster tools I can only confirm what I said, Google is indexing wrong URLs. We are the first owners of that domain. Thanks :) – Curri Sep 19 '11 at 21:19
  • how did you conclude you were the first to register that domain? Additionally, did you see the list of your external incoming links? You can also show which ones are 404's. – J. M. Becker Sep 22 '11 at 18:15
1

Google's claim to fame is to crawl the Internet and discern relevant content that provides value to searchers. In doing so, Google relies heavily on inbound links from other websites as a sort of "vote of confidence" about your site. Provided there are links on other website floating about the Net, Google will follow the links in search of content to index.

I suspect that the previous owner of your domain name (prior to your registration) has inbound links elsewhere to content authored some time go. Now that you've taken custody of the domain name and the content no longer exists, Google gets a 404 error.

In a perfect world, Google would remember receiving the 404 error and never crawl those link again. Unfortunately, GoogleBot is complex and ever-changing so it's hard to guess what might happen.

I had a similar experience with a newly registered domain name -- you can safely ignore this behavior. It won't have any sustainable impact on your rankings.

Trent Scott
  • 959
  • 1
  • 12
  • 28
  • Hi Trenton, thanks for your answer. I am 100% sure that we are the first owners, the domain is not an easy word – Curri Sep 19 '11 at 21:20
  • That's very weird. I wonder if someone else is forwarding their domain to yours (e.g. 301 redirect). – Trent Scott Sep 19 '11 at 22:10
0

What happens? Google is accessing your site. Nothing to worry about.

If you worry about something then read the given URL: http://www.google.com/bot.html

If you don't want that Google accesses your site then you can block the IP range. In this case no page will be indexed.

mailq
  • 17,023
  • 2
  • 37
  • 69
  • I don't want to block the googlebot and I don't really worried :) but I cannot understand why Google is trying to index pages that don't exist and never existed. The domain is brand new so it didn't have hat content before. – Curri Sep 17 '11 at 20:54
  • 1
    It is brand new for _you_ but there could be a former owner that had these pages. Others linked to it and Google now follows external links. So what. You shouldn't care. Google is cleaning it's index and the hits will decrease over time. – mailq Sep 17 '11 at 23:21
  • Trust me, we are the first owners :) – Curri Sep 19 '11 at 21:21
  • @Curri Let's bet. It is support-fr.org and you are the owner since 2007? – mailq Sep 19 '11 at 21:26
0

It's not possible to tell from a single URL whether this is practical or not but the first thing I'd be looking at is adding some part of the URL to the robots.txt file.

John Gardeniers
  • 27,458
  • 12
  • 55
  • 109