-1

I have a wordpress blog on my own server, which used permanant links containing Chinese characters in urls like http://techblog.zellux.czm.cn/2008/03/ics-lab4-%E7%BB%8F%E9%AA%8C/.

Several months ago I changed all the urls with english descriptions like http://techblog.zellux.czm.cn/2009/05/page-coloring/, and installed a plugin to generated sitemap.xml automatically and submitted it to Google Webmaster Tools.

Now several months later I checked Google Webmaster Tools again, in Diagnostics->Web Crawl it said that it could not found 41 urls, all of which contained unicode of Chinese characters like the one before, and in the Linked From column they were all unavailable. As to Sitemaps->Sitemap details, it said that only 15 out of 115 URLs were indexed.

So my problem is how to make Google stop trying to crawl deprecated URLs but crawl only those specified in sitemap.xml?

Epeius
  • 1,031
  • 2
  • 9
  • 6

1 Answers1

1

You can use a robots.txt file to exclude specified pages from being crawled. It would look like this:

User-Agent: Googlebot
Disallow: /file-1
Disallow: /file-2
Disallow: /file-3

Create this in any text editor and upload to your root directory (or edit the existing robots.txt file if one is there).

Webmaster Tools also has a tool to generate a robots.txt file (Tools > Generate robots.txt)