First, sign up for a Google Webmaster Tools account. This will allow you view statistics from Google about how they crawl your site, and lets you request removal of pages from the index (more on that later).
Next, set up a robots.txt
file for your site. You do not need to block your entire site from Google to use robots.txt
. All search engines follow robots.txt
, so this will also prevent sites like Bing or Yahoo from indexing these pages.
To set this up, create robots.txt
as a plain text file in the root directory of your site (e.g. http://www.example.com/robots.txt
). The syntax is very simple: you specify the user-agent this should apply to, using *
as a wild-card for all robots, and you specify where the robots shouldn't crawl. Note that you should not include any pages you want to be completely "secret", as this is a publicly visible file. The syntax for robots.txt
is as follows:
User-agent: user agent name
Disallow: directory name
Disallow: another directory
Disallow: (etc)
If you want to block any search engines from indexing data in a subdirectory of your images directory, you might do something like this:
User-agent: *
Disallow: /images/foo/bar/
Disallow: /images/foo/baz/
You can even disallow just a specific file:
User-agent: *
Disallow: /images/foo/bar/qux.jpg
Setting up robots.txt
will prevent the specified directories and files from being indexed in the future. Over time, these pages will be removed from the search index, but it will not be immediate. To expedite this process, use your webmaster tools account to submit a request to remove a URL from the index. Click on the website account you want to remove the URL from, then open "Site configuration" on the left. Click on "Crawler access", then open the "Remove URL" tab. Click on "New removal request", and type in the URL you want to have removed. Then, hit enter. The page should ask you to confirm that you've already blocked the URL via robots.txt
(which you've just done). Click OK, and it should submit the request. It will usually take them 1-3 days to process the request. You can check the status of the request by logging into your webmaster tools account at any time.
