1

I have this robots.txt:

User-Agent: *
Disallow: /files/

User-Agent: ia_archiver
Allow: /

User-agent: Googlebot 
Disallow: 

User-agent: googlebot-image 
Disallow: 

User-agent: googlebot-mobile 
Disallow: 

I am finding that PDF files in the /files/ directory are being indexed by Google.

Should I move the first entry to the bottom?

In working with Google's webmaster tools. I moved the /files/ disallow to the bottom and ran a test on one PDF file in the files directory and it returned Success.

How can I fix this issue? We do not want anything in this directory being indexed.

EDITED

Even if I remove everything except the first clause,

User-Agent: *
Disallow: /files/

Google still is able to see PDFs in the /files/ directory, what am I doing wrong here?

In Bing's webmaster tools, it shows as blocked but Google's still shows Success.

MB34
  • 167
  • 2
  • 10
  • 3
    How long are you waiting after changing the `robots.txt` before analysing your logs? And which Google bot is retrieving the PDFs? – Ladadadada May 06 '14 at 16:43

1 Answers1

0

Edit: re-read the standard. a robot will use the first matching name token, or fall-back to *. For each bot you want to deny access to /files/, you'll need to add a matching disallow:

User-agent: *
Disallow: /files/

User-agent: Googlebot 
Disallow: /files/

http://www.robotstxt.org/ is a great resource, if you haven't seen it.

pete
  • 723
  • 1
  • 7
  • 16
  • `*` should work for all bots that adhere to the standards. – MB34 May 07 '14 at 18:29
  • 1
    thats true, but the standard also says a bot only has to use the first, closest matching rule. so if googlebot finds its token, it will only process that and not bother with the `*`. google's faq confirms this behavior: https://support.google.com/webmasters/answer/156449 under *'create a robots.txt file'*. i too used to think it 'stacked', but re-read it the other day and apparently not.... – pete May 07 '14 at 18:35