How to let google crawl pdf files but not index them?

Question

if i understand it right, you can only tell google to crawl or not crawl pdf files via robots.txt. i want google to crawl the files, but not list them in the search results pages.

is this possible?

Ok, so what would google do with the information? Google: ah! here's a Pdf file. Nice .. but ... what do u want me to do with this? I provide search results .. and u're asking me to NOT list this file in any search results... er..... — Pure.Krome, May 12 '12 at 12:55
@Pure.Krome: if its a pay for download resource it is a valid move. Otherwise google could cache it. — memo, May 12 '12 at 12:59
That means Google should index the PDF (i.e. return it in the list of results) but not make the contents available from the Google cache? — Daan, May 12 '12 at 17:01

score 2 · Answer 1 · answered May 12 '12 at 16:30

2

You can add robots directives to any file via the x-robots-tag http header. Setting it to noindex, follow sounds like what you want.

answered May 12 '12 at 16:30

Tony McCreath

2,882
1
14
21

score 0 · Answer 2 · answered May 12 '12 at 13:02

0

Im not sure but isnt this: <meta name="robots" content="noindex"> a good solution for your problem.?

answered May 12 '12 at 13:02

memo

441
2
6

sure i meant the link to the pdf file in the html. – memo May 12 '12 at 16:29

How to let google crawl pdf files but not index them?

2 Answers2