Web.config modification to block search engines from crawling pdfs

Question

I'm trying to stop web crawlers from indexing pdf files on a website. I know how to do this with an .htaccess file but not in a web.config file. This snippet will stop crawlers from indexing the whole site correct? What do I need in order to just block pdfs from being crawled? Is it possible?

<httpProtocol>
    <customHeaders>
        <add name="X-Robots-Tag" value="noindex" />
    </customHeaders>
</httpProtocol>

Learn IIS URL Rewrite module https://docs.microsoft.com/en-us/iis/extensions/url-rewrite-module/creating-rewrite-rules-for-the-url-rewrite-module . Then write a rule to redirect search engine bots to a harmless page when they try to access your PDF links. — Lex Li, Nov 06 '20 at 23:47
@LexLi: This isn't about redirection, but setting a header that tells the bots not to index this resource. — Esa Jokinen, Nov 08 '20 at 10:40

score 4 · Accepted Answer · answered Nov 08 '20 at 10:39

Setting Response Headers is possible with IIS URL Rewrite Module.

<system.webServer>
  <rewrite>
    <outboundRules>
      <rule name="X-Robots-Tag: noindex to .pdf">
        <match serverVariable="RESPONSE_X_Robots_Tag" pattern=".*" />
        <conditions>
          <add input="{REQUEST_FILENAME}" pattern="\.pdf$" />
        </conditions>
        <action type="Rewrite" value="noindex"/>
      </rule>
    </outboundRules>
  </rewrite>
</system.webServer>

Jokenen: Thank you. This is exactly what I was trying to figure out. Working like a charm. — robb, Nov 10 '20 at 22:18

Web.config modification to block search engines from crawling pdfs

1 Answers1