4

I'm trying to stop web crawlers from indexing pdf files on a website. I know how to do this with an .htaccess file but not in a web.config file. This snippet will stop crawlers from indexing the whole site correct? What do I need in order to just block pdfs from being crawled? Is it possible?

<httpProtocol>
    <customHeaders>
        <add name="X-Robots-Tag" value="noindex" />
    </customHeaders>
</httpProtocol>
robb
  • 145
  • 4
  • Learn IIS URL Rewrite module https://docs.microsoft.com/en-us/iis/extensions/url-rewrite-module/creating-rewrite-rules-for-the-url-rewrite-module . Then write a rule to redirect search engine bots to a harmless page when they try to access your PDF links. – Lex Li Nov 06 '20 at 23:47
  • 1
    @LexLi: This isn't about redirection, but setting a header that tells the bots not to index this resource. – Esa Jokinen Nov 08 '20 at 10:40

1 Answers1

4

Setting Response Headers is possible with IIS URL Rewrite Module.

<system.webServer>
  <rewrite>
    <outboundRules>
      <rule name="X-Robots-Tag: noindex to .pdf">
        <match serverVariable="RESPONSE_X_Robots_Tag" pattern=".*" />
        <conditions>
          <add input="{REQUEST_FILENAME}" pattern="\.pdf$" />
        </conditions>
        <action type="Rewrite" value="noindex"/>
      </rule>
    </outboundRules>
  </rewrite>
</system.webServer>
Esa Jokinen
  • 46,944
  • 3
  • 83
  • 129
  • Jokenen: Thank you. This is exactly what I was trying to figure out. Working like a charm. – robb Nov 10 '20 at 22:18