I'm using Nutch 1.6 and Solr 4.3 on Ubuntu Server 12.04 I would like to switch on and off content indexing. Is there a way to specify this behaviour in my HTML pages so that Solr can behave accordingly ?
As an example, when using Google Search Appliance I would use "googleon" - "googleoff" tags around the content on the page that i don't want indexed (headers, footers, copyright strings, etc ).
thank you