0

This documentations section states that Apache Tika can be configured using dedicated configuration file: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika

<str name="tika.config">/my/path/to/tika.config</str>

The obvious question is - where can I find sample tika.config and instruction on how to modify it?

What's my goal? I want to configure Tika to NOT to parse media files content

illegal-immigrant
  • 8,089
  • 9
  • 51
  • 84
  • There's a bunch of Tika Configs used in unit testing the Tika configuration code that ship in the source tree - do [any of these help](https://svn.apache.org/repos/asf/tika/trunk/tika-core/src/test/resources/org/apache/tika/config/)? – Gagravarr Jan 30 '14 at 11:03
  • well, that's better than nothing, but..still no hint what options are supported and how to achieve different goals using config file – illegal-immigrant Jan 30 '14 at 13:12
  • i mean that's strange - application supports configuration but literally no sign of one on the internet – illegal-immigrant Jan 30 '14 at 13:13
  • I think most people either rely on the auto-discovery of parsers and detectors, so don't need a config file, or do their configuration in code – Gagravarr Jan 30 '14 at 15:11

1 Answers1

0

You have to add these line in the solrconfig.xml file

<lib dir="../../../../contrib/extraction/lib/" regex="tika-core-\d.*\.jar" />
  <lib dir="../../../../contrib/extraction/lib/" regex="tika-parsers-\d.*\.jar" />

Add these line too

**<requestHandler name="/update/extract"
                  startup="lazy"
                  class="solr.extraction.ExtractingRequestHandler" >
    <lst name="defaults">
      <str name="lowernames">true</str>
      <str name="fmap.meta">ignored_</str>
      <str name="fmap.content">_text_</str>
      <str name="capture">body</str>
    </lst>
      <str name="tika.config">**html-config.xml**</str>
  </requestHandler>**

the html-config.xml file must be in the conf directory and contain the contents see from here https://github.com/apache/tika/blob/master/tika-parsers/src/test/resources/org/apache/tika/parser/html/tika-config.xml

sajju
  • 11
  • 1