Solr: indexing fb2 files

Question

I want to use Solr for indexing some library, that represent books in fb2 format. In fact fb2 is just xml with similar xsd format. But, post.jar ignores *.fb2 files, and I dont understand how to map values in fb2 file to index fields, like:

<book-title>some book</book-title>

...to "book-title" field in index. Should I create a plug-in, or something else?

David George · Accepted Answer · 2016-09-16T07:20:12.243

You should look at the Solr Data Import Handler (DIH).

https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler

In the Solr examples folder you have an RSS import example. If you look in the rss-data-config.xml file you will see how they use the XPathEntityProcessor to map from XML to the Solr fields, e.g.:

Here is some more information: http://www.andornot.com/blog/post/Sample-Solr-DataImportHandler-for-XML-Files.aspx

I have also written Tika parsers in the past to work with specific file formats.

https://lucidworks.com/blog/2010/06/18/extending-apache-tika-capabilities/

For more flexibility you can just read your files using your favorite programming language and send the data to Solr using an API. We had to do this for a recent application as the DIH wasn't flexible enough for what we wanted to achieve.

Solr: indexing fb2 files

1 Answers1