0

I am trying to index ~1 million of xml files to Solr 5. There are a few ways I can think of:

  1. dump all the xml files into a directory and then use post.jar
  2. It seems to me that data import handler can also be used to recursively import xml files

Are there any other ways?

cheffe
  • 9,345
  • 2
  • 46
  • 57
user2073131
  • 45
  • 1
  • 5
  • You could make your own indexer in your favourite language which would parse the xml files, make any modifications you might want and send them to the solr server using a solr client library. – James Doepp - pihentagyu Jan 26 '16 at 18:17
  • Check this blog post http://www.andornot.com/blog/post/Sample-Solr-DataImportHandler-for-XML-Files.aspx – cheffe Jan 27 '16 at 07:19

1 Answers1

0

Your question is how to index one million xml-files with solr.

You can use the bin/post-tool even with recursiv folder structure.

If this has enough functionality: fine. If you need more special features build your own indexer, in particular with solrj this is quite easy.

If you have enough main memory you can use DataImportHandler with FileListEntityProcessor. ´FileListEntityProcessor´ first collects all files and than run through the real indexing. So in your case the first step will put one million instances of "File" in your main memory.

Karsten R.
  • 1,628
  • 12
  • 14