I have a 40 millions of files which is stored in a file systems. I want to take Some suggestion, there are so many method to do indexing such as DIH, solr, Solrj. How many core should I use for indexing 50 millions of documents. I have 40 millions of documents.
I have decided to do using SolJ. Is this a good way to that thing, if yes then I don't know how many core should I use?
I have a 40 millions of files which is stored in a file systems, the filename saved as ARIA_SSN10_0007_LOCATION_0000129.pdf
- I have to split all underscore value from a filename and these value have to be index to the solr.
The above operation I have to do. Is it opssible using DIH if yes then How I will split these operation using DIH. Please share some link for it.
please suggest.
Thanks