0

I have a 40 millions of files which is stored in a file systems. I want to take Some suggestion, there are so many method to do indexing such as DIH, solr, Solrj. How many core should I use for indexing 50 millions of documents. I have 40 millions of documents.

I have decided to do using SolJ. Is this a good way to that thing, if yes then I don't know how many core should I use?

I have a 40 millions of files which is stored in a file systems, the filename saved as ARIA_SSN10_0007_LOCATION_0000129.pdf

  1. I have to split all underscore value from a filename and these value have to be index to the solr.

The above operation I have to do. Is it opssible using DIH if yes then How I will split these operation using DIH. Please share some link for it.

please suggest.
Thanks

YoungHobbit
  • 13,254
  • 9
  • 50
  • 73
Mugeesh Husain
  • 394
  • 4
  • 13
  • check if this is of any help to you http://stackoverflow.com/questions/31691606/solr-multicore-vs-sharding-vs-1-big-collection/31691754#31691754 – Abhijit Bashetti Aug 04 '15 at 12:28
  • @Abhijit I saw your post have you indexing these core using SolrJ ? – Mugeesh Husain Aug 04 '15 at 13:12
  • no. I am not using SolrJ...Using the http/Rest API's of solr...indexing by DIH.. – Abhijit Bashetti Aug 04 '15 at 13:18
  • I have a file systems, the filename saved as ARIA_SSN10_0007_LOCATION_0000129.pdf and I have to split all underscore value from a filename and these value have to be index to the solr. how would do these thing like split etc using DIH Is This possible using DIH if Yes then how – Mugeesh Husain Aug 04 '15 at 16:48
  • splitting the text is whats is called as tokenizing the text... and that is done with the help of analyser/tokenizers and filters etc.... so build s fieldType or use existing one for the your field... – Abhijit Bashetti Aug 05 '15 at 06:11
  • splitting the text not like that way...i think you didn't read my requirement. i have to split filename get lots of value these values i have to index corresponding to their field...Please read my requirement.. before indexing i want to fetch value then index to solr.. How to do it with use of DIH ? – Mugeesh Husain Aug 05 '15 at 17:07

0 Answers0