3

I have all my XML files stored on to the other server and I have installed and configure the SOLR on different server. How can I index those XML files into the SOLR. I have checked nutch but it's main purpose is to crawl the html pages and index them. I don't need to crawl. I have All those files on specific path on other server. I just need to do indexing those XML files in SOLR. I have installed and configure SOLR4.

If anyone have did some thing like this please let me know how to do that. Thank you

Anand Khatri
  • 366
  • 1
  • 6
  • 16

3 Answers3

2

Why not mount the drive from your Solr server, and do something like:

java -jar post.jar "Z:\home\data\delivery\textarticles.xml"

post.jar is in the exampledocs folder. You might also use it as an example application and build your own application to post those xml files from the other server

Chris Warner
  • 436
  • 5
  • 7
1

Take a look at the DataImportHandler. I think you should be able to access a network file if it has the proper permissions set up.

Alexandre Rafalovitch
  • 9,709
  • 1
  • 24
  • 27
Shane Andrade
  • 2,655
  • 17
  • 20
  • Shane: I have the proper permission ans also username and password for that server but the way you suggest won't work. because files are on whole different servers. I have question for you. In data-config file in entity tab there is a field called URL can I mention the url="10.20.30.40 /home/data/delivery/textarticles.xml" like this? does it works? where I have to mention the username and password for that server. – Anand Khatri Jan 23 '13 at 21:49
0

Based on your comment to Shane Alexander's answer, you will need to use the URLDataSource option of the DataImportHandler to retrive the file via a Url. Additionally, you will need to incorporate the patch from SOLR-1490 to allow for authentication support.

Paige Cook
  • 22,415
  • 3
  • 57
  • 68
  • thank you for your reply Paige Cook: How can I apply the patch SOLR-1490 to my existing SOLR4 installation? Do you know what exactly I need to do to apply that patch? URLDataSource you have mentioned is for solr1.4 I just make sure does it work for solr4? – Anand Khatri Jan 24 '13 at 14:37
  • Yes, URLDataSource works with Solr 4 (the Solr 1.4 label in the wiki indicates that has been around since Solr 1.4). The patch is the source code changes to the URLDataSource.java file. You will need to get the Solr source from http://lucene.apache.org/solr/versioncontrol.html make the required changes and recompile Solr. – Paige Cook Jan 24 '13 at 15:34
  • Thanks @PaigeCook : I have downloaded the source code from [lucene.apache.org/solr/versioncontrol.html](http://lucene.apache.org/solr/versioncontrol.html). So I guess it's already been there, I don't need to do anything. now let me try the URLDataSource in my SOLR. I want to give thumbs up but I don't have enough reputation to do that but really appreciate your Help. – Anand Khatri Jan 24 '13 at 15:53