1

Hi I am a naive user when it come to Solr. Please guide me on the following hurdles.

1) Solr Index PDF documents

Solution tried

I used tika-app 0.9.jar to extract the content from the Input PDF files to text file. Now I am trying to write a java code to index the documents to Solr.

2) Post them to a remote server

I need to post either the documents or the index to a central remote server. Can curl command be used for this.

Regards Balaji.

Balaji.N.S
  • 745
  • 3
  • 13
  • 28

2 Answers2

2

1) Solr Index PDF documents - I believe Solr does this for you. You can use Solr's http interface or SolrJ. 2) Post the index to a remote server - Solr replication may fit the bill.

Yuval F
  • 20,565
  • 5
  • 44
  • 69
0

Assuming the PDFs are on a web server, you can use Nutch to fetch and parse them, and then push the index to Solr via its HTTP interface.

Butifarra
  • 1,094
  • 8
  • 12