0

Is there a way, via configuration, to use spring-data-solr with Tika? Otherwise, is there some alternative to solrj’s ContentStreamUpdateRequest+addfile for spring-data-solr?

Currently I am using Solrj + Tika in this manner:

SolrServer server = new HttpSolrServer(URL);
...
Tika tika = new Tika();
...
String fileType = tika.detect(path.toFile());
up = new ContentStreamUpdateRequest("/update/extract"); 
up.addFile(path.toFile(), fileType);
up.setParam("literal.id", idField);
...
up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
NamedList<Object> request = server.request(up);

I arrived at this method by successfully following this ExtractingRequestHandler guide.

Using solr 4.3.0, is is possible get the same result via spring-data-solr, instead of having to invoke Solrj directly?

fish2000
  • 4,289
  • 2
  • 37
  • 76
Osy
  • 1,613
  • 5
  • 21
  • 35

1 Answers1

1

There is no direct support for ContentStreamUpdateRequest. The fallback would be to do it within a SolrCallback executed by SolrTemplate.

NamedList<Object> result = solrTemplate.execute(new SolrCallback<NamedList<Object>>() {

  @Override
  public NamedList<Object> doInSolr(SolrServer solrServer) throws SolrServerException, IOException {
    Tika tika = new Tika();
    // ...
    String fileType = tika.detect(path.toFile());
    ContentStreamUpdateRequest up = new ContentStreamUpdateRequest("/update/extract");
    up.addFile(path.toFile(), fileType);
    up.setParam("literal.id", idField);
    // ...
    up.setAction(org.apache.solr.client.solrj.request.AbstractUpdateRequest.ACTION.COMMIT, true, true);
    NamedList<Object> request = solrServer.request(up);
  }

});

In case you need this kind of behavior in more repositories then probably this post about adding custom methods to all repositories might help.

Christoph Strobl
  • 6,491
  • 25
  • 33