0

My understanding is that indexing a PDF, Word, Excel, etc. document through Solr will allow searching but not highlighting. I have this code to perform the indexing:

        String urlString = "http://localhost:8983/solr"; 
        SolrServer solr = new HttpSolrServer(urlString);
        ContentStreamUpdateRequest up = new ContentStreamUpdateRequest("/update/extract");

        for (MultipartFile file : files) {
            if (file.getOriginalFilename().equals("")) {
                continue;
            }
            File destFile = new File(destPath, file.getOriginalFilename());
            file.transferTo(destFile);
            up.addFile(destFile);

            up.setParam("literal.id", destFile.getAbsolutePath());
            up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);

            try {
                solr.request(up);

            } catch (SolrServerException sse) {
                sse.printStackTrace();
            }

        }

    }
    } catch (IOException ioe) {
      ioe.printStackTrace();   
    }

I have read that in order to enable highlighting I will need to "store/parse the content?" How can this be done? Thanks for your help.

James
  • 2,876
  • 18
  • 72
  • 116

1 Answers1

2

You will need to modify the Schema file for your Solr instance and set stored="true"for the content field. I am assuming that you are using the default field settings for the ExtractingRequestHandler want to return highlight results against that field.

Please reference the Field Options By Use Case for a matrix and notes on what field options must be enabled for Highlighting and other features to work correctly.

Paige Cook
  • 22,415
  • 3
  • 57
  • 68
  • Thanks Paige. I did not find a content field in the schema. I could create one, but it seems that the text field is already indexing the data. Is there any issue with changing its stored from false to true? Also, I have not changed any settings on the ExtractingRequestHandler. – James Oct 09 '12 at 21:34
  • No, there is not any issue with changing the stored value on the text field, since I am assuming this on the one that you want to highlight against. – Paige Cook Oct 10 '12 at 00:36
  • That's correct. It appears that the default setting for ExtractingRequestHandler is to store its content into the text field (see below). So, I'll just keep that default and change the stored value on the text field. ( text) – James Oct 10 '12 at 16:02