1

I'm using Solr + Cell (Tika) + schemaless mode and when I send files to be indexed I'm not seeing any of the literals I've provided stored in the result documents. Here is what I see returned when I execute a *:* query in solr admin UI. It only contains the ID field and version, but none of the other literals or even content within the file.

{
"responseHeader": {
"status": 0,
"QTime": 2,
"params": {
  "indent": "true",
  "q": "*:*",
  "_": "1432606194712",
  "wt": "json"
  }
},
"response": {
"numFound": 3,
"start": 0,
"docs": [
  {
    "id": "fa8ab118-4fd2-45db-81ea-d38d533a85bd",
    "_version_": 1502169638339870700
  },
  {
    "id": "550b56ad-fd1f-4340-9a94-4c3cd7491e8d",
    "_version_": 1502191400586838000
  },
  {
    "id": "587b4c68-7a9f-4844-9829-a7d92b6bc98d",
    "_version_": 1502196460453625900
  }
]
}
}

Here is the POST I'm sending:

POST /solr/archive/update/extract?literal.id=587b4c68-7a9f-4844-9829-a7d92b6bc98d&literal.employeeNumber=3855&literal.name=Monthly+Workforce+Report.pdf&literal.url=http%3A%2F%2Flocalhost%3A8060%2Fapp%2Fhistory%2Fdocument%2F587b4c68-7a9f-4844-9829-a7d92b6bc98d&literal.archivedDate=2015-05-25T22%3A09%3A39.000-0400&wt=javabin&version=2

I'm using SolrJ to send this request with the following code:

    SimpleDateFormat format = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSZ");

    ContentStreamUpdateRequest req = new ContentStreamUpdateRequest("/update/extract");
    req.addFile(getLocation(), null);
    ModifiableSolrParams params = new ModifiableSolrParams();
    if( id != null ) params.add(ExtractingParams.LITERALS_PREFIX + "id", id.toString() );
    params.add(ExtractingParams.LITERALS_PREFIX + "employeeNumber", employeeNumber);
    params.add(ExtractingParams.LITERALS_PREFIX + "name", name);
    params.add(ExtractingParams.LITERALS_PREFIX + "url", url.toString());
    params.add(ExtractingParams.LITERALS_PREFIX + "archivedDate", format.format(archiveDate));
    if( imageUrl != null ) params.add(ExtractingParams.LITERALS_PREFIX + "imageUrl", imageUrl.toString());
    if( categories != null ) {
        for( String cat : categories ) {
            params.add(ExtractingParams.LITERALS_PREFIX + "category", cat);
        }
    }
    req.setParams( params );
    NamedList<Object> result = server.request( req );

It creates the document, but fails to store the literal values I supply. And I'm pretty sure none of the metadata and content it reads from the content of the file isn't stored either.

I can confirm that if I use this code it stores everything fine:

    SimpleDateFormat format = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSZ");

    SolrInputDocument doc = new SolrInputDocument();
    doc.addField("id", id.toString());
    doc.addField("employeeNumber", employeeNumber);
    doc.addField("name", name);
    doc.addField("url", url.toString());
    if( imageUrl != null ) doc.addField("imageUrl", imageUrl.toString());
    doc.addField("location", location.getAbsolutePath());
    doc.addField("archivedDate", format.format(archiveDate) );
    if( categories != null ) {
        for( String cat : categories ) {
            doc.addField("category", cat);
        }
    }
    server.add(doc);

So why aren't these being stored?

chubbsondubs
  • 37,646
  • 24
  • 106
  • 138

0 Answers0