0

I'm working with Solr 4.3 I have a set up with two Solr Cores: userCore and mainCore

userCore has its own schema.xml and solrconfig.xml and is hosted on localhost:8983.

mainCore has its own different schema and solrconfig, and has aSolrCloud set up with one shard running at localhost:8080, the other at localhost:7574

I post documents to a userToMain update chain defined in userCore, which indexes the document and then forwards it on to another update chain in mainCore. Documents are processed here and indexed into mainCore, and then we're done.

All this worked well until distributed search got involved: Documents got indexed successfully as I could tell by querying the indeces of the different cores and shards via Luke. However, distributed Solr query wasn't working for this set up because, as it turns otu, my mainCore (i.e. the one with SolrCloud set up) did not have a uniqueKey defined.

So I tried to remedy this. I'd already had the following field in the mainCore schema:

<field name="doc-id" type="string" indexed="true" stored="true" multiValued="false" required="true"/>

I wanted this to be used as uniqueKey by specifying in the schema:

<uniqueKey>doc-id</uniqueKey>

Now when I post a document to userCore via

`java -Durl=http://localhost:8983/solr/userCore/update?update.chain=userToCoref -jar \
$(SOLR_HOME)/example/exampledocs/post.jar example/examplesdocs/test_doc0.xml`

I receive the error

Document is missing mandatory uniqueKey field: doc-id

not only in mainCore, in whose schema uniqueKey is actually defined, but also in userCore in whose schema there is not even a mention of a uniqueKey!

Specifically, here's part of the error for mainCore:

127578 [qtp1733460569-16] INFO  org.apache.solr.update.processor.LogUpdateProcessor  – [corefCore] webapp=/solr path=/update params={wt=javabin&version=2} {} 0 619
127579 [qtp1733460569-16] ERROR org.apache.solr.core.SolrCore  – org.apache.solr.common.SolrException: Document is missing mandatory uniqueKey field: doc-id
at org.apache.solr.update.AddUpdateCommand.getHashableId(AddUpdateCommand.java:132)
at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:389)
at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)

Part of Error for userCore:

135506 [qtp1733460569-19] INFO  org.apache.solr.update.processor.LogUpdateProcessor  – [userCore] webapp=/solr path=/update params={update.chain=userToCoref} {} 0 628
135507 [qtp1733460569-19] ERROR org.apache.solr.core.SolrCore  – org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Document is missing mandatory uniqueKey field: doc-id
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:402)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102)

In summary, what baffles me is two-fold:

1) Why, when I actually have doc-id field defined in my mainCore schema, when I point uniqueKey to it, and when it does get indexed under all other circumstances, does Solr complain that Document is missing mandatory uniqueKey field:doc-id ???

2) Even if something is indeed wrong with mainCore in terms of this field, why oh why is userCore also seem to be complaining about this? They are on totally different servers, with totally different configs. All userCore does is post documents it receives to mainCore, as specified by the URL of this mainCore.

Any help would be much appreciated!

EDIT: I wanted to provide some answers to the comments. The original document posted to userCore, test_doc0.xml, looks like this:

<add><doc>

<field name="docid">docid0</field>

<field name="coref_input">Bill Clinton was the 42nd president. Clinton's wife Hillary is currently Secretary of State. Hillary Clinton ran for president unsuccessfully.</field>

</doc></add>

After it gets indexed to userCore, it gets sent to mainCore for processing via this particular logic in the relevant updateRequestProcessor, userToMainUpdateRequestProcessor.java:

public void processAdd(AddUpdateCommand cmd) throws IOException {
    SolrInputDocument userDoc = cmd.getSolrInputDocument();

    SolrInputField userInputField = userDoc.getField(inputField);
    if (userInputField != null) {
        SolrInputField userDocIdField = userDoc.getField(docIdField);
        if (userDocIdField == null || userDocIdField.getValueCount() > 1) {
            throw new RuntimeException(docIdField + " must be present and single-valued");
        }
    }

    SolrResponse response;
    try {
        mainServer.add(userDoc);
        mainServer.commit();
    } catch (SolrServerException e) {
        throw new RuntimeException(e);
    }

    super.processAdd(cmd);
}

where mainServer is defined in UserToMainUpdateRequestProcessorFactory.java as:

mainServer = new HttpServer("http://localhost:8080/solr/mainCore");

Thus userCore posts a doc to mainCore, and mainCore does a bunch of processing to produce some more fields like this (I can't include full document):

Name_Data: hillary clinton
Name_FullnameOverrides: enghillary clinton
Name_CompletedData: hillary clinton
name-token-count: 2
doc-id: docid0
doc-language: eng
indoc-chain-id: 5
longest-mention: Hillary Clinton
confidence: 0.9443013649773926

ess
  • 313
  • 5
  • 12
  • 1
    Can you add how your document (test_doc0.xml) look like, because it might be due to that – Fuxi Aug 02 '13 at 22:18
  • 1
    userCore logs the REMOTE exception from the mainCore, no surprises here. You mentioned that you process on the userCore and than forward processing on the main core. How do you accomplish this? Are you sure that the very same document is passed to the mainCore? I think it's worth debugging the update processor in mainCore to see what enters. – lexk Aug 04 '13 at 16:01

1 Answers1

2

Your schema defines doc-id and your document contains a field named docid (no dash).

These fields need to match exactly.

Mason G. Zhwiti
  • 6,444
  • 11
  • 61
  • 97