I'm working with Solr 4.3
I have a set up with two Solr Cores: userCore
and mainCore
userCore
has its own schema.xml
and solrconfig.xml
and is hosted on localhost:8983
.
mainCore
has its own different schema
and solrconfig
, and has aSolrCloud
set up with one shard running at localhost:8080
, the other at localhost:7574
I post documents to a userToMain
update chain defined in userCore
, which indexes the document and then forwards it on to another update chain in mainCore
. Documents are processed here and indexed into mainCore
, and then we're done.
All this worked well until distributed search got involved:
Documents got indexed successfully as I could tell by querying the indeces of the different cores and shards via Luke. However, distributed Solr query wasn't working for this set up because, as it turns otu, my mainCore
(i.e. the one with SolrCloud set up) did not have a uniqueKey
defined.
So I tried to remedy this. I'd already had the following field in the mainCore
schema:
<field name="doc-id" type="string" indexed="true" stored="true" multiValued="false" required="true"/>
I wanted this to be used as uniqueKey by specifying in the schema:
<uniqueKey>doc-id</uniqueKey>
Now when I post a document to userCore
via
`java -Durl=http://localhost:8983/solr/userCore/update?update.chain=userToCoref -jar \
$(SOLR_HOME)/example/exampledocs/post.jar example/examplesdocs/test_doc0.xml`
I receive the error
Document is missing mandatory uniqueKey field: doc-id
not only in mainCore
, in whose schema uniqueKey
is actually defined, but also in userCore
in whose schema there is not even a mention of a uniqueKey
!
Specifically, here's part of the error for mainCore
:
127578 [qtp1733460569-16] INFO org.apache.solr.update.processor.LogUpdateProcessor – [corefCore] webapp=/solr path=/update params={wt=javabin&version=2} {} 0 619
127579 [qtp1733460569-16] ERROR org.apache.solr.core.SolrCore – org.apache.solr.common.SolrException: Document is missing mandatory uniqueKey field: doc-id
at org.apache.solr.update.AddUpdateCommand.getHashableId(AddUpdateCommand.java:132)
at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:389)
at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
Part of Error for userCore:
135506 [qtp1733460569-19] INFO org.apache.solr.update.processor.LogUpdateProcessor – [userCore] webapp=/solr path=/update params={update.chain=userToCoref} {} 0 628
135507 [qtp1733460569-19] ERROR org.apache.solr.core.SolrCore – org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Document is missing mandatory uniqueKey field: doc-id
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:402)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102)
In summary, what baffles me is two-fold:
1) Why, when I actually have doc-id
field defined in my mainCore schema, when I point uniqueKey
to it, and when it does get indexed under all other circumstances, does Solr complain that Document is missing mandatory uniqueKey field:doc-id
???
2) Even if something is indeed wrong with mainCore in terms of this field, why oh why is userCore also seem to be complaining about this? They are on totally different servers, with totally different configs. All userCore does is post documents it receives to mainCore, as specified by the URL of this mainCore.
Any help would be much appreciated!
EDIT: I wanted to provide some answers to the comments.
The original document posted to userCore
, test_doc0.xml, looks like this:
<add><doc>
<field name="docid">docid0</field>
<field name="coref_input">Bill Clinton was the 42nd president. Clinton's wife Hillary is
currently Secretary of State. Hillary Clinton ran for president
unsuccessfully.</field>
</doc></add>
After it gets indexed to userCore
, it gets sent to mainCore
for processing via this particular logic in the relevant updateRequestProcessor, userToMainUpdateRequestProcessor.java
:
public void processAdd(AddUpdateCommand cmd) throws IOException {
SolrInputDocument userDoc = cmd.getSolrInputDocument();
SolrInputField userInputField = userDoc.getField(inputField);
if (userInputField != null) {
SolrInputField userDocIdField = userDoc.getField(docIdField);
if (userDocIdField == null || userDocIdField.getValueCount() > 1) {
throw new RuntimeException(docIdField + " must be present and single-valued");
}
}
SolrResponse response;
try {
mainServer.add(userDoc);
mainServer.commit();
} catch (SolrServerException e) {
throw new RuntimeException(e);
}
super.processAdd(cmd);
}
where mainServer
is defined in UserToMainUpdateRequestProcessorFactory.java
as:
mainServer = new HttpServer("http://localhost:8080/solr/mainCore");
Thus userCore
posts a doc to mainCore
, and mainCore
does a bunch of processing to produce some more fields like this (I can't include full document):
Name_Data: hillary clinton
Name_FullnameOverrides: enghillary clinton
Name_CompletedData: hillary clinton
name-token-count: 2
doc-id: docid0
doc-language: eng
indoc-chain-id: 5
longest-mention: Hillary Clinton
confidence: 0.9443013649773926