0

I running into an issue where a query to our Solr search will return different values. However I am querying on the id, which is set to be the Unique Key Field.

enter image description here

So in the Solr Admin UI I will run a query like.

enter image description here

The relevant response info is below.

 "response": {
    "numFound": 1,
    "start": 0,
    "maxScore": 7.4537606,
    "docs": [
      {
        "title": [
          "ICARDA forced to move"
        ],
        "moduleid_s": "58",
        "id": "client1.com.58.1673",
        "enddate_dt": "2015-09-25T23:59:00Z",
        "url": "mysite.com/item.aspx?id=1673",
        "startdate_dt": "2015-09-25T00:00:00Z",

Now running that query a few times will eventually lead to a different response.

 "response": {
    "numFound": 1,
    "start": 0,
    "maxScore": 7.453251,
    "docs": [
      {
        "title": [
          "ICARDA forced to move"
        ],
        "moduleid_s": "58",
        "id": "client1.com.58.1673",
        "enddate_dt": "2015-09-25T23:59:00Z",
        "url": "mysiteNewUrl.com/item.aspx?id=1673",
        "startdate_dt": "2015-09-25T00:00:00Z",

Notice that the url is different.

With Debug Query checked. You can see that the different urls are in the GET_FIELDS section.

Why/how can I get different information? I'm querying off the id which is marked an the unique field. From my understanding there should never be more than of those. Could this be a synchronization issue? I'm using the Solr admin UI query with a single core selected.

Is there was way to check if only one document with that id is in the Index?

UPDATE:

I ran a facet query and that unique returns 2

<lst name="facet_fields">
 <lst name="id">
<int name="client1.com.58.1673">2</int>

vs one that isn't having the issue.

<lst name="facet_fields">
 <lst name="id">
<int name="client1.com.58.163">1</int>

Is this right? Does this explain my issue in that there are duplicate documents, but if that's the case why aren't two documents getting returned instead of just different data?

Adam
  • 388
  • 1
  • 10
  • This can happen only if you have re-indexed or index other document having same id. – YoungHobbit Sep 23 '15 at 17:39
  • @YoungHobbit: So you can in fact have two documents with the same Unique Key Field. I was under the impression that Solr would just overwrite if the Unique Key Field was the same. – Adam Sep 23 '15 at 18:53
  • It should - unless you explicitly tells it to ignore the uniqueKey constraint. I'm guessing there might have been a change in the cluster layout (or changed document router) between the two indexing jobs, resulting in documents having been routed differently. – MatsLindh Sep 23 '15 at 20:56
  • @MatsLindh, that's what I was thinking. We did upgrade to Solr 5.3 recently from 4.6 I believe. Could that have caused the issue. I ran some facet queries, see my update in the question. – Adam Sep 23 '15 at 21:13

1 Answers1

1

Is this a SolrCloud setup or a single-collection one? If it is cloud, you most likely ended up with one record in two different cores. Possibly due to a router or an upgrade bug.

The good news, you should be able to find all the records that have this problem by doing facet.field=id, facet.mincount=2. Then you could delete/reinsert them for consistency.

And no, you should not be able to end up in this state, so there is either mis-configuration, upgrade failure or some forced commands to ignore the unique requirement.

Alexandre Rafalovitch
  • 9,709
  • 1
  • 24
  • 27