9

I'm doing a search of documents in my index, then subsequently trying to get some of them by _id. Despite receiving a set of results, some of the documents can not be retrieved with a simple get. Worse still, I CAN get the same document with a URI search where ?_id:<the id>

Just for example, running a simple GET

curl -XGET 'http://localhost:9200/keepbusy_process__issuer_application/KeepBusy__Activities__Activity/neHSKSBCSv-OyAYn3IFcew'

Gives me the result:

{
  "_index" : "keepbusy_process__issuer_application",
  "_type" : "KeepBusy__Activities__Activity",
  "_id" : "neHSKSBCSv-OyAYn3IFcew",
  "exists" : false
}

But if I do a search with same _id:

curl -XGET 'http://localhost:9200/keepbusy_process__issuer_application/KeepBusy__Activities__Activity/_search?q=_id:neHSKSBCSv-OyAYn3IFcew'

I get the expected result:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1.0,
    "hits": [
      {
        "_index": "keepbusy_process__issuer_application",
        "_type": "KeepBusy__Activities__Activity",
        "_id": "neHSKSBCSv-OyAYn3IFcew",
        "_score": 1.0,
        "_source": {
          "template_uid": "KeepBusy__Activities__Activity.create application",
          "name": "create application",
          "updated_at": "2014-01-08T10:02:33-05:00",
          "updated_at_ms": 1389193353975
        }
      }
    ]
  }
}

I'm indexing documents through the stretcher ruby API, and immediately after indexing I'm doing a refresh. My local setup is 2 nodes. I'm running v0.90.9

There is nothing obvious in the logs why this should fail. I've restarted the cluster and everything appears to start correctly, but the result is the same.

Is there something I'm missing or some way I can further diagnose this issue?

Mark van Straten
  • 9,287
  • 3
  • 38
  • 57
Phil
  • 2,797
  • 1
  • 24
  • 30

1 Answers1

10

This issue typically occurs when documents are indexed with non-default routing (either explicitly set or deducted from parent's id in case of parent/child documents). If this is the case, try specifying correct routing in you get request.

Artem Bernatskyi
  • 4,185
  • 2
  • 26
  • 35
imotov
  • 28,277
  • 3
  • 90
  • 82
  • Unfortunately this is not the case. I haven't specified the routing at any time. Equally, I've avoid parent / child relationships as this has bitten me in the past. Thinking along these lines though, could there be any issue with having two nodes running on the same machine? – Phil Jan 08 '14 at 19:07
  • 1
    Running two nodes on the same machine shouldn't be a problem. I missed the fact that after restart the problem disappeared, which means it wasn't routing, but most likely transaction log-related issue. If you will be able to reproduce it again, please try executing get request with `realtime` flag set to `false` to see if the record will show up in get. – imotov Jan 08 '14 at 21:31
  • I'll try the realtime flag. Actually, I just edited my question to confirm that restarting the services does not make any difference to the result. – Phil Jan 08 '14 at 22:09
  • If restarting doesn't make a difference, please try running GET with different values in `routing` parameter (0,1,2,...10) to see if the record will show up. – imotov Jan 08 '14 at 23:51
  • Having finally rebuilt the test environment, it does seem that there was a routing issue. I'm not sure what the cause was, so I've accepted your answer for pointing me in the right direction. – Phil Jan 21 '14 at 17:28
  • I'm seeing similar issues. Here is my question: http://stackoverflow.com/questions/24060094/elasticsearch-get-by-id-doesnt-work-but-document-exists – lukewm Jun 05 '14 at 12:22