2

I'm using ElasticSearch 6.2.3 and I'm indexing documents with IDs being URLs. When I query the index:

GET /ecm_sync/_search
{
  "query": {"match_all": {}}
}

I get:

...
"hits": [
      {
        "_index": "ecm_sync",
        "_type": "doc",
        "_id": "workspace://SpacesStore/07dfa82d-c6ce-469d-b881-4fab6cd9a277",
        "_score": 1,
...

Now, if I take this URL and try to GET it directly:

GET /ecm_sync/_doc/workspace%3A%2F%2FSpacesStore%2F07dfa82d-c6ce-469d-b881-4fab6cd9a277

I get:

{
  "_index": "ecm_sync",
  "_type": "_doc",
  "_id": "workspace://SpacesStore/07dfa82d-c6ce-469d-b881-4fab6cd9a277",
  "found": false
}

The same thing happens both with Kibana and curl. I've seen there was an opened issue long time ago but it was closed so I don't know if I'm doing something wrong.

Bade
  • 699
  • 3
  • 12
  • 25
  • have you tried an ID that doesn't include special characters? Just the UUID part? Does that happen to work? – Phil May 09 '18 at 07:45
  • Yes and it doesn't work. I'm aware of ES tokenizing fields by special characters so I wasn't sure if it does the same for IDs. – Bade May 09 '18 at 07:47
  • Take a look at this question / answer, related to routing with non-default IDs: https://stackoverflow.com/questions/21003370/elasticsearch-returns-document-in-search-but-not-in-get ... I'm not offering it as an answer as it is old, but maybe there is something in there you can use. In the past I had to specify a mapping to prevent tokenization. That may not be the case any longer. – Phil May 09 '18 at 09:01
  • Sorry, but I don't understand the routing. I've never set it directly. Is there any way to list all routings? – Bade May 09 '18 at 12:56
  • Possibly, but I haven't used elasticsearch in a while, so I'd need to search for it. – Phil May 09 '18 at 13:57

1 Answers1

3

Such _id format (with special/reserved characters) will likely come to warning/error when used in "query-string search". Use request body search:

GET ecm_sync/doc/_search
{
  "query": {
    "term": {
      "_id": {
        "value": "workspace://SpacesStore/07dfa82d-c6ce-469d-b881-4fab6cd9a277"
      }
    }
  }
}

If you still need to search for some tricky string via query string search (though unwanted) you have to escape all reserved characters manually:

GET ecm_sync/doc/_search?q=_id:workspace\:\/\/SpacesStore\/07dfa82d\-c6ce\-469d\-b881\-4fab6cd9a277

The reserved characters are: + - = && || > < ! ( ) { } [ ] ^ " ~ * ? : \ /

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html

RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
  • This works, thanks, but I guess there has to be a correct way to refer to document by ID (so I'll wait a bit before accepting your answer). – Bade May 09 '18 at 08:41
  • @Bade, see my update. But you should prefer "request body search" in such cases – RomanPerekhrest May 09 '18 at 09:34
  • Yes, this works too. Too bad I can't escape it the same way in ID part of a _doc GET – Bade May 09 '18 at 12:57