0

I'm facing some odd behavior of elastic-search while searching grand child. My grand child doesn't recognizes each n every parent document. When I ask elastic-search to return me children of a parent, it returns all the possible hits. Then when i ask to return me those children which have grand child, then I get incorrect results. Some time i get no hits or lesser. But when i check the routing and parent id of my grand child then I found that they do exists in their parent. But I can't understand why I'm getting incorrect results. Do anybody of you has encountered such types of issues??? I checked my code thrice and didn't found any type error :-( Let me show you the steps to reproduce this error.

Here is my mapping:

PUT /test_index

{
   "mappings":{
        "parentDoc":{
            "properties":{
                 "id":{
                    "type":"integer"
                 },
                 "name":{
                    "type":"text"
                    }
                 }
        },
        "childDoc": {
            "_parent": {
                "type": "parentDoc"
            },
            "properties":{
                "id":{
                    "type":"integer"
                },
                "name":{
                   "type":"text"
                },
                "contact": {
                    "type":"text"
                }
            }
        },
        "grandChildDoc": {
            "_parent": {
                "type": "childDoc"
            },
            "properties":{
                "id":{
                    "type":"integer"
                },
                "description":{
                   "type":"text"
                }
            }
        }
    }
}

Indexing parentDoc:

PUT /test_index/parentDoc/1

{
    "pdId":1,
    "name": "First parentDoc"
}

PUT /test_index/parentDoc/2

{
    "pdId":2,
    "name": "Second parentDoc"
}

Indexing childDoc:

PUT /test_index/childDoc/10?parent=1

{
    "cdId":10,
    "name": "First childDoc",
    "contact" : "+XX0000000000"
}

PUT /test_index/childDoc/101?parent=1

{
    "cdId":101,
    "name": "Second childDoc",
    "contact" : "+XX0000000111"
}

PUT /test_index/childDoc/20?parent=2

{
    "cdId":20,
    "name": "Third childDoc",
    "contact" : "+XX0011100000"
}

Indexing grandChildDoc:

PUT /test_index/grandChildDoc/100?parent=10

{
    "gcdId":100,
    "name": "First grandChildDoc"
}

PUT /test_index/grandChildDoc/200?parent=10

{
    "gcdId":200,
    "name": "Second grandChildDoc"
}

PUT /test_index/grandChildDoc/300?parent=20

{
    "gcdId":300,
    "name": "Third grandChildDoc"
}

Now when I ask elastic-search to show me those parentDoc which have childDoc, then it returns: POST /test_index/parentDoc/_search

{
    "query": {
        "has_child": {
            "type": "childDoc",
            "query": {
                "match_all": {}
            }
        }
    }
}

Result: (This seems fine.!)

{
    "took": 3,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 2,
        "max_score": 1,
        "hits": [
            {
                "_index": "test_index",
                "_type": "parentDoc",
                "_id": "2",
                "_score": 1,
                "_source": {
                    "pdId": 2,
                    "name": "Second parentDoc"
                }
            },
            {
                "_index": "test_index",
                "_type": "parentDoc",
                "_id": "1",
                "_score": 1,
                "_source": {
                    "pdId": 1,
                    "name": "First parentDoc"
                }
            }
        ]
    }
}

Now when I ask elasticsearch to show me those childDoc which have grandChildDoc, then it returns: POST /test_index/childDoc/_search

{
    "query": {
        "has_child": {
            "type": "grandChildDoc",
            "query": {
                "match_all": {}
            }
        }
    }
}

Result: (Here, you will notice that some of the hits are missing. For example childDoc with id 10 and 101 are missing).

{
    "took": 7,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 1,
        "max_score": 1,
        "hits": [
            {
                "_index": "test_index",
                "_type": "childDoc",
                "_id": "20",
                "_score": 1,
                "_routing": "2",
                "_parent": "2",
                "_source": {
                    "cdId": 20,
                    "name": "Third childDoc",
                    "contact": "+XX0011100000"
                }
            }
        ]
    }
}

Any idea what mistake I'm doing??? Or it is a bug ??? Any workaround or solution???

[Note: I'm using elasticsearch v5.4]

1 Answers1

0

I have got the same working. I am using logstash to index the documents in elastic.

Root Cause:

I have explored the root cause. By default elastic assigns 5 shards and documents for one set of parent-child-grandchild must be located in the same shard. Unfortunately the data is spread across the shards. Elastic will return only those records which are there in the same shard.

Solution:

For parent-child-grandchild to work, you need to have the grand parent document id as routing value in grand child document.

For single level(Parent-child), parent value is deafult routing value which works fine. But for three level, you need to configure routing for each document in grand child.

As I have mentioned, routing value should be grand parent id.

Please find below example using logstash:

  1. Parent

    "index" => "search"
    "document_type" => "parent"
    "document_id" => "%{appId}"
    
  2. Child: Works by default since parent/routing is same as parent document id. Routing formula (shard_num = hash(_routing) % num_primary_shards)

    "index" => "search"
    "document_type" => "child"
    "document_id" => "%{lId}"
    "parent" => "%{appId}"
    
  3. Grandchild: Note Routing is appId which is grand parent document id

    "index" => "search"
    "document_type" => "grandchild"
    "document_id" => "%{lBId}"
    "parent" => "%{lId}"
    "routing" => "%{appId}"
    

This will index all the documents to same shard and search works fine in this use case.

Anh Pham
  • 2,108
  • 9
  • 18
  • 29
sunil bhardwaj
  • 259
  • 3
  • 6