4

So my data in elasticsearch looks like this one whole dict with one person id is equal to one doc and it contains list of objects like

`{
  "dummy_name": "abc",
  "dummy_id": "44850642"
}`

which is shown below ,the thing is I am querying on the field dummy_id and I am getting result as some no. of matching query results, and I want to aggregate on dummy_id field so I'll get no of docs for a specific dummy_id, but what happening is I am also getting the buckets of dummy_id which are not mentioned in the query its self as person contains list of objects in which dummy_id is present.

`{
  "person_id": 1234,
  "Properties": {
    "Property1": [
      {
        "dummy_name": "abc",
        "dummy_id": "44850642"
      },
      {

      },
      {

      }
    ]
  }
},
{
  "person_id": 1235,
  .........
}`

Query Iam using:

`{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "Properties.Property1.dummy_id": "453041 23234324 124324 "
          }
        }
      ]
    }
  },
  "aggregations": {
    "group_by_concept": {
      "terms": {
        "field": "Properties.Property1.dummy_id",
        "order": {
          "_count": "desc"
        },
        "size": 10
      }
    }
  }
}`
glady
  • 41
  • 4
  • I don't understand - Are you querying for dummy_name or dummy_id. You mentioned on querying on dummy_name and getting wrong dummy_id's, perhaps there are other dummy_id's with those dummy names – aclowkay Jan 25 '18 at 11:56
  • I've update the question accordingly and let me know the question is still clear ? – glady Jan 29 '18 at 07:22

1 Answers1

0

The problem which is coming is how are you keeping the data. For eg In this document

{
  "person_id": 1234,
  "Properties": {
    "Property1": [
      {
        "dummy_name": "abc",
        "dummy_id": "44850642"
      },
      {
        "dummy_name": "dfg",
        "dummy_id": "876468"
      },
      {

      }
    ]
  }
}

The tokens that would be generated in this document would be

  • Dummy id tokens - 44850642,876468.This is how data is kept in backend in Lucene

So when you would query for dummy_id:44850642

you would get the document, but aggregations aggregates on terms produced by the documents matching the query

So as a result you would see buckets of 44850642 as well as 876468.

For more information on how elasticsearch keeps data of a list of objects , here is the link - https://www.elastic.co/guide/en/elasticsearch/reference/current/array.html

gaurav9620
  • 1,147
  • 12
  • 30