How to aggregate on the same field:value which are specified in query in elasticsearch

Question

So my data in elasticsearch looks like this one whole dict with one person id is equal to one doc and it contains list of objects like

`{
  "dummy_name": "abc",
  "dummy_id": "44850642"
}`

which is shown below ,the thing is I am querying on the field dummy_id and I am getting result as some no. of matching query results, and I want to aggregate on dummy_id field so I'll get no of docs for a specific dummy_id, but what happening is I am also getting the buckets of dummy_id which are not mentioned in the query its self as person contains list of objects in which dummy_id is present.

`{
  "person_id": 1234,
  "Properties": {
    "Property1": [
      {
        "dummy_name": "abc",
        "dummy_id": "44850642"
      },
      {

      },
      {

      }
    ]
  }
},
{
  "person_id": 1235,
  .........
}`

Query Iam using:

`{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "Properties.Property1.dummy_id": "453041 23234324 124324 "
          }
        }
      ]
    }
  },
  "aggregations": {
    "group_by_concept": {
      "terms": {
        "field": "Properties.Property1.dummy_id",
        "order": {
          "_count": "desc"
        },
        "size": 10
      }
    }
  }
}`

I don't understand - Are you querying for dummy_name or dummy_id. You mentioned on querying on dummy_name and getting wrong dummy_id's, perhaps there are other dummy_id's with those dummy names — aclowkay, Jan 25 '18 at 11:56
I've update the question accordingly and let me know the question is still clear ? — glady, Jan 29 '18 at 07:22

score 0 · Answer 1 · answered Jan 25 '18 at 18:02

The problem which is coming is how are you keeping the data. For eg In this document

{
  "person_id": 1234,
  "Properties": {
    "Property1": [
      {
        "dummy_name": "abc",
        "dummy_id": "44850642"
      },
      {
        "dummy_name": "dfg",
        "dummy_id": "876468"
      },
      {

      }
    ]
  }
}

The tokens that would be generated in this document would be

Dummy id tokens - 44850642,876468.This is how data is kept in backend in Lucene

So when you would query for dummy_id:44850642

you would get the document, but aggregations aggregates on terms produced by the documents matching the query

So as a result you would see buckets of 44850642 as well as 876468.

For more information on how elasticsearch keeps data of a list of objects , here is the link - https://www.elastic.co/guide/en/elasticsearch/reference/current/array.html

How to aggregate on the same field:value which are specified in query in elasticsearch

1 Answers1