1

Hello I'm having trouble deciding the correctness of the following query for multiple OR in Elasticsearch. I want to select all the unique data (not count, but select all rows)

My best try for this in elastic query is

GET mystash/_search
{
    "aggs": {
        "uniques":{
             "filter":
            {
                "or":
                        [
                            { "term": { "url.raw" : "/a.json" } },
                            { "term": { "url.raw" : "/b.json" } },
                            { "term": { "url.raw" : "/c.json"} },
                            { "term": { "url.raw" : "/d.json"} }
                        ]
            },
            "aggs": {
                "unique" :{
                    "terms" :{
                        "field" : "id.raw",
                        "size" : 0
                    }
                }
            }
        }
    }
}

The equivalent SQL would be

SELECT DISTINCT id
FROM json_record
WHERE 
    json_record.url = 'a.json' OR
    json_record.url = 'b.json' OR
    json_record.url = 'c.json' OR
    json_record.url = 'd.json'

I was wondering whether the query above is correct, since the data will be needed for report generations.

Xyten
  • 33
  • 1
  • 5

1 Answers1

2

Some remarks:

  • You should use a query filter instead of an aggregation filter. Your query loads all documents.
  • You can replace your or+term filter by a single terms filter
  • You could use a size=0 at the root of the query to get only agg result and not search results

Example code:

{"size":0,
  "query" :{"filtered":{"filter":{"terms":{"url":["a", "b", "c"]}}}}, 
   "aggs" :{"unique":{"term":{"field":"id", "size" :0}}}
} 
G Quintana
  • 4,556
  • 1
  • 22
  • 23