0

Say I have a type type1 for which one of the fields is an array:

curl -XPUT localhost:9200/index1/type1/1
{
  'field1': ['A', 'B', 'C'],
  'field2': 1
}


curl -XPUT localhost:9200/index1/type1/2
{
  'field1': ['A', 'E', 'D'],
  'field2': 2
}

I'd like to query for the values in field1 which are common to all documents of that type. So in this case, the query will return 'A'.

Is there a way to do it with an elasticsearch query? What if I want to also add a condition about another field?

Finally, Is there any way to define such query also in Kibana 4?

Dolev
  • 85
  • 7

1 Answers1

1

I don't think it's exactly what you asked for but it can be misused for it: you can use Term Aggregation on field1 and by knowing how many docs you have in total know which one every document has.

I'm re-pasting your queries, easier to work with them in Sense

Indexing:

PUT i30288948/type1/1
{
  "field1": ["A", "B", "C"],
  "field2": 1
}
PUT i30288948/type1/2
{
  "field1": ["A", "E", "D"],
  "field2": 2
}

Query:

POST i30288948/_search?search_type=count
{
  "aggs": {
    "terms": {
      "terms": {
        "field": "field1",
        "size": 1
      }
    }
  }
}

The result:

{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "terms": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 4,
         "buckets": [
            {
               "key": "a",
               "doc_count": 2
            }
         ]
      }
   }
}

By knowing your total of 2 and comparing doc_count, it would be possible to answer your question.

Whether this is suitable as you want for Kibana I unfortunately cannot answer.

Update to first comment:

I understand the solution is not satisfying in general, but it could be solved. You said:

First, I need to know the count beforehand.

You can get the total count as part of the hits.total; I didn't include it in the original result paste; it's now the complete response. By comparing, you would know if it affects all or not.

But more importantly - it doesn't get me all items. Say instead of E I have B again in item 2. Running the query then returns A, but I would expect to get both A and B.

You can increase the size of the aggregation; but I understand the solution is not really good, because you don't know how many of aggregation match your totals. Maybe looking at sum_other_doc_count helps to figure out whether there's more data.

mark
  • 6,308
  • 8
  • 46
  • 57
  • Thanks, but that doesn't work.. First, I need to know the count beforehand. But more importantly - it doesn't get me *all* items. Say instead of `E` I have `B` again in item 2. Running the query then returns `A`, but I would expect to get both `A` and `B`. – Dolev May 18 '15 at 07:56
  • @Dolev I've expanded my answer; solvable, but probably neither *pretty* nor *good*. – mark May 18 '15 at 09:16
  • Thanks for the effort! In Splunk, this is simple, since you can term you get not only the equivalent of doc_count, but also the percentage. Then you can filter for those with 100% only... Unfortunately, I couldn't find a way to do this in elasticsearch. – Dolev May 18 '15 at 13:03