60

Is it possible to query for all of the values a specific field? Say I have "articles" and each article has an author, is there a query I can perform to find a list of all authors?

Yu Hao
  • 119,891
  • 44
  • 235
  • 294
eric
  • 2,699
  • 4
  • 29
  • 40
  • 1
    This can help :[Query all unique values of a field with Elasticsearch](http://stackoverflow.com/a/26647301/1145750). – Gary Gauh Oct 30 '14 at 07:35
  • 4
    The selected answer is pretty out-dated. pls refer [enter link description here](http://stackoverflow.com/questions/14466274/query-all-unique-values-of-a-field-with-elasticsearch) – Mohan Kumar Jan 06 '16 at 13:46
  • 2
    FYI the selected answer has changed since @MohanKumar made their helpful comment above. – eric Jul 28 '20 at 19:43

7 Answers7

64

How to get all possible values for field author?

curl -XGET  http://localhost:9200/articles/_search?pretty -d '
{
    "aggs" : {
        "whatever_you_like_here" : {
            "terms" : { "field" : "author", "size":10000 }
        }
    },
    "size" : 0
}'

Note

  • "size":10000 Get at most 10000 unique values. Default is 10.

  • "size":0 By default, "hits" contains 10 documents. We don't need them.

  • By default, the buckets are ordered by the doc_count in decreasing order.


Reference: bucket terms aggregation

Also note, according to this page, facets have been replaced by aggregations in Elasticsearch 1.0, which are a superset of facets.

kgf3JfUtW
  • 13,702
  • 10
  • 57
  • 80
  • I am using an aggregation when I filter on multiple condition. Is it possible to have values that don't match my filter as a 0 count ? – g.lahlou Jan 17 '18 at 17:43
25

I think what you want is a faceted search. Have a look at this example from the documentation:

http://www.elasticsearch.org/guide/reference/api/search/facets/index.html

curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d '
  {
    "query" : { "query_string" : {"query" : "*"} },
    "facets" : {
      "tags" : { "terms" : {"field" : "author"} }
    }
  }
'

See if you can tailor this to work for you.

Willi Mentzel
  • 27,862
  • 20
  • 113
  • 121
MatthewJ
  • 3,127
  • 2
  • 27
  • 34
  • 8
    Just a quick note. When searching for all, prefer the matchAll Query. – dadoonet Jan 17 '13 at 20:47
  • That is exactly what I needed. I had looked at facets, but thought they were really for counts, I didn't realize I could get terms from them as well. This is actually really awesome, thanks! – eric Jan 17 '13 at 22:08
  • listen to @dadoonet, avoid using wildcards as they negate the value of using a fancy inverted-index like elasticsearch. – eric Feb 24 '14 at 01:44
  • 14
    Just a note (for people as me that come here by Google search and don't immediately see that this good answer is a bit outdated): I just learnt that ES 1.4 deprecates facets by aggregations... so refer here: http://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html – Pierpaolo Cira Mar 16 '15 at 09:49
  • 3
    @PierpaoloCira In addition to the documentation you linked, here is also a nice answer with an up-to-date example: http://stackoverflow.com/a/26647301/1666398. Ftr: I like the [`?search_type=count` param](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html#_returning_only_aggregation_results) a lot :) – dtk Jun 12 '15 at 09:28
  • It has been mentioned already that facets were replaced by aggregations in ES 1, but in ES 5 they seem to have removed the ability to get all results for an aggregation, so we seem to be back to square one, not knowing how to get all terms... – Hakanai Apr 05 '17 at 04:09
  • @Trejkaz My updated answer should be able to retrieve up to 10000 unique values for one field. If more is needed, [configuration changes may be necessary](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-from-size.html) – kgf3JfUtW Dec 01 '17 at 22:33
  • Error 400 in Elasticsearch 7.3 – Yin Aug 14 '19 at 10:35
3

another example

request

curl -X POST "http://localhost:9200/_search?pretty=true" -d '
{
  "facets" : {
    "tags" : { "terms" : {"field" : "network.platform"} },
    "size" : 60
  },
  "size" : 0
}
'

response

{
  "took" : 266,
  "timed_out" : false,
  "_shards" : {
    "total" : 650,
    "successful" : 650,
    "failed" : 0
  },
  "hits" : {
    "total" : 41,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "facets" : {
    "tags" : {
      "_type" : "terms",
      "missing" : 15,
      "total" : 26,
      "other" : 0,
      "terms" : [ {
        "term" : "platform name 1",
        "count" : 20
      }, {
        "term" : "platform name 2",
        "count" : 6
      } ]
    }
  }
}
C Würtz
  • 856
  • 9
  • 20
2

I think the optimal way is to use elasticsearch aggregation https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html

GET {index}/{type}/_search
{
  "size": 0, <-- to not display search hits
  "aggs": {
    "{aggregation_name}": {
      "terms": {
        "field": "{filed_value}",
        "size": 10
      }
    }
  }
}
slisnychyi
  • 1,832
  • 3
  • 25
  • 32
1

You don't mention the Elasticsearch Version, but for ES 1.6, the preferred method is using aggregations. Here is an example of what I use.

--Get all the STATUS values, which is a nested query.

GET path for data/_search?size=200
{
  "aggs": {
    "something": {
      "nested": {
        "path": "NESTED_PATH"
      },
      "aggs": {
        "somethingCodes": {
          "terms": {
            "field": "NESTED_PATH.STATUS",
            "size": 50
          }
        }
      }
    }
  }
}

and an example Response:

"aggregations": {
      "panels": {
         "doc_count": 5029693,
         "panelCodes": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
               {
                  "key": "M",
                  "doc_count": 1943107
               },
               {
                  "key": "W",
                  "doc_count": 137904
               },
               {
                  "key": "E",
                  "doc_count": 69080
               },
               {
                  "key": "Y",
                  "doc_count": 4081
               },
               {
                  "key": "N",
                  "doc_count": 1063
               },
               {
                  "key": "T",
                  "doc_count": 483
               },
               {
                  "key": "",
                  "doc_count": 1
               }
            ]
         }
      }
   }
James Drinkard
  • 15,342
  • 16
  • 114
  • 137
1

Fastest way of checking existing field values:

GET myindex/mytype/<id>/_termvectors?fields=Product.Material.Code
  • myindex = index
  • mytype = type
  • <id> = document id
andrew.fox
  • 7,435
  • 5
  • 52
  • 75
0

Please use the below code to get only list of 'articles' field values from all the content in the index.

curl 'http://localhost:9200/my_index/_search?pretty=true&_source=articles'

It will sure help you.

  • 1
    Please [Take the Tour](https://stackoverflow.com/tour) , and be sure with your [answer link](https://meta.stackexchange.com/questions/8231/are-answers-that-just-contain-links-elsewhere-really-good-answers/8259#8259) While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. [how to answer](https://stackoverflow.com/help/how-to-answer) – Agilanbu Jun 20 '19 at 10:37
  • While this may work, using an aggregation would be better since this will return at most 10k docs, not 10k *distinct* values. – sox supports the mods Dec 18 '19 at 13:50