9

What is the most efficient way oof testing if a document with an _id exists?

I could obviously do:

curl -XGET localhost:9200/my_index/my_doctype/<_id>?fields=_id'.

Or I could go with: How do I check for duplicate data on ElasticSearch? and send an empty document, I guess.

Anything more efficient?

Community
  • 1
  • 1
eran
  • 14,496
  • 34
  • 98
  • 144

3 Answers3

15

You can use the HTTP HEAD verb to retrieve the headers only.

curl -XHEAD --dump-header - localhost:9200/index/type/doc

It will provide you with either 200 or 404 without any part of the document body.

Louis-Philippe Huberdeau
  • 5,341
  • 1
  • 19
  • 22
6

If on a version prior to 2.1, you can use the "Search Exists API".

An example:

Search the twitter index for a tweet type by the user "kimchy"

$ curl -XGET 'http://localhost:9200/twitter/tweet/_search/exists?q=user:kimchy'

The response body will contain a true or false if there are any tweets by that user:

{
  "exists" : true
} 

You can also send the query in the request body like so (POST or GET both work):

curl -XGET 'http://localhost:9200/twitter/tweet/_search/exists' -d '
{
    "query" : {
    "term" : { "user" : "kimchy" }
    }
}'

The response will be the same.

Micah
  • 1,676
  • 16
  • 23
  • 3
    Just a reminder, official doc says Search Exists API "Deprecated in 2.1.0". `HEAD` is recommended way: https://www.elastic.co/guide/en/elasticsearch/guide/current/doc-exists.html – coderz Mar 07 '16 at 01:42
2

I would just use the get api which returns a 404 if the object doesn't exist, otherwise the object itself. If you use the Java API you'll find an isExists method in the GetResponse object.

If the _id field you are referring to is not included in your documents, saying fields=_id wouldn't give you back either the _source nor any specific field under fields. But you would get back the _id in the header of the response anyway.

If you are using the REST api you can use the following:

curl -XHEAD 'http://localhost:9200/twitter/tweet/1

it won't return the document back but just 404 if not found, 200 otherwise. The body of the response will also contain the exists flag too, with the same meaning.

What's interesting is that using the HEAD method maps to a get request internally, that's why it's not directly exposed to the Java API, but you can obtain the same behaviour creating a GetRequest with the following code:

GetRequest getRequest = new GetRequest("index", "type", "id");
// don't get any fields back...
getRequest.fields(new String[0]);
javanna
  • 59,145
  • 14
  • 144
  • 125
  • Are you sure this will happen? I mean, the "_id" is returned anyway, no? The other concern is just a higher network traffic if the document is large – eran Jun 12 '13 at 07:47
  • 1
    Maybe I misunderstood the question, but why do you care about the id in the response when you get a document by id? The concern about network traffic is reasonable if you have big documents, then it could be worth paying the cost of parsing the source on server side. – javanna Jun 12 '13 at 07:56
  • I don't care about the _id in the response, I'm just pointing out that since the `_id` is included in the response anyway, I thought stating `fields=_id` would serve to tell the server to ONLY get this field, and not add work (i.e., a need to parse the `_source`) – eran Jun 12 '13 at 08:12
  • I see what you mean...updating my answer – javanna Jun 12 '13 at 09:01
  • Hmm... What do you mean by "the _id field you are referring to is not included in your documents"? Does this mean that it is not included in "_source"? Come to think of it, i can just include a field that is non-existent no? :) `fields=some_madeup_field` – eran Jun 12 '13 at 09:28
  • Exactly you could do that, in the end what you do is the same since the _id field is not stored by default and not present in your documents. – javanna Jun 12 '13 at 09:31
  • 1
    I meant that you could have the id as part of your documents and configure its path in the mapping. That way you would get it back because it's in your source. – javanna Jun 12 '13 at 09:33
  • 1
    @eran Just figured out I missed something in my answer ;) so I updated it. In fact the other answer you got (just saw it) is correct. – javanna Jul 27 '13 at 09:39