5

I have an index in elasticsearch with is occupied by some json files with respected to timestamp. I want to delete data from that index.

curl -XDELETE http://localhost:9200/index_name

Above code deletes the whole index. My requirement is to delete certain data after a time period(for example after 1 week). Could I automate the deletion process?

I tried to delete by using curator.

But I think it deletes the indexes created by timestamp, not data with in an index. Can we use curator for delete data within an index?

It will be pleasure if I get to know that either of following would work:

  • Can Curl Automate to delete data from an index after a period?
  • Can curator Automate to delete data from an index after a period?
  • Is there any other way like python scripting to do the job?

References are taken from the official site of elasticsearch.

Thanks a lot in advance.

kahveci
  • 1,429
  • 9
  • 23
ADARSH K
  • 606
  • 1
  • 8
  • 21

3 Answers3

4

You can use the DELETE BY QUERY API: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html

Basically it will delete all the documents matching the provided query:

POST twitter/_delete_by_query
{
  "query": { 
    "match": {
      "message": "some message"
    }
  }
}

But the suggested way is to implement indexes for different periods (days for example) and use curator to drop them periodically, based on the age:

...
logs_2019.03.11
logs_2019.03.12
logs_2019.03.13
logs_2019.03.14
Enrichman
  • 11,157
  • 11
  • 67
  • 101
  • I have only one index and I am storing my _json_ data into it.If I made indices according to the date,then how can I group them all in to a single index pattern, so that I can search a particular data in **Kibana** ? – ADARSH K Mar 14 '19 at 11:30
  • Suppose, now I have index named **animals** and I am inserting data in to it, then I can create an index pattern named **animals** and can search on it.But what will be the index pattern if I am using the _date_ as the index name? Then I may need to create _n number of index patterns_ with respected to index names right? – ADARSH K Mar 14 '19 at 11:39
  • Adarsh, you can use aliases to group indices together. Index patterns can use wildcards, e.g. `animals-*`, to accommodate dated indices. – untergeek Mar 14 '19 at 14:05
  • @untergeek , I got your valuable point and it make sense.If my index is like **animals-%{+YYYY.MM.dd}** For sure I can make index patterns like _animals-*_ But the doubt is that, can I use curator to delete these indices with respect to old date? The following link is one I found on web that curator command,and i don't know whether it is correct or not. Attaching the part which I think is correct. https://discuss.elastic.co/t/delete-indices-older-than-30-days/96630/5 What should be the timestring for delete if my index is someting like **animals-%{+YYYY.MM.dd}** – ADARSH K Mar 15 '19 at 04:45
  • You can define the pattern as you want, just do some tests and see. :) We are doing like that! – Enrichman Mar 15 '19 at 07:34
  • It works exactly like that, Adarsh. Curator's original use case was deleting indices by date stamp. It uses a different syntax, e.g. `%Y.%m.%d` for what in Logstash looks like `%{+YYYY.MM.dd}`, but the functionality is there. – untergeek Mar 16 '19 at 05:24
  • @Enrichman , I did some tests and reached a point where I could create multiple indices like you suggested.Now my index is like, for example , **logstash-2019.03.21** So I am going to use curator like u said.I referred some of the documents and reached a point that `- filtertype: pattern kind: prefix value: logstash-` can be used to select indices starting with **logstash-** But how can I append the _timestamp_ with it , and check whether it is older than 7 days from current timestamp? – ADARSH K Mar 19 '19 at 04:06
  • @untergeek , Since I am newbie to this technology, I went through the documentation on Curator on the following link https://www.elastic.co/guide/en/elasticsearch/client/curator/current/filtertype_pattern.html#_prefix Now my doubt is that how can I search for the indices older than 7 days from today, if my indices is like **logstash-2019.03.10**, **logstash-2019.03.15**, **logstash-2019.03.16** and how to use timestring in curator. https://www.elastic.co/guide/en/elasticsearch/client/curator/current/filtertype_pattern.html#_timestring – ADARSH K Mar 19 '19 at 04:10
  • Guys I was wrong that Indices are deleting by their name concatenated with timestamp.So I think how to Split the timestamp and check whether it is older than 7 days from current.Actually I think Curator can delete indices with respect to the time they created.Thank you all for valuable information.Now it seems fine with Curator. – ADARSH K Mar 19 '19 at 04:49
  • @Enrichman, Thanks for the Help.Now I can delete indices older than 7 days using curator. I referred the following link for more knowledge on this topic. https://discuss.elastic.co/t/indices-deleting-using-curator-different-retention-period-for-different-index/77792 Can we automate the curator for deletion purpose?Now I need to run curator every day. Any reference on automating this tool,so that it will delete automatically the older data? – ADARSH K Mar 19 '19 at 12:30
  • 1
    @ADARSHK we used Jenkins to automate this, but a simple cronjon will work. :) Glad you made it! – Enrichman Mar 23 '19 at 23:19
4

Simple example using Delete By Query API:

POST index_name/_delete_by_query
{
  "query": {
    "bool": {
      "filter": {
        "range": {
          "timestamp": {
            "lte": "2019-06-01 00:00:00.0",
            "format": "yyyy-MM-dd HH:mm:ss.S"
          }
        }
      }
    }
  }
}

This will delete records which have a field "timestamp" which is the date/time (within the record) at which they occured. One can run the query to get a count for what will be deleted.

GET index_name/_search
{
  "size": 1,
  "query: {
-- as above --

Also it is nice to use offset dates

         "lte": "now-30d",

which would delete all records older than 30 days.

georgep68
  • 86
  • 2
0

You can always delete single documents by using the HTTP request method DELETE.

To know which are the id's you want to delete you need to query your data. Probably by using a range filter/query on your timestamp.

As you are interacting with the REST api you can do this with python or any other language. There is also a Java client if you prefer a more direct api.

neun24
  • 222
  • 2
  • 10
  • If I am using timestamp as the filter, how can I implement the idea to delete documents in last 7 days? Pleasure if you have any reference about the task. `{ "found" : true, "_index" : "website", "_type" : "blog", "_id" : "123", "_version" : 3 }` This is the code snippet from the link you given.If I am checking with the documents with timestamp older then I given, then how should be my request in above code? Can I automate this process with Python script or something? – ADARSH K Mar 14 '19 at 11:43
  • No, that's the response of the given example DELETE request. Have a look at the [range query on dates](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html#ranges-on-dates) and make sure you have a date to use it on. Then use that with _delete_by_query as suggested above. – neun24 Mar 14 '19 at 16:47