2

Using elasticsearch-py, I would like to remove all documents from a specific index, without removing the index. Given that delete_by_query was moved to a separate plugin, I want to know what is the best way to go about this?

zanderle
  • 815
  • 5
  • 15
  • You can't just delete and recreate the index? – OneCricketeer Feb 18 '16 at 17:17
  • @cricket_007 I could, but I'd rather do it by removing the documents. Otherwise, I'd have to check the index settings and mappings and use them when recreating index. I think it's easier to remove the documents. – zanderle Feb 18 '16 at 17:45
  • A simple backup of the mappings and such shouldn't be that difficult. A full index scan and a bulk delete doesn't seem "easier", IMO – OneCricketeer Feb 18 '16 at 18:00

2 Answers2

2

It is highly inefficient to delete all the docs by delete by query. More direct and correct action is:

  • Getting the current mappings (Assuming you are not using index templates)
  • Dropping the index by DELETE /indexname
  • Creating the new index and the mappings.

This will take a second, former will take much, much more time and unnecessary disk I/O

Hkntn
  • 356
  • 1
  • 6
  • Thank you. It's what I did in the end. I'll leave the other answer as accepted, since it answers the question more directly (even if it is the wrong approach). – zanderle Feb 22 '16 at 08:22
0

Use a Scroll/Scan API call to gather all Document IDs and then call batch delete on those IDs. This is the recommended replacement for the Delete By Query API based on the official documentation.

EDIT: Requested information for using this specifically in elasticsearch-py. Here is the documentation for the helpers. Use the Scan helper to scan throgh all documents. Use the Bulk helper with the delete action to delete all the ids.

Chris Franklin
  • 3,864
  • 2
  • 16
  • 19
  • Can you provide more information about how to do this in elasticsearch-py? – zanderle Feb 18 '16 at 17:04
  • Added a link to the specific docs for the Python helpers you need to perform the scan and then call the bulk deletion. Everything you need should be there! – Chris Franklin Feb 18 '16 at 17:22