Fast and effecient way to filter Elastic Search index by the IDs from another index

Asked Feb 17 '21 at 11:38

Active Jul 03 '22 at 09:01

Viewed 364 times

I'm looking for an efficient and fast solution to filter the ES index, index A (about 40M documents), by the IDs from index B (about 3M documents). And to delete in index B all documents that are not in index A using the filtered IDs.

An ID in index A looks like ABC1D2:XXX (where XXX are numbers). An ID in index B looks like ABC1D2

What I've tried so far is to:

Cache all IDs from index B
Cache all IDs from index A
Filter index B IDs by the IDs from index A. And bulk delete documents from index B by the filtered IDs.

However, it takes 24+ hrs.

What is the best approach to achieve the same but faster? As far as I know in Elastic search we don't have something like SQL left join.

edited Jul 03 '22 at 09:01

Anonymous Creator

2,968
7
31
77

asked Feb 17 '21 at 11:38

Georgi Georgiev

Found in this thread https://stackoverflow.com/questions/17497075/efficient-way-to-retrieve-all-ids-in-elasticsearch to use stored_fields so to retrieve only the meta data. This significantly improved the speed. However, if someone can advice for further improvements I will appreciate it. Thank you! – Georgi Georgiev Feb 17 '21 at 15:04

Fast and effecient way to filter Elastic Search index by the IDs from another index

0 Answers0