[Fixed]-Best practise to remove stale documents in elasticsearch

0👍

I’m now doing it like that:

for dt, updated_ids in self.updated.items():
   existing_ids_in_index = [d.id for d in dt.search().scan()]
   stale_ids = list(set(existing_ids_in_index) - set(updated_ids))
   for stale_id in stale_ids:
       dt.find_one('id', stale_id).delete()
   print("... {}: Removed {}.".format(dt.get_model().__name__, len(stale_ids)))

I could further optimize this with a delete_by_query but I’m unsure about the details.

1👍

In the master branch (to be released soon, you can just do Search().delete() to invoke the delete_by_query API.

Leave a comment