0

I used elasticsearch-py to move millions of records represented by a Django model from PostgreSQL to Elasticsearch. I used the name of the model for doctype (which was in CamelCase).

I then switched to Elasticsearch DSL and noticed that by default it creates doctypes with lowercase names with underscores (snake_case).

I don't want to redefine doc_type in my document meta, so I am to rename it in Elasticsearch. What would be the fastest way to do this?

utapyngo
  • 6,946
  • 3
  • 44
  • 65

1 Answers1

1

My own solution using elasticsearch_dsl:

from elasticsearch.helpers import bulk
from elasticsearch_dsl import Search
from elasticsearch_dsl.connections import connections


connection = connections.get_connection()    
s = Search(index=index, doc_type=old_name)

actions = (dict(
    _index=hit.meta.index, _type=new_name, 
    _id=hit.meta.id, _source=hit.to_dict()
) for hit in s.scan())
bulk(connection, actions, request_timeout=300)
s.params(request_timeout=600).delete()
utapyngo
  • 6,946
  • 3
  • 44
  • 65
  • Note that you don't need to group actions into chunks yourself, the `bulk` helper already does that, you can feed it directly an iterator (a generator in this case) that will consume the `scan` results and `yield` out the modified docs. – Honza Král Oct 18 '17 at 08:16
  • Thank you @HonzaKrál. Updated. – utapyngo Oct 18 '17 at 10:13