0

We have 2 indexes :

  1. Conversation - Contains participants details, last message, etc

    { "_id":123,

    "last_message":"Hi",

    "from_phone": "+919899988888"

    "to_phone":"+919899988889"

    ...... }

  2. ConversationDetails - list of messages sent/received for a given participants

    [{
    
    
        "conv_id":123,
    
        "message":"Hi",
    
        "channel": "SMS"
    
        "comm_dir":"SENT"
    
        "created": 1592992160480
    
        ......
    
     },
    
       {
    
        "conv_id":123,
    
        "message":"Hi",
    
        "channel": "SMS"
    
        "comm_dir":"RECEIVED"
    
        "created": 1592992160480
    
        ......
    
     },
    

    ]

We need to have a field 'lastReceivedMessageSource' in every Conversation Document which is derived from Conversation Details Documents for the given conversations.

We need to migrate this data for millions of conversations. What is the fastest way to do that ?

My approach: is to fetch the values for 'n' conversations and bulk_upsert in the conversation Document.

Note: There are millions of Conversations

E.S Version: 5.6

Sahil Gupta
  • 2,028
  • 15
  • 22
  • 1
    I'd suggest using an approach like this one: https://stackoverflow.com/questions/58952346/using-a-search-template-in-an-ingest-pipeline/59047019#59047019 (hint: enrich processor) – Val Jun 24 '20 at 09:58
  • I would suggest you to use `Logstash` with multiple file `input plugin`. It is really faster. For a test run with 5m data with 10 file plugins, it processed in an hour in single node machine. You can increase parallelism. It depends on number of nodes, shards, resources. Please add the calculation so that I can suggest whether that can be done with Logstash. – Gibbs Jun 24 '20 at 10:06
  • @Val: thanks for the suggestion but the E.S version we are using doesn't support that. – Sahil Gupta Jun 24 '20 at 18:15
  • Then you could leverage the [`elasticsearch` filter plugin](https://www.elastic.co/guide/en/logstash/current/plugins-filters-elasticsearch.html) but the lookups might not be fast enough... to be tested – Val Jun 24 '20 at 18:16

0 Answers0