2

I am trying to use the reindex api for elasticsearch

https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html

here is my source index

"testtype": {
        "_all": {
          "enabled": false
        },
        "dynamic_templates": [
          {
            "message_field": {
              "mapping": {
                "fielddata": {
                  "format": "disabled"
                },
                "index": "analyzed",
                "omit_norms": true,
                "type": "string"
              },
              "match": "message",
              "match_mapping_type": "string"
            }
          },
          {
            "string_fields": {
              "mapping": {
                "fielddata": {
                  "format": "disabled"
                },
                "index": "analyzed",
                "omit_norms": true,
                "type": "string",
                "fields": {
                  "raw": {
                    "ignore_above": 256,
                    "index": "not_analyzed",
                    "type": "string"
                  }
                }
              },
              "match": "*",
              "match_mapping_type": "string"
            }
          }
        ],
        "properties": {
          "@timestamp": {
            "type": "date",
            "format": "strict_date_optional_time||epoch_millis"
          },
          "@version": {
            "type": "string",
            "index": "not_analyzed"
          },
          "app_code": {
            "type": "string"
          },
          "data": {
            "properties": {
              "action": {
                "type": "string",
                "norms": {
                  "enabled": false
                },
                "fielddata": {
                  "format": "disabled"
                },
                "fields": {
                  "raw": {
                    "type": "string",
                    "index": "not_analyzed",
                    "ignore_above": 256
                  }
                }
              },
              "level": {
                "type": "string",
                "norms": {
                  "enabled": false
                },
                "fielddata": {
                  "format": "disabled"
                },
                "fields": {
                  "raw": {
                    "type": "string",
                    "index": "not_analyzed",
                    "ignore_above": 256
                  }
                }
              },
              "message": {
                "type": "string",
                "norms": {
                  "enabled": false
                },
                "fielddata": {
                  "format": "disabled"
                }
              },
              "timestamp": {
                "type": "date",
                "format": "strict_date_optional_time||epoch_millis"
              }
            }
          },
          "header": {
            "properties": {
              "@timestamp": {
                "type": "date",
                "format": "strict_date_optional_time||epoch_millis"
              },
              "_id": {
                "type": "long"
              },
              "app_code": {
                "type": "string",
                "norms": {
                  "enabled": false
                },
                "fielddata": {
                  "format": "disabled"
                },
                "fields": {
                  "raw": {
                    "type": "string",
                    "index": "not_analyzed",
                    "ignore_above": 256
                  }
                }
              },
              "host": {
                "type": "string",
                "norms": {
                  "enabled": false
                },
                "fielddata": {
                  "format": "disabled"
                },
                "fields": {
                  "raw": {
                    "type": "string",
                    "index": "not_analyzed",
                    "ignore_above": 256
                  }
                }
              },
              "meta_host": {
                "type": "string",
                "norms": {
                  "enabled": false
                },
                "fielddata": {
                  "format": "disabled"
                },
                "fields": {
                  "raw": {
                    "type": "string",
                    "index": "not_analyzed",
                    "ignore_above": 256
                  }
                }
              },
              "name": {
                "type": "string",
                "norms": {
                  "enabled": false
                },
                "fielddata": {
                  "format": "disabled"
                },
                "fields": {
                  "raw": {
                    "type": "string",
                    "index": "not_analyzed",
                    "ignore_above": 256
                  }
                }
              },
              "pid": {
                "type": "long"
              },
              "source_id": {
                "type": "string",
                "norms": {
                  "enabled": false
                },
                "fielddata": {
                  "format": "disabled"
                },
                "fields": {
                  "raw": {
                    "type": "string",
                    "index": "not_analyzed",
                    "ignore_above": 256
                  }
                }
              },
              "source_name": {
                "type": "string",
                "norms": {
                  "enabled": false
                },
                "fielddata": {
                  "format": "disabled"
                },
                "fields": {
                  "raw": {
                    "type": "string",
                    "index": "not_analyzed",
                    "ignore_above": 256
                  }
                }
              },
              "timestamp": {
                "type": "date",
                "format": "strict_date_optional_time||epoch_millis"
              },
              "user": {
                "type": "string",
                "norms": {
                  "enabled": false
                },
                "fielddata": {
                  "format": "disabled"
                },
                "fields": {
                  "raw": {
                    "type": "string",
                    "index": "not_analyzed",
                    "ignore_above": 256
                  }
                }
              }
            }
          },
          "source_id": {
            "type": "string"
          },
          "timestamp": {
            "type": "date",
            "format": "strict_date_optional_time||epoch_millis"
          }
        }
      }

So it has some string fields that also have the corresponding raw fields. The problem is that the default fields are analyzed. So i want the new index to be

{
  "mappings": {
    "test": {
        "dynamic_templates": [
            { "notanalyzed": {
                  "match": "*",
                  "path_unmatch":"data.message",
                  "match_mapping_type": "string",
                  "mapping": {
                      "type":        "string",
                      "index":       "not_analyzed",
                      "fielddata": {
                      "format": "disabled"
                    },
                    "fields": {
                      "raw": {
                        "ignore_above": 256,
                        "index": "not_analyzed",
                        "type": "string"
                      }
                    }
                  }
               }
            }
          ]
       }
   }
}

the old index has some data. so i tried to reindex that as

POST /_reindex
{
  "source": {
    "index": "oldindex",
    "type": ["testtype"]
  },
  "dest": {
    "index": "newindex"
  }
}

after i do this, i see that the new index has been converted to

{
  "newindex": {
    "aliases": {},
    "mappings": {
      "testtype": {
        "properties": {
          "data": {
            "properties": {
              "action": {
                "type": "string"
              },
              "level": {
                "type": "string"
              },
              "message": {
                "type": "string"
              },
              "timestamp": {
                "type": "date",
                "format": "strict_date_optional_time||epoch_millis"
              }
            }
          },
          "header": {
            "properties": {
              "@timestamp": {
                "type": "date",
                "format": "strict_date_optional_time||epoch_millis"
              },
              "_id": {
                "type": "long"
              },
              "app_code": {
                "type": "string"
              },
              "host": {
                "type": "string"
              },
              "meta_host": {
                "type": "string"
              },
              "name": {
                "type": "string"
              },
              "pid": {
                "type": "long"
              },
              "source_id": {
                "type": "string"
              },
              "source_name": {
                "type": "string"
              },
              "timestamp": {
                "type": "date",
                "format": "strict_date_optional_time||epoch_millis"
              },
              "user": {
                "type": "string"
              }
            }
          }
        }
      },
      "test": {
        "dynamic_templates": [
          {
            "notanalyzed": {
              "mapping": {
                "fielddata": {
                  "format": "disabled"
                },
                "index": "not_analyzed",
                "type": "string",
                "fields": {
                  "raw": {
                    "ignore_above": 256,
                    "index": "not_analyzed",
                    "type": "string"
                  }
                }
              },
              "match": "*",
              "match_mapping_type": "string",
              "path_unmatch": "data.message"
            }
          }
        ]
      }
    },
    "settings": {
      "index": {
        "creation_date": "1461792130202",
        "number_of_shards": "5",
        "number_of_replicas": "1",
        "uuid": "nho7V2PpTbqzfsUVWVdLkA",
        "version": {
          "created": "2030099"
        }
      }
    },
    "warmers": {}
  }
}

I cant understand what happened here! It looks like the new data was just auto indexed! This is not what I intended at all.

I even tried

POST /_reindex
{
  "source": {
    "index": "oldindex",
    "type": ["testtype"]
  },
  "dest": {
    "index": "newindex",
    "type": ["test"]
  }
}

but now i get

{
   "error": "org.elasticsearch.ElasticsearchParseException: Unknown array field [type]"
}

what am i doing wrong? I cant use elasticdump or knapsack as they are 3rd party plugins.

Andrei Stefan
  • 51,654
  • 6
  • 98
  • 89
AbtPst
  • 7,778
  • 17
  • 91
  • 172

1 Answers1

2

You're almost there.

  • delete the wrongly created index: DELETE newindex
  • create the newindex index:
PUT /newindex
{
  "mappings": {
    "test": {
      "dynamic_templates": [
        {
          "notanalyzed": {
            "match": "*",
            "path_unmatch": "data.message",
            "match_mapping_type": "string",
            "mapping": {
              "type": "string",
              "index": "not_analyzed",
              "fielddata": {
                "format": "disabled"
              },
              "fields": {
                "raw": {
                  "ignore_above": 256,
                  "index": "not_analyzed",
                  "type": "string"
                }
              }
            }
          }
        }
      ]
    }
  }
}
  • and use this slightly modified _reindex command:
POST /_reindex
{
  "source": {
    "index": "oldindex",
    "type": [
      "testtype"
    ]
  },
  "dest": {
    "index": "newindex"
  },
  "script": {
    "inline": "ctx._type='test'"
  }
}

The important bit is the script where you are telling the _reindex API to change the _type of the documents in newindex.

Andrei Stefan
  • 51,654
  • 6
  • 98
  • 89
  • thanks, that makes more sense. however, now i get ` "type": "script_exception", "reason": "scripts of type [inline], operation [update] and lang [groovy] are disabled"` what does this mean? – AbtPst Apr 28 '16 at 12:44
  • 1
    Forgot about that. You need to enable scripting in `elasticsearch.yml`: `script.engine.groovy.inline.update: true` and restart. – Andrei Stefan Apr 28 '16 at 12:47
  • ok, also seems to work if i specify `"dest": { "index": "abtclm2", "type": "test" }` the type in dest is tring unlike the type in source which is array. now, i see that all documents are not transfered. after a while, the post/_reindex gets terminated and i see `502 Bad Gateway: Registered endpoint failed to handle the request.` what does this mean? – AbtPst Apr 28 '16 at 13:14
  • can the reindex operation be disturbed if i try to get the stats before it is finished? – AbtPst Apr 28 '16 at 13:27
  • To be honest, I am not sure. I don't think the stats can interfere with the reindex operation... – Andrei Stefan Apr 28 '16 at 13:32
  • i think it does. i have the reindex operation running and if i try to do anything else, in sense, the operation terminates with 502 error – AbtPst Apr 28 '16 at 13:33
  • Interesting. Do you see any errors in logs? How many documents do you have? – Andrei Stefan Apr 28 '16 at 13:34
  • i have not checked the logs yet. we have about 22M documents – AbtPst Apr 28 '16 at 13:35
  • nope, hold on. iot crashed again. let me look at the logs – AbtPst Apr 28 '16 at 13:36
  • i get `[ERROR][license.plugin.core ] [node] # # LICENSE EXPIRED ON` could this be causing an issue? with 502? – AbtPst Apr 28 '16 at 13:43
  • Cluster health, cluster stats and indices stats operations are blocked if the license expired. – Andrei Stefan Apr 28 '16 at 14:25
  • makes sense, but could it be causing the reindex to crash? i dont see any other errors – AbtPst Apr 28 '16 at 14:30
  • According to the documentation, it shouldn't. – Andrei Stefan Apr 28 '16 at 14:41
  • yeah you are right. i updated my license. i still get the 502 error. the reindex runs for a minute or so and then i get the error. any idea what might be causing this? – AbtPst Apr 28 '16 at 15:19
  • so, here is the deal. even after i get the error it still continues to put the docs in as i see the doc count increase. dont know how long it takes for it to finish or whther i get any confirmation when it gets done. still weird to see the 502. – AbtPst Apr 28 '16 at 15:22
  • how does reindex handle duplicates? what if i start reindex, stop it and then start it again. will the older documents be deleted? – AbtPst Apr 28 '16 at 15:27
  • does reindex increase size of data on disk? – AbtPst Apr 28 '16 at 17:27
  • It shouldn't increase the size unless the new mapping makes it larger (additional fields, fields with additional terms etc). – Andrei Stefan Apr 28 '16 at 18:56