0

I have been migrating one of the indexes in self-hosted Elasticsearch to amazon-elasticsearch using Logstash. we have around 1812 documents in our self-hosted Elasticsearch but in amazon-elasticsearch, we have only about 637 documents. Half of the documents are missing after migration.

Our logstash config file

input {
 elasticsearch {
 hosts => ["https://staing-example.com:443"]
 user => "userName"
 password => "password"
 index => "testingindex"
 size => 100
 scroll => "1m"
 }
}

filter {

}

output {
 amazon_es {
 hosts => ["https://example.us-east-1.es.amazonaws.com:443"]
 region => "us-east-1"
 aws_access_key_id => "access_key_id"
 aws_secret_access_key => "access_key_id"
 index => "testingindex"
}
stdout{
  codec => rubydebug
  }
}

We have tried for some of the other indexes as well but it still migrating only half of the documents.

Thilak
  • 126
  • 2
  • 12
  • Any errors in the Logstash logs? – Val Oct 15 '19 at 08:41
  • there are no errors in Logstash logs – Thilak Oct 15 '19 at 08:42
  • Are you sure that some documents are not overwriting themselves because they have the same id? You didn't specify the ID strategy in your config file, are they autogenerated? – Val Oct 15 '19 at 08:43
  • @Val I am not sure whether some documents are overwriting or not but _id field is autogenerated. how can I specify the ID strategy in the config file? – Thilak Oct 15 '19 at 10:11
  • How many results do you get in your console (from the `stdout` output)? – Val Oct 15 '19 at 11:19
  • each document is very huge in size it's not a good idea to count the total manually is there any way find the total result int ```stdout``` output – Thilak Oct 15 '19 at 11:32
  • You can use the [`dots` codec](https://www.elastic.co/guide/en/logstash/current/plugins-codecs-dots.html) instead of the `rubydebug` one, so you can count dots instead ;-) – Val Oct 15 '19 at 11:33
  • @Val total of 637 dots same as document count(637) – Thilak Oct 15 '19 at 12:39
  • Ok, at least the count is consistent... How did you get the source index count (1812)? Can you show the command you're using? – Val Oct 15 '19 at 12:45
  • Yeah get it, the total source document count is 1272. is it because of the replica count as well – Thilak Oct 15 '19 at 12:50
  • 1272 = 636 * 2 so yes, it looks like you're counting the replica documents as well. You should compare the counts you get from `GET index/_count` – Val Oct 15 '19 at 12:53

1 Answers1

0

Make sure to compare apples to apples by running GET index/_count on your index on both sides.

You might see more or less documents depending on where you look (Elasticsearch HEAD plugin, Kibana, Cerebro, etc) and if replicas are taken into account in the count or not.

In your case you had more replicas in your local environment than in your AWS Elasticsearch service, hence the different count.

Val
  • 207,596
  • 13
  • 358
  • 360