I had a similar need and what I did was a small script that would query the DB and ES and compare the number of records.
That script can be as simple as a shell script like the one below. It queries the DB for the nunmber of records in MY_TABLE
and stores that number in the DB_COUNT
variable. Then it queries ES for the number of records in the my_table
index and stores that number in the ES_COUNT
variable. Finally, it outputs both numbers:
#!/bin/sh
# retrieve DB count
DB_QUERY="SELECT COUNT(1) FROM MY_TABLE"
DB_COUNT=$(mysql -u root --quick --disable-column-names -B -r -e "$DB_QUERY" -h localhost db_name)
# retrieve ES count
ES_QUERY="curl -s -XGET localhost:9205/my_table/_search?filter_path=hits.total"
ES_COUNT=$($ES_QUERY | python -c 'import json,sys;obj=json.load(sys.stdin);print obj["hits"]["total"]')
# compare
echo "$DB_COUNT records in DB and $ES_COUNT docs in ES"
If all goes well you'll see an output like this:
1267437 records in DB and 1267437 docs in ES
If the number of records is different then I would investigate further by using dichotomy, i.e. do the same queries but retrieve the number of records per year/month/week/day in order to see where the differences were.