0

I am currently using logstash-jdbc-plugin to pull data from DB and put it to an index in ES. How to check if the entire data pulled from DB is getting inserted into Elastic Search index.

The data that is pulled is in millions so can't keep checking manually

Val
  • 207,596
  • 13
  • 358
  • 360
sri
  • 331
  • 1
  • 4
  • 11

1 Answers1

0

I had a similar need and what I did was a small script that would query the DB and ES and compare the number of records.

That script can be as simple as a shell script like the one below. It queries the DB for the nunmber of records in MY_TABLE and stores that number in the DB_COUNT variable. Then it queries ES for the number of records in the my_table index and stores that number in the ES_COUNT variable. Finally, it outputs both numbers:

#!/bin/sh

# retrieve DB count
DB_QUERY="SELECT COUNT(1) FROM MY_TABLE"
DB_COUNT=$(mysql -u root --quick --disable-column-names -B -r -e "$DB_QUERY" -h localhost db_name)

# retrieve ES count
ES_QUERY="curl -s -XGET localhost:9205/my_table/_search?filter_path=hits.total"
ES_COUNT=$($ES_QUERY | python -c 'import json,sys;obj=json.load(sys.stdin);print obj["hits"]["total"]')

# compare
echo "$DB_COUNT records in DB and $ES_COUNT docs in ES"

If all goes well you'll see an output like this:

1267437 records in DB and 1267437 docs in ES

If the number of records is different then I would investigate further by using dichotomy, i.e. do the same queries but retrieve the number of records per year/month/week/day in order to see where the differences were.

Val
  • 207,596
  • 13
  • 358
  • 360
  • Is there any way to compare the records ? like picking some random records from es and same records from db and compare them ?? – sri Dec 01 '15 at 19:00
  • Yes, provided you are using the same ids on both sides. – Val Dec 01 '15 at 19:14