Large dataset with incompatible data comparison across various databases

Question

We have sync pipeline between MySQL and Big Query running, I want to compare some critical tables, but they're quite big in million rows, and up to 50+ column each. I have tried to use CSV dump, and compare that way but there is also data type incompatibility issue (i.e. date representation)

How do I get large data comparison?

score 0 · Answer 1 · answered Feb 06 '19 at 23:54

Now about using endly to compare any databases, where directives section allows specifying an index expression (unique columns combination), desired date format and other comparison options.

you can simply run with the simple workflow name like the one below.

endly -r=compare

@compare.yaml

pipeline:
  register:
    mysqldb:
      action: dsunit:register
        datastore: myMySQLdataset
        config:
          driverName: mysql
          descriptor: '[username]:[password]@tcp(127.0.0.1:3306)/[dbname]?parseTime=true'
          parameters:
            dbname: myMySQLdataset
            username: myUsername
            password: myPassowrd

    bigquerydb:
      action: dsunit:register
      datastore: myBQdataset
      config:
        driverName: bigquery
        parameters:
          datasetId: myBQdataset

  compare:
    action: dsunit:compare
    maxRowDiscrepancy: 1000000

    directives:
      "@indexBy@": id,otherUniqueColumn
      "@numericPrecisionPoint@": 7
      "@coalesceWithZero@": true
      "@caseSensitive@": false
      "@timeFormat@myDateColumn": 'yyyy-MM-dd'
    source1:
      datastore: myMySQLdataset
      SQL: SELECT *
           FROM table1

    source2:
      datastore: myBQdataset
      SQL: SELECT *
           FROM table1

Large dataset with incompatible data comparison across various databases

1 Answers1