0

One of our application is installed in 50 odd locations. Each of these has a local database. we need to sync data of one of the table of this database to a centalized location. We want to use a queue mechanism in between the local and the centralized database. So that if there is a network problem, while sync, it will store the recent updates, in the queue, so that whenever the ntwork comes back , it will insert all the records in the remote database, and once done, it will delete the data, inside the temp Q, and the local database can keep on working without any lock even in absense of network. We cant use a MYSQL sync, because the source is 50 different databases, and the target is one single database , which will have all the records, of all these databases, there is no primary key in my source table.

Can any one suggest any suitable way for the above problem, Our source and target databases ar MYSQL

user3123372
  • 704
  • 1
  • 10
  • 26

1 Answers1

1

This is pretty much what I have been doing or a living the past few years, and my gut instinct is that the time to read 500,000 items from the source database and sync in the destination will not take as much time as one might think and the time taken to read the "key" fields, compute the MD5 hash, and cross check with your table to avoid syncing items that haven't changed won't end up saving too much time and may even run longer. I'd simply read all and update all. If that results in a runtime that is too long, then I'd compress the runtime by making the ETL muti-threaded, with each thread only operating on a segment of the table but working in parallel.

It would be important to ensure that your destination database has a primary key index or unique index. Otherwise, each of your updates/inserts could lock the entire table. This would be bad if you are taking the multithreaded approach, but important even if you are remaining single-threaded because your job could lock the destination DB table and interfere with the application that rides on top of that DB.

You say the source DB "may be DB2". When you say "may" it implies that DB is still being designed/planned? DB2 9 or above does have built-in tracking of last update time, and the ability to query and get back only the items that have changed since a point in time. Perhaps this is why the DB was designed to not have a column indicating the last updated time, eg:

SELECT * FROM T1 WHERE ROW CHANGE TIMESTAMP FOR TAB t1 > current timestamp - 1 hours;

The timestamp cutoff for the above query would be the last timestamp your sync ran.

If this is the case, that should solve your problem. But, your solution would end up being tied very tightly to DB2 and in the future they may like to move to another DB platform and expect your sync job to not need to be re-visited. So it would be important to make sure all the right people know that your product will be dependant on remaining on DB2, or if they plan to migrate that migration would include restructuring the DB to have a "last changed timestamp" column, and make whatever changes necessary at the app level to populate that field.

Hope it helps! Thanks

Ankur Makadia
  • 17
  • 1
  • 5
  • Thanks a ton for your time, Both my source and target database are MYSQL, What I have planned that I have created another table in my local database which acts as a Q. I will write triggers in my source table, which will dump data in this new table, whenever there is a insert in the source table, I will treat this new table as a source now, and just insert all the rows, in the remote table. and upon successful completion will delete those many rows from this new table. Please let me know if you think this seems suitable. – user3123372 Apr 25 '18 at 05:04