At my office we have a legacy accounting system that stores all of its data in plaintext files (TXT extension) with fixed-width records. Each data file is named e.g., FILESALE.TXT. My goal is to bring this data into our MySQL server for read-only usage by many other programs that can't interface with the legacy software. Each file is essentially one table.
There are about 20 files in total that I need to access, roughly 1gb of total data. Each line might be 350-400 characters wide and have 30-40 columns. After pulling the data in, no MySQL table is much bigger than 100mb.
The legacy accounting system can modify any row in the text file, delete old rows (it has a deleted record marker -- 0x7F), and add new rows at any time.
For several years now I have been running a cron job every 5 minutes that:
- Checks each data file for last modification time.
- If the file is not modified, skip it. Otherwise:
- Parse the data file, clean up any issues (very simple checks only), and spit out a tab-delimited file of the columns I need (some of the columns I just ignore).
TRUNCATE the table and imports the new data into our MySQL server like this:
START TRANSACTION; TRUNCATE legacy_sales; LOAD DATA INFILE '/tmp/filesale.data' INTO TABLE legacy_sales; COMMIT;
The cron script runs each file check and parse in parallel, so the whole updating process doesn't really take very long. The biggest table (changed infrequently) takes ~30 seconds to update, but most of the tables take less than 5 seconds.
This has been working ok, but there are some issues. I guess it messes with database caching, so each time I have to TRUNCATE and LOAD a table, other programs that use the MySQL database are slow at first. Additionally, when I switched to running the updates in parallel, the database can be in a slightly inconsistent state for a few seconds.
This whole process seems horribly inefficient! Is there a better way to approach this problem? Any thoughts on optimizations or procedures that might be worth investigating? Any neat tricks from anyone who faced a similar situation?
Thanks!