I have 2 columns in the table which have incorrect entries. The size of the table runs in to billions of records. I had like to swap data between two columns (c1 and C2) The approach taken is to export the data in small chunks in to CSV files and then import it back with corrected entries. For example, below is the data set
--------
|C1 | C2 |
|3 | 4 |
|4 | 6 |
I would then export the data in to a semicolon delimted CSV file (complete command NOT shown) as below
SELECT C2,C1 FROM TABLE temp INTO OUTFILE /tmp/test.csv
The output of such command would be
4;3
6;4
When I import back the data (after deleting the data in question), the data will be corrected as follows
| C1 C2 |
| 3 4 |
| 4 6 |
It is really a matter of OUTFILE and INFILE operation, I believe
Question
- Does the approach makes sense? The real data also expects NULL, int values in some of the columns apart from data swaps.
- The other complexity is in the production database, I will need to use the WHERE clause. The table name would also be fetched dynamically.
- With reference to point 2, how do I add dynamicity to the queries. Should I use a STORED procedure or SHELL SCRIPT? STORED Procedure does not seem to support LOAD DATA INFILE functionality.
- If I am left with shell, any sample script that I can reuse? The CSV filename, table name and WHERE clause will have to be built at run time.
- Also the size of the chunk to be exported and imported will be calculated dynamically.
- Any other approach?
Note - This is a INFOBRIGHT column based table on top of mysql. The UPDATE query is non-performant and ALTER TABLE is not supported by INFOBRIGHT.