I'm working on a project where users could upload excel files into a MySQL database. Those files are the main source of our data as they come directly from the contractors working with the company. They contain a large number of rows (23000 on average for each file) and 100 columns for each row!
The problem I am facing currently is that the same file could be changed by someone (either the contractor or the company) and when re-uploading it, my system should detect changes, update the actual data, and save the action (The fact that the cell went from a value to another value :: oldValue -> newValue) so we can go back and run a versions comparison (e.g 3 re-uploads === 3 versions). (oldValue Version1 VS newValue Version5)
I developed a tiny mechanism for saving the changes => I have a table to save Imports data (each time a user import a file a new row will be inserted in this table) and another table for saving the actual changes
I save the id of the row that have some changes, as well as the id and the table where the actual data was modified (Uploading a file results in a insertion in multiple tables, so whenever a change occurs, I need to know in which table that happened). I also save the new value and the old value which is gonna help me with restoring the "archives data".
- To restore a version : SELECT * FROM 'Archive' WHERE idImport = ${versionNumber}
- To restore a version for one row : SELECT * FROM 'Archive' WHERE idImport = ${versionNumber} and rowId = ${rowId}
- To restore all version for one row : SELECT * FROM 'Archive' WHERE rowId = ${rowId}
- To restore version for one table : SELECT * FROM 'Archine' WHERE tableName = ${table}
- Etc.
Now with this structure, I'm struggling to restore a version or to run a comparaison between two versions, which makes think that I've came up with a wrong approach since it makes it hard to do the job! I am trying to know if anyone had done this before or what a good approach would look like?
Cases when things get really messy :
- The rows that have changed in a version might not have changed in the other version (I am working on a time machine to search in other versions when this happens)
- The rows have changed in both versions but not the same fields. (Say we have a user table, the data of the user with id 15 have changed in 2nd and 5th upload, great! Now for the second version only the name was changed, but for the fifth version his address was changed! When comparing these two versions, we will run into a problem constrcuting our data array. name went from "some"-> NULL (Name was never null. No name changes in 5th version) and address went from NULL -> "some' is which obviously wrong).
My actual approach (php)
<?php
//Join records sets and Compare them
foreach ($firstRecord as $frecord) {
//Retrieve first record fields that have changed
$fFields = $frecord->fieldName;
//Check if the same record have changed in the second version as well
$sId = array_search($frecord->idRecord, $secondRecord);
if($sId) {
$srecord = $secondRecord[$sId];
//Retrieve straversee fields that have changed
$sFields = $srecord->fieldName;
//Compare the two records fields
foreach ($fFields as $fField) {
$sfId = array_search($fField, $sFields);
//The same field for the same record was changed in both version (perfect case)
if($sfId) {
$sField = $sFields[$sfId];
$deltaRow[$fField]["oldValue"] = $frecord->deltaValue;
$deltaRow[$fField]["newValue"] = $srecord->deltaValue;
//Delete the checked field from the second version traversee to avoid re-checking
unset($sField[$sfId]);
}
//The changed field in V1 was not found in V2 -> Lookup for a value
else {
$deltaRow[$fField]["oldValue"] = $frecord->deltaValue;
$deltaRow[$fField]["newValue"] = $this->valueLookUp();
}
}
$dataArray[] = $deltaRow;
//Delete the checked record from the second version set to avoid re-checking
unset($secondRecord[$srecord]);
}
I don't know how to deal with that, as I said I m working on a value lookup algorithm so when no data found in a version I will try to find it in the versions between theses two so I can construct my data array. I would be very happy if anyone could give some hints, ideas, improvements so I can go futher with that.
Thank you!