0

My question is kind of hard to explain in title so I'll show the data and goal. There is a MySQL table with following structure:

CREATE TABLE customerProjectData(
    idCustomer INT NOT NULL,
    idProject INT DEFAULT NULL,
    comePersons SMALLINT DEFAULT NULL,
    comePairs SMALLINT DEFAULT NULL,
    comment VARCHAR(255) DEFAULT NULL,
    idCity INT DEFAULT NULL,
    idStreet INT DEFAULT NULL,
    name VARCHAR(64) DEFAULT NULL,
    surname VARCHAR(64) DEFAULT NULL,
    homeNum VARCHAR(10) DEFAULT NULL,
    postCode CHAR(6) DEFAULT NULL,
    postCity VARCHAR(64) DEFAULT NULL,
    cellPhone VARCHAR(12) DEFAULT NULL
)

The thing is, there shold be also PRIMARY KEY(idCustomer, idProject) defined and it's not. As a result There are kind of duplicates (with the same primary key) but with different data.

I could run ALTER IGNORE TABLE but data loss would probably be unacceptable and unpredicatable. Finally we decided to try to fill null fields with values from duplicates if they contain data and after that run the ALTER IGNORE TABLE. Much less data will be lost that way and it's acceptable in this case (it's better than leaving it as it is now in longer time perspective).

The question is how to fill those fields from each duplicate.

web-nomad
  • 6,003
  • 3
  • 34
  • 49
Joe
  • 2,551
  • 6
  • 38
  • 60

2 Answers2

1

Here is a rough try.

First try to find out the no. of rows which have the same key.

<?php
// $link is the database identifier

$sql = 'SELECT COUNT(*) AS num, * FROM `customerProjectData` GROUP BY `idCustomer`, `idProject` HAVING COUNT(*) > 1 ORDER BY COUNT(*) ASC;';
$run = mysql_query( $sql, $link );

$rows = array();
if( $run && mysql_num_rows( $run ) ) {
    while( ( $fetch = mysql_fetch_assoc( $run ) ) !== false ) {
        $rows[] = $fetch;
    }
}
?>

Now $rows contains a list of all rows which have the same key and a count of how many times this key has been repeated in the table.

You can write a function which then iterates count times and see which rows has the complete data and use that to populates other records with this record's data.

A bit of trial and error.

web-nomad
  • 6,003
  • 3
  • 34
  • 49
0

I used @web-nomad suggestion and did something similar, but in sql procedure:

DROP PROCEDURE IF EXISTS correctCPD$$
CREATE PROCEDURE correctCPD()
BEGIN

    DECLARE currentCustomerId INT;
    DECLARE currentProjectId INT;
    DECLARE cur_idCustomer INT; 
    DECLARE cur_idProject INT;
    DECLARE cur_comePersons SMALLINT;
    DECLARE cur_comePairs SMALLINT;
    DECLARE cur_comment VARCHAR(255);
    DECLARE cur_idCity INT;
    DECLARE cur_idStreet INT;
    DECLARE cur_name VARCHAR(64);
    DECLARE cur_surname VARCHAR(64);
    DECLARE cur_homeNum VARCHAR(10);
    DECLARE cur_postCode CHAR(6);
    DECLARE cur_postCity VARCHAR(64);
    DECLARE cur_cellPhone VARCHAR(12);

    CREATE TEMPORARY TABLE ids (
        idCustomer INT,
        idProject INT
    ) ENGINE = InnoDB;  

    INSERT INTO ids
        SELECT idCustomer, idProject FROM customerprojectdata group by idCustomer, idProject having count(*) > 1;

    BLOCK1: BEGIN
        DECLARE done INT DEFAULT FALSE;
        DECLARE itemCur CURSOR FOR SELECT idCustomer, idProject FROM ids;   
        DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;

        OPEN itemCur;

        itemCurLoop: LOOP
            FETCH itemCur INTO currentCustomerId, currentProjectId;
            IF done THEN
                LEAVE itemCurLoop;
            END IF;

            BLOCK2: BEGIN
                DECLARE doneIn INT DEFAULT FALSE;   
                DECLARE cpdCur CURSOR FOR SELECT idCustomer, idProject, comePersons, comePairs, comment, idCity, idStreet, name, surname, homeNum, postCode, postCity, cellPhone FROM customerProjectData WHERE idCustomer = currentCustomerId AND idProject = currentProjectId;    
                DECLARE CONTINUE HANDLER FOR NOT FOUND SET doneIn = TRUE;

                OPEN cpdCur;
                cpdCurLoop: LOOP
                    FETCH cpdCur INTO 
                        cur_idCustomer, cur_idProject, cur_comePersons, cur_comePairs, 
                        cur_comment, cur_idCity, cur_idStreet, cur_name, cur_surname, 
                        cur_homeNum, cur_postCode, cur_postCity, cur_cellPhone;
                    IF doneIn THEN
                        LEAVE cpdCurLoop;
                    END IF;

                    UPDATE CustomerProjectData SET 
                        comePersons = IF((comePersons IS NULL OR comePersons = '') AND cur_comePersons > 0, cur_comePersons, comePersons),
                        comePairs = IF((comePairs IS NULL OR comePairs = '') AND cur_comePairs > 0, cur_comePairs, comePairs),
                        comment = IF((comment IS NULL OR comment = '') AND cur_comment > 0, cur_comment, comment),
                        idCity = IF((idCity IS NULL AND idStreet IS NULL) AND cur_idCity > 0, cur_idCity, idCity),
                        idStreet = IF(((idCity IS NULL OR idCity = cur_idCity) AND idStreet IS NULL) AND cur_idStreet > 0, cur_idStreet, idStreet),
                        name = IF((name IS NULL OR name = '') AND cur_name > 0, cur_name, name),
                        surname = IF((surname IS NULL OR surname = '') AND cur_surname > 0, cur_surname, surname),
                        homeNum = IF((homeNum IS NULL OR homeNum = '') AND cur_homeNum > 0, cur_homeNum, homeNum),
                        postCode = IF((postCode IS NULL OR postCode = '') AND cur_postCode > 0, cur_postCode, postCode),
                        postCity = IF((postCity IS NULL OR postCity = '') AND cur_postCity > 0, cur_postCity, postCity),
                        cellPhone = IF((cellPhone IS NULL OR cellPhone = '') AND cur_cellPhone > 0, cur_cellPhone, cellPhone)
                    WHERE idCustomer = currentCustomerId AND idProject = currentProjectId;

                END LOOP;
                CLOSE cpdCur;
            END BLOCK2;
        END LOOP;

        CLOSE itemCur;

    END BLOCK1;

    DROP TEMPORARY TABLE ids;

END$$

Thanks for help!

Joe
  • 2,551
  • 6
  • 38
  • 60