I have multiple processes running the exact same script, and the problem is that they are selecting the same data. I have a flag "in_use", which is set to 1 after being selected, but the other processes are too fast and selects the same data before the script itself has been able to update all the data to in_use = 1. How can I, perhaps inside of the query, make sure that the same data is never selected? Locking it somehow?
Right now I update in_use for each process, but it's not fast or secure enough.
DELIMITER //
DROP PROCEDURE IF EXISTS `getDomainsForWhois`;
CREATE PROCEDURE `getDomainsForWhois`(
IN in_tld VARCHAR(10),
IN in_max_limit INT
)
BEGIN
SELECT
d.domainID,
CONCAT(SUBSTRING_INDEX(REPLACE(d.domainName, LOWER(d.tld), ''), '.' , -1), LOWER(d.tld)) as domainName,
d.tld
FROM
Domains d
WHERE (
d.parentDomainID IS NULL
AND
d.tld = in_tld
AND
d.dateFetched <= DATE_SUB(CURRENT_TIMESTAMP(),INTERVAL 1 DAY)
AND
d.in_use = 0
)
GROUP BY
domainName
ORDER BY
d.dateFetched ASC
LIMIT
0,in_max_limit
FOR
UPDATE;
END//
This is basically how I run the script. This script runs simaltaneously from multiple processes (cron):
domains=`mysql -u USER -pPASSWORD DBNAME --skip-column-names -e "LOCK TABLE Domains WRITE; CALL getDomainsForWhois('.$1', $2)"`;
while read domainID domainName domainTld
do
mysql -u USER -pPASSWORD DBNAME -e "UPDATE Domains SET in_use = 1 WHERE domainID = $domainID";
done << EOF
$domains
EOF
mysql -u USER -pPASSWORD DBNAME -e "UNLOCK TABLES;";