I am investigating a slow running SQL query on a production mySQL database, and looking for options on improving performance. I did not design or implement this, but I do need to fix it.
The intended purpose of the SQL is to check if the same datapacket has previously been inserted, and if so return the IDs of those previously inserted rows so the data inserted is not a duplicate. It attempts to do this with an INNER JOIN on itself via the LONGTEXT 'datapacket' column (which contain up to 60,000 characters of JSON data). There are currently close to 1 million records in this table, the SQL takes approx 30-60s to run each time, and this query runs hundreds-thousands of times each day.
CREATE TABLE `T_Upload` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`type` varchar(30) NOT NULL,
`datapacket` longtext,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `idx_uploadtype` (`type`),
KEY `idx_timestamp` (`timestamp`)
) ENGINE=InnoDB CHARSET=ascii;
EXPLAIN
SELECT priorDuplicate.id
FROM T_Upload u INNER JOIN T_Upload priorDuplicate ON priorDuplicate.files = u.files
AND u.id > priorDuplicate.id
WHERE u.id = 3277515
AND u.type = 'mobile'
When I Run EXPLAIN on the SQL, I get..."Impossible WHERE noticed after reading const tables".
So, my Questions are:
Is this SQL always returning an empty recordset as per the 'EXPLAIN', and therefore a complete waste of system time & resources?
Is converting the LONGTEXT to a VARCHAR(65000) with an INDEX on first 20 characters (which contain a unique datapacket ID) a viable alternative?