Best Practice for Skipping Duplicate Entries in MySQL

Question

I have written a feed aggregator before but am trying to optimize it a bit. In the past, using simplepie (php class) to parse the feeds, I have used the get_id() function for each feed item to return a hash (an md5 mix of link + title). I store this "id" as the "remote_id" in MySQL. However to ensure that I have no duplicates I've been doing a SELECT query for each feed item to ensure that the "remote_id" does not exist. This seems inefficient considering I am looking at 1000's of feeds.

Is it most efficient to just turn remote_id into a unique key and then let the database fail to write the new record on each pass? Any other way to engineer this that is better?

score 1 · Accepted Answer · answered Jun 07 '11 at 09:23

Yes, if a key should be unique in mysql, it's generally a good idea to define it as a unique key.

When inserting possible duplicates you may use PDO and try {} catch () {} statements to filter them out, they will throw an exception. You won't have to check beforehand.

I use something like this in a similar situation (pseudocode alert):

        $stmnt = $this->dbh->prepare('INSERT QUERY');  

        try {
            $this->dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
            $this->dbh->beginTransaction();

            $stmnt->execute($celss);

            $this->dbh->commit();
        } catch (Exception $e)
        {
            $this->dbh->rollback();
            $this->invalidRows[] = array($cells,$e->getMessage());
            continue;
        }

Best Practice for Skipping Duplicate Entries in MySQL

1 Answers1