2

I have written a feed aggregator before but am trying to optimize it a bit. In the past, using simplepie (php class) to parse the feeds, I have used the get_id() function for each feed item to return a hash (an md5 mix of link + title). I store this "id" as the "remote_id" in MySQL. However to ensure that I have no duplicates I've been doing a SELECT query for each feed item to ensure that the "remote_id" does not exist. This seems inefficient considering I am looking at 1000's of feeds.

Is it most efficient to just turn remote_id into a unique key and then let the database fail to write the new record on each pass? Any other way to engineer this that is better?

Justin Johnson
  • 30,978
  • 7
  • 65
  • 89
phirschybar
  • 8,357
  • 12
  • 50
  • 66

1 Answers1

1

Yes, if a key should be unique in mysql, it's generally a good idea to define it as a unique key.

When inserting possible duplicates you may use PDO and try {} catch () {} statements to filter them out, they will throw an exception. You won't have to check beforehand.

I use something like this in a similar situation (pseudocode alert):

        $stmnt = $this->dbh->prepare('INSERT QUERY');  

        try {
            $this->dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
            $this->dbh->beginTransaction();

            $stmnt->execute($celss);

            $this->dbh->commit();
        } catch (Exception $e)
        {
            $this->dbh->rollback();
            $this->invalidRows[] = array($cells,$e->getMessage());
            continue;
        }
Arend
  • 3,741
  • 2
  • 27
  • 37