-1

I want to implement emails blacklist for my system. In that list I would like to store emails which are not deliverable.

When my system can't deliver email, I would like to save it, and later never send them again.

There are two possible easiest solution what I could see:

When recipient is non exist,

  1. I could set mark about that in User table
  2. Or collect such bad emails in specific table

2nd solution is more easiest seems to me (because easier can be cached), but there is a question about choosen approach...

In MyMail function I will implement a call to search current email in bad emails list (table) first and then if it is not there send it out, otherwise cancel sending.

So main question what is could be faster: 1. Making query each time to DB to search particular key like, SELECT 1 FROM table WHERE email='checking_email'. And of course that table will have only one field email and that field will be indexed(unique). And I will use only strict comparision. 2. or cache that table content in ONE array and making query to array to check is particular element|key exist or not ?

In case when bad email list could be as long as much, for example 10 million records.

What will be faster ? and better for high loading project ?

P.S. I know that PHP array's eat too much memory, But doing a lot of DB queries in case when newsletter sending is started is also not so good.

P.P.S. I'm going to cache bad emails in one variable this is why in PHP it will be present as array.

P.P.S. Another approach is caching each bad email in each cache key and checking only if such key exist in cache or not, but in that case purging the cache will be more complicated. I'm thinking this is the best solution. Keys in cache can be like, bad_email_. And logic of that pre sending call could be: check if required email exist in cache, if not exist, then check in DB if it is not present and there too, then it is good email. From other point if system detect bad email then it will be saved in DB and in Cache in the same time.

user1016265
  • 2,307
  • 3
  • 32
  • 49

2 Answers2

0

Rather than worrying about whether the lookup in an array is faster or not, worry about where you're going to store that array. It needs to be serialized somewhere while the script is not running, then loaded into memory when you need it. In the time it takes to do that, you could have run maybe thousands of queries against the database. If your array may really grow into the millions of entries, a pure PHP array is a bad solution.

If you're worried about too many queries to the database, batch them. Query several hundred or thousand emails at once. Use prepared statements, which makes this even faster.

deceze
  • 510,633
  • 85
  • 743
  • 889
  • But what do you thinkg about caching them into Cache as each email address it's own cache key ? – user1016265 Nov 07 '12 at 08:38
  • Depends on the cache. Flat files? No, way too slow. Memcached? It's volatile, so you need another data *store* as well and load the data into Memcached first, same problem. Something like Redis? Well, then you're basically already employing a database... – deceze Nov 07 '12 at 08:39
  • cache will be memcache and data will be saved in MySQL and logic how memcache will be filled I described in P.P.S – user1016265 Nov 07 '12 at 08:41
  • As with anything performance related: ***test it***. Fill a database with a million randomly generated emails, then make test queries. If this is fast enough (it probably will be), stop looking for other solutions. – deceze Nov 07 '12 at 08:42
  • This is why I've decided to ask here fist, maybe someone has experience already in such tasks. – user1016265 Nov 07 '12 at 08:43
  • In my experience databases are faster than you think...! :) Or at least can be made so with some carefully planned queries. – deceze Nov 07 '12 at 08:48
  • It can be true, but my project is visited enough and System makes a lot of other queries, this is thy I'm thinking about optimization. – user1016265 Nov 07 '12 at 08:49
0

I would definitely use a cache of some kind. If you could explore further options I would recommend using Redis (an in memory key/value DB), It was pretty much designed for this kind of purpose, and it is fast, really fast. http://redis.io/

bm_i
  • 568
  • 5
  • 9