1

The title of this question might be confusing but the problem is simple.

I'm using Zend_Cache with memcached as a backend. I have two module called "Last articles" and "Popular articles". Both of this module are on every pages and use a similar query such as :

Select * from table where status = 'published' and category = '' order by dateCreated|/popularity\

My table have 1.5 million rows so far. I have indexes on every field that I'm using in the previous query. I cache the recent articles for 1hour and the popular for 4hours. I have 4 web server (php5/apache2) and 1 database server (mysql). The table engine is innoDB.

The problem some time my cache expire right in the middle of a heavy load, which make my web site unavailable until those modules are cached again. I could had a new MYSQL server.

But is there a way to handle the caching in a smarter way? Like for example the server1 will try to refresh the cache while server 2,3 and 4 will still use the same value out of the cache.

I can write some code to do that, but I was wondering if there is way to do that directly with Zend_Cache? Of if there is a design pattern that i could apply to my problem?

[EDIT] I want something that I could scale up to 100 servers

j0k
  • 22,600
  • 28
  • 79
  • 90
zzarbi
  • 1,832
  • 3
  • 15
  • 29

4 Answers4

1

Instead of relying on the cache expiring and then being repopulated during an HTTP request (or, more problematically, during several concurrent requests), why not have the cache never expire?

Then schedule some untility script to run your expensive queries (just once!) and update the cache in the background.

timdev
  • 61,857
  • 6
  • 82
  • 92
  • The problem is the query is per "category" and I have few hundred thousand categories and so two queries per category... + the fact that I would refresh some categories that are not visited every day. – zzarbi Jul 05 '11 at 21:11
1

everything is possible :)

distributed memcache (serv1,2,3,4).

use serv4 only for ReCache.

set up a "internal only" webSite (not visible for users).

strip the part that "would refresh some categories".

to get "most read articles" -> parse apache access logs.

and re-submit urls to server4.

there is access time, so you can get only the needed part i.e. from 2 to 6 hours ago.

distributed memcache will auto-populate it's values to serv1,2,3.

Paul Rysevets
  • 39
  • 1
  • 4
  • One thing I forgot to mention I already use distributed memcached on server 1,2,3,4 Also if my service that parse log doesnt work I wont have my module refreshed, if server 4 is not working I wont have my module refresh either. The whole idea of having 4 web server is to accommodate the load but in the same time have redundancy on 4 servers. So if one die I can deal with it until I fix it. With you solution one server can broke my module. I need something where similar, but when server 4 is dead, server two will take his place... – zzarbi Jul 06 '11 at 15:19
1

Is that the actual query you're executing?

Select * from table where status = 'published' and category = '' order by dateCreated|/popularity\

maybe instead of searching for advanced caching solutions, see why this query stresses your database server. A table with 1.5m rows is not something unusual.

Did you try adding a LIMIT clause or select only the columns you require:

Select col1, col2 from table where status = 'published' and category = '' order by dateCreated LIMIT 5

It'll reduce the traffic between the database and the web servers significantly.

aporat
  • 5,922
  • 5
  • 32
  • 54
  • No it's not the actual query. The Actual query is already optimized and looks more like Select * from (Select field1,field2 where category = '') as tmp order by date limit 5. As A said before the query itself is pretty light it's not a slow query, but the fact that all the cache expire at the same time will stress mysql exponentially. – zzarbi Jul 07 '11 at 16:22
0

I finally implemented a class that inherit from Zend_Cache_Backend_Libmemcached I'm overriding the load() method.

Each of my server has there hostname finishing by a set of number such as serv01, serv02, serv03, serv04. The main idea is taht each server will think that the cache expired at different time. For example serv01 will think that the cache is expired 20minutes before it actually expires, serv02 will be 15minutes, serv03 10minutes and serv04 5minutes.

By doing so my cache will never be refresh at the same time on each server, and if one server is down the cache will be refresh by another server.

zzarbi
  • 1,832
  • 3
  • 15
  • 29