2

I have a news site which receives around 58,000 hits a day for 36,000 articles. Of this 36000 unique stories, 30000 get only 1 hit (majority of which are search engine crawlers) and only 250 stories get over 20 impressions. It is a wastage of memory to cache anything, but these 250 articles.

Currently I am using MySQL Query Cache and xcache for data caching. The table is updated every 5-10 mins, hence Query Cache alone is not much useful. How can I detect frequently visited pages alone and cache the data?

Joyce Babu
  • 19,602
  • 13
  • 62
  • 97
  • 2
    Is waste of memory *really* an issue? How large is each story? – Pekka Jan 29 '11 at 11:38
  • A serialized row is about 2000 to 5000 characters long. – Joyce Babu Jan 29 '11 at 11:57
  • It's not that much, storage is cheap. You'll probably spend much more money in finding a solution than in buying more memory. However, if the data is just plain text, why not compress it? – Boris Guéry Jan 29 '11 at 14:45
  • @Boris - I have enough free memory and there is no chance of getting OOM in near future. I want to know whether there is a logical solution to this. – Joyce Babu Jan 29 '11 at 18:31

2 Answers2

4

I think you can have two options to start with:

  1. You don't cache anything by default.

    You can implement with an Observer/Observable pattern a way to trigger an event when the article's view reaches a threshold, and start caching the page.

  2. You cache every article at creation

In both case, you can use a cron to purge articles which don't reaches your defined threshold.

In any case, you'll probably need to use any heuristic method to determine enough early that your article will need to be cached, and as in any heuristic method, you'll have false-positive and vice-versa.

It'll depend on how your content is read, if articles are realtime news, it'll probably be efficient as it'll quickly generate high traffic.

The main problem with those method is you'll need to store extra information like the last access datetime and its current page views which could result in extra queries.

Boris Guéry
  • 47,316
  • 8
  • 52
  • 87
1

You can cache only new articles (let's say the ones which have been added recently). I'd suggest having a look at memcached and Redis - they are both very useful, simple and at the same time powerful caching engines.

itsmeee
  • 1,627
  • 11
  • 12
  • Checking the creation date is a good idea. That can reduce the number of articles considerably. From what I hear, since I am not using advanced features, xcache is much faster thn memcached and Redis. – Joyce Babu Jan 29 '11 at 18:22