0

I have a service that provides HTML code which at some point it is not updated anymore. The code is always generated dynamically from a database with 10 million entries so each HTML code page rendering searches there for say 60 or 70 of those entries and then renders the page.

So, for those expired pages, I want to use a caching system which will be VERY simple (like just enter a record with the rendered HTML and (if I need) remove it).

I tried to do it file-based but the search for the existence of a file and then passing it through php to actually render it , seems like too much for what I want to do.

I was thinking of doing it on mysql with a table with MEDIUMBLOBs (each page is around 100k). It would hold about 150000 such records (for now, at least).

My question is: Would it be faster to let mysql do the lookup of the file and the passing to php or is the file-based approach faster?

The lookup code for the file based version looks like this:

$page = @file_get_contents(getCacheFilename($pageId));
if($page!=NULL) {
    echo $page;
} else {
    renderAndCachePage($pageId);
}

which does one lookup whether it finds the file or not.

The mysql table would just have an ID (the page id) and the blob entry.

The disk of the system is a simple SATA raid 1 , the mysql daemon can grab up to 2.5GB of memory (i have a proxy running too, eating the rest of the 16GB of the machine. )

In general the disk is quite busy already.

My not using PEAR cache, is because I think (please feel free to correct me on this) it adds overhead I do not need because the page rendering code is called about 2M times per day and I wouldn't want to go through the whole code each time (and yes, I have eaccelerator to cache the code too).

Any pointer to what direction I should go, would be greatly welcome.

Thanks!

pataroulis
  • 143
  • 1
  • 2
  • 11
  • What about using a reverse httpd proxy? Users who request a page will retreive it from there instead of the database. – jftuga Sep 13 '12 at 12:28
  • Unfortunately the page is rendered via a script that calls the page HTML through ajax so it always calls the same script, which in turn, reads a javascript variable to retrieve the page id. I already run varnish on the server and I cannot cache something more than the script file itself. – pataroulis Sep 13 '12 at 12:42

2 Answers2

1

I would reccomend using memcached in your particular case but that is in case you have some spare GB's of RAM.

Logic Wreck
  • 1,420
  • 9
  • 8
  • Hmm, my free memory currently is some 100M + around a gigabyte of cached server stuff. I will probably need more than that even though I suppose memcached will be intelligent enough to keep in the cache the most used results. Is that correct ?... – pataroulis Sep 13 '12 at 12:45
  • Does installing memcached require a recompile of apache/php? Will it be compatible with my varnish installation ? – pataroulis Sep 13 '12 at 12:46
  • No need to recompile - you can install it from a package and then just install the php-memcache or php-memcached module for php. – Logic Wreck Sep 13 '12 at 12:50
  • Just tested it and it works like a charm. It seems that I can install the daemon to another server (which has a whole 16GB free) and push the cache there. .. Going to try it right now!!!!! WHOO HOOO – pataroulis Sep 13 '12 at 15:58
  • 1
    I installed memcache and it is super fast-er than the dynamic generation of the pages (Captain obvious speaking...). I will now proceed with distributing the memcache cache to the other, free server I have. !! Thanks for the precious pointer @logic-wreck – pataroulis Sep 13 '12 at 16:52
1

which at some point it is not updated anymore

This is the key really - how do you determine when a particular file is considered frozen? If you don't know then a simple approach would be to set the cache time based on when it was last modified, e.g.

$minquiet=86400; // 1 day
$scale=1;
$ago=filemtime($file);
$cache=(time()-$ago - $minquiet)/$scale;
if ($cache<0) $cache=0;
header("Cache-control: max-age=$cache; must-revalidate");

The question of whether a mysql database or the filesystem is faster is a more complex one.

Which filesystem?

Are the files in a directory hierarchy or all the same dir? If the latter, can they all be in one dir?

You'll certainly get much better performance if you bypass PHP altogether - but that presupposes there is data in the path which can be parsed to determine cacheability.

$page = @file_get_contents(getCacheFilename($pageId));

OMG - that's going to give really poor performance. At least change it to stat() + readfile().

$page = @file_get_contents(getCacheFilename($pageId));
if($page!=NULL) {

Does that mean that you determine if te page is cacheable based on whether it exosts or not? If so then swap this around so that the URL points to where the file should be, and implement the PHP code in the 404 handler.

symcbean
  • 21,009
  • 1
  • 31
  • 52
  • Hi @symcbean! Thanks for taking the time to elaborate!! The point when the page is frozen is known so after freezing the page, I save it and consider it cached. Thanks for the tip on stat + readfile. I will definitely try that! And you surely opened another window with that 404 handler idea! Thanks! – pataroulis Sep 13 '12 at 15:56
  • +1 for the idea of the 404 page. Some url rewrite can surely do the trick! Thanks again! – pataroulis Sep 13 '12 at 16:53