I have a minor problem, hoping someone can shed some light on the situation.
THE SITUATION:
I have a custom partial caching mechanism that is built into a PHP CMS. In short, when a template in the CMS is processed, it processes the 'cachable' PHP code and does not process the 'non-cachable' php code, then the resulting code is saved as a file to be processed for future visits to that page.
THE PROBLEM:
I am experience file access delays when the system is 'looking' for the cached file. 250,000 cached files using GLOB to find matching file takes 1/4s when no traffic is on site - sometimes 10-15s on traffic spikes. Almost seems like 2 separate client sessions cannot run GLOB simultaneously so they bottlneck.
WHAT I AM LOOKING FOR:
... is an alternative method, or optimization to provide block caching, without the bottleneck issues. Must consider my unique concerns (outlined below). I need a faster way to access these files or an alternative partial page cache direction to go :/
==============================================
ABRIDGED CODE:
// var to hold cached page path, if found
$pageCache = NULL;
// get the URL for current page
// there is actually some other code here that could alter the 'theURL4Cache' var for various reasons, but for simplicity in this example lets just keep it the REQUEST_URI
$GLOBALS['theURL4Cache'] = $_SERVER['REQUEST_URI'];
// check if existing cache file is in place
$filePattern = 'parsed/page_cache/*^' . $_SERVER['SERVER_PORT'] . '^' . $_SERVER['HTTP_HOST'] . '^' . (($_SESSION['isMobile']) ? 'M' : 'D') . '^L' . $language . '^T*^P' . $attributes['pageId'] . '^' . md5($GLOBALS['theURL4Cache']) . sha1($GLOBALS['theURL4Cache']) . '.php';
$fileArr = glob($filePattern);
// possible multiple files found that fit / expired files found that fit the pattern
// lets grab the newest file and try to use it
if(count($fileArr)){
rsort($fileArr);
$file = $fileArr[0]; // get file with latest expire date
if($file > 'parsed/page_cache/' . date('Y-m-d-H-i-s')) $pageCache = $file; // set an attribute to hold the valid file path to the cached file
// remove files that are no longer corrent
for($i=(($pageCache === NULL) ? 0 : 1); $i<count($fileArr);$i++) unlink($fileArr[$i]);
};
if($pageCache){
// cached page is found, lets process and output this puppy
include($pageCache);
} else {
// cached page is not found, let's build a cacheable page from the CMS template
$newCode = // ..... various code is processed here to isolate the cacheable code blocks and process while leaving the non-cacheable blocks intact ..... //
// create the new file path where the cached code will be placed
// first we need an expiration date
$cacheDate = date_create();
date_add($cacheDate, date_interval_create_from_date_string( $cache_increment . ' ' . $cache_interval)); // $cache_increment and $cache_interval are stored in the CMS DB for each page, giving the content manager control over the expiration of the page in cache
$filePath = $GLOBALS['iProducts']['physicalRoot'] . "/parsed/page_cache/" . date_format($cacheDate,'Y-m-d-H-i-s') . "^" . $_SERVER['SERVER_PORT'] . '^' . $_SERVER['HTTP_HOST'] . '^' . (($_SESSION['isMobile']) ? 'M' : 'D') ."^L" . $language . "^T" . $templateId . "^P" . $attributes['pageId'] . "^" .md5($GLOBALS['theURL4Cache']) . sha1($GLOBALS['theURL4Cache']) . '.php'; // create the file path
if(file_exists($filePath)) unlink($filePath); // delete the file if it already exists
$fp = fopen($filePath,"w"); // create the new file
flock($fp,LOCK_EX);
fwrite($fp,$newCode); // write the cache file
flock($fp,LOCK_UN);
// now output this puppy
eval($newCode);
};
WHY IS YOUR FILENAME SUCH A MESS, YOU ASK?
Well, I'm glad you asked! Another part of the CMS includes 'smart cache management' where if a page, or a template is modified by a content manager, all cached pages effected are purged from the system. In addition content in a page could vary due to the attribs in the URL query string, if it is a mobile device rendering or not, SSL vs non-SSL, the domain name or the current session language (the engine supports multiple language content all associated to the same page, conditionally outputted based on the session language.
So here is a cached page file name example: 2014-02-14-10-36-36^80^www.mydomain.com^M^L^T42^P41^a067036ef358f12a0049740f035a7ee688dbb0033c19a70163d6c453dbc5b84f1889ffe2.php
Here are the components of the file name: expire-date^port^domain^mobileOrDesktop^Language^Template^Page^md5+sha1OfURL.php
Here are the components explained:
- expire-date: the calculated date/time this cache file should expire based on the content managers entry in the CMS. This can be used by GLOB to filter out all expired files and delete them for clean up via a CRON job. This is also to decide if the cached page is fresh enough to display in the beginning of the code.
- port: 80 or 443 to signify if this was retrieved over SSL. Content may be conditionally different based on the SSL state.
- domain: "www.mywebsite.com" Multiple domain names could be attached to a CMS installation, need to differentiate so two domains with same REQUEST_URI don't show each others' content
- mobile or desktop: "M" or "D" - to allow same URL to 'sniff out' client and serve content accordingly.
- language: "L" if not using multiple languages, "L-ENG / L-GER /..." if using multi language
- template: "T#" so templateId 47 would be "T47" - allows for easy filter by file name to identify all cached pages using a given template to remove when the template is modified in the CMS.
- page: "P#" so pageId 12 would be "P12" - allows for easy filter by file name to identify all cached versions of a given page to remove when the page is modified in the CMS.
- md5+sha1OfURL.php: takes $_SERVER['REQUEST_URI'] and encodes it TWICE (once MD5, once SHA1) concatenating the results to give a (reasonably) unique ID representing the URL (since query strings can impact content).
Any ideas or advice is welcome. Thanks in advance!