11

Hi this is more of an information request really.

I'm currently working on a pretty large event listing website and have started thinking about some caching for the data sets being used.

I have been messing with APC this week and have seen some real improvements during testing however what I'm struggling to get my head around is best practices and techniques required when trying to cache data that changes frequently.

Say for example the user hits the home page, this by default displays the latest 10 events happening and if that user is logged in those events are location specific. Is it possible to deploy some kind of caching system when dealing with logged in states and data that changes frequently, the system currently allows the user to "show more events: which is an ajax request to pull extra results from the db.

I haven't really found anything on this as I'm not sure what to search for but I'm really interested to know the techniques used for advanced caching systems that deal especially with data that changes and data specific to users?

I mean is it even worth it? are the other performance boosters when dealing with this sort of criteria?

Any articles or tips and info on this will be greatly appreciated!! Please let me know if any other info is required!!

user229044
  • 232,980
  • 40
  • 330
  • 338
Mike Waites
  • 1,688
  • 3
  • 19
  • 26

3 Answers3

7

Your basic solutions are:

  • file cache
  • memcached/redis
  • APC

Each used for slightly different goal.

File cache is usually something that you utilize when you can pre-render files or parts of them. It is used in templating solutions, partial views (mvc), css frameworks. That sort of stuff.

Memcached and redis are both more or less equal, except redis is more of a noSQL oriented thing. They are used for distributed cache ( multiple servers , same cached data ) and for storing the sessions, if you have cluster of webservers.

APC is good for two things: opcode cache and data cache. Faster then memcached, but works for each server separately.


Bottom line is : in a huge project you will use all of them. Each for a different task.

tereško
  • 58,060
  • 25
  • 98
  • 150
  • 1
    Hey thanks for the answer so I guess the best place to start is to make sure the database is optimized to its full potential that work up from there what I still can't get my head around is how you implement caching techniques for logged in states for example if you log in and you are from London how would you go about implementing a cache for those results when that content is specific to the user does that make sense? – Mike Waites May 07 '11 at 09:21
4

So you have opcode caching, which speeds things up by saving already compiled PHP files in cache.

Then you have data caching, where you save variables or objects that take time to get like data built from SQL queries.

Then you have output caching, which is where you save entire blocks of your webpages in files, and output those files instead of building that block of your webpage on each request.

I once wrote a blog post about how to do output caching:

http://www.spotlesswebdesign.com/blog.php?id=17

If it's location specific, and there are a billion locations, your best bet is probably output caching assuming you have a lot of disc space, but you will have to use your head for what is best, as each situation is very different when it comes to how best to apply caching.

dqhendricks
  • 19,030
  • 11
  • 50
  • 83
  • output caching it's bad practice, and caching in files is worst caching, which can be used. – OZ_ May 06 '11 at 17:37
  • @OZ_ why is it bad practice? if you have a crap load of stuff to cache, you won't be able to cache everything to memory... disc is the next best thing. do you have anything to back up your statement? – dqhendricks May 06 '11 at 17:46
  • OZ, why is output caching bad, and specifically why is caching to a file bad. If not to a file, then where? – Jonathan Beebe May 06 '11 at 17:47
  • @dqhendricks well.. because. What are you want to cache? HTML tags of page? They will not be changed almost never, so reading them from one file (template) and putting to another ("cache") it's just wasting of resources. Only data should be cached, not representation. And then your cache will not be filled by crap. – OZ_ May 06 '11 at 18:19
  • @somethingkindawierd then memory. File - slowest part of system. In most cases fetching data from DB will be faster, than reading from file-cache (and because DB uses memory for cache). – OZ_ May 06 '11 at 18:20
  • @OZ_ um no. 1. Outputting a file is faster than a DB call in pretty much every case. Do some benchmarks... 2. In some cases you cannot store all of your caching to memory, as memory is a fairly limited resource. 3. Output is more than moving something from a template file to a cache file, it stops you from having to assemble all of the data used to create that page. Think, instead of loading 20 classes doing 3 db calls, and a bunch of if statements, you could instead simply output the already assembled page. It's not much different than having apache output a static HTML page at that point.. – dqhendricks May 06 '11 at 18:27
  • @OZ_ do some research and some actual benchmarks before you talk about things that you are just guessing at or don't fully understand. – dqhendricks May 06 '11 at 18:28
  • @dqhendricks lol, why do you think I don't understand it? He-he :) Do these benchmarks self, I know that files works much slowly than memory. Do you know, how fast server works, when swap-memory used? Ask sysadmins, if they likes 'swapping' :) And about '1' - DB uses memory cache, that's because simple queries will be faster fetch from DB than from files. About '3': if data already cached there is no need to fetch it from db, so it will be just moving bytes from 1 file to another. – OZ_ May 06 '11 at 18:37
  • @OZ_ if you have a potential 100000 different queries that could be run for this location based information, your rdbms is not going to cache all of the results... every time you do an include within your code, you are accessing a file. I don't know why you think it is that slow. connecting to a db is much slower. true disc is slower than memory, which is why this technique is only used in situations where you have too much data to cache for the amount of memory you have... your server wouldn't have to do swapping if you weren't already using too much memory... – dqhendricks May 06 '11 at 18:44
  • @dqhendricks `your rdbms is not going to cache all of the results` ... 1) it depends on DB's cache buffer size; 2) DB can keep file-handlers opened between requests; 3) doesn't matter, how many requests you have totally (it's just about DB settings tuning), only matter how many requests you have per request. And for some cases will be even better to store full DB data in memory. Read about Redis, for example. `I don't know why you think it is that slow` - try to use xDebug, as variant. And server will not run into swapping because of cachers - size of memory cache always limited by cachers. – OZ_ May 06 '11 at 19:08
  • @OZ_ i have benchmarked, and outputting a static file is faster than processing any page that that connects to the db in pretty much every case. my db is not stored in memory of course. any mature db that has a decent amount of data will most likely not fit in memory. my point is memory is limited, you cannot use it to cache everything if you have a large system with a lot of data. memory caching should be used for many things. for some things, disc is the better option however. while you may run out of memory when caching, you will almost never run out of disc. – dqhendricks May 06 '11 at 19:19
  • @dqhendricks maybe your db configured wrong. To finalize: files can be used as cache storage, only when other storages aren't available. If you haven't enough memory for cache - just buy this memory. Memory works faster, than disk, it's axiom, so use memory as cache storage. I have no time to arguing about this, sorry. – OZ_ May 06 '11 at 19:38
  • @OZ_ I just don't understand how you could think processing and generating a page with PHP is going to be faster than outputting a static file. – dqhendricks May 06 '11 at 20:32
  • @dqhendricks you don't understand me. Representation should not be cached, only data. We shouldn't jump over M and C in MVC. If all data can be fetched quickly (from cache or from DB - doesn't matter), generating a page will not take long time. And even if templates generating works so bad so we need cache results (like Smarty, for example), we can use memory to cache generated templates. But when we haven't enough memory - then files. But only we haven't enough memory and we can't buy more RAM or servers for memcache. – OZ_ May 06 '11 at 20:55
  • @OZ_ you don't jump over m and c, you jump over m and v. – dqhendricks May 06 '11 at 20:59
  • @dqhendricks in your code, provided by link, no any mutexes to organize multi-thread using of files, so forget about using this code with `potential 100000 different queries`. And I guess your using of MVC are wrong, if you can create `$unique_name_for_this_page_output` - it more looks like 'one script per page' style, not MVC. – OZ_ May 08 '11 at 08:56
  • @OZ_ that code is for learning how it works. ad your own mutexes/semaphores if you want production level code. there is no one script per page style. unique name can be any concatenation of userid, variables, and whatever other identifier you choose. figure it out. each block of your page may have it's own use of models and views that can be wrapped in a conditional with its own id. – dqhendricks May 08 '11 at 17:56
2

If done correctly, using memcached or similar solutions can give huge boosts to site performance. By altering the cached data directly instead of rehydrating it from the database you can bypass the database entirely for data that either doesn't need to be saved or can be trivially rebuilt. Since the database is often the most critical component in web applications, any load you can take off it is a bonus.

On the other hand, making sure your database queries are as light and efficient as possible will have a much larger impact on performance than most cache tweaks.

Kaivosukeltaja
  • 15,541
  • 4
  • 40
  • 70