7

I'm loading XML files from disk using file_get_contents, and as a test I find I can load a 156K file using file_get_contents() 1,000 times in 3.99 seconds. I've subclassed the part that does the loading and replaced it with a memcache layer, and on my dev machine find I can do 1000 loads of the same document in 4.54 seconds.

I appreciate that file_get_contents() will do some caching, but it looks like it is actually faster than a well-known caching technique. On a single server, is the performance of file_get_contents() as good as one can get?

I'm on PHP 5.2.17 via Macports, OS X 10.6.8.

Edit: I've found on XML documents of this size, there is a small benefit to be had in using the MEMCACHE_COMPRESSED flag. 1,500 loads via memcache are done in 6.44 sec (with compression) rather than 6.74 (without). However both are slower than file_get_contents, which does the same number of loads in 5.71 sec.

halfer
  • 19,824
  • 17
  • 99
  • 186

3 Answers3

11

Because file_get_contents mmaps the file and so you'll only have a few file system calls and this will end up in the file system cache. memcache involves out-of-process calls to the memcached (and out of server on a clustered implementation).

The performance of file_get_contents() crucially depends on the type of file system, for example a file on an NFS mounted file system is not mmapped and this access can be a LOT slower. Also on a multi-user server, the file system cache can get rapidly flushed by other processes whereas the memcached cache will almost certainly be in memory.

TerryE
  • 10,724
  • 5
  • 26
  • 48
  • Ah, interesting. Are you saying that on a web server in which various XML files are being loaded all the time, the performance of the memcache approach _might_ in some cases be better? (I expect to deploy on a low-end Linux VPS, probably with 512MB of RAM - plenty free for the default 64M that memcache reserves by default.) – halfer Mar 04 '12 at 14:56
  • Still, I wonder whether the contention that would knock something out of FS cache would be as equally likely to knock something out of memcache cache `;-)` – halfer Mar 04 '12 at 15:05
  • On a VPS which is dedicated to one App, you should be able to get everything to fit. However its worth "right-sizing" your caches: use 32M for memcache or less if that's enough. Make sure your using APC or Xcache if your app is PHP based. Don't forget that you can get good performance dividend from tuning the MySQL caches if you use MySQL, ... 512Mb is small enough that you need to allocate wisely. – TerryE Mar 04 '12 at 15:51
  • Good advice, although I'm some way off deploying. It'll be pretty low-traffic anyway, I should think - just a side project! – halfer Mar 04 '12 at 16:01
  • A tick for TerryE, a +1 for @Mantriur - both very helpful. Thank you. – halfer Mar 04 '12 at 17:51
3

file_get_contents is the simplest way to retrieve a file. The underlying operating system (especially linux) already has efficient caching mechanisms. Anything else you do just creates overhead and slows things down.

Memcache would make sense if you loaded these files from a remote location.

Edit: It is not necessarily true that file_get_contents is the simplest way. fopen/fget might be even faster - I don't know. But the differences should be minor compared to the complexity of a caching layer.

Mantriur
  • 986
  • 7
  • 20
  • I expect you're right. I guess I was expecting `file_get_contents` to perform _some_ disk activity every time (perhaps to see if a file has changed), whereas `memcache_get` need do none at all. Hence my expectation that the memcache approach would be faster... nevertheless it hasn't been a waste of time, since I think I have just learnt something :) – halfer Mar 04 '12 at 14:48
  • On a very generalized level the FS cache does the same thing as your PHP memcache - except that it is far more specialized and not written in a scripting language. :) The cache doesn't need to access the file a second time unless the file contents changed. A separate caching mechanism would make sense if there's a lot of IO activity on the system flushing the file cache. – Mantriur Mar 04 '12 at 14:53
  • A tick for @TerryE, a +1 for Mantriur - both very helpful. Thank you. – halfer Mar 04 '12 at 17:52
0

Storing XML files in memcache makes very little sense to me.

I'd rather store parsed values, saving me both reading and parsing.

halfer
  • 19,824
  • 17
  • 99
  • 186
Your Common Sense
  • 156,878
  • 40
  • 214
  • 345
  • Looks like you are confusing storage formats and have no idea how to properly use them. serializing XML objects is just weird. Same for the storing arrays in memcache which is ALREADY AN ARRAY – Your Common Sense Mar 04 '12 at 15:46
  • abrasive or not - it's merely a description of all that mess of storing XML files in the memory. – Your Common Sense Mar 04 '12 at 15:58