6

The question is quite simple: When it comes to touching the disk, are these two examples equal, or does scenario #2 touch the disk twice?

Scenario #1

include '/path/to/file.php';

Scenario #2

if (file_exists('/path/to/file.php'))
    include '/path/to/file.php';

I know that scenario #1 touches the disk once. Now, as I understand it file_exists() caches the path and whether or not the file exists. In order to clear that cache you need to call clearstatcache().

But does include, et alii, also use that cache? Or is it exclusive to file_exists()?

KingCrunch
  • 128,817
  • 21
  • 151
  • 173
Sverri M. Olsen
  • 13,055
  • 3
  • 36
  • 52
  • 7
    You shouldn't care about this. `file_exists` is going to issue a `stat`. That `stat` pales in comparison to the I/O load of reading the entire file. It sounds like you're trying to perform a micro-optimization. If you really want to speed things up, profile your code with xdebug or xhprof. – Charles Dec 23 '12 at 01:37
  • 2
    I will add that if you really want a *file* and not a *directory* it's preferable to use `is_file()`. Regardless of the above comment, I think it's a fair question - but not so much for "optimization" reasons. – Wesley Murch Dec 23 '12 at 01:39
  • 1
    So you ask that question. First thing I would do is to look into PHP source code. Have you tried that? http://lxr.php.net/ - See as well: http://php.net/sites.php - another thing you can do is to run PHP with strace and look what's going on. – hakre Dec 23 '12 at 01:44
  • 1
    @Charles I profile all my code, and that is why I am asking this question. Autoloading classes is showing up in my profiling, which is why I am asking about this. Database calls and such should take up the lion-share of the profiling, but it does not. I know autoloading is slow, almost by definition, but if I can speed it up just a little then it will be worth it. – Sverri M. Olsen Dec 23 '12 at 01:47
  • 1
    Only in some cases does it make sense to combine `is_file()/file_exists()` with `include`... autoloading classes is not one of them, unless I'm missing something here. `include` will just silently fail, so why bother checking if the file exists first? Your question seems diluted by your reason for asking it. – Wesley Murch Dec 23 '12 at 01:48
  • ...unless it's because you don't want `include` to check all the available include paths, which I suppose could makes sense. – Wesley Murch Dec 23 '12 at 01:53
  • I'm pretty sure the answer is no. The obvious problem here is race condition. The underlying inode could have changed between the calls to stat() and fopen(). The OS disk cache probably keep some data structures in memory. It's not really something that can be safely done in user land. – cleong Dec 23 '12 at 01:55
  • @SverriM.Olsen, is the *autoloader* showing up, or is the actual act of *loading* the file showing up? Perhaps it's time for a bytecode cache? – Charles Dec 23 '12 at 01:56
  • @Charles I use APC. The problem is that bytecode caches cannot cache conditional includes/requires very efficiently. It has almost no effect. – Sverri M. Olsen Dec 23 '12 at 02:01
  • 1
    Hmm, I think that is a pretty pre-condition you should have notice upfront with your question (APC). I mean that really is different to bare-metal PHP. – hakre Dec 23 '12 at 02:06
  • @Wesley Murch: For your autoloader you might want to `require` even so that it's clear if that magic chunk fails that you know why. Not that some files get parsed that should have not. Bail out early, bail out often. Use the compiler. – hakre Dec 23 '12 at 02:22
  • @hakre: Still it does not make sense, checking if a file exists then using `require`. There's no reason to check unless you want to do something else if the file doesn't exist. I'm open to the possibility that I've overlooked something, but I'm not seeing it right now (with such little context). – Wesley Murch Dec 23 '12 at 02:25

2 Answers2

5

Just one little thing to remind: include uses include path. file_exists doesn't. Apart from that you are obviously looking for problems instead of solutions (which must not be wrong, just saying, my answer might not fulfill what you look for, covers only a fragment).

hakre
  • 193,403
  • 52
  • 435
  • 836
3

Both of these examples touch the disk twice – reading the directory and then reading the file. In the first example, this both happens during one command, the second command splits them. It’s very unlikely that the include() will read the directory again, as your OS should contain some sort of HD cache, that should last at least this long.

But you are obviously trying to over-optimize something. Unless you are going to this >100 times in your script, there will not be any performance-difference whatsoever between your two options.

Chronial
  • 66,706
  • 14
  • 93
  • 99
  • I MAY be doing it 100+ times. It is for autoloading classes, and I cannot always be sure how many classes will be loaded. – Sverri M. Olsen Dec 23 '12 at 01:51
  • @SverriM.Olsen: What is the point of the `file_exists()` call? – Wesley Murch Dec 23 '12 at 01:51
  • @WesleyMurch It is primarily to speed up the code. Touching the disk is notoriously slow, so I would rather avoid it. And please do not tell me it is not worth worrying about. The majority of PHP software out there is painfully slow because no one cares about things like this. When you slap a PHP framework or similar on a shared host, and you have 100+ users, it becomes almost unusable. – Sverri M. Olsen Dec 23 '12 at 01:58
  • 1
    @SverriM.Olsen: So you have concluded that it's faster to check if the file exists first rather than just blindly attempt an `include`? Doesn't that defeat the point of your question? I'm not trying to tell you performance doesn't matter, I'm just trying to make sense of your question. – Wesley Murch Dec 23 '12 at 02:00
  • @WesleyMurch I am sorry. I must have misunderstood the question. Checking if the file exists obviously adds another function call, which makes it slower. The point of the `file_exists()` is more so that I can catch files that do not exist and do something about it, like logging it or maybe showing an error message appropriate for the environment (i.e. development, production, etc.). My primary concern at this moment, however, is performance. – Sverri M. Olsen Dec 23 '12 at 02:32
  • @SverriM.Olsen: That reason makes sense to me, but still defeats the point of your question, practically speaking. If you *must* check if the file exists first because you want to "do something" if it doesn't - then it seems you cannot use "Scenario #1" - so I'm puzzled why you asked the question to begin with. – Wesley Murch Dec 23 '12 at 02:51