3

I have a 'theoretical' question, to see if a solution I'm planing makes sense or not:

I have a script that reads a lot of data out from the Database, with settings, configuration, etc - and builds that togheter (for every registered user). I wont go into too much details why or what exactly.

My Idea was, that I could actually do that only once and create a .inc file, with the ID of the user, to cache it. If the user changes something, the file will be recreated of course.

But now, lets suppose I do that, with 1'000'000 - or even more files. Will I encounter issues, while including those files? (always one specific file, not every file at once). Is that generaly a good idea, or am I just stressing the server even more with this?

And I'm planing to put everything in the same cache folder - will I have performance improvements, if I split that folder up into multiple ones?

Thanks for the help.

Katai
  • 2,773
  • 3
  • 31
  • 45
  • 1
    I strongly doubt that the file read time, especially while making a million files, will be less than that of just reading the database. – Waleed Khan Aug 08 '12 at 14:34
  • The read time wouldnt be less, but I actually have to read the database and calculate some stuff according to the results. I'm basically saving the 'result' as cached file (so that it's not necessary to do (retrieve, analyze, calculate) it every time) – Katai Aug 08 '12 at 14:35
  • 2
    Why not store those "results" in an extra field in the database? But unless you're making heavy and extensive calculations there's no need for that. – Adi Aug 08 '12 at 14:36
  • 1
    This would really be a question for SU/SF, because it's more about how the underlying file system (NTFS, ext3/ext4 and XFS being the most common ones you'll come across in the real world) will handle it than anything else. My gut instinct tells me that 1000000 files in the root of one directory is a *bad* plan though. – DaveRandom Aug 08 '12 at 14:38
  • Good question - I thought that not storing such things in the DB would be better (+ one less DB call), seeing that the end results actually usually includes a bit of code too (auto generated code) and I didnt really want to `eval()` it, since afaik, `include` use a caching mechanism, right? - And I was looking at other solutions, why do they tend to use cache files? I'm just not sure if this type of raw information belongs in a DB – Katai Aug 08 '12 at 14:45

2 Answers2

2

You will be limited by the file system. It's not possible to reach that many files in a folder. You can do something like this:

  1. Hash the filename: file1.php becomes 3305d5836bea089f2a1758d8e89848c8
  2. Split the hash in several parts: 3/3/0/5/d/5836bea089f2a1758d8e89848c8
  3. It's done
Florent
  • 12,310
  • 10
  • 49
  • 58
  • I'll definitely use this approach if/when I face a million files to organize. Am I right in asuming it would mitigate I/O delays due to the nested folders structure ? – Alex Aug 08 '12 at 14:40
  • Thank you, that would be my favorite solution. But I started to worry about going the wrong route with this... – Katai Aug 09 '12 at 12:07
2

Some filesystems will just not allow that and on many filesystems that will be incredibly slow - simple lookup of one file will be so slow that you program will seek the disk all the time while burning almost no CPU. You'll be much better off if you split those files so that you don't have more than something like several thousands of them per folder.

Also see this answer.

Community
  • 1
  • 1
sharptooth
  • 167,383
  • 100
  • 513
  • 979