-1

I have some files with number as names I want to put in folders on a optimized way such as about same number of files per folder et al.

Would it be a good practice to choose the folder after the modulus of the number ? Is modulus as expensive as division? How many instructions does it takes?

To be more precise, I would like to use the modulus of the estimated number of files's square root.

Number of files > 5'000'000

  • 2
    Why are you worried about optimizing math operations when you're dealing with the file system? That will be much slower. – Kevin Jan 06 '16 at 14:50
  • because I can't optimize the file system more than it is. I have to grab every microsecond I can, that is string cuting would be too expensive for example. – Ludovic Zenohate Lagouardette Jan 06 '16 at 15:03
  • 1
    @ludovic: that's just silly. Modulus is exactly the same as division, so it might take 20 cycles; a few nanoseconds. Five million of those won't take a blink of an eye in total. Now create five million files... How many blinks did that take? The division operation is not even noise. – rici Jan 06 '16 at 15:16

1 Answers1

1

You can use any hashing solution you like, subject to the usual constraints, the most important of which being that all hash values are equally likely. Modulus might be totally fine if the files are numbered sequentially.

Even if you were to use a cryptographic hash (NOT recommended), the cost is trivial compared to what the filesystem needs to do to create a file. Modulus is fine.

But you might also want to think about human users. How will they (you) find a file? Dividing by ranges is much easier to manage. Then you can name each directory by the beginning of the range, and it is a simple task to find the correct directory.

When you use numbers as filenames, you will at some point wish you had zero-padded them all to the same length so that alphabetic order and numeric order are the same. I strongly suggest you get this right from the start. The most common moment to notice the problem is when it is necessary to bulk retrieve backups.

rici
  • 234,347
  • 28
  • 237
  • 341