Using GlusterFS to store and access lots and lots of very small files is a difficulty many implementations face, and it seems you're already on a good path to solve the problem: breaking the files up into separate directories.
You could implement a solution like that. Just create a bunch of directories, choose a limit for how many files can go in each directory, and hope you don't run out of places to place files. In your example you're creating 65k+ directories, so that's not likely to be a problem any time soon.
Another option is to create directories based on the date a file is created. For example if the file cust_logo_xad.png
was created today it would be stored here:
/gluster/files/2015/08/24/cust_logo_xad.png
If you're hosting data for different entities (customers, departments, etc) you could separate files based on ownership, assigning the entity a unique ID of some sort. For example:
/gluster/files/ry/ry7eg4k/cust_logo_xad.png
Beyond that it would be a good idea to take a look at the GlusterFS documentation for tuning the storage cluster for hosting small files. At the very least make sure that:
- The file systems on the GlusterFS storage servers have enough free inodes available (
mkfs
option)
- The drives on the GlusterFS storage servers can handle lots of IOPs.
- You use an appropriate file system for the task (either ext4 or xfs)
- Your application / staff doesn't try to scan directories with lots of small files frequently.
If you can (and if you haven't already) it's a good idea to create a database to act as an index for the files, rather than having to scan (e.g. ls
) or search (e.g. find
) for files all of the time.