1

We are in the process of building a system which allows users to upload multiple images and videos to our servers.

The team I'm working with have decided to save all the assets belonging to a user in a folder named using the user's unique identifier. This folder in turn will be a sub-folder of our main assets folder on the file server.

The file structure they have proposed is as follows:

[asset_root]/userid1/assets1  
[asset_root]/userid1/assets2  

[asset_root]/userid2/assets1  
[asset_root]/userid2/assets2  

etc.

We are expecting to have thousands or possibly a million+ users in the life time of this system.

I always thought that it wasn't a good idea to have many sub-folders in a single location and suggested a year/month/day approach as follows:

[asset_root]/2010/11/04/userid1/assets1  
[asset_root]/2010/11/04/userid1/assets2  

[asset_root]/2010/11/04/userid2/assets1  
[asset_root]/2010/11/04/userid2/assets2  

etc.

Does anyone know which of the above approaches would be better suited for this many assets? Is there a better method to organize images/videos on a server?

The system in question will be an Windows IIS 7.5 with a SAN.

Many thanks in advance.

purplemass
  • 67
  • 6

2 Answers2

1

In general you are correct, in that many file systems impose a limit on the number of files and folders which may be in one folder. If you hit that limit with the number of users you have, your in trouble.

In general, I would simply use a uuid for each image, with some dimension of partitioning. e.g. A hash of ABCDEFGH would end up as [asset_root]/ABC/DEFGH. Using a hash gives you a greater degree of assurance about the number of files which will end up in each folder and prevents you from having to worry about, for example, not knowing which month an image you need was stored in.

Zack Bloom
  • 8,309
  • 2
  • 20
  • 27
  • Thanks for the quick response. Using a uuid and partitioning seems like a great idea! One of the reasons the team want to go with their solution is that there'll be no need to store the file paths in the database if all user assets are in a single folder but that obviously has its limitations. – purplemass Nov 04 '10 at 02:10
  • If you don't store the paths in a database, I think you'll regret it. Eventually you'll need to find how much space is used by each user, get stats on usage, or any of a million things. You don't want to be traversing the file system to determine if the "Files" link should show up on a user's profile page. – Zack Bloom Nov 04 '10 at 21:52
  • We've gone with a solution based on your answer: each user ID is a 10 digit number so the folder created for user 1234567890 is: [asset_root]/1234/567/890/. Regarding your point on saving paths in the database: we store a Boolean value to show whether an asset has been saved or not - thought this may be more efficient in the long run and should be sufficient for getting stats etc. – purplemass Nov 10 '10 at 10:50
0

I'm presuming your file system is NTFS? IF so, you've got a limit of 4,294,967,295 files on the disk - the limit of files in a folder is the same. If you have on the order of millions of users you should be fine, though you might want to consider having only one folder per user instead of several as your example indicates.

John Christensen
  • 5,020
  • 1
  • 28
  • 26
  • Do you think the file server will perform any differently (slower) if there are thousands of sub-folders in one folder? I remember opening a folder with hundreds of files in Windows 2000 and having to wait for several minutes before the files/folders displayed. – purplemass Nov 04 '10 at 02:56
  • I've never tested it. That said - the server may perform somewhat more slowly, but probably not enough to justify complicating your folder structure, especially when compared to the inefficiencies of having dozens of threads all trying to read/write these assets to the hard drive at the same time. :) – John Christensen Nov 04 '10 at 13:27