4

I'd like to create a service to serve several hundred thousand smaller files (from 5kb to 500kb, mostly around 10-100kb). Think of it as a kind of gravatar.com which serves those little avatar pics on URLs like https://secure.gravatar.com/avatar/1545f91437e2576b910dbd1023a44756

I want to use a descriptive URL without any IDs or hashes, for example like http://www.server.com/This-is-my-file.ext, There are no duplicate file names.

What would be the most efficient way to serve and organize the files with not much overhead?

Just putting everything in one directory and let nginx serve the files will slow down after a certain amount of files depending on the file system.

One idea is to save the files in a simple directory structure based on the first characters of the filename, so the example would be served from T/h/This-is-my-file.ext with a simple rewrite rule in the nginx config. This would result in a very uneven distribution into the different directories. Using values from lets say a md5 hash of the filename would result in a good distribution but require more computing power...

I guess this sounds like a perfect use case for a key value store but isn't it possible with just the file system and nginx to keep it simple?

Joseph Quinsey
  • 222
  • 6
  • 17
user168080
  • 41
  • 2
  • "Just putting everything in one directory and let nginx serve the files will slow down after a certain amount of files depending on the file system." Just curious--why is this? I don't think I can answer your question, but I'd like to know why this solution is inefficient? Because the filesystem looks through the file names sequentially? – AlexMA Nov 13 '13 at 21:53
  • 2
    Look into consistent ring buffers. – phoebus Nov 13 '13 at 22:27
  • There are a ton of load balancing algorithms out there exactly for this purpose. Your problem sounds exactly like some CS homework I had in the past. :) – jlehtinen Nov 13 '13 at 22:30
  • Not sure why I said buffers, I meant hashes. – phoebus Nov 13 '13 at 23:23

1 Answers1

1

Hash the filenames.

The set_md5 instruction doc

# You can do this:
# I didn't test this. 
location /hashed/([0-9a-f]{2})([0-9a-f]*)/(.*) {
  try_files /$1/$2/$3;
}
set_md5 $digest $request_uri;
location / {
  rewrite .* /hashed/$digest/$request_uri;
}
moebius_eye
  • 1,103
  • 7
  • 21