5

I am building a PHP application that lets users upload photos. To make it manageable, a folder shall have maximum of 5000 photos. Each uploaded photo will be assigned an ID in the database, and the photo will be renamed to the ID.

How do I check if a certain folder has 5000 photos? Should I check the ID of the last row in the photo table?

Folders shall be named incrementally. eg. folder_1, folder_2, etc.

My proposed steps:

  1. Retrieve last insert ID of photo table
  2. (Last_insert_ID + 1) % 5000 = Some_number (The folder number it should be saved in)
  3. Save photo to folder_some_number. If folder not exist, create new one.

Or is there a better way to do this?

PeeHaa
  • 71,436
  • 58
  • 190
  • 262
WebNovice
  • 2,230
  • 3
  • 24
  • 40
  • `(Last_insert_ID + 1) % 5000 == 0` – PeeHaa Apr 16 '12 at 14:47
  • I wouldn't use modulus if I were you. If an insert fails for any reason, then you'll skip that generated number (on MySQL at least) and your count will be incorrect. I'd store it as an explicit column in a table. – halfer Apr 16 '12 at 14:48
  • @RepWhoringPeeHaa, Haven't thought about it. Well if the (last_insert_id + 1) % 5000 = 0, will make it default to 'folder' folder. The next folder will be 'folder1' – WebNovice Apr 16 '12 at 14:49
  • @halfer, I don't really need the photo count to be precise. Few skipped or deleted IDs should be acceptable. By an explicit column, you mean a value for number of Total Photos? – WebNovice Apr 16 '12 at 14:52
  • Yes, explicit column = count per folder, stored in a folders table, as per @Mike's answer. – halfer Apr 16 '12 at 15:09

6 Answers6

5

I'm guessing database reads are going to be faster than file system reads like this.

You'd be better off running an sql query and getting a count per folder, grouped.

// Example Query
SELECT COUNT(file), folderId As Count FROM photos WHERE folderId IN ('1', '2') GROUP BY folder
// It would be beneficial to have a flag on the folders that would enable or disable them
// that way you're not iterating through folders that we already know are > 5k
// You would run this and have seperate query that would pull in these folder names
// and passing them to the above query.
SELECT foldername, folderId FROM folders WHERE countFlag = 0;


//Example Conditional.
if($Count > 5000):
  // Over 5k Do Something
  //Since were over 5k, set this folders flag to 1
  // that way we arent iterating through it again
  $countFlag = 1;
else:
  // Under 5k Do Something else
endif;

Note: If you need actual code samples, i can whip something up real quick. The above examples leave out actual working code and are just for theory purposes. You will need to iterate through the returned rows as they are grouped by folder.

NDBoost
  • 10,184
  • 6
  • 53
  • 73
  • +1 for that. The easies would be to save current ID and current folder-name in the database. That way, you know where you are, without having to do heavy quires either. – OptimusCrime Apr 16 '12 at 14:50
2
$last_id; // for example 9591
$current_folder = floor($last_id/5000);
$picture_num_in_folder = $last_id-($current_folder*5000);
if ($picture_num_in_folder == 5000)
    // new dir and such (new folderid = $current_folder+1 and $picture_num_in_folder = 1)
else
    // just place the picture with unique ID $last_id+1 called image_$picture_num_in_folder+1 in folder $current_folder
OptimusCrime
  • 14,662
  • 13
  • 58
  • 96
1

Don't use autoincrement ids for calculations. When you delete files, you'll get holes in your ID sequences which will throw off your math. Save the filepath to your db table. Then do a simple COUNT:

SELECT filepath, COUNT(filename) AS num FROM mytable
GROUP BY filepath;

You can save new files to filepaths where num < 5000.

dnagirl
  • 20,196
  • 13
  • 80
  • 123
  • Although unlikely in the OPs scenario you'd be doing `count()` not `sum()` its still not a bad idea to not base on an auto incremented ID. I would still use `auto_increment` but just dont base your counts off of the ID – NDBoost Apr 16 '12 at 15:00
  • Unless the OP's got a high percentage of failed/rolledback DB operations, it's unlikely to throw off the count by too much. The intent is to not exceed 5000, not be EXACTLY 5000 every time. – Marc B Apr 16 '12 at 15:35
1

Using the insert ID might not be very accurate, as there a number of error conditions that can cause an ID to be "skipped". You could store the folder number in a separate column, called "folder_number" or similar. That way you can get the highest folder number, and then do a count on the number of records in that folder, if it's less than 5000, add it in the same folder, otherwise run your logic to increment the folder count (creating the physical folder at the same time).

That should be faster than using the file system to check, which could be quite slow for the volume of files you're talking about.

Tim
  • 4,217
  • 1
  • 15
  • 21
1

Let me suggest another approach: Use a hash on the picture's ID and use the first few characters as a path for example, lets assume the image ID is 1 for which the hash is c4ca4238a0b923820dcc509a6f75849b. Create the directory structure /rootpath/images/c/4/c/ and in it save the image. You can use mkdir() to create nested directories like mkdir( '/dir/c/4/c/', 0777, true );.

This way you automatically distribute the load across many folders and the ID itself is the path.

If my memory serves, WordPress uses something similar...

Yaniro
  • 1,595
  • 9
  • 14
  • If I am not mistaken, this way, I would end up with so many folders. Eg. 5000 images might end up into 100 folders? Where as, as per my requirement I would need only 1 folder. – WebNovice Apr 16 '12 at 15:02
  • +1 for this as a scalable solution. It ensures that folders don't get too full, and when the situation demands it, the file storage can be split between physical machines by mapping different paths. @Damchey - you shouldn't worry about having many folders - on the server, you won't be manually exploring them via ssh/gui. – halfer Apr 16 '12 at 15:11
  • @Damchey, you will end up with several directories but this structure will balance the number of files in them in a way which won't require you to manage the number of files per directory. In general, the more files you have in a single directory, the more time it will require the file system to retrieve a file from that directory. – Yaniro Apr 16 '12 at 18:51
-1

if you want to organize your folders and files this way, you dont need to check if the folder has 5000 files, just assign the files to the corresponding folder.

If there are less than 5000 files in a x folder that should be full (because of the last ID is bigger than 5000*x) that means that some images have been removed. You can't reasign that ID to a row in your database so you can't refill the space of the removed files.

Andreu Ramos
  • 2,858
  • 2
  • 25
  • 36