2

I am developing a site framework in php (codeigniter) and want to introduce image versioning on image uploads so that I can take advantage of image caching. The easiest approach would just be to md5 the image and use that as the file name but I don't like this approach for the following reasons:

1)Not SEO friendly on the image names

2)md5 hashes seem unnecessarily long - and therefore larger database field required.

So I am considering using an approach such as the following:

Start the filename with the entered name of the image with underscores instead of spaces then add a randomly generated integer, say 8 digits long. This will mean I have to check for an existing image by that name and then regenerate the integer if one exists (however unlikely that is).

Presumably I will also have to unique filename for every image size, so I guess the solution here would be to add a prefix representing the file size.

Now I want to get this right first time since it will be a pain to change once the framework is deployed so I am really just looking for input on

a)Whether my concerns are justified (particularly does the filename do anything for SEO and does the length of a random string of numbers affect it)

b)Whether there is anything else I should be concerned about or check for with my proposed approach.

c)Is there an easier approach, perhaps a hashing algorithm which produces much shorter results.

d) Is there already a ci lib out there that does this?

Thank you for your input and advice

SwiftD
  • 5,769
  • 6
  • 43
  • 67
  • 1
    Make sure you use your ALT attribute in the IMG tags - that's going to be friendlier to Google et. al. than human-readable image file names. By the time you add your random 8 digit number to the original file name you'll be close to the same size as a md5_file() hash anyway - and in today's world a dozen bytes isn't going to make a noticeable difference anyway. – Benny Hill Dec 17 '12 at 14:17
  • @BennyHill So your saying as long as I have the alt tag filled (which i am filling with image short description) the filename has no effect on SEO? I agree about the noticeable difference comment but I do wonder how much all these things add up to when you've said it about a hundred things throughout development – SwiftD Dec 17 '12 at 14:25
  • I stand corrected - Google says they do pay attention to the filename (I based my previous comment on personal experience - filename has never seemed to matter for me): http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CDEQFjAA&url=http%3A%2F%2Fwww.google.com%2Fwebmasters%2Fdocs%2Fsearch-engine-optimization-starter-guide.pdf&ei=IC7PUJ2eFsKY2wWZuIH4BA&usg=AFQjCNEMj8KHxhxQz9cMLoMxMDiLdrAbJw&bvm=bv.1355325884,d.b2I&cad=rja – Benny Hill Dec 17 '12 at 14:43
  • @BennyHill Only for image search. If you don't care about your results in image search (i.e. you're most people), don't worry about filenames. – ceejayoz Dec 17 '12 at 15:07
  • @ceejayoz Why wouldnt you care about image search though? for the sake of doing things one way over the other you are creating another doorway into your site, granted they are probably only there to steal an image but a hit is a hit, and they may even stay to have a look around. – SwiftD Dec 17 '12 at 15:25
  • @WebweaverD It's a cost-benefit tradeoff. Image search doesn't get much traffic, and as you noted, most of the people are probably just looking for an image to rip off. In most sites' cases it's not worth more development time to account for - **especially** if you're already using good ALT tags. – ceejayoz Dec 17 '12 at 15:34
  • @ceejayoz thanks for all your input - I think I would possibly go with a hashed approach for user uploads or a site hosting thousands of images but for my admin upload the cost has been an extra ten lines of code to implement an seo friendly approach and go for the images search hits. However I would also add that I'm not even sure how much this helps with google image search since a lot of the results im getting have generic names such as image45828.png so it appears even here it's not doing many favours and alt tags seem to reign supreme (and possibly content surrounding images) – SwiftD Dec 17 '12 at 16:29

1 Answers1

2

This answers a few of your questions:

Replacing spaces with underscores is not enough to have a clean filename as you'd need to check for more weird characters, but you can use sanitize_filename() method in CI's security library: http://ellislab.com/codeigniter/user-guide/libraries/security.html

If you do want to preserve the original filename, your approach sounds good to me. Though, 8-digit integer at the end of filename can be replaced by '-1’, ‘-2’, ‘-3' by simple incremental loop checking if the file with that ending exists or not.

File Upload library is something you can check out - http://ellislab.com/codeigniter/user-guide/libraries/file_uploading.html. It is flexible and can be configured to keep the original filenames. Getting sanitize_filename() from Security lib to work along should do exactly what you need.

In all my CI applications I always use encrypted filename (this optional feature is provided by CI file upload class). At the same time I can configure the library to not overwrite already existing file by adding a number to it (if no encryption is used) or by just giving it another encrypted name (when encryption option is on). I do like it this way as it keeps the filenames consistent clean (although long and not SEO-friendly, however ALT tag gives it more exposure to search engines).

Aidas
  • 1,213
  • 2
  • 10
  • 16
  • Thank you for your suggestions. I will check out sanitize filename - does this add underscores as well as remove special chars? I have already built my upload script around the image upload class but I like the simpler versioning you suggest. I am also curious as to why you encrypt your filenames - what is the advantage in your mind? – SwiftD Dec 17 '12 at 15:05
  • To be honest, at the time I chose this approach, I found the encryption to be the easiest and quickest way to make sure filenames on the server are clean and consistent. My clients would always upload the files with hell loads of wrong characters in their names and the lengths would vary from very short to very long! Besides that, image search performance has never been something I would need to pay a lot of attention to. However I understand this could be an issue where the project requires good performance in image search. – Aidas Dec 17 '12 at 15:23