Image Classification into Good or Bad Images

Question

We want to tell whether an image is good or bad.

There are a fixed set of checks we do to classify an image into good or bad category.

Example:

1. Background color.

2. Height X Width ratio.

3. No water marks.

In general, we want only GOOD images. We fetch these images from websites and perform operations to validate images of that website.

As of now, we go to the website, try to get the normal images (say Product images from E-commerce websites by excluding common images across all pages). There is an alternative in terms of visiting Google with search parameter "site:website name", it reduces our effort of identifying images.

I haven't tried/used color histogram approach.

What would be the better approach for this problem? Any research papers (or open source libraries like Mahout) which would be easy to implement will also be useful.

I am not sure what your question is. "I have heard that color histogram for quantized RGB values will be helpful. Still, what would be the better approach for this problem?" What is exactly your problem ? I mean, color histogram won't help you much to know about the width/height ratio for example. Would histogram another way to find what you call a good image? — jlengrand, Oct 11 '12 at 15:28
Finding a solid watermark (text in a corner, in one color and without transparency) might be possible, but to distinguish a transparent watermark from the rest of the image seems pretty much impossible, unless you know exactly how the watermark looks. I mean how do you even know it's a watermark and not, for example, a text on a shop window? — Philipp, Oct 11 '12 at 15:29
@Philipp there're certain types of watermarks, some of them can be detected by means of statistical methods, but it is no trivial task, you're right — Qnan, Oct 11 '12 at 15:32
*"No water marks etc."* Does 'etc.' mean 'or other things that would get us sued for scraping and reusing obviously copyrighted content'? — Andrew Thompson, Oct 11 '12 at 15:40
@AndrewThompson I second that. I couldn't word it any better; +1 — CosmicGiant, Oct 11 '12 at 15:44
My mistake! I changed the 'etc.' part and also reflected my intent. Basically, I want to know if any open source library is available or/and any good papers for such types of problems would be helpful. I am sure people would have solved such types of problems and I don't want to reinvent the whole thing again. — instanceOfObject, Oct 11 '12 at 17:39
I still don't understand what you want. Are you searching for a library able to detect watermarks, or a library able to perform histogram operations ? — jlengrand, Oct 11 '12 at 17:42
@Philipp & Qnan Yup, I agree! This is not a trivial task to identify them in all the places. — instanceOfObject, Oct 11 '12 at 17:42
@jlengrand For both the things! In fact, I want to use open source libraries to perform as many tasks as possible. — instanceOfObject, Oct 11 '12 at 17:44

score 1 · Accepted Answer · edited May 23 '17 at 11:48

The most advanced library in terms of image processing is (in the opinion of a lot of people, myself included) OpenCV.

It was originally developed by Intel, and now fully Open Source.

Bindings exist for a large panel of languages, from Android to C and Python.

It can definitely be used in a professional context, and a lot of companies use it.

It has several histogram capabilities out of the box and the whole library is usually heavily optimized.

You can also find a lot of libraries built on top of it, like face recognition or pattern matching.

If you want to calculate mathematical parameters of images, OpenCV is definitely a good way to go :)

Here is a link for java bindings

Image Classification into Good or Bad Images

1 Answers1