Better model for classifying image quality (seperate sharp & well lit images from blurry/out of focus/grainy images)

Question

I have a dataset of around 20K images that are human labelled. Labels are as follows: Label = 1 if the image is sharp and well lit, and Label = 0 for those blurry/out of focus/grainy images.

The images are of documents such as Identity cards.

I want to build a Computer Vision model that can do the classification task.

I tried using VGG-16 for transfer learning for this task but it did not give good results (precision .65 and recall = .73). My sense is that VGG-16 is not suitable for this task. It is trained on ImageNet and has very different low level features. Interestingly the model is under-fitting.

We also tried EfficientNet 7. Though the model was able to decently perform on training and validation, test performance remains bad.

Can someone suggest more suitable model to try for this task?

Since you have already tried two decent models for transfer learning, I think that it can useful if you share some more details about your training attempts. What is your training/test split proportion? Which layers of the VGG-16 have you freezed? Have you applied differential learning rate or did you use a learning rate scheduler? How does your images look? Lastly, feel free to share your training code for others to see the hyperparameters you have used. I hope you find the solution. — Bedir Yilmaz, Aug 07 '20 at 07:32

score 1 · Answer 1 · answered Aug 13 '20 at 05:43

I think your problem with VGG and other NN is the resizing of images:
VGG expects as input 224x224 size image. I assume your dataset has much larger resolution, and thus you significantly downscale the input images before feeding them to your network.

What happens to blur/noise when you downscale an image?
Blurry and noisy images become sharper and cleaner as you decrease the resolution. Therefore, in many of your training examples, the net sees a perfectly good image while you label them as "corrupt". This is not good for training.

An interesting experiment would be to see what types of degradations your net can classify correctly and what types it fails: You report 65% precision @ 73% recall. Can you look at the classified images at that point and group them by degradation type?
That is, what is precision/recall for only blurry images? what is it for noisy images? What about grainy images?

What can you do?

Do not resize images at all! if the network needs fixed size input - then crop rather than resize.
Taking advantage of the "resizing" effect, you can approach the problem using a "discriminator". Train a network that "discriminate" between an image and its downscaled version. If the image is sharp and clean - this discriminator will find it difficult to succeed. However, for blurred/noisy images the task should be rather easy.

score 0 · Answer 2 · answered Aug 12 '20 at 18:03

For this task, I think using opencv is sufficient to solve the issue. In fact comparing the variance of Lablacien of the image with a threshold (cv2.Laplacian(image, cv2.CV_64F).var()) will generate a decision if an image is bluered or not.

You ca find an explanation of the method and the code in the following tutorial : detection with opencv

I think that training a classifier that takes the output of one of one of your neural network models and the variance of Laplacien as features will improve the classification results.

I also recommend experementing with ResNet and DenseNet.

score 0 · Answer 3 · answered Aug 13 '20 at 22:37

I would look at the change in color between pixels, then rank the photos on the median delta between pixels... a sharp change from RGB (0,0,0) to (255,255,255) on each of the adjoining pixels would be the max possible score, the more blur you have the lower the score.

I have done this in the past trying to estimate areas of fields with success.

Better model for classifying image quality (seperate sharp & well lit images from blurry/out of focus/grainy images)

3 Answers3