10

I have a library of like 1 million images, and roughly half of these are watermarked with the same, half transparent watermark in the same spot.

Where do I begin, detecting the images with the watermarks? Is there some standard tools for this purpose?

Kristian Rafteseth
  • 2,002
  • 5
  • 27
  • 46

6 Answers6

4

If according to your question, you just want to detect the images that are watermarked, you can use the following algorithm:

  • Extract a sample of the watermarking image Scan the watermark image
  • pixel by pixel and store the first pixels in an array. Scan each
  • image pixel by pixel and store in an array. Whenever a row matrix
  • from the image being scanned contains elements of the array in the
  • same order, it's most likely a match.

The code could be something like this:

$no_of_pixels = what_you_got;
$matched = 0;
$thumbpixels = array();
$wmark = imagecreatefrompng("watermark.png");
list($width, $height) =  getimagesize("watermark.png");
$tesimage = imagecreatefrompng("test.png");
for($h = 0; $h < $height; $h++){    
    for($w = 0; $w < $width; $w++){
        if(imagecolorsforindex($testimage, imagecolorat($testimage, $w, $h)) == $thumbpixels[0]){
            while($thumbpixels[$i++] === imagecolorsforindex($tesimage, imagecolorat($wmark, $w, $h)) && $no_of_pixels != $matched){
                $matched++;
            }
            if($matched == $no_of_pixels) echo "Voila, we found it!";
        }
    }
}

EDIT

Just seeing your thumbnail example. If you just want to detect text, you can try tesseract-ocr or PhpOCR.

You may also consider PHPSane

Chibueze Opata
  • 9,856
  • 7
  • 42
  • 65
2

Detecting almost any feature in an image is called Object Detection. There is a widely used libray called OpenCV. It has a very simple SDK, although setting up can be a real pain. It is well supported for C/C++ and (nearly well supported for) Python. It took me arnd 3 weeks to train my own Classfier (training), first time I started using OpenCV.

But I would not really depend on this solution entirely and consider my priorities. Also, it is very hard to achieve good rate with custom classifier. Other methods are more time consuming.

Kishor Kundan
  • 3,135
  • 1
  • 23
  • 30
2

In short, not with complete accuracy.

At best, you could only apply heuristics on the image to see if it matches an exact watermark, and get a confidence rating -- for example, if the watermark if a 50% white overlay, then a scene that was predominantly white could give a false positive, and of course the inverse is true.

There are also problems that could arise if the images use a lossy compression, such as JPEG, as the edges, and the saturation may result in a watermark that isn't as saturated as expected, or as exactly positioned as expected.

Rowland Shaw
  • 37,700
  • 14
  • 97
  • 166
  • you kicked me in the right direction here, thanks. the half transparent image is always brightening up the image with more white in the same spots – Kristian Rafteseth Mar 11 '13 at 13:57
  • A simple heuristic would be to look for pixels where the text would be to have a luminosity below 50% to get a "fail" metric - of cause, it'll give false positives (consider the digits in your sample) – Rowland Shaw Mar 11 '13 at 20:21
1

Because you know where the watermark always is it is possible that you could use imagecolorat and imagecolorsforindex to get the alpha value for pixels both inside and outside of the watermark. I would expect the alpha values to be similar when there is no watermark, and different when there is (within some threshold that you would need to determine). Of course, this may not work on all images so if you need 100% accuracy you would probably need something more reliable.

ioums
  • 1,367
  • 14
  • 20
1

In your case, where you are looking for the same logo in a predictable location, it is relatively simple. However it's much, much simpler and faster (as per my comment elsewhere) to match a copyright notice in the meta data!

A watermark is not going to produce fixed changes to the content - each modified pixel will obtain a new value based on the watermark and the image itself. Hence you need to extract this information - I'd go with diferentiating the image and just looking at the magnitude of the derivative (not the phase).

Then it's simply a matter of correlating the differential with one of just the watermark (or lots with the watermark and other content).

You really don't want to be doing this kind of image processing in PHP unless you're happy writing your own extensions. Most image processing toolkits will support differentiation and correlation.

BTW: if you don't know how to differentiate an image, and/or can't understand how to correlate an image, please don't ask - this is not the right forum for that discussion

symcbean
  • 47,736
  • 6
  • 59
  • 94
  • 1
    ...and looking at your example, it suggests that the watermark is not always in same position - it's centrally aligned left to right, and the number implies that it changes, therefore the position changes. – symcbean Mar 11 '13 at 14:49
0

Well if there is no tool to do this, you could try the following:

  1. identify where the watermark appears as a percentage of pixels, eg bottom right 40px x 100px

  2. For each image, make a temp copy and crop out the location of where the watermark would appear. This should leave both the watermarked version and the non watermarked version as the same

  3. compare the images - e.g. combination of width x height, filesize, CRC or actual pixel comparison, though for a million images you'd need some serious CPU power.

acutesoftware
  • 1,091
  • 3
  • 14
  • 33
  • Where do you get the non-watermarked image from in this example? – Rowland Shaw Mar 11 '13 at 13:41
  • I am saying that you crop ALL images (or set the area to black or white) where the watermark appears. The poster said the watermark always appears in the same place, so this should work – acutesoftware Mar 11 '13 at 13:44
  • So, if the watermark is a diagonal 50% white line across the image, you crop nothing (as you know the watermark would cross the diagonal), and see if it matches what? – Rowland Shaw Mar 11 '13 at 13:52
  • True - I assumed from the wording of the poster that the watermark was small 'called a spot'. But even if it is a large diagonal line you could take that watermark from an existing watermarked image and apply it to ALL images, which then lets you compare them – acutesoftware Mar 11 '13 at 13:59
  • Not if it's translucent (as described) – Rowland Shaw Mar 11 '13 at 20:14