10

I am scanning documents to JPG images. The scanner must scan all pages as color or all pages as black and white. Since many of my pages are color, I must scan all pages as color. After the scanning is complete, I would like to examine the images with .Net and try to detect what images are black and white so that I can convert those images to grayscale and save on storage.

Does anybody know how to detect a grayscale image with .Net?

Please let me know.

Chris W. Rea
  • 5,430
  • 41
  • 58
Dave
  • 1,721
  • 2
  • 23
  • 46
  • 1
    Checking the image type isn't going to cut it since it'll be set for 24 or 32 bit (since you're scanning in color). You'll probably have to check each pixel; if R == G == B in all pixels, it's a grayscale image, otherwise it's probably color. – Michael Todd Dec 09 '09 at 22:37
  • 1
    A thought: Even though the scanner in theory is providing R == G == B, is it possible that during JPEG compression there may be some pixels where that's only almost true? Consider, JPEG is a lossy compression algorithm. Perhaps JPEG takes some liberties with nearby pixel colors. But I confess, I am not a JPEG expert. But I'd want to know how it worked before I relied on R == G == B. – Chris W. Rea Dec 09 '09 at 22:40
  • Yep, I'd hate to rely on exactly r==g==b because even if jpg doesn't do any fudging (and I bet it does), your scanner and original would have to be perfect as well, which strikes me as unlikely in many cases. – Beska Dec 09 '09 at 22:44
  • Fair enough. Hadn't considered variations in pixels (which are, of course, going to occur when scanning). Interesting problem. – Michael Todd Dec 09 '09 at 22:46
  • 1
    @Dave: i should have done it sooner but... this morning pasted into my answer a code snippet that will actually return the highest pixel RGB delta of an image. How you interpret the delta is up to you. You can test for 0 (true and complete gray scale) or slightly greater than zero to allow for some color information. – Paul Sasik Dec 10 '09 at 16:35

6 Answers6

14

If you can't find a library for this, you could try grabbing a large number (or all) of the pixels for an image and see if their r, g, and b values are within a certain threshold (which you might set empirically, or have as a setting) of one another. If they are, the image is grayscale.

I would definitely make the threshold for a test a bit larger than 0, though...so I wouldn't test r=g, for example, but (abs(r-g) < e) where e is your threshold. That way you can keep your false color positives down...as I suspect you'll otherwise get a decent number, unless your original image and scanning techniques give precisely grayscale.

Beska
  • 12,445
  • 14
  • 77
  • 112
6

A simple algorithm to test for color: Walk the image pixel by pixel in a nested for loop (width and height) and test to see if the pixel's RGB values are equal. If they are not then the image has color info. If you make it all the way through all the pixels without encountering this condition, then you have a gray scale image.

Revision with a more complex algorithm:

In the first rev of this post i proposed a simple algorithm that assumes that pixels are gray scale if each pixel's RGB are values are equal. So RGBs of 0,0,0 or 128,128,128 or 230,230,230 would all test as gray while 123,90,78 would not. Simple.

Here's a snippet of code that tests for a variance from gray. The two methods are a small subsection of a more complex process but ought to provide enough raw code to help with the original question.

/// <summary>
/// This function accepts a bitmap and then performs a delta
/// comparison on all the pixels to find the highest delta
/// color in the image. This calculation only works for images
/// which have a field of similar color and some grayscale or
/// near-grayscale outlines. The result ought to be that the
/// calculated color is a sample of the "field". From this we
/// can infer which color in the image actualy represents a
/// contiguous field in which we're interested.
/// See the documentation of GetRgbDelta for more information.
/// </summary>
/// <param name="bmp">A bitmap for sampling</param>
/// <returns>The highest delta color</returns>
public static Color CalculateColorKey(Bitmap bmp)
{
    Color keyColor = Color.Empty;
    int highestRgbDelta = 0;

    for (int x = 0; x < bmp.Width; x++)
    {
        for (int y = 0; y < bmp.Height; y++)
        {
            if (GetRgbDelta(bmp.GetPixel(x, y)) <= highestRgbDelta) continue;

            highestRgbDelta = GetRgbDelta(bmp.GetPixel(x, y));
            keyColor = bmp.GetPixel(x, y);
        }
    }

    return keyColor;
}

/// <summary>
/// Utility method that encapsulates the RGB Delta calculation:
/// delta = abs(R-G) + abs(G-B) + abs(B-R) 
/// So, between the color RGB(50,100,50) and RGB(128,128,128)
/// The first would be the higher delta with a value of 100 as compared
/// to the secong color which, being grayscale, would have a delta of 0
/// </summary>
/// <param name="color">The color for which to calculate the delta</param>
/// <returns>An integer in the range 0 to 510 indicating the difference
/// in the RGB values that comprise the color</returns>
private static int GetRgbDelta(Color color)
{
    return
        Math.Abs(color.R - color.G) +
        Math.Abs(color.G - color.B) +
        Math.Abs(color.B - color.R);
}
Paul Sasik
  • 79,492
  • 20
  • 149
  • 189
  • Some scanners will introduce a slight bit of color into otherwise black and white images. You should allow a small threshold for the colors to be not quite equal. – Andres Dec 09 '09 at 22:40
  • Wouldn't an image with RGB values of 128,128,128 at ALL pixels be considered just a (one-color-)gray rectangular picture? – chrischu Dec 09 '09 at 22:41
  • @crischu: Well, I think that was just an example of showing how all values would be equal. – Beska Dec 09 '09 at 22:42
  • 1
    I don't think you can. Scanning in color, and using lossy compression will most surely produce some artifacts in terms of colors. Even b&w documents will not be perfectly grayscale. – Yannick Motton Dec 09 '09 at 22:43
  • 3
    Just to allow for expected variation from scanning, I'd suggest ameliorating this a little. Doing something like: colorDiff = (Red - Blue) ^ 2 + (Red - Green) ^ 2. If colorDiff < COLOR_DIFF_MAX, presume grayscale -- I'd run the calculation on some known-grayscale scans to find a reasonable value for COLOR_DIFF_MAX. – Conspicuous Compiler Dec 09 '09 at 22:44
  • @Beska i know that it was just an example. Still my statement still has its value because it doesn't matter if the example values are 128, 3, or 42 the picture that fulfills this check is a picture of a SINGLE color and not a graySCALE picture. – chrischu Dec 09 '09 at 23:32
  • @Kigurai: So is this plain wrong or not the only way? It cannot be both. i was aiming for vanilla simplicity first. "simple algorithm to test for color" This morning i followed up with a more complex example that will allow for slightly "off" gray. – Paul Sasik Dec 10 '09 at 16:31
  • On second though, I might have mixed things up in my head and made a bit too quick judgement. Removing my previous comment, and the downvote. – Hannes Ovrén Dec 11 '09 at 07:55
1

A faster versión. Test with a threshold of 8. Work well for my

Use:

bool grayScale;
Bitmap bmp = new Bitmap(strPath + "\\temp.png");
grayScale = TestGrayScale(bmp, 8);
if (grayScale)
   MessageBox.Show("Grayscale image");


/// <summary>Test a image is in grayscale</summary>
/// <param name="bmp">The bmp to test</param>
/// <param name="threshold">The threshold for maximun color difference</param>
/// <returns>True if is grayscale. False if is color image</returns>
public bool TestGrayScale(Bitmap bmp, int threshold)
{
    Color pixelColor = Color.Empty;
    int rgbDelta;

    for (int x = 0; x < bmp.Width; x++)
    {
        for (int y = 0; y < bmp.Height; y++)
        {
            pixelColor = bmp.GetPixel(x, y);
            rgbDelta = Math.Abs(pixelColor.R - pixelColor.G) + Math.Abs(pixelColor.G - pixelColor.B) + Math.Abs(pixelColor.B - pixelColor.R);
            if (rgbDelta > threshold) return false;
        }
    }
    return true;
}

Do you have a faster one?

0

As JPEG have support for metadata, you should first to check if your scanner software place some special data on saved images and if you can rely on that information.

Rubens Farias
  • 57,174
  • 8
  • 131
  • 162
  • This doesn't make sense to me. The scanner software, if it writes metadata into the file, will write that the image is a color image if it is scanned as color (which it is), even if the image only contains grayscale content. – Beska Dec 10 '09 at 16:17
  • It was an idea and I pointed out to validate this hypothetical data, beska. Anyways, ty for your comment. – Rubens Farias Dec 10 '09 at 16:22
0

The answer I posted in the python section might be helpful. Images you find e.g. on the web that a human would consider grayscale often do not have identical R,G,B values. You need some calculation of the variance and some kind of sampling process so you don't have to check a million pixels. The solution Paul gave is based on the max difference so a single red pixel artefact from a scanner could turn a grayscale image into non-grayscale. The solution I posted got 99.1% precision and 92.5% recall on 13,000 images.

Community
  • 1
  • 1
Noah Whitman
  • 231
  • 2
  • 4
0

I think that this approach should require the least code, it's been tested on jpegs. bImage below is a byte array.

 MemoryStream ms = new MemoryStream(bImage);
 System.Drawing.Image returnImage = System.Drawing.Image.FromStream(ms);
 if (returnImage.Palette.Flags == 2)
 {
      System.Diagnostics.Debug.WriteLine("Image is greyscale");
 }
Fiach Reid
  • 6,149
  • 2
  • 30
  • 34