Determining color space for JPEG

Question

I am writing a code for determining a JPEG image's color space. I have found two references that may help me implementing this. One is on the oracle.com, the other one is a C source code from the ijg.com which "is responsible for the reference implementation of the original JPEG standard".

However they do differ. E.g. in IJG when there is no Adobe marker and there are 4 channels it is assumed as CMYK, but in oracle it is YCCA. Also IJG's implementation doesn't look on subsampling, whereas for 4-channel subsampled it is YCCK in oracle specs, and so on.

Also there are many missings in ColorSpace class, when I implemented the oracle's logic I needed to specify 3 extra color spaces, like YCCK, YCCA, RGBA.

Another point is that I found information that JPEG does not support transparency in alpha channel here, why would oracle talk about YCCA and RGBA in the context of JPEG metadat specification?

In result when checking an image with IJG's logic it tells me it is CMYK (checked the image with ImageMagick on ubuntu and it also says it is CMYK), with oracle's logic it is YCCA. Who to believe? Why would oracle not rely on the original JPEG specification? Or there is something else I don't know?

You may want to look at the [specification](http://www.cipa.jp/std/documents/e/DC-008-Translation-2016-E.pdf) of [Exif](https://en.wikipedia.org/wiki/Exif) (Exchangeable image file format). If you can read Perl, then the canonical utility is Phil Harvey's [ExifTool](https://www.sno.phy.queensu.ca/~phil/exiftool/). — AlexP, Jun 11 '18 at 13:08
ok, but it still doesn't answer why they differ (oracle and ijg) — tobi, Jun 11 '18 at 13:31
ExifTool is widely considered the gold standard, and the reference I provided is the actual official definition of the format of the JPEG files produced by post-medieval digital cameras. — AlexP, Jun 11 '18 at 13:46
From JPEG specification: "Application-dependent information, e.g. colour space, is outside the scope of this Specification." - Note: the standard were specified before sRGB, and during change of colour space in video (from Rec.601 to Rec.709). — Giacomo Catenazzi, Jun 14 '18 at 15:01
Version 2012 (ISO/IEC 10918-5:2012) specify YCC Rec601 (or just Y), as base, and only 1 or 3 channels (to be interchangeable). But a ICC profile is recommended. So I think or there is a ICC which specify the meaning of the 4 channels, or the program should just guess. And because 4 channel JPEG are not so frequent, it seems nobody care — Giacomo Catenazzi, Jun 14 '18 at 15:17
The meaning of the colours in the colour model of a JPEG depends on metadata of the transport that is providing that JPEG, whether by file, UVC, or otherwise. The transport isn't mentioned in the question so the answer is ambiguous. — G Huxley, May 31 '21 at 04:08
Oracle is not the canonical source for JPEG standards, ISO/ITU are, and the IJG is independent of the standard, at one time IJG had a generally accepted reference free library **until the ITU told them to stuff it** regarding a few proprietary extensions... nevertheless, don't look to Oracle for any reasoning or why they do whatever, LOL. If you want canonical, refer to the ITU or ISO. Just for the record: an **A** alpha transparency channel is not supported in JPEG in an official way (IJG added to vers 8 if memory serves — but little or no software supports jpeg with alpha). — Myndex, Oct 13 '21 at 09:26

score 15 · Accepted Answer · edited Aug 16 '21 at 12:26

After my comments on old JPEG standards, I finally found the answer.

On ISO/IEC 10918-6:2013 (E), section 6.1:

Images encoded with only one component are assumed to be grayscale data in which 0 is black and 255 is white.

Images encoded with three components are assumed to be RGB data encoded as YCbCr unless the image contains an APP14 marker segment as specified in 6.5.3, in which case the colour encoding is considered either RGB or YCbCr according to the application data of the APP14 marker segment. The relationship between RGB and YCbCr is defined as specified in Rec. ITU-T T.871 | ISO/IEC 10918-5.

Images encoded with four components are assumed to be CMYK, with (0,0,0,0) indicating white unless the image contains an APP14 marker segment as specified in 6.5.3, in which case the colour encoding is considered either CMYK or YCCK according to the application data of the APP14 marker segment. The relationship between CMYK and YCCK is defined as specified in clause 7.

and the APP14 flags is "Adobe\0", the AP12 has the transform flag:

Transform flag values of 0, 1 and 2 shall be supported and are interpreted as follows:

0 – CMYK for images that are encoded with four components in which all four CMYK values are complemented; RGB for images that are encoded with three components; i.e., the APP14 marker does not specify a transform applied to the image data.

1 – An image encoded with three components using YCbCr colour encoding.

2 – An image encoded with four components using YCCK colour encoding.

So, it depends: It should be CMYK, but it could be YCCK if APP14 and AP12 have the right values.

The above does seem to be completely in line with the IJG implementation (or the other way around), with the exception that it does not mention the presence of a JFIF marker (it doesn't affect the outcome though, as JFIF must have component ids 1, 2, 3 and Adobe YCC if present). I have never read this document before, so I like this answer. :-) — Harald K, Jun 15 '18 at 14:06
That same spec says about your answer: "In the absence of other information or metadata, such as a file format, container, or other printing system mechanism that specifies the interpretation of the colour or grayscale values of the image". Since the questioner doesn't mention how the jpeg is transported, there answer is ambiguous. — G Huxley, May 31 '21 at 04:07

Harald K · Answer 2 · 2018-06-18T07:15:03.327

I have struggled to understand the Oracle document you refer to as well.

In my experience, from writing a JPEG plugin for Java ImageIO, the right thing to do, is to follow the IJG implementation. That's what the majority of software does, so it will create the least confusion among your users (ie. "Why does my image look different in your software and software X?"). The Sun/Oracle algorithm disagrees with the "rest of the world" in many cases.

I ended up implementing a slightly different algorithm, that takes the "extra" Java color spaces into account, but otherwise stays very close to the IJG implementation:

// Adapted from libjpeg jdapimin.c:
// Guess the input colorspace
// (Wish JPEG committee had provided a real way to specify this...)
switch (startOfFrame.componentsInFrame()) {
    case 1:
        return JPEGColorSpace.Gray;
    case 2:
        return JPEGColorSpace.GrayA; // Java special case: Gray + Alpha
    case 3:
        if (jfif != null) {
            return JPEGColorSpace.YCbCr; // JFIF implies YCbCr
        }
        else if (adobeDCT != null) {
            switch (adobeDCT.transform) {
                case AdobeDCT.Unknown:
                    return JPEGColorSpace.RGB;
                case AdobeDCT.YCC:
                    return JPEGColorSpace.YCbCr;
                default:
                    // TODO: Warning!
                    return JPEGColorSpace.YCbCr; // assume it's YCbCr
            }
        }
        else {
            // Saw no special markers, try to guess from the component IDs
            int cid0 = startOfFrame.components[0].id;
            int cid1 = startOfFrame.components[1].id;
            int cid2 = startOfFrame.components[2].id;

            if (cid0 == 1 && cid1 == 2 && cid2 == 3) {
                return JPEGColorSpace.YCbCr; // assume JFIF w/out marker
            }
            else if (cid0 == 'R' && cid1 == 'G' && cid2 == 'B') {
                return JPEGColorSpace.RGB; // ASCII 'R', 'G', 'B'
            }
            else if (cid0 == 'Y' && cid1 == 'C' && cid2 == 'c') {
                return JPEGColorSpace.PhotoYCC; // Java special case: YCc
            }
            else {
                // TODO: Warning!
                return JPEGColorSpace.YCbCr; // assume it's YCbCr
            }
        }

    case 4:
        if (adobeDCT != null) {
            switch (adobeDCT.transform) {
                case AdobeDCT.Unknown:
                    return JPEGColorSpace.CMYK;
                case AdobeDCT.YCCK:
                    return JPEGColorSpace.YCCK;
                default:
                    // TODO: Warning!
                    return JPEGColorSpace.YCCK; // assume it's YCCK
            }
        }
        else {
            // Saw no special markers, try to guess from the component IDs
            int cid0 = startOfFrame.components[0].id;
            int cid1 = startOfFrame.components[1].id;
            int cid2 = startOfFrame.components[2].id;
            int cid3 = startOfFrame.components[3].id;

            if (cid0 == 1 && cid1 == 2 && cid2 == 3 && cid3 == 4) {
                return JPEGColorSpace.YCbCrA; // Java special case: YCbCrA
            }
            else if (cid0 == 'R' && cid1 == 'G' && cid2 == 'B' && cid3 == 'A') {
                return JPEGColorSpace.RGBA; // Java special case: RGBA
            }
            else if (cid0 == 'Y' && cid1 == 'C' && cid2 == 'c' && cid3 == 'A') {
                return JPEGColorSpace.PhotoYCCA; // Java special case: YCcA
            }
            else {
                // TODO: Warning!
                // No special markers, assume straight CMYK.
                return JPEGColorSpace.CMYK;
            }
        }

    default:
        throw new IIOException("Cannot determine source color space");
}

Determining color space for JPEG

2 Answers2

Linked