Questions regarding DCT8x8 CUDA sample

Question

I have a questions regarding the Sample provided by Nvidia called DCT8x8 which is applied to an image to execute the algorithm in parallel. more info: http://developer.download.nvidia.com/compute/DevZone/C/html/C/src/dct8x8/doc/dct8x8.pdf

The code executes forward DCT and it's inverse on a BMP image.

My first question is, is there a way to calculate the only the forward transform to obtain the JPG?

Second, there are several parts of the code that I don't understand I hope someone that is familiar with DTC and CUDA can help me with those.

First: in the file dtc8x8_gold.cpp the program uses the following matrices:

const float DCTv8matrix[BLOCK_SIZE2] =
{
0.3535533905932738f, 0.4903926402016152f, 0.4619397662556434f, 0.4157348061512726f, 0.3535533905932738f, 0.2777851165098011f, 0.1913417161825449f, 0.0975451610080642f,
0.3535533905932738f, 0.4157348061512726f, 0.1913417161825449f, -0.0975451610080641f, -0.3535533905932737f, -0.4903926402016152f, -0.4619397662556434f, -0.2777851165098011f,
0.3535533905932738f, 0.2777851165098011f, -0.1913417161825449f, -0.4903926402016152f, -0.3535533905932738f, 0.0975451610080642f, 0.4619397662556433f, 0.4157348061512727f,
0.3535533905932738f, 0.0975451610080642f, -0.4619397662556434f, -0.2777851165098011f, 0.3535533905932737f, 0.4157348061512727f, -0.1913417161825450f, -0.4903926402016153f,
0.3535533905932738f, -0.0975451610080641f, -0.4619397662556434f, 0.2777851165098009f, 0.3535533905932738f, -0.4157348061512726f, -0.1913417161825453f, 0.4903926402016152f,
0.3535533905932738f, -0.2777851165098010f, -0.1913417161825452f, 0.4903926402016153f, -0.3535533905932733f, -0.0975451610080649f, 0.4619397662556437f, -0.4157348061512720f,
0.3535533905932738f, -0.4157348061512727f, 0.1913417161825450f, 0.0975451610080640f, -0.3535533905932736f, 0.4903926402016152f, -0.4619397662556435f, 0.2777851165098022f,
0.3535533905932738f, -0.4903926402016152f, 0.4619397662556433f, -0.4157348061512721f, 0.3535533905932733f, -0.2777851165098008f, 0.1913417161825431f, -0.0975451610080625f
};

const float DCTv8matrixT[BLOCK_SIZE2] =
{
0.3535533905932738f, 0.3535533905932738f, 0.3535533905932738f, 0.3535533905932738f, 0.3535533905932738f, 0.3535533905932738f, 0.3535533905932738f, 0.3535533905932738f,
0.4903926402016152f, 0.4157348061512726f, 0.2777851165098011f, 0.0975451610080642f, -0.0975451610080641f, -0.2777851165098010f, -0.4157348061512727f, -0.4903926402016152f,
0.4619397662556434f, 0.1913417161825449f, -0.1913417161825449f, -0.4619397662556434f, -0.4619397662556434f, -0.1913417161825452f, 0.1913417161825450f, 0.4619397662556433f,
0.4157348061512726f, -0.0975451610080641f, -0.4903926402016152f, -0.2777851165098011f, 0.2777851165098009f, 0.4903926402016153f, 0.0975451610080640f, -0.4157348061512721f,
0.3535533905932738f, -0.3535533905932737f, -0.3535533905932738f, 0.3535533905932737f, 0.3535533905932738f, -0.3535533905932733f, -0.3535533905932736f, 0.3535533905932733f,
0.2777851165098011f, -0.4903926402016152f, 0.0975451610080642f, 0.4157348061512727f, -0.4157348061512726f, -0.0975451610080649f, 0.4903926402016152f, -0.2777851165098008f,
0.1913417161825449f, -0.4619397662556434f, 0.4619397662556433f, -0.1913417161825450f, -0.1913417161825453f, 0.4619397662556437f, -0.4619397662556435f, 0.1913417161825431f,
0.0975451610080642f, -0.2777851165098011f, 0.4157348061512727f, -0.4903926402016153f, 0.4903926402016152f, -0.4157348061512720f, 0.2777851165098022f, -0.0975451610080625f
};

float Q[BLOCK_SIZE2] =
{
32.f, 33.f, 51.f, 81.f, 66.f, 39.f, 34.f, 17.f,
33.f, 36.f, 48.f, 47.f, 28.f, 23.f, 12.f, 12.f,
51.f, 48.f, 47.f, 28.f, 23.f, 12.f, 12.f, 12.f,
81.f, 47.f, 28.f, 23.f, 12.f, 12.f, 12.f, 12.f,
66.f, 28.f, 23.f, 12.f, 12.f, 12.f, 12.f, 12.f,
39.f, 23.f, 12.f, 12.f, 12.f, 12.f, 12.f, 12.f,
34.f, 12.f, 12.f, 12.f, 12.f, 12.f, 12.f, 12.f,
17.f, 12.f, 12.f, 12.f, 12.f, 12.f, 12.f, 12.f
};

float C_a = 1.387039845322148f; //!< a = (2^0.5) * cos( pi / 16); Used in forward and inverse DCT.
float C_b = 1.306562964876377f; //!< b = (2^0.5) * cos( pi / 8); Used in forward and inverse DCT.
float C_c = 1.175875602419359f; //!< c = (2^0.5) * cos(3 * pi / 16); Used in forward and inverse DCT.
float C_d = 0.785694958387102f; //!< d = (2^0.5) * cos(5 * pi / 16); Used in forward and inverse DCT.
float C_e = 0.541196100146197f; //!< e = (2^0.5) * cos(3 * pi / 8); Used in forward and inverse DCT.
float C_f = 0.275899379282943f; //!< f = (2^0.5) * cos(7 * pi / 16); Used in forward and inverse DCT.

can someone please explain me why are those values being used and the reason for their usage?

also in the file dct8x8_kernel_quantization.cu there is another Q matrix, that my guess is that is indicating the threshold for quantization, and if so, why those values?

__constant__ short Q[] =
{
32, 33, 51, 81, 66, 39, 34, 17,
33, 36, 48, 47, 28, 23, 12, 12,
51, 48, 47, 28, 23, 12, 12, 12,
81, 47, 28, 23, 12, 12, 12, 12,
66, 28, 23, 12, 12, 12, 12, 12,
39, 23, 12, 12, 12, 12, 12, 12,
34, 12, 12, 12, 12, 12, 12, 12,
17, 12, 12, 12, 12, 12, 12, 12
};

my last question is, I have the feeling that those values are specified for the "barbara.bmp" image which if true, will not let me use a different image than the defaul one, and that is what I'm looking for to do, besides understanding the code.

Thank you very much for your help!

Saul

score 2 · Accepted Answer · answered Nov 24 '13 at 01:03

The discrete cosine transform multiplies input data elements by cosine terms based on their position in the data set. These cosine terms can be pre-computed for various values of n, i.e. for various positions in the data set.

The first matrix (DCTv8matrix) represents a set of those cosine term computations. Notice that all the values lie between -1 and 1, the range of the cosine function.

The second matrix (DCTv8matrixT) is simply the transpose of the first matrix.

The third and fourth matrices (float Q[BLOCK_SIZE2] and __constant__ short Q[]) are floating-point and integer representations of quantization factors. In order to achieve compression, one of the methods used in JPEG encoding is to "throw away" certain "frequency components" in the resultant transformed data arising out of the DCT. These matrices are used to assist in the quantization of certain "frequency components" in the 2D transformed data. Lower "frequency components" are represented towards the upper left hand corner, and higher "frequency components" are represented towards the lower right hand corner. The specific choice of quantization factors is determined by the designers of the JPEG algorithm (or the implementers) to achieve compression while still preserving a realistic image.

In this case, the greatest quantization occurs at the highest "frequencies".

The quantization factors chosen generally are not unique to a specific image. You should be able to use those factors with reasonable results on other images.

Although this example does the bulk of the work associated with a JPEG encode (and decode), I don't believe it would be a trivial matter to store the intermediate results in JPEG format. It would still be necessary to create a routine that would write the appropriate JPEG header (e.g. JFIF header) to a file, followed by the appropriate quantized data in the appropriate order.

Questions regarding DCT8x8 CUDA sample

1 Answers1