How does image digitalization differ from sound digitalization (PCM)?

Question

I am trying to understand digitalization of sound and images. As far as I know, they both need to convert analog signal to digital signal. Both should be using sampling and quantization.

Sound: We have amplitudes on axis y and time on axis x. What is on axis x and y during image digitalization?
What is kind of standard of sample rate for image digitalization? It is used 44kHz for CDs (sound digitalization). How exactly is used sample rate for images?
Quantization: Sound - we use bit-depth - which means levels of amplitude - Image: using bit-depth also, but it means how many intesities are we able to recognize? (is it true?)
What are other differences between sound and image digitalization?

Romain F · Accepted Answer · 2020-02-11T18:36:01.137

1

Acquisition of images can be summarized as a spatial sampling and conversion/quantization steps. The spatial sampling on (x,y) is due to the pixel size. The data (on the third axis, z) is the number of electrons generated by photoelectric effect on the chip. These electrons are converted to ADU (analog digital unit) and then to bits. What is quantized is the light intensity in level of greys, for example data on 8 bits would give 2^8 = 256 levels of gray.

An image loses information both due to the spatial sampling (resolution) and the intensity quantization (levels of gray).

Unless you are talking about videos, images won't have sampling in units of Hz (1/time) but in 1/distance. What is important is to verify the Shannon-Nyquist theorem to avoid aliasing. The spatial frequencies you are able to get depend directly on the optical design. The pixel size must be chosen respectively to this design to avoid aliasing.

EDIT: On the example below I plotted a sine function (white/black stripes). On the left part the signal is correctly sampled, on the right it is undersampled by a factor of 4. It is the same signal, but due to bigger pixels (smaller sampling) you get aliasing of your data. Here the stripes are horizontal, but you also have the same effect for vertical ones.

edited Feb 11 '20 at 18:36

answered Feb 11 '20 at 17:06

Romain F

397
6
14

thanks for your response. So what is exactly on axis X and Y during image digitalization? Y=level of grays; X=?; ? How sampling then work if Hz is not used. Sample rate in Hz means, how many samples are "created" per second. What exactly does 1/distance say? How is this formula (1/distance) connected to Shannon theorem? – Sirdhemond Feb 11 '20 at 17:50
There is one more dimension for an image than for a sound. X and Y are the cartesian coordinates. The data (light intensity) is on the third axis Z. Your image is not discretized in time as a sound but in space (the pixels). The sample rate is related to the number of pixels per meter, instead of number of measurements per second for a sound. – Romain F Feb 11 '20 at 18:09
The Shannon theorem states that your sampling frequency must be twice the maximal frequency in your signal. A frequency in an image can be seen as vertical or horizontal white/black stripes. If you don't have enough pixel sampling, high frequency stripes will be seen as low frequency stripes. This is [aliasing](https://en.wikipedia.org/wiki/Aliasing) – Romain F Feb 11 '20 at 18:12
And how can we know RGB of each individual pixel of that image during digitalization? When we have only light intensity on Z? – Sirdhemond Feb 11 '20 at 19:04
RGB image is just the combination of 3 intensity images taken with red, green and blue filters. You can have a look at the [Bayer matrix](https://en.wikipedia.org/wiki/Bayer_filter) – Romain F Feb 11 '20 at 23:05
So sample rate is for example every 10th pixel create a sample (image sampling)? Can I say it like that? As far as rgb is concerned, we have x,y coordinates and then 3 intensities for red, green and blue for a pixel? Do I understand it right? – Sirdhemond Feb 12 '20 at 00:30
If yes, then during quantization, we dont have (with not only white-black images) the light intensity in level of greys, but three intensities: level of reds/green/blue? – Sirdhemond Feb 12 '20 at 00:36
You understand right for the RGB, you have in fact 3 images, one red, one green and one blue. For images, sampling rate is related to the detector and the optics in front of it, since optics drive the way your image is formed on the detector. Samples are directly the pixels. 1 pixel = 1 sample. This has to be compared with the frequencies of your object to avoid aliasing, as I plotted on the figure of my main answer – Romain F Feb 12 '20 at 08:54
and how can you have "3 images"? You have for ONE PIXEL 3 images (reg, green, blue)? and then you need to do for each (red image, green image, blue image) the sampling and then quatization? – Sirdhemond Feb 12 '20 at 10:22
You have one pixel per color filter, see my previous link on the Bayer matrix. Yes red, green and blue images are sampled and quantized – Romain F Feb 12 '20 at 11:23
and hopefully last question. Before SAMPLING do I need to convert that image to intensity image profile? – Sirdhemond Feb 12 '20 at 19:28

hotpaw2 · Answer 2 · 2020-02-11T18:15:19.077

There is no common standard for the spatial axis for image sampling. A 20 megapixel sensor or camera will produce images at a completely different spatial resolution in pixels per mm, or pixels per degree angle of view than a 2 megapixel sensor or camera. These images will typically be rescaled to yet another non-common-standard resolution for viewing (72 ppi, 300 ppi, "Retina", SD/HDTV, CCIR-601, "4k", etc.)

For audio, 48k is starting to become more common than 44.1ksps. (on iPhones, etc.)

("a nice thing about standards is that there are so many of them")

Amplitude scaling in raw format also has no single standard. When converted or requantized to storage format, 8-bit, 10-bit, and 12-bit quantizations are the most common for RGB color separations. (JPEG, PNG, etc. formats)

Channel formats are different between audio and image. X, Y, where X is time and Y is amplitude is only good for mono audio. Stereo usually needs T,L,R for time, left, and right channels. Images are often in X,Y,R,G,B, or 5 dimensional tensors, where X,Y are spatial location coordinates, and RGB are color intensities at that location. The image intensities can be somewhat related (depending on gamma corrections, etc.) to the number of incident photons per shutter duration in certain visible EM frequency ranges per incident solid angle to some lens.

A low-pass filter for audio, and a Bayer filter for images, are commonly used to make the signal closer to bandlimited so it can be sampled with less aliasing noise/artifacts.

so during image sampling: we have x,y as coordinates of pixel and RGB. Then during quantization it creates intervals with level of grays or with RGB? — Sirdhemond, Feb 11 '20 at 19:14

How does image digitalization differ from sound digitalization (PCM)?

2 Answers2