1

Suppose I want to represent an image of size H*W with 3 color channels (RGB) in a numpy 3-D array, such that the dimension is (H, W, 3). Let's take a simple example of (4,2,3). So we create an array like this - img = np.arange(24).reshape(4,2,3).

In order to fit the analogy of the above image example, the values of the elements should be -

Channel R: [0,1],[2,3],[4,5],[6,7]
Channel G: [8,9],[10,11],[12,13],[14,15]
Channel B: [16,17],[18,19],[20,21],[22,24]

i.e, 3 outer array, and above arrays nested inside.

However, the result of np.arange(24).reshape(4,2,3) is -

array([[[ 0,  1,  2],
        [ 3,  4,  5]],

       [[ 6,  7,  8],
        [ 9, 10, 11]],

       [[12, 13, 14],
        [15, 16, 17]],

       [[18, 19, 20],
        [21, 22, 23]]])

If I want the first row of first channel, i.e. img[0,:,0], I would expect [0,1] as result, but I will actually get [0,3] back.

I understand that if I initialize the ndarray with shape (3,4,2), I will get what I am looking for. But I want to work with the conventional shape of (H,W,depth).

Can you please help me understand the gap in my understanding?

Supratim Haldar
  • 2,376
  • 3
  • 16
  • 26
  • The first `0` selects the first plane or block in the display. Think of that as a (2,3) shaped array. The second `0` selects the first column from that array. To get `[0,1]` you'd have to use `img[0,0,:2]]`, i.e. the first row of the first block, and the first 2 items from that row. – hpaulj Feb 27 '19 at 19:41
  • 1
    `np.array([R,G,B])` should produce a (3,4,2) array. In `numpy` the first dimension is the outer most. This is the reverse of MATLAB. – hpaulj Feb 27 '19 at 19:43
  • Thanks @hpaulj! So, what is the usual recommended way to represent an image of ``H pixel * W pixel * 3 color channels`` with np ndarray? Maybe it's just me, but I felt that the shapes are not very intuitive for image representation. – Supratim Haldar Feb 27 '19 at 20:02
  • The color channel is usually the last dimension. That, for example, is what `matplotlib` `imshow` expects, https://matplotlib.org/api/_as_gen/matplotlib.pyplot.imshow.html#matplotlib.pyplot.imshow – hpaulj Feb 27 '19 at 20:28

1 Answers1

2

I think your misunderstanding happens because you (wrongly) assume that the transformation from a vector into the array starts filling the first index first. Really, it starts with the last index and moves forward. In your example the order in which the array is filled is

0 -> [0,0,0]

1 -> [0,0,1]

2 -> [0,0,2]

3 -> [0,1,0] etc.

Thus, the first pixel is [0,1,2], the second pixel is [3,4,5] and you get exactly the results you see.

The misunderstanding lies exclusively in your idea how a vector is transformed into such a matrix (and is stored in the background). Once you defined the image everything should be as you expect it.

As an aside: You may indeed encounter images which are saved with size [3,X,Y] instead, as hpaulj commented.

Xenon
  • 111
  • 1
  • 4