I saw there are two types of data: channel first and last in the world of convolutional networks.
According to many websites, "channel-first" refers to NCHW
format, while "channel-last" is equivalent to NHWC
format. This is clear because in channel first format, C
is positioned before H
and W
.
However, ARM seems to have defined "channel-first" as NHWC
, as you can see in this paper.
P6: The two most common image data formats are Channel-Width-Height (CHW), i.e. channel last, and Height-Width-Channel (HWC), i.e. channel first. The dimension ordering is the same as that of the data stride. In an HWC format, the data along the channel is stored with a stride of 1, data along the width is stored with a stride of the channel count, and data along the height is stored with a stride of (channel count × image width).
This is also reasonable since "Channel first" sounds like MAC operation goes channel-wise like below:
for (N){
for (H){
for (W){
for (C){
}
}
}
}
So there is no fixed definition of channel-first or channel-last, isn't there?
Also, I'm not sure when you say NHWC
or NCHW
, what do you specifically mean? I guess the important thing is the combination of algorithms and the data arrangement in memory. If the data comes in in NHWC
format, you need to design the algorithm like so.
And, since there are no fixed definitions of NHWC
and NCHW
, I don't think it makes any sense if you just say PyTorch is NCHW
, channel-first or something without mentioning how the data arranges in memory.
Or when you hear NCHW
, you can realize that the data arrangement in memory is like ch0[0,0]
, ch1[0, 0]
, ch2[0, 0]
, ch0[1, 0]
, ch1[1, 0]
, ch2[1, 0]
, ch0[2, 0]
, ...?
Can anyone help clarify my understanding of the data format?