When defining segmentation network for R G B images, such as the network in fcn-xs example on mxnet, the input RGB image layer is fed to multiple convolutions, activations, poolings, etc...
Convolution, for example, is defined as below: mxnet.symbol.Convolution(data=input, kernel=(3, 3), pad=(1, 1), num_filter=64, workspace=workspace_default, name="conv1_1")
On the one hand, convolution filters here are 2D, meaning each color layer R,G,B is processed separately. On the other hand, it is well known from neuroscience that relevant features are contained in the color contrast, rather than in the color channel itself, i.e., the colors should be subtracted from each other, e.g. Red minus Green or Blue minus Yellow.
How to enforce it by network structure? How the R G B components are mixed and combined?