I am trying to train a model for autonomous driving that converts input from the front camera, to a bird's eye view image.
The input and output, both are segmentation masks with shape (96, 144) where each pixel has a range from 0 to 12 (each number represents a different class).
Now my question is how should i preprocess my data and which loss function should i use for the model (I am trying to use a Fully convolutional Network).
I tried to convert input and outputs to shape (96, 144, 13) using keras' to_categorical utility so each channel has 0s and 1s of representing a specific mask of a category. I used binary_crossentropy
ad sigmoid
activation for last layer with this and the model seemed to learn and loss started reducing.
But i am still unsure if this is the correct way or if there are any better ways.
what should be the:
- input and ouptput data format
- activation of last layer
- loss function