I'm trying to participate in a challenge for classifying dashboard camera images (for car) with labels being -traffic light red / green / non-existent. Traffic lights are small part of the image, and no bounding box is supplied.
I'm trying to fine-tune the image as suggested here currently with the Inception net, but getting 0.55-0.6 accuracy. Need to achieve 0.95+.
I think that the network is not performing well because of the small portion of the traffic light in the image.
How can I make better progress with this?