Neural network topology for object recognition on aerial photos (computer vision)

Question

My objective is to recognize the footprints of buildings on aerial photos. Having heard about recent progress in machine vision (ImageNet Large Scale Visual Recognition Challenges) I though I could (at least) try to use neural networks for this task.

Can anybody give me the idea what should be the topology of such a network? I guess it should have as many outputs as inputs (which means all the pixels in picture) since I want to recognize the outlines of buildings with their (at least approximate) placement on the picture.

I guess the input pictures should be of standard size, with each pixel normalized to grey scale or YUV color space (1 value per color) and maybe normalized resolution (each pixel should represent fixed size in reality). I am not sure if the picture could be preprocessed in any other way before inputting into net, maybe by extracting the edges first?

The tricky part is how the outputs should be represented and how to train the net. Using just e.g. output=0 for the pixel within building footprint and 1 for the pixel outside of it, might not be the best idea. Maybe I should teach the network to recognize edges of the building instead so the pixels which represent building edges should have 1's and 0's for the rest of pixels?

Can anybody throw in some suggestions about network topology/inputs/outputs formats? Or maybe this task is hopelessly difficult and I have 0 chances to solve it?

That's a tough task. I tried detecting buildings in aerial images with OpenCV and MATLAB classifier cascades, but it didn't work too well. I also tried using Neuroph and.. was it Encog? But it again didn't work too well. I recommend you to first successfully train a network that can recognize 'this image IS a building' or 'this image IS NOT a building' (images should be cropped to the buildings). That could give you a better start for the harder task. You could also compare which gives you better results: recognizing in edge-detected images or color. — PawelP, Sep 05 '14 at 18:26

score 0 · Answer 1 · answered Sep 08 '14 at 02:49

I think we need a better definition of "buildings". If you want to do building "detection", that is detect the presence of a building of any shape/size, this is difficult for a cascade classifier. You can try the following, though:

Partition a set of known images to fixed-size blocks.
Label each block as "building", "not building", or "boundary(includes portions of both)"
Extract basic features like intensity histograms, edges, hough lines, HOG, etc.
Train SVM classifiers based on these features (you can try others, too, but I recommend SVM by experience).

Now you can partition your images again and use the trained classifier to get the results. The results will have to be combined to identify buildings.

This will still need some testing to get the parameters(size of histograms, parameters of SVM classifier etc.) right.

I have used this approach to detect "food" regions on images. The accuracy was below 70%, but my guess is that it will be better for buildings.

Neural network topology for object recognition on aerial photos (computer vision)

1 Answers1