I am interested in creating a software which detects an object like a pen using Microsoft Kinect. I recollect 100 positives images an 200 negative images in order to be taken by artificial neural network. My question is: how can I convert these images to be the input of the ANN? I guess that last layer has one neuron because is one output is or not pen and I guess that the input is one too I want to use 3 layer in total. But I don't know if I should convert positive and negative images in matrix or what can I do?
1 Answers
First of all, Welcome to Stackoverflow!
I've never personally dealt with using the Kinect for image recognition, but if its possible, you should scale down the image to a fairly reasonable size such as 100x100
so that its is still manageable.
You should also try to convert the image to grayscale
as this will also help with computational efficiency, time of development, and it's much easier to start of with than RGB.
The input layer will not be 1, that's a given. If we're referring to the image that has 100x100 dimensions, the total number of inputs should be 10000
, one for each pixel. Remember, you're trying to breakup the data as fine-grained as you can so the ANN can detect patterns in the data.
The output layer should actually have 2 neurons
, and for a good reason. Remember, each output neuron is measuring the likelihood that the input belongs to that respective class. By having 2 neurons, each one can represent the positive class (Yes, this is a pen) or the negative class (no, this is not a pen). So, by having 2 neurons, you can get the probabilities that the image will belong to that class, and then you can choose the highest value as your answer.
3 Total layers should be sufficient, you'll probably never need more than that. There are some very good articles for you to determine the amount of layers to have, such as this one I hope this helps! Let me know if you have any further questions.

- 3,040
- 1
- 21
- 30
-
If I train ANN with cropped images(only contain object that I want to recognize) Should I crop images in order to test ANN? – Johana Apr 02 '14 at 21:59
-
You sort of have to, or else it wouldn't work since the input layer should be the size of your image in number of pixels. – Alejandro Apr 03 '14 at 00:01
-
But If I train ANN with images that object that I want to recognize is in the center of images and When I show a images that the object is in the left Can ANN recognize that¡? because I have the problem that I told you ANN only recognize the object if I crop images leaving object that I want to recognize – Johana Apr 03 '14 at 01:05
-
It should be able to, but try to maybe find images of pen that are outside of the middle, just to give your ANN more exposure to different data – Alejandro Apr 03 '14 at 02:27
-
It works, my problem was my second class(non pen) I trained my second class using images of apple and it works.Well I have a question about second class, If I want to recognize if image have a pen or not. First class I will use images of pen and second class Can I use images of tables for example I put a pen on table ANN recgonize that image contain a pen but when image contain only a table ANN reocgnie that image do not contain pen – Johana Apr 04 '14 at 17:57