0

I am writing a program to recognize handwritten letters. I have 500px*500px images that I import as BufferedImages and I am taking every pixel's getRBG() value as inputs to the neural network, therefore there are 250,000 inputs. The values for getRGB() range from -16777216 (signifies writing) to -1 (signifies the white background). The weights from the input to the first hidden node are randomized from 0 to 1. I have been using the sigmoid function 1/(1+e^(-x)) as my activation function to get all of the values between 0 and 1. My problem, though, is that since there are so many inputs, when I take the dot product of them with the weights, I get a number with an enormous magnitude (e.g., 1.3E8 or -1.3E8). Then when I put that number into the sigmoid function the results are always all 1s or all 0s therefore it's essentially passing no valuable information to the second hidden node. Furthermore, since the images are predominately white, most of the inputs are -1.

I adjusted the code so that it prints the values after the dot product then prints it after they pass through the sigmoid function.

After dot product with weights, before sigmoid function: 
-1.3376484582733577E8   
-1.3382651127917042E8   
-1.3438475698429278E8   
-1.3356711106666781E8   
-1.3470225249402404E8   
-1.3372922925798771E8   
-1.3211961536262843E8   
-1.3512040351863045E8   

After sigmoid function: 
0.0 
0.0 
0.0 
0.0 
0.0 
0.0 
0.0 
0.0 

To edit the getRGB() values, I used the function newRGBValue = (getRGB() + 2) * (-1) therefore all of the values ranged from -1 to 16777214. When I pass all these values into the sigmoid function, though, it simply returns 1 since the new dot product with those values are enormous positive numbers (shown in output below).

After dot product, before sigmoid function: 
1.3198725189415371E8    
1.3345978405544662E8    
1.3375036029244222E8    
1.3278472449389385E8    
1.328751157809899E8 
1.3309195657860701E8    
1.34090008925348E8  
1.3300517803640646E8

After: 
1.0 
1.0 
1.0 
1.0 
1.0 
1.0 
1.0 
1.0

Is there a better activation function I should use for this program? Or is there a way I can manipulate the inputs so that the sigmoid function is suitable? Sorry for this long-winded post and thanks in advance for any insight.

Ken White
  • 123,280
  • 14
  • 225
  • 444
biz
  • 53
  • 6

1 Answers1

1

Normalize your inputs. That is, for every image, compute the mean mu and variance sigma of the pixel values, and replace an old pixel value v with the normalized value (v - mu) / sigma. This eliminates the huge negative values you had for your pixel values.

Also consider using normally-distributed initial random weights with mean 0 and variance 1, so that the expected value of your dot products is 0. Then, it would be best to switch to the tanh activation function, which is centered at 0, hence leading to faster learning (if your dot products are close to 0).

k_ssb
  • 6,024
  • 23
  • 47