How to improve the OCR accuracy rate of Neural Network in Matlab

Question

I'm working on OCR for Arabic character. I want to try glcm as a features extraction method. I've got the code here: http://www.mathworks.com/matlabcentral/fileexchange/22187-glcm-texture-features

Example of input images (character images):

and I've made a code to get the GLCM output based on needed features. Here it is:

function features = EkstraksiFitur_GLCM(x)
    glcm = graycomatrix(x,'offset',[0 1; -1 1; -1 0; -1 -1], 'NumLevels', 2); 

    stats = GLCM_Features1(glcm, 0);
    autocorrelation = double(mean (stats.autoc));
    if isnan(autocorrelation)
        autocorrelation=0;
    else
        autocorrelation=autocorrelation;
    end

    contrast = double(mean(stats.contr));
    if isnan(contrast)
        contrast=0;
    else
        contrast=contrast;
    end

    Correlation = double(mean (stats.corrm));
    if isnan(Correlation)
        Correlation=0;
    else
        Correlation=Correlation;
    end

    ClusterProminence = double(mean (stats.cprom));
    if isnan(ClusterProminence)
        ClusterProminence=0;
    else
        ClusterProminence=ClusterProminence;
    end

    ClusterShade = double(mean (stats.cshad));
    if isnan(ClusterShade)
        ClusterShade=0;
    else
        ClusterShade=ClusterShade;
    end

    Dissimilarity = double(mean (stats.dissi));
    if isnan(Dissimilarity)
        Dissimilarity=0;
    else
        Dissimilarity=Dissimilarity;
    end

    Energy = double(mean (stats.energ));
    if isnan(Energy)
        Energy=0;
    else
        Energy=Energy;
    end
    . 
    .
    .
    features=[autocorrelation, contrast, Correlation, Dissimilarity, Energy, Entropy, Homogeneity, MaximumProbability, SumAverage, SumVariance, SumEntropy, DifferenceVariance, DifferenceEntropy, InverseDifferenceMomentNormalized];

Using loop to get the features of all the images (data train):

srcFile = dir('D:\1. Thesis FINISH!!!\Data set\0 Well Segmented Character\Advertising Bold 24\datatrain\*.png');
fetrain = [];
for a = 1:length(srcFile)
    file_name = strcat('D:\1. Thesis FINISH!!!\Data set\0 Well Segmented Character\Advertising Bold 24\datatrain\',srcFile(b).name);
    A = imread(file_name);
    [gl] = EkstraksiFitur_GLCM2 (A);
    [fiturtrain] = reshape (gl, [56,1]) ;
    fetrain = [fetrain fiturtrain];
%   vectorname = strcat(file_name,'_array.mat');

end
 save ('fetrain.mat','fetrain');

I've got the features.

And then run the training process using Neural Network, but I get a very low accuracy rate. This is the code:

% clc;clear;close all;
% function net1 = pelatihan (input, target)
net = newff(fetrain,target,[10 2],{'tansig','tansig'},'trainscg');
% net.trainParam.mem_reduc = 2;
net.performFcn = 'mse'; 
net.divideFcn = 'dividetrain';
% [trainInd,valInd,testInd] = dividetrain(601);
net.trainParam.show = 10; % Frequency of progress displays (in epochs).
net.trainParam.epochs = 1000; %default 1000
net.trainParam.goal = 1e-6;
net = train(net,fetrain,target);
output = round(sim(net,fetrain));
save net1.mat net
% net2 = output;
data = fetest;

[target; output];
prediksi = round(sim (net, data));
[targetx; prediksi];

%% Calculate the accuracy %
y = 1;
j = size (prediksi, 2); 
% x = size (targetx, 2);
for i = 1:j 
    if prediksi (i) == targetx (i)
       y =y+1;
    else
        y;
    end 
end 
% y all correct data
% j all data
s = 'The accuracy is %.2f%%';
acc = 100 *(y/j);
sprintf (s,acc)

I've tried several times, but the accuracy rate (NN test result) wasn't improve. It's contantly give output 1.96%. Is there something wrong with the process flow, or with the code that i've made?

Any help would be very helpful and appreciated

score 1 · Answer 1 · answered Jun 03 '16 at 10:58

1

First I can see from the feature you extracted that they are nnot normalized and they vary in range. which means some of the fetaure wil dominate the rest. try to normalize or standarize the features. is the accuracy you measure on training set only or you are some test set or cross validation methods? is it true what I see you are using 601 features? did you try features selection methods to decide which features belong better to the data and the model?

Second I would like to know what you are implementing for the structure instead of reading the full code to understand what you have done.

third would be intersting to look at the input image to understand the enviremoent you are dealing with.

answered Jun 03 '16 at 10:58

Feras

834
7
18

Thank You so much for replying @Feras first, do you have any suggestion for feature normalization method? data train just for train the network, i have another data for testing step. I've plan to use cross validation, but still didn't really understand about it. || 601 the amount of the input images (character images - OCR). || Second, I just try to get particular features from the structure as the output of GLCM process || Third, I've already attatched the input images (Arabic character images) – Ana Ain Jun 05 '16 at 10:08
I would recommend you to start with standardizing the features. and then try to do some feature selection or maybe try to evaluate it by using weka application. is the image taken from care plate or hand written? again would you mind write the structure you are using in the NN – Feras Jun 05 '16 at 10:29
Do you have any recommendation method for standardizing the features? Or a sample code? It would be so much help. || The image, selected from APTI dataset. || Did you mean structure is something like algorithm or work flow of NN?, or the parameters? , sorry I'm just a newbie, still didn't really understood all about NN. ... Thank you @Feras – Ana Ain Jun 05 '16 at 16:13

How to improve the OCR accuracy rate of Neural Network in Matlab

1 Answers1