heterogeneous class recognition with ANN / MLP

Question

I have put together a classifying 3 layer artificial neural network that appears to work on other datasets. Playing around some artificial datasets that I made, I was unable to correctly predict between two classes when one class was positive in one feature or another feature.

Clearly class1 is can be identified by asking if either feature 1 or feature 2 is equal to 1 but I can't get the algorithm to predict the dataset correctly (there are 20 examples following this pattern in the dataset).

Can ANN/MLPs recognize this type of pattern? If so, what am I missing? If not, are there other methods that can predict this type of pattern (maybe SVM)?

I used Octave as that was what was used in the online course offered from coursera. I have listed most of the code here although it is structured slightly differently when I run it. As you can see I do use bias units on the first and second layers and I have also varied the number of hidden units in the second layer from 1-5 with no improvement over random guessing.

% Load dataset
y = [1; 1; 2; 2]
X = [1, 0; 0, 1; 0, 0; 0, 0]
m = size(X, 1);    
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), hidden_layer_size, (input_layer_size + 1));
Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), num_labels, (hidden_layer_size + 1));

% Randomly initialize weight parameters
initial_Theta1 = randInitializeWeights(input_layer_size, hidden_layer_size);
initial_Theta2 = randInitializeWeights(hidden_layer_size, num_labels);
initial_nn_params = [initial_Theta1(:) ; initial_Theta2(:)];

% Add bias units to layers and feedforward
Xbias = [ones(m,1), X];
L2bias = [ones(m,1), sigmoid(Xbias*Theta1')];
L3 = sigmoid(L2bias * Theta2');

% Create class matrix Y
Y = zeros(m, num_labels);
for r = 1:m;
    Y(r, y(r)) = 1;
end

% Set cost function
J = (sum(sum(Y.*log(L3) + (1-Y).*log(1-L3))))/-m + lambda*(sum(sum((Theta1(:,2:columns(Theta1))).^2)) + sum(sum((Theta2(:,2:columns(Theta2))).^2)))/2/m;
% Initialize weight gradient matrices
D2 = zeros(rows(Theta2),columns(Theta2));
D1 = zeros(rows(Theta1),columns(Theta1));

% Calculate gradient with backpropagation
for t = 1:m;
    a1 = [1 X(t,:)]';
    z2 = Theta1*a1;
    a2 = [1; sigmoid(z2)];
    z3 = Theta2*a2;
    a3 = sigmoid(z3);
    d3 = a3 - Y(t,:)';
    d2 = (Theta2'*d3)(2:end).*sigmoidGradient(z2);
    D2 = D2 + d3*a2';
    D1 = D1 + d2*a1';
end

Theta2_grad = D2/m;
Theta1_grad = D1/m;

Theta2_grad(:,2:end) = Theta2_grad(:,2:end) + lambda*Theta2(:,2:end)/m;
Theta1_grad(:,2:end) = Theta1_grad(:,2:end) + lambda*Theta1(:,2:end)/m;

% Unroll gradients
grad = [Theta1_grad(:) ; Theta2_grad(:)];

% Compute cost (Feed forward)
[J,grad] = nnCostFunction(initial_nn_params, input_layer_size, hidden_layer_size, num_labels, X, y, lambda);

% Create "short hand" for the cost function to be minimized using fmincg
costFunction = @(p) nnCostFunction(p, input_layer_size, hidden_layer_size, num_labels, X, y, lambda);

% Train the neural network using fmincg
options = optimset('MaxIter', 1000);
[nn_params, cost] = fmincg(costFunction, initial_nn_params, options);

% Obtain Theta1 and Theta2 back from nn_params
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), hidden_layer_size, (input_layer_size + 1));
Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), num_labels, (hidden_layer_size + 1));

score 0 · Answer 1 · answered Jan 08 '14 at 21:26

0

NN can recognize any pattern. Universal Approximation Theorem proves that (as well as many others).

The most obvious reason I can think of is lack of bias neuron. Althouh for more valuable answers you have to include your code.

answered Jan 08 '14 at 21:26

lejlot

64,777
8
131
164

Thank you for the response, I greatly appreciate it. I thought that NNs should be able to recognize complex patterns like this but failure on similar artificial data sets have made me wonder if I am implementing correctly or if it is an inherent limitation. I did use bias units in the first two layers and the algorithm works on other data sets well (such as this one classifying [wines chemically](http://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data)). Could it be that such a pattern requires more than 3 layers? – user2368936 Jan 13 '14 at 06:21
No, **no data** requires more than 3 layes. – lejlot Jan 13 '14 at 07:37
I got it working for the case I described it. I think the problems were mainly caused by two factors: not enough training examples and maybe too high of a regularization coefficient during the cost optimization. I think this makes sense because over-limitation of weight sizes would have prevented the sigmoid functions from creating binary logic which seems like it would be required for this classification. – user2368936 Jan 26 '14 at 20:50

heterogeneous class recognition with ANN / MLP

1 Answers1