Logistic regression with gradient descent resulting different outcome for different dataset

Question

I am trying logistic regression using gradient descent with two data set, I get different result for each of them.

Dataset1 Input

Dataset2 input-

x =

1   20   30
1   40   60
1   70   30
1   50   50
1   50   40
1   60   40
1   30   40
1   40   50
1   10   20
1   30   40
1   70   70

y =

The difference in dataset 1 and dataset2 is only the range of values. When I run my common code for both the data set,, My code gives desired output for dataset one but very weird idea for dataset 2.

My code goes as follow:

[m,n]=size(x);
x=[ones(m,1),x];

X=x;


%3. In this step we will plot the graph for the given input data set just to see how is the distribution of the two class.
pos = find(y == 1);  % This will take the postion or array number from y for all the class that has value 1 
neg = find(y == 0);  % Similarly this will take the position or array number from y for all class that has value 0
 % Now we plot the graph column x1 Vs x2 for y=1 and y=0
plot(X(pos, 2), X(pos,3), '+'); 
hold on
plot(X(neg, 2), X(neg, 3), 'o');
xlabel('x1 marks in subject 1')
ylabel('y1 marks in subject 2')
legend('pass', 'Failed')
hold off

%   Now we limit the x1 and x2 we need to leave or skip the first column x0 because they should stay as 1.

%  The critical thing hear to know is that this is not a linear regression but logistic regression, hence the h(hypothesis varies)
%  So we calculate the hypothesis that is based on e

% j_theta will be calculated upon all the training set for 1st iteration
% 


    g=inline('1.0 ./ (1.0 + exp(-z))');
    alpha=1;
    theta = zeros(size(x(1,:)))';   % the theta has to be a 3*1 matrix so that it can multiply by x that is m*3 matrix
    max_iter=2000;
    j_theta=zeros(max_iter,1);            % j is a zero matrix that is used to store the theta cost function j(theta)

    for num_iter=1:max_iter
    %  Now we calculate the hx or hypothetis, It is calculated here inside no. of iteration because the hupothesis has to be calculated for new theta for every iteration
         z=x*theta;
         h=g(z);     % Here the effect of inline function we used earlier will reflect
         h

           j_theta(num_iter)=(1/m)*(-y'* log(h) - (1 - y)'*log(1-h)) ;    % This formula is the vectorized form of the cost function J(theta) This calculates the cost function      
         j_theta
         theta = theta - (alpha/m) * x' * (1./(1+exp(-x*theta)) - y);
         %grad=(1/m) *  x' * (h-y);     % This formula is the gradient descent formula that calculates the theta value.  
         %theta=theta - alpha .* grad;          % Actual Calculation for theta
           theta
    end

figure
plot(0:1999, j_theta(1:2000), 'b', 'LineWidth', 2)
hold off


figure
%3. In this step we will plot the graph for the given input data set just to see how is the distribution of the two class.
pos = find(y == 1);  % This will take the postion or array number from y for all the class that has value 1 
neg = find(y == 0);  % Similarly this will take the position or array number from y for all class that has value 0
 % Now we plot the graph column x1 Vs x2 for y=1 and y=0
plot(X(pos, 2), X(pos,3), '+'); 
hold on
plot(X(neg, 2), X(neg, 3), 'o');
xlabel('x1 marks in subject 1')
ylabel('y1 marks in subject 2')
legend('pass', 'Failed')


plot_x = [min(X(:,2))-2,  max(X(:,2))+2];     % This min and max decides the length of the decision graph.
% Calculate the decision boundary line
plot_y = (-1./theta(3)).*(theta(2).*plot_x +theta(1));
plot(plot_x, plot_y)
hold off

Please find the graph for each data sets as follow:

For data set 1:

enter image description here

For dataset2:

enter image description here

As you can see data set one gives me correct answer.

Having said that I believe the datsaet2 has wide range of data probably 10-100, Hence to normalize it I used feature scaling with the dataset2 and got the graph. The decision line formed was correct but a bit below the expected place, see it for yourself.

Dataset2 input with feature scaling:

x =

1.00000  -1.16311  -0.89589
1.00000  -0.13957   1.21585
1.00000   1.39573  -0.89589
1.00000   0.37219   0.51194
1.00000   0.37219  -0.19198
1.00000   0.88396  -0.19198
1.00000  -0.65134  -0.19198
1.00000  -0.13957   0.51194
1.00000  -1.67487  -1.59981
1.00000  -0.65134  -0.19198
1.00000   1.39573   1.91977

y =

0
1
1
1
0
1
0
0
0
0
1

The graph I get after adding feature scaling to my previous code is given below

enter image description here

As you can see if the decision line was a bit up then I would have got the perfect output..

Please help me understand the scenario, why even feature scaling cant help. or if my code has some error,, or if I am missing anything.

For feature scaling: you forgot to transform your line back into the original coordinate space. The line is fit to the transformed, not original coordinates. — Sean Owen, Jul 30 '14 at 11:23
Hi Sean, as you can see I removed the X=x part of code ,, X contains the original set of data where as my feature scaled data is contained in x. Even after I plot the graph for X I get the same graph as shown above. Actually X=x was a typo. If it is not what you meant by your comment, could you please point out the error line. Actually I am quite new to machine learning.... Thanks for your help. If possible could you also provide the correct statement. — Sam, Jul 30 '14 at 15:10

Logistic regression with gradient descent resulting different outcome for different dataset

0 Answers0