-1

I am doing some facial recognition training using linear SVC, where my dataset is 870x22. I have 30 frames for 29 different person, where i am using 22 simple value pixels in the image to recognize the face image, said 22 pixels are my features. Also, when i call train_test_split(), it'll give me a X_test of size 218x22 and y_test of size 218. Once i have trained the classifier and i try to run images of a new face (30x22) matrix, it gives me the error:

ValueError: Found input variables with inconsistent numbers of samples: [218, 30]

Here's the code:

import sklearn
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score, f1_score

    img_amount = 30
    target = np.asarray([1]*img_amount + [2]*img_amount + [3]*img_amount + [4]*img_amount + [5]*img_amount + [6]*img_amount + [7]*img_amount + [8]*img_amount + [9]*img_amount + [10]*img_amount + [11]*img_amount + [12]*img_amount + [13]*img_amount + [14]*img_amount + [15]*img_amount + [16]*img_amount + [17]*img_amount + [18]*img_amount + [19]*img_amount + [20]*img_amount + [21]*img_amount + [22]*img_amount + [23]*img_amount + [24]*img_amount + [25]*img_amount + [26]*img_amount + [27]*img_amount + [28]*img_amount + [29]*img_amount)   
    dataset= dataset[:, 0:22]
        
        svc_1 = SVC(kernel='linear', C=0.00005)
        X_train, X_test, y_train, y_test = train_test_split( dataset, target, test_size=0.25, random_state=0)
        
        def train(clf, X_train, X_test, y_train, y_test):
            
            clf.fit(X_train, y_train)
            print ("Accuracy on training set:")
            print (clf.score(X_train, y_train))
            print ("Accuracy on testing set:")
            print (clf.score(X_test, y_test))
            
            y_pred = clf.predict(X_test)
            
            print ("Classification Report:")
            print (metrics.classification_report(y_test, y_pred))
            print ("Confusion Matrix:")
            print (metrics.confusion_matrix(y_test, y_pred))
    
    
    
        train(svc_1, X_train, X_test, y_train, y_test)
    
    
print ("Classification Report:")
print (metrics.classification_report(y_test, new_face_img))

In order to not visually pollute the question, i uploaded to pastebin the matrix for new_face_img: https://pastebin.com/uRbvv5jD

Link for the dataset: Dataset

They are just arrays and can be passed directly to their variables

The lines i get the error on, are when i try to predict new samples:

predictions = svc_1.predict(new_face_img) 
print ("Classification Report:")
->>>>print (metrics.classification_report(y_test, predictions))

predictions = svc_1.predict(michael_ocluded_array) 
expected=np.ones(len(michael_ocluded_array))
print ("Confusion Matrix:")
print (metrics.confusion_matrix(expected, predictions))

Confusion Matrix: --------------------------------------------------------------------------- ValueError Traceback (most recent call last) in 1 predictions = svc_1.predict(michael_ocluded_array) 2 print ("Confusion Matrix:") ----> 3 print (metrics.classification_report(y_test, predictions))

C:\ProgramData\Miniconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs) 70 FutureWarning) 71 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)}) ---> 72 return f(**kwargs) 73 return inner_f 74

C:\ProgramData\Miniconda3\lib\site-packages\sklearn\metrics_classification.py in classification_report(y_true, y_pred, labels, target_names, sample_weight, digits, output_dict, zero_division) 1927 """
1928 -> 1929 y_type, y_true, y_pred = _check_targets(y_true, y_pred) 1930 1931 labels_given = True

C:\ProgramData\Miniconda3\lib\site-packages\sklearn\metrics_classification.py in _check_targets(y_true, y_pred) 79 y_pred : array or indicator matrix 80 """ ---> 81 check_consistent_length(y_true, y_pred) 82 type_true = type_of_target(y_true) 83 type_pred = type_of_target(y_pred)

C:\ProgramData\Miniconda3\lib\site-packages\sklearn\utils\validation.py in check_consistent_length(*arrays) 253 uniques = np.unique(lengths) 254 if len(uniques) > 1: --> 255 raise ValueError("Found input variables with inconsistent numbers of" 256 " samples: %r" % [int(l) for l in lengths]) 257

ValueError: Found input variables with inconsistent numbers of samples: [218, 30]

  • I don't know why it's happening, but it might help others to put the line number that caused the error. – Elan-R Aug 23 '20 at 00:54

1 Answers1

0

Here is the issue:

predictions = svc_1.predict(new_face_image) 
print ("Confusion Matrix:")
print (metrics.confusion_matrix(y_test, predictions))

You are having predictions of new_face_image and predicting it with your test dataset.

predictions = svc_1.predict(new_face_image) 
# change this to what you expect but shape=(30,)
expected=np.ones(len(new_face_image))
print ("Confusion Matrix:")
print (metrics.confusion_matrix(expected, predictions))

Edited for validation with dataset test data:

predictions = svc_1.predict(x_test) 
print ("Confusion Matrix:")
print (metrics.confusion_matrix(y_test, predictions))
Sayan Dey
  • 771
  • 6
  • 13
  • This fixes the error just for confusion matrix, but what about predicting the output for new_face_img? –  Aug 23 '20 at 11:19
  • "predictions" var is the prediction for faceimage. "Expected" should be with you as a dataset. You are comparing what you predicted and what you expected. – Sayan Dey Aug 23 '20 at 11:31
  • That is a problem, how can i assure that new_face_img is inside x_test, then? Because basically, what i am trying to do is test prediction for images of my choice, but train_test_split, chooses randonly –  Aug 23 '20 at 12:20
  • Notice your solution does exactly my train function does, when i call it. The error started when my i tried to pass as prediction parameter an image that was not inside dataset for the train_test_split. Reason why, would be because i'd like to test any image of my choice, not one that was trained on previously –  Aug 23 '20 at 12:24
  • Where did you get this faceimage array from, is that random, if it is in dataset? If it is in dataset you can do, ```faceimage=dataset[i], expected=target[i]``` and do the rest as I suggested in the answer. Is this an image classification problem? – Sayan Dey Aug 23 '20 at 12:25
  • It isn't in the dataset(array) , all other images are just regular face images. the new_face_img i added oclusion (blacked out some parts) to the image and i would like to see if my classifier can recognize it. If i stack it into the dataset, i wouldnt know how to separate it for testing, and i dont know how to assure its inside the test group when train_test_split() function does the split –  Aug 23 '20 at 12:28
  • This is easy. But as you say '(blacked out some parts) to the image and i would like to see if my classifier can recognize it' what do you expect in the output variable, it can be anything between 1 to 29, right, what it actually should be, that is important. – Sayan Dey Aug 23 '20 at 12:32
  • Basically, all i want is just the accuracy rate in percentage, it's supposed to be around 20-40%, which is great! –  Aug 23 '20 at 12:34
  • you can't evaluate your model until you don't know the correct label. Please go through [this](https://developers.google.com/machine-learning/crash-course/training-and-test-sets/splitting-data) – Sayan Dey Aug 23 '20 at 12:42
  • You need to know what is the class num you expect from faceimage – Sayan Dey Aug 23 '20 at 12:43
  • Sorry, that wasnt clear to me until now, the labels should be 29, basically i took the last guy in the dataset whose label is the 29th item and added oclusion. so his class num should be 29 –  Aug 23 '20 at 12:47
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/220307/discussion-between-sayan-dey-and-john-jones). – Sayan Dey Aug 23 '20 at 12:54