0

I am using train_test_split to train and test my data this is an interesting concept to divide the data into training and test, but what if I want to load some data that wasn't in the test data?

My problem is that train_test_split treats data randonly, i'd like to see what label an outside image belongs to.

Currently, I'm extracting 22 features from images and using those features to train linear SVC for recognition, now according to train_test_split I get 94% on the test set, which is alright, what I want to do is simply test it on an image that wasn't in the dataset. train_test_split receives data from a previously loaded dataset for training and testing, but I would like to load the image and test them directly.

Reproducible example: (3 images with 10 features)

import sklearn
from sklearn.model_selection import train_test_split
from sklearn import metrics

y_target = [1]*1 + [2]*1 + [3]*1 # number of images per person
data = np.asarray([[152., 236., 228., 168., 236., 224.,  70., 223., 175., 195.],
       [140., 233., 226., 161., 234., 220.,  67., 220., 159., 194.],
       [135., 233., 225., 157., 234., 221.,  65., 220., 159., 193.]])

svc_ = SVC(kernel='linear', C=0.00005)

A_train, A_test, b_train, b_test = train_test_split(
        data, y_target, test_size=0.25, random_state=0)

def train(clf, A_train, A_test, b_train, b_test):
    
    clf.fit(A_train, b_train)
    print ("Accuracy on training set:")
    print (clf.score(A_train, b_train))

train(svc_, A_train, A_test, b_train, b_test)

For instance, how would I test the following image's features?

([[126., 232., 225., 149., 231., 222.,  60., 218., 152., 191.]])

So, what i am doing is selecting a specific image, editing it a bit then i'd like to see how my classifier does in the testing for this image that was edited, that wasnt trained and it wasnt in the dataset, for instance if i picked an image from the internet, how would i test it??

1 Answers1

0

If you know how to get the features you are interested in from your images, simply load the image, gather the features, then predict and test against the correct values. For example

y_test = [[1], [2], [3]]
images = # fill in however you are getting your images into memory here
clf.score(images, y_test)

# or get the predictions by hand and do your own metric
predictions = clf.predict(images)
mse = np.mean(np.square(y_test - predictions))

You should have already trained your classifier before doing this.

alwaysmvp45
  • 437
  • 4
  • 8
  • i dont understand your solution, how would i test my training against an image that i didnt load into the dataset? you're doing the same thing i did –  Aug 21 '20 at 15:12
  • You don't have to load the image into the dataset. For example define your sequence test_image = ([[126., 232., 225., 149., 231., 222., 60., 218., 152., 191.]]) then callling predictions = clf.predict(test_image) is all you need – alwaysmvp45 Aug 22 '20 at 01:49