0

I've reshaped a feature vector and still got this error:

ValueError: Expected 2D array, got 1D array instead: array=[].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

I've used reshape before the prediction like

features = features.reshape(1, -1)

But no luck at all.

This is the code I have

import cv2
import numpy as np
import os
import glob
import mahotas as mt
from sklearn.svm import LinearSVC

# function to extract haralick textures from an image
def extract_features(image):
    # calculate haralick texture features for 4 types of adjacency
    textures = mt.features.haralick(image)

    # take the mean of it and return it
    ht_mean = textures.mean(axis = 0).reshape(1, -1)
    return ht_mean

# load the training dataset
train_path  = "C:/dataset/train"
train_names = os.listdir(train_path)

# empty list to hold feature vectors and train labels
train_features = []
train_labels   = []

# loop over the training dataset
print ("[STATUS] Started extracting haralick textures..")
for train_name in train_names:
    cur_path = train_path + "/" + train_name
    cur_label = train_name
    i = 1

    for file in glob.glob(cur_path + "/*.jpg"):
        print ("Processing Image - {} in {}".format(i, cur_label))
        # read the training image
        image = cv2.imread(file)

        # convert the image to grayscale
        gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

        # extract haralick texture from the image
        features = extract_features(gray)

        # append the feature vector and label
        train_features.append(features.reshape(1, -1))[0]
        train_labels.append(cur_label)

        # show loop update
        i += 1

# have a look at the size of our feature vector and labels
print ("Training features: {}".format(np.array(train_features).shape))
print ("Training labels: {}".format(np.array(train_labels).shape))

# create the classifier
print ("[STATUS] Creating the classifier..")
clf_svm = LinearSVC(random_state = 9)

# fit the training data and labels
print ("[STATUS] Fitting data/label to model..")
clf_svm.fit(train_features, train_labels)

# loop over the test images
test_path = "C:/dataset/test"
for file in glob.glob(test_path + "/*.jpg"): 
    # read the input image
    image = cv2.imread(file)

    # convert to grayscale
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

    # extract haralick texture from the image
    features = extract_features(gray)

    # evaluate the model and predict label
    prediction = clf_svm.predict(features)

    # show the label
    cv2.putText(image, prediction, (20,30), cv2.FONT_HERSHEY_SIMPLEX, 1.0, (0,255,255), 3)
    print ("Prediction - {}".format(prediction))

    # display the output image
    cv2.imshow("Test_Image", image)
    cv2.waitKey(0)

I don't know if I'm using reshape() incorrectly or I'm missing something.

ValueError: Expected 2D array, got 1D array instead: array=[]. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

Sean Sabe
  • 79
  • 9

1 Answers1

1

Consider following points:

  • You are getting above error because train_features is [ ](empty list) in line clf_svm.fit(train_features, train_labels). It should contain atleast 1 data. It is happening because train_path points to a folder which contains only image files, but above code is assuming that train_path points to a folder having atleast 1 subfolder (no files).

    train 
       - class1_folder[class11.jpg, class12.jpg, ...]
       - class2_folder[class21.jpg, class22.jpg, ...]
       - and so on ...
    

    Here, your class names of the training data will be [class1, class2, ...]

  • Correct line train_features.append(features.reshape(1, -1))[0] to train_features.append(features.reshape(1, -1)[0])

  • Output of clf_svm.predict(features) is a numpy array. So, replace prediction with str(prediction) in cv2.putText function. You can also replace it with prediction[0].

Try below code:

import cv2
import numpy as np
import os
import glob
import mahotas as mt
from sklearn.svm import LinearSVC

# function to extract haralick textures from an image
def extract_features(image):
    # calculate haralick texture features for 4 types of adjacency
    textures = mt.features.haralick(image)

    # take the mean of it and return it
    ht_mean = textures.mean(axis = 0).reshape(1, -1)
    return ht_mean

# load the training dataset
train_path  = "C:\\dataset\\train"
train_names = os.listdir(train_path)

# empty list to hold feature vectors and train labels
train_features = []
train_labels   = []

# loop over the training dataset
print ("[STATUS] Started extracting haralick textures..")
for train_name in train_names:
    cur_path = train_path + "\\" + train_name
    print(cur_path)
    cur_label = train_name
    i = 1

    for file in glob.glob(cur_path + "\*.jpg"):
        print ("Processing Image - {} in {}".format(i, cur_label))
        # read the training image
        #print(file)
        image = cv2.imread(file)
        #print(image)

        # convert the image to grayscale
        gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

        # extract haralick texture from the image
        features = extract_features(gray)
        #print(features.reshape(1, -1))
        # append the feature vector and label
        train_features.append(features.reshape(1, -1)[0])
        train_labels.append(cur_label)

        # show loop update
        i += 1

# have a look at the size of our feature vector and labels
print ("Training features: {}".format(np.array(train_features).shape))
print ("Training labels: {}".format(np.array(train_labels).shape))

# create the classifier
print ("[STATUS] Creating the classifier..")
clf_svm = LinearSVC(random_state = 9)

# fit the training data and labels
print ("[STATUS] Fitting data/label to model..")
print(train_features)
clf_svm.fit(train_features, train_labels)

# loop over the test images
test_path = "C:\\dataset\\test"
for file in glob.glob(test_path + "\*.jpg"): 
    # read the input image
    image = cv2.imread(file)

    # convert to grayscale
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

    # extract haralick texture from the image
    features = extract_features(gray)

    # evaluate the model and predict label
    prediction = clf_svm.predict(features)

    # show the label
    cv2.putText(image, str(prediction), (20,30), cv2.FONT_HERSHEY_SIMPLEX, 1.0, (0,255,255), 3)
    print ("Prediction - {}".format(prediction))

    # display the output image
    cv2.imshow("Test_Image", image)
    cv2.waitKey(0)
cv2.destroyAllWindows()
Anubhav Singh
  • 8,321
  • 4
  • 25
  • 43
  • Thanks for your answer, Anubhav. I've tried what you've answered but now I got this warning. `ConvergenceWarning: Liblinear failed to converge, increase the number of iterations`. Looking around I found that I have to increase iterations.¿What do you think? – Sean Sabe Aug 01 '19 at 02:15
  • To complement the comment above, I've used `dual = False` in `LinearSVC` to avoid the warning but the process still abort. – Sean Sabe Aug 01 '19 at 02:31
  • But, please tell me one thing what's in your path : `C:\\dataset\\train`. It needs to contain class folder with .jpg files inside it. – Anubhav Singh Aug 01 '19 at 03:44
  • Try `max_iter=10000` in `LinearSVC` model. If that doesn't work try scaling your data between 0-1 to deal with `ConvergenceWarning: Liblinear failed to converge, increase the number of iterations`. – Anubhav Singh Aug 01 '19 at 03:51
  • I think this is completely different matter, more related to number of data files you are using. – Anubhav Singh Aug 01 '19 at 03:52
  • check this: https://stackoverflow.com/questions/52670012/convergencewarning-liblinear-failed-to-converge-increase-the-number-of-iterati – Anubhav Singh Aug 01 '19 at 03:53
  • I placed files into subfolders and classified them manually and works fine extracting descriptors. The process aborts at fitting data, launching the warning I've described above. – Sean Sabe Aug 01 '19 at 13:04
  • I have this folder structure `train --> accepted --> img.jpg` Also, I already did `StandardScaling()` and changing `max_iter = 10000000` and still got the warning and the process aborted. – Sean Sabe Aug 01 '19 at 13:05
  • `train --> accepted --> 11 items` `train --> wrinkled --> 21 items` - `test --> accepted --> 18 items` `test --> wrinkled --> 44 items` – Sean Sabe Aug 01 '19 at 13:12
  • I already mailed you. In the code I tried `max_iter = 1000000`. It takes a bit to process but ends aborting anyways. – Sean Sabe Aug 01 '19 at 13:37
  • Just merge both folders inside `test_path` directory. No need to maintain separate folder during testing. BTW, above code assume you are reading files in `test_path`, not directory like we were doing during training. – Anubhav Singh Aug 01 '19 at 14:11
  • 1
    I fixed what you told me about the path in test set. Didn't realize it wasn't necessary to contain subfolders in it. Thank you very much. – Sean Sabe Aug 01 '19 at 14:39