1

I'm trying to use the SVM from the sklearn library to perform some image recognition, but when I call the fit method, I get a "ValueError: setting an array element with a sequence." type of error. My code is as following.

My testing.py file:

import matplotlib.pyplot as plt
import numpy as np
from sklearn import svm
from imageToNumberArray import imageToNumberArray

classAndValuesFile = "../Classes_Values.txt"
classesFiles = "../"

testImage = "ImageToPerformTestOn.png"

x = []
y = []

def main():
    i = 0
    with open(classAndValuesFile) as f:
        for line in f:
            splitter = line.split(",", 2)
            x.append(imageToNumberArray(classesFiles + splitter[0]))
            y.append(splitter[1].strip())

    clf = svm.SVC(gamma=0.001, C=100)
    clf.fit(x,y)
    #print clf.predict(testImage)

The imageToNumberArray file is:

from PIL import Image
from numpy import array


def imageToNumberArray(path):
    img = Image.open(path)
    arr = array(img)
    return arr

And I'm getting the following error:

Traceback (most recent call last):
  File "D:\Research\project\testing.py", line 30, in <module>
main()
  File "D:\Research\project\testing.py", line 23, in main
clf.fit(x,y)
  File "C:\Python27\lib\site-packages\sklearn\svm\base.py", line 139, in fit
X = check_array(X, accept_sparse='csr', dtype=np.float64, order='C')
  File "C:\Python27\lib\site-packages\sklearn\utils\validation.py", line 344, in check_array
array = np.array(array, dtype=dtype, order=order, copy=copy)
ValueError: setting an array element with a sequence.

If I comment the clf.fit line it works just fine.

Also, If I print all the shapes of the matrices in X, I get something like this (some are 2D, some are 3D):

(59, 58, 4)
(49, 27, 4)
(570, 400, 3)
(471, 364)
(967, 729)
(600, 600, 3)
(325, 325, 3)
(386, 292)
(86, 36, 4)
(49, 26, 4)
(578, 244, 3)
(300, 300)
(995, 557, 3)
(1495, 677)
(400, 400, 3)
(200, 230, 3)
(74, 67, 4)
(49, 34, 4)
(240, 217, 3)
(594, 546, 4)
(387, 230, 3)
(297, 273, 4)
(400, 400, 3)
(387, 230, 3)
(86, 62, 4)
(50, 22, 4)
(499, 245, 3)
(800, 566, 4)
(1050, 750, 3)
(400, 400, 3)
(499, 245, 3)
(74, 53, 4)
(47, 26, 4)
(592, 348, 4)
(1050, 750, 3)
(1600, 1600)
(320, 320)
(84, 54, 4)
(47, 25, 4)
(600, 294, 3)
(400, 400, 3)
(1050, 750, 3)
(1478, 761)
(504, 300, 3)
(53, 84, 4)
(36, 42, 4)
(315, 600, 4)
(223, 425, 3)
(194, 325, 3)

The first two numbers are the size of the image.

What can I do the get rid of this error?

David Andrei Ned
  • 799
  • 1
  • 11
  • 28
Theo
  • 154
  • 1
  • 6
  • You almost definitely want to extract features from your images before doing any kind of machine learning (although I know KNN can work well for digit recognition). Check this out: http://www.codeproject.com/Articles/619039/Bag-of-Features-Descriptor-on-SIFT-Features-with-O – Ryan Aug 24 '15 at 16:09
  • Perhaps [this](http://stackoverflow.com/questions/25485503/valueerror-setting-an-array-element-with-a-sequence-while-using-svm-in-scikit) can help you. – deborah-digges Aug 24 '15 at 16:12

1 Answers1

2

You seem to be confused how SVM works. In short, x has to be one, big two-dimensional array, while in your case it is a list of various matrices. SVM will not ever run on such data. First, find a meaningful (in your data sense) way to represent each image as a constant size vector, which is often called feature extraction. One of the basic approaches would be to represent each image as some histogram or as bag of visual words.

lejlot
  • 64,777
  • 8
  • 131
  • 164