0

I am trying to plot and create a visual decision boundary for my SVM model. A bit of background about the dataset and task. It's a binary classification task and I am classifying news articles as either fake or real. I would like to visually see this decision boundary. The graph code snippet is from here: https://medium.com/geekculture/svm-classification-with-sklearn-svm-svc-how-to-plot-a-decision-boundary-with-margins-in-2d-space-7232cb3962c0

I tried using the 'test_x_vectorize' as the x and y but I would get the error of: TypeError: unhashable type: 'csr_matrix'

I then tried flattening as per this thread but it gave me the same issue. TypeError: unhashable type: 'matrix'

here is my code:

# test/train split for X and Y

X_train, X_test, Y_train, Y_test = train_test_split(data['News'], data['Label'], test_size=0.2, random_state=21, shuffle=True)

# Creating the vectorizer using TfidfVectorizer
vectorize = TfidfVectorizer(max_features=5)
vectorize.fit(data['News'])

train_x_vectorize = vectorize.transform(X_train)
test_x_vectorize = vectorize.transform(X_test)

# Creating the SVM model
SVM = svm.SVC(C=1, kernel='linear', degree=3, gamma='auto')
SVM.fit(train_x_vectorize, Y_train)

# Predicting the accuracy on testing data
pred = SVM.predict(test_x_vectorize)

plt.figure(figsize=(10, 8))
# Plotting our two-features-space
sns.scatterplot(x=X_train[:, 0], 
                y=X_train[:, 1], 
                hue=Y_train, 
                s=8);
# Constructing a hyperplane using a formula.
w = SVM.coef_[0]           # w consists of 2 elements
b = SVM.intercept_[0]      # b consists of 1 element
x_points = np.linspace(-1, 1)    # generating x-points from -1 to 1
y_points = -(w[0] / w[1]) * x_points - b / w[1]  # getting corresponding y-points
# Plotting a red hyperplane
plt.plot(x_points, y_points, c='r');

My traceback error is here:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-8-0535339c273a> in <module>()
     30 plt.figure(figsize=(10, 8))
     31 # Plotting our two-features-space
---> 32 sns.scatterplot(x=X_train[:, 0], 
     33                 y=X_train[:, 1],
     34                 hue=Y_train,

2 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/series.py in _get_values_tuple(self, key)
    954 
    955         if not isinstance(self.index, MultiIndex):
--> 956             raise KeyError("key of type tuple not found and not a MultiIndex")
    957 
    958         # If key is contained, would have returned by now

KeyError: 'key of type tuple not found and not a MultiIndex'

ambleparadox
  • 11
  • 1
  • 4

1 Answers1

0

Try using

sns.scatterplot(x=X_train.iloc[:, 0], 
                y=X_train.iloc[:, 1], 
                hue=Y_train, 
                s=8);

This would specify that you trying to access by index

loamoza
  • 128
  • 8