0

I have been trying to clustering based on the SGD model parameters (Coefficient and Intercept). coef_ holds the weights w and intercept_ holds b. How can those parameters be used with clustering (KMedoids) on a group of the learned model?

import numpy as np
from sklearn import linear_model
X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
Y = np.array([1, 1, 2, 2])
clf = linear_model.SGDClassifier()
clf.fit(X, Y)

So I want to make clustering based on clf.coef_ (array([[19.47419669, 9.73709834]])) and clf.intercept_ (array([-10.])) for each learned model.

mac13k
  • 2,423
  • 23
  • 34
Man.utd
  • 59
  • 7
  • Hi, welcome to StackOverflow. I think this question is a bit difficult to understand, what do you mean by clustering based on SGD model parameters? – johannesack Aug 04 '20 at 12:55
  • i mean clustering based on SGD model parameters (coef_ and intercept_) instead of using X values (data points) @JohannesAck – Man.utd Aug 04 '20 at 13:18
  • Yes, but the coeff and intercept parameters are parameters of the learned model, not of the data points. Therefore it is just one set of parameters, clustering does not make a lot of sense here. Do you maybe want to use the SGD model to predict the potentially denoised Y value for each X, and then clustere these "denoised" (x,y) pairs? – johannesack Aug 04 '20 at 13:23
  • @JohannesAck, I want to cluster the learned model based on their parameters (coeff and intercept parameters)? because this part is part of my big code for distributed machine learning – Man.utd Aug 04 '20 at 13:28
  • first, should I used both (coeff and intercept parameters) ? second how ? because they have different vector? – Man.utd Aug 04 '20 at 13:30
  • Let me see if understand your question correctly: you want to create multiple models (using SGD) and then cluster these models? – Rotem Tal Aug 04 '20 at 14:24
  • @ rotem tal, yes this what i want, using model parameters as input to KMedoids – Man.utd Aug 04 '20 at 14:26

1 Answers1

0

Build your X dataset for clustering by appending the coeffs and intercept arrays every time after you train a model, ie.:

X = np.vstack((X, np.hstack((clf.coeff_, clf.intercept_))))

Once you have all your data in X feed it a KMedoids model, ie.:

from sklearn_extra.cluster import KMedoids

kmed = KMedoids(n_clusters=N).fit(X)

Note that you have specify N and you should probably test the clustering results for a number of values of N before choosing the best one based on one or more of clustering metrics.

mac13k
  • 2,423
  • 23
  • 34
  • I got the following error.....ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s) @mac13k – Man.utd Aug 04 '20 at 15:21
  • Check the shape and content of the input array. – mac13k Aug 04 '20 at 16:10
  • @ mac13k clf.coef_.shape, clf.intercept_.shape return ((1, 2), (1,)) – Man.utd Aug 04 '20 at 16:29
  • You cannot feed those to KMedoids directly. Build the input array with 3 columns and as many rows as many models you train, then fit it into KMedoids. – mac13k Aug 04 '20 at 16:38