2

I'm working on an example of applying Restricted Boltzmann Machine on Iris dataset. Essentially, I'm trying to make a comparison between RMB and LDA. LDA seems to produce a reasonable correct output result, but the RBM isn't. Following a suggestion, I binarized the feature inputs using skearn.preprocessing.Binarizer, and also tried different threshold parameter values. I tried several different ways to apply binarization, but none seemed to work for me.

Below is my modified version of the code based on this user's version User: covariance.

Any helpful comments are greatly appreciated.

from sklearn import linear_model, datasets, preprocessing
from sklearn.cross_validation import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.neural_network import BernoulliRBM
from sklearn.lda import LDA

# import some data to play with
iris = datasets.load_iris()
X = iris.data[:,:2]  # we only take the first two features.
Y = iris.target

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=10)

# Models we will use
rbm = BernoulliRBM(random_state=0, verbose=True)
binarizer = preprocessing.Binarizer(threshold=0.01,copy=True)
X_binarized = binarizer.fit_transform(X_train)
hidden_layer = rbm.fit_transform(X_binarized, Y_train)
logistic = linear_model.LogisticRegression()
logistic.coef_ = hidden_layer
classifier = Pipeline(steps=[('rbm', rbm), ('logistic', logistic)])
lda = LDA(n_components=3)


#########################################################################

# Training RBM-Logistic Pipeline
logistic.fit(X_train, Y_train)
classifier.fit(X_binarized, Y_train)

#########################################################################

# Get predictions
print "The RBM model:"
print "Predict: ", classifier.predict(X_test)
print "Real:    ", Y_test

print

print "Linear Discriminant Analysis: "
lda.fit(X_train, Y_train)
print "Predict: ", lda.predict(X_test)
print "Real:    ", Y_test 
Community
  • 1
  • 1
sgerat
  • 21
  • 1
  • 2

1 Answers1

4

RBM and LDA are not directly comparable, as RBM doesn't perform classification on its own. Though you are using it as a feature engineering step with logistic regression at the end, LDA is itself a classifier - so the comparison isn't very meaningful.

The BernoulliRBM in scikit learn only handles binary inputs. The iris dataset has no sensible binarization, so you aren't going to get any meaningful outputs.

Raff.Edward
  • 6,404
  • 24
  • 34
  • Good point. But, I'm curious why this user said he was able to do classification in this post: [link] (http://stackoverflow.com/questions/23419165/restricted-boltzmann-machine-how-to-predict-class-labels) – sgerat Sep 23 '15 at 21:25
  • He didn't use an RBM for classification. He used it for feature engineering. And it works poorly for him, always outputing a label of "2". This is because the iris data doesn't have any sensible binarization. Your using the wrong tools for this data and thats why it dosn't work for you and it didn't work for the person you posted – Raff.Edward Sep 24 '15 at 01:11