1

I am trying to use sklearn.neural_network.BernoulliRBM with iris dataset:

from sklearn import datasets
iris = datasets.load_iris() 
collist = ['SL', 'SW', 'PL', 'PW']
dat = pd.DataFrame(data=iris.data, columns=collist)

from sklearn.neural_network import BernoulliRBM
model = BernoulliRBM(n_components=2)
scores = model.fit_transform(dat)
print(scores.shape)
print(scores)

However, I am only getting 1 as output for all rows:

(150, 2)
[[1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]  # same for all rows

Can I get values similar to scores for individual rows as I can get in principal component analysis? Else how can I get some useful numbers from RBM? I tried model.score_samples(dat) but that also gives value of 0 for vast majority of rows.

rnso
  • 23,686
  • 25
  • 112
  • 234

1 Answers1

1

According to the documentation:

The model makes assumptions regarding the distribution of inputs. At the moment, scikit-learn only provides BernoulliRBM, which assumes the inputs are either binary values or values between 0 and 1, each encoding the probability that the specific feature would be turned on.

Since your dat values are all greater than 1, I'm guessing the model is truncating all input data to 1.0. If, for example, you apply a normalization:

from sklearn.preprocessing import normalize
scores = model.fit_transform(normalize(dat))

You'll get values with some variation:

array([[0.23041219, 0.23019722],
   [0.23046652, 0.23025144],
   ...,
   [0.23159369, 0.23137678],
   [0.2316786 , 0.23146158]])

Since your input features must have an interpretation as probabilities, you'll want to think about what if any normalization is reasonable for the particular problem you are solving.

Ryan Walker
  • 3,176
  • 1
  • 23
  • 29
  • It works, but it is not able to separate 3 species of iris dataset. What does this mean? – rnso Apr 09 '18 at 02:42
  • I think it fails to separate because your features, even when normalized, don't represent probabilities. You can try other normalization strategies but I suspect this model (at least as implemented in sklearn) isn't a good choice for representing continuous features (length, width, etc.). – Ryan Walker Apr 09 '18 at 14:21
  • What is the main situation where BernoulliRBM is especially helpful? – rnso Apr 09 '18 at 14:25
  • When your features have a natural scaling into 0-1 range. Some examples are image classification and bag of words text models. http://scikit-learn.org/stable/auto_examples/neural_networks/plot_rbm_logistic_classification.html – Ryan Walker Apr 09 '18 at 14:35