3
  1. In scikit learn, how many neurons are in the output layer? As stated here, you can only specify the hidden layer size and their neurons but nothing about the output layer, thus I am not sure how scikit learn implements the output layer.
  2. Does it make sense to use softmax activation function for output layer having only a single neuron?
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
Medo
  • 952
  • 3
  • 11
  • 22
  • I guess it should be equal to `# of classes`, no? – MaxU - stand with Ukraine Nov 17 '17 at 22:15
  • @MaxU IF I have binary classes, does it mean that I should have two neurons in the output layers both having `softmax`? – Medo Nov 17 '17 at 22:17
  • if you have only one binary class (either `0` or `1`), then one output neuron should be enough. How many unique values has your `y` (target) data set and which shape does it have? – MaxU - stand with Ukraine Nov 17 '17 at 22:24
  • @MaxU Thanks for your response. My dataset has `10K` samples of `64` features. The class label is binary ( `1` or `-1`). Do you think the output activation function `softmax` would work here? I am thinking of `tanh` as well – Medo Nov 17 '17 at 22:55

1 Answers1

1

Test:

Setup:

In [227]: %paste
clf = MLPClassifier()

m = 10**3
n = 64

df = pd.DataFrame(np.random.randint(100, size=(m, n))).add_prefix('x') \
       .assign(y=np.random.choice([-1,1], m))


X_train, X_test, y_train, y_test = \
    train_test_split(df.drop('y',1), df['y'], test_size=0.2, random_state=33)

clf.fit(X_train, y_train)
## -- End pasted text --
Out[227]:
MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(100,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False)

Number of outputs:

In [229]: clf.n_outputs_
Out[229]: 1

Number of layers:

In [228]: clf.n_layers_
Out[228]: 3

The number of iterations the solver has ran:

In [230]: clf.n_iter_
Out[230]: 60

Here is an excerpt of the source code where the activation function for the output layer will be chosen:

    # Output for regression
    if not is_classifier(self):
        self.out_activation_ = 'identity'
    # Output for multi class
    elif self._label_binarizer.y_type_ == 'multiclass':
        self.out_activation_ = 'softmax'
    # Output for binary class and multi-label
    else:
        self.out_activation_ = 'logistic'

UPDATE: MLPClassifier binarizes (in a one-vs-all fashion) labels internaly, so logistic regression should work well also with labels that are differ from [0,1]:

    if not incremental:
        self._label_binarizer = LabelBinarizer()
        self._label_binarizer.fit(y)
        self.classes_ = self._label_binarizer.classes_
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • So `logistic` can work when the classes are `1` or `-1`? I thought it is only effective when the classes are binary i.e., `0` or `1` while others such as `tanh` work better for `1`,`-1` classes as stated https://en.wikipedia.org/wiki/Activation_function – Medo Nov 17 '17 at 23:16