How scikit learn implements the output layer

Question

In scikit learn, how many neurons are in the output layer? As stated here, you can only specify the hidden layer size and their neurons but nothing about the output layer, thus I am not sure how scikit learn implements the output layer.
Does it make sense to use softmax activation function for output layer having only a single neuron?

@MaxU IF I have binary classes, does it mean that I should have two neurons in the output layers both having `softmax`? — Medo, Nov 17 '17 at 22:17
if you have only one binary class (either `0` or `1`), then one output neuron should be enough. How many unique values has your `y` (target) data set and which shape does it have? — MaxU - stand with Ukraine, Nov 17 '17 at 22:24
@MaxU Thanks for your response. My dataset has `10K` samples of `64` features. The class label is binary ( `1` or `-1`). Do you think the output activation function `softmax` would work here? I am thinking of `tanh` as well — Medo, Nov 17 '17 at 22:55

MaxU - stand with Ukraine · Accepted Answer · 2017-11-17T23:30:41.637

Test:

Setup:

In [227]: %paste
clf = MLPClassifier()

m = 10**3
n = 64

df = pd.DataFrame(np.random.randint(100, size=(m, n))).add_prefix('x') \
       .assign(y=np.random.choice([-1,1], m))


X_train, X_test, y_train, y_test = \
    train_test_split(df.drop('y',1), df['y'], test_size=0.2, random_state=33)

clf.fit(X_train, y_train)
## -- End pasted text --
Out[227]:
MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(100,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False)

Number of outputs:

In [229]: clf.n_outputs_
Out[229]: 1

Number of layers:

In [228]: clf.n_layers_
Out[228]: 3

The number of iterations the solver has ran:

In [230]: clf.n_iter_
Out[230]: 60

Here is an excerpt of the source code where the activation function for the output layer will be chosen:

    # Output for regression
    if not is_classifier(self):
        self.out_activation_ = 'identity'
    # Output for multi class
    elif self._label_binarizer.y_type_ == 'multiclass':
        self.out_activation_ = 'softmax'
    # Output for binary class and multi-label
    else:
        self.out_activation_ = 'logistic'

UPDATE: MLPClassifier binarizes (in a one-vs-all fashion) labels internaly, so logistic regression should work well also with labels that are differ from [0,1]:

    if not incremental:
        self._label_binarizer = LabelBinarizer()
        self._label_binarizer.fit(y)
        self.classes_ = self._label_binarizer.classes_

So `logistic` can work when the classes are `1` or `-1`? I thought it is only effective when the classes are binary i.e., `0` or `1` while others such as `tanh` work better for `1`,`-1` classes as stated https://en.wikipedia.org/wiki/Activation_function — Medo, Nov 17 '17 at 23:16

How scikit learn implements the output layer

1 Answers1