The number of layers it's irrelevant at this stage. If you use softmax
then it's either categorical_crossentropy
or sparse_categorical_crossentropy
depending whether you one-hot-encoded the targets or not. But there's no consistency between softmax
output layer activation function and loss='binary_crossentropy'
, output is likely to be whacky.
model.add(Dense(2, activation='softmax')) #2 because it's a two class problem
model.compile(loss='categorical_crossentropy',
optimizer='adagrad', #optimizer can be whatever works best
metrics=['accuracy'])
Whether using softmax
or sigmoid
depends on your classification problem. Is it something like 'A vs NOT A' or 'A or B' . Plot the model performance, compare and drive conclusions.