I have seen threads with similar questions/problems but I have not found this very issue.
Suppose I train a NN with following cost function:
J(theta) = 1/m * sum(sum( -y * log(h(x)) - ( 1 - y ) * log(1-h(x)) ))
and also use sigmoid function as the activation function.
Now, e.g. for cancer detection, for a CV test I get 0.6 Precision and 0.6 Recall. If I want to get another Ratio of Precision and Recall (e.g. lower Precision but higher Recall) I can just change the threshold of a prediction function (i.e. h(output_layer) > threshold). I guess I could also: - change the NN architecture, - change the training set, - change regularization parameter and I would get a different result.
But what if I do NOT want to change any architecture of the NN. Is changing the threshold of the predict function really smart? I see it like that: We train our NN with the sigmoid function (that kind of checks if an activiation of a certain node is below or above 0.5, roughly speaking). And then, after we trained the network with this lower-or-higher-than-0.5 approach, we change the last prediction threshold to some other value. I do not think that this would be the optimal Precision/Recall Ratio (or F1 Score) that is possible with a certain training set and NN architecture. Or in other words, I do not think we 'walk along' the optimal ROC Curve. Is that correct?
My 2 thoughts on how to come up with a better solution:
1.) Change the activation function. Either to a completly different function or shift the sigmoid function (e.g. sigmoid new = 0.1 + sigmoid original). So I would also get more activation and I guess more Recall in the end.
2.) Change the Cost function (!). E.g. to
J(theta) = 1/m * sum(sum( ALPHA* -y * log(h(x)) - ( 1 - y ) * log(1-h(x)) )). With this Alpha (Scalar) I could punish the -y * log(h(x)) error more (alpha >1) or less (alpha <1). But would I need to also change the backpropagation and/or gradient calculation if I change the costfunction?
I'd appreciate every help, link or thought on this topic :-)
Best, Wolfgang