Well, I wrote this code to classify my data. My data is of 5000 instances and 260 features. Each feature is binomial, i.e. if word "money" is in the instance that I am categorizing, then feature 23 is 1, otherwise 0 etc. There are 4 categories. when I compute the final classes, there is 57% error. In most cases, the desired probability P(y=c|x) is 0 for all c. Even in the correct ones, the maximum of this value is e.g. P(y=1, x) = e^-80 but the others are even smaller so class 1 is selected which is true. So the problem is I guess the values are too small. How can I solve this? I've seen that working with logarithmic probabilities may be better but how can I implement this logarithmically? Thank you in advance.
I am putting the code as an appendix if there is anything missing or wrong is the code. labels = the data classes, normalized features = the data where rows are the instances and the columns are features. Thanks again.