If I have a non-numeric variable in my data set that contains many of one class but few of another does this cause the same issues as when the target classes are unbalanced?
For example if one of my variables was title and the aim was to identify whether a person is obese. The data obese class is split 50:50 but there is only one row with the title 'Duke' and this row is in the obese class. Does this mean that an algorithm like logistic regression (after numeric encoding) would start predicting that all Dukes are obese (or have a disproportionate weighting for the title 'Duke')? If so, are some algorithms better/worse at handling this case? Is there a way to prevent this issue?