2

I am using multinom from nnet package to fit a logistic regression model to data consists of 3 classes, however the prevalence of the classes is not balanced. I would like to assign weight/penalties in order to tell the model to avoid misclassification for a certain class. Here is my code and a slice of my data:

 mnm <- multinom(formula = cut.rank ~ ., data = training.logist, trace = FALSE, maxit = 1000, weights=c(10,5,1))

> str(head(training.logist))
'data.frame':   6 obs. of  15 variables:
 $ is_top_rated_listing                       : Factor w/ 2 levels "0","1": 1 1 1 2 2 2
 $ seller_is_top_rated_seller                 : int  1 1 1 1 1 1
 $ is_auto_pay                                : Factor w/ 2 levels "0","1": 2 2 2 2 2 2
 $ is_returns_accepted                        : Factor w/ 2 levels "0","1": 2 2 2 2 2 2
 $ seller_feedback_rating_star                : Factor w/ 11 levels "Blue","Green",..: 7 7 7 9 9 9
 $ keywords_title_assoc                       : num  1 1 1 1 1 1
 $ normalized.price_shipping                  : num  0 0 0.00871 0.01853 0.01853 ...
 $ normalized.seller_feedback_score           : num  0.7117 0.8791 0.0966 0.095 0.095 ...
 $ normalized.seller_positive_feedback_percent: num  0.7117 0.8791 0.0966 0.095 0.095 ...
 $ item_condition                             : Factor w/ 2 levels "New","New other (see details)": 1 1 1 1 1 1
 $ listing_type                               : Factor w/ 2 levels "FixedPrice","StoreInventory": 2 2 2 1 1 1
 $ best_offer_enabled                         : Factor w/ 2 levels "0","1": 1 1 1 1 1 1
 $ shipping_handling_time                     : int  10 10 10 1 1 1
 $ shipping_locations                         : Factor w/ 7 levels "AU,Americas,Europe,Asia",..: 5 5 5 5 5 5
 $ cut.rank                                   : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1
> 

Anyone have an idea how to assign misclassification penalties? specifically I would like assign a penalty ratio of 10:5:1 (correspond to class 1,2,3) meaning I really like to be accurate on class 1. The distribution of my target variable cut.rank is ~ 0.04,0.08,0.88. Because class 1 has a low prevalence the model sensitivity for that class is low.

user3628777
  • 529
  • 3
  • 10
  • 20

0 Answers0