0

Anyone knows how to set up the perc.over and perc.under in my case? I tried a couple of combination, but it did not give me good result. I want my target variable to be split into almost 50/50. I have 266776 for my training set, and the current ratio of my target variable in this dataset is 88/12. Here is my code. smoted_data <- SMOTE(Response ~ ., data= train,perc.over = 100)

Gracetam
  • 19
  • 1
  • 6
  • if i am not mistaken, this should create a balanced data.set: smoted_data <- SMOTE(Response ~ ., data, perc.over = 100 * 88/12, perc.under = 100 + 12/88 * 100) – DPH Dec 12 '20 at 19:38
  • Which library are you using for SMOTE? – G5W Dec 12 '20 at 19:51
  • @G5W, i use this library(DMwR) – Gracetam Dec 12 '20 at 20:44
  • @DPH, unfortunately, i got an error.Error in factor(newCases[, a], levels = 1:nlevels(data[, a]), labels = levels(data[, : invalid 'labels'; length 0 should be 1 or 2 In addition: Warning messages: 1: In smote.exs(data[minExs, ], ncol(data), perc.over, k) : NAs introduced by coercion 2: In smote.exs(data[minExs, ], ncol(data), perc.over, k) : NAs introduced by coercion 3: In smote.exs(data[minExs, ], ncol(data), perc.over, k) : NAs introduced by coercion – Gracetam Dec 12 '20 at 21:00
  • @Gracetam that is because I had a typo:: smoted_data <- SMOTE(Response ~ ., data= train, perc.over = 100 * 88/12, perc.under = 100 + 12/88 * 100) – DPH Dec 12 '20 at 21:01
  • @Gracetam could you provide the output of dput(head(train,50) in the question? – DPH Dec 12 '20 at 21:04
  • @DPH i don't think you had a typo. i used this code, but i still got an error. The output is structure(list(id = c(332277L, 30514L),Gender = c("Female", "Male"),Age = c(51L, 24L),Driving_License = c(1L, 1L),Region_Code = c(28, 8, 11),Previously_Insured = c(0L, 1L),Vehicle_Age = c("1-2 Year", "< 1 Year", ">2 Year"),Vehicle_Damage = c("Yes","No"), Annual_Premium = c(34724, 30834), Policy_Sales_Channel = c(26, 152),Vintage = c(100L, 22L), Response =c(1L, 2L). I only selected some, otherwise, it is too much to upload here. – Gracetam Dec 12 '20 at 22:38
  • @DPH the error was the same. Error in factor(newCases[, a], levels = 1:nlevels(data[, a]), labels = levels(data[, : invalid 'labels'; length 0 should be 1 or 2 In addition: Warning messages: 1: In smote.exs(data[minExs, ], ncol(data), perc.over, k) : NAs introduced by coercion 2: In smote.exs(data[minExs, ], ncol(data), perc.over, k) : NAs introduced by coercion 3: In smote.exs(data[minExs, ], ncol(data), perc.over, k) : NAs introduced by coercion – Gracetam Dec 12 '20 at 22:42
  • @DPH Response is set as factor. – Gracetam Dec 12 '20 at 22:42
  • @Gracetam convert it to character with as.character() and after smote turn it to a factor again with as.factor() – DPH Dec 12 '20 at 22:44
  • @DPH i got an error. Error in T[i, ] : subscript out of bounds – Gracetam Dec 12 '20 at 23:08
  • @Gracetam I am afraid a reproduceble example is needed: https://www.tidyverse.org/help/ - please produce one of your data and see if it produces the same error, if so alter your original question with the reprex – DPH Dec 13 '20 at 00:16

0 Answers0