Smote function in R

Question

Anyone knows how to set up the perc.over and perc.under in my case? I tried a couple of combination, but it did not give me good result. I want my target variable to be split into almost 50/50. I have 266776 for my training set, and the current ratio of my target variable in this dataset is 88/12. Here is my code. smoted_data <- SMOTE(Response ~ ., data= train,perc.over = 100)

if i am not mistaken, this should create a balanced data.set: smoted_data <- SMOTE(Response ~ ., data, perc.over = 100 * 88/12, perc.under = 100 + 12/88 * 100) — DPH, Dec 12 '20 at 19:38
@DPH, unfortunately, i got an error.Error in factor(newCases[, a], levels = 1:nlevels(data[, a]), labels = levels(data[, : invalid 'labels'; length 0 should be 1 or 2 In addition: Warning messages: 1: In smote.exs(data[minExs, ], ncol(data), perc.over, k) : NAs introduced by coercion 2: In smote.exs(data[minExs, ], ncol(data), perc.over, k) : NAs introduced by coercion 3: In smote.exs(data[minExs, ], ncol(data), perc.over, k) : NAs introduced by coercion — Gracetam, Dec 12 '20 at 21:00
@Gracetam that is because I had a typo:: smoted_data <- SMOTE(Response ~ ., data= train, perc.over = 100 * 88/12, perc.under = 100 + 12/88 * 100) — DPH, Dec 12 '20 at 21:01
@Gracetam could you provide the output of dput(head(train,50) in the question? — DPH, Dec 12 '20 at 21:04
@DPH i don't think you had a typo. i used this code, but i still got an error. The output is structure(list(id = c(332277L, 30514L),Gender = c("Female", "Male"),Age = c(51L, 24L),Driving_License = c(1L, 1L),Region_Code = c(28, 8, 11),Previously_Insured = c(0L, 1L),Vehicle_Age = c("1-2 Year", "< 1 Year", ">2 Year"),Vehicle_Damage = c("Yes","No"), Annual_Premium = c(34724, 30834), Policy_Sales_Channel = c(26, 152),Vintage = c(100L, 22L), Response =c(1L, 2L). I only selected some, otherwise, it is too much to upload here. — Gracetam, Dec 12 '20 at 22:38
@DPH the error was the same. Error in factor(newCases[, a], levels = 1:nlevels(data[, a]), labels = levels(data[, : invalid 'labels'; length 0 should be 1 or 2 In addition: Warning messages: 1: In smote.exs(data[minExs, ], ncol(data), perc.over, k) : NAs introduced by coercion 2: In smote.exs(data[minExs, ], ncol(data), perc.over, k) : NAs introduced by coercion 3: In smote.exs(data[minExs, ], ncol(data), perc.over, k) : NAs introduced by coercion — Gracetam, Dec 12 '20 at 22:42
@Gracetam convert it to character with as.character() and after smote turn it to a factor again with as.factor() — DPH, Dec 12 '20 at 22:44
@DPH i got an error. Error in T[i, ] : subscript out of bounds — Gracetam, Dec 12 '20 at 23:08
@Gracetam I am afraid a reproduceble example is needed: https://www.tidyverse.org/help/ - please produce one of your data and see if it produces the same error, if so alter your original question with the reprex — DPH, Dec 13 '20 at 00:16

Smote function in R

0 Answers0