0

I have a dataframe with 25 variables with 70000 rows. Out of which 3 variables have 1000, 250 and 250 NA values respectively. How to replace NA values using WOE method. Do we need to replace all the columns with WOE values or only find the WOE for 3 variables and replacing them in the dataframe is sufficient. We have both continous and categorical data.

What should be the approach? Please help

Carl
  • 4,232
  • 2
  • 12
  • 24
Rameez Shaik
  • 31
  • 1
  • 5
  • I am not sure if we can replace missing values with WOE, woe calculated by bining the continous data into groups , we also check for monotonicity with dependent variable. If your task is to replace missing values it should have been done before calculation of missing values, you can check a package called `mice`. It has lot of different ways of imputing missing values. Also you can sometimes apply business rules to fill missing values , for example a marital status can be guessed using age,location and salary etc. It all depends how you want to do it. – PKumar Jun 22 '18 at 06:11
  • 1
    If your task is to replace missing values it should have been done before calculation of woe values (my apologies, from previous comment) . For categorical data we can directly calculate the woe (no need of binning). Sometimes its good idea to remove the entire rows of missing values(analyst has to see what percentage of missing values are present w.r.t overall data). – PKumar Jun 22 '18 at 06:21
  • Thank you very much – Rameez Shaik Jun 22 '18 at 06:22

0 Answers0