0

I was trying to get dummy values for my data, when I noticed some values are having '?' as their value. As many rows in my data have these values, I simply cannot drop them. In such case what should I replace them with? Just taking the mode of the category will help? Also, I tried to replace the ? values with the mode.

df1 = df1[df1.workclass == '?'].replace('?',"Private")

But I get an empty table now.

enter image description here

  • Can you expand the explanation of your problem? Like how do yo create dataframe or how you are accesing, and what is the wrong ouput you are having... a [minimal, reproducible examlple code](https://stackoverflow.com/help/minimal-reproducible-example). – MrNobody33 Jun 10 '20 at 04:13

2 Answers2

0

It depends on the dataset. There are different methods that apply to different features. Some may require just replacing with the mode. In some cases, different ML algorithms and models are also used such as Random Forest, KNN, etc. So it completely depends on the type of data you are handling. Explore the field of data exploration. Maybe this can help you.

tanmayjain69
  • 158
  • 1
  • 10
0

You will have to manually check your different variables and decide what to do with missing for each parameter. for eg: You can drop the variables with >50 pc of missing unless they suggest very high weight of evidence. Some variables can be substituted with central tendencies or can be predicted as well. Categoricals can be replaces by UNK (unknown) and so on.

TBhavnani
  • 721
  • 7
  • 12