Why is the imputation not working even though the array is larger then the number of nulls?

Question

#Num has too much unique values and we cant do simple imputation because each unique value has such a low count that setting all 72 nulls to that value will skew the results so we just randomly impute the nulls using the existing values

num_values = X_train['Num'].dropna().values
# Shuffle non-missing values multiple times
num_shuffled = num_values.copy()  # Make a copy to shuffle multiple times
np.random.shuffle(num_shuffled)
num_shuffled=pd.Series(num_shuffled)
# Fill missing values with the shuffled values
X_train['Num']=X_train['Num'].fillna(num_shuffled)

The number of nulls decreased but did not go to zero even though there is a lot more values in num_shuffled then the number of nulls

Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. — Community, Aug 26 '23 at 09:20
"The number of nulls decreased but did not go to zero even though there is a lot more values in num_shuffled then the number of nulls" - in your own words, **why should that be sufficient**? When `fillna` takes a value from `num_shuffled`, how do you think it decides which one to take? How long do you think `num_shuffled` should need to be, in order to solve the problem, and why do you think this is so? — Karl Knechtel, Aug 26 '23 at 09:24
(Hint: did you try to check how many values are in `num_shuffled`? Do you see any pattern, in which values are still missing in `X_train['Num']` afterwards? Do you see how that pattern is related to the length of `num_shuffled`? Did you try [reading the documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html) in order to understand how it works?) — Karl Knechtel, Aug 26 '23 at 09:26

Why is the imputation not working even though the array is larger then the number of nulls?

0 Answers0