I have a DataFrame, df
, containing several columns. Some of the values in df
are NaN
. I want to replace each NaN
with a valid value, chosen by randomly sampling from other values in the given column.
For instance, if:
df[work] = [4, 7, NaN, 4]
I'd like to replace df[work][2]
with 4 2/3 of the time and 7 1/3 of the time.
Here's my attempt:
def resample_fillna(df):
for col in df.columns:
# get series consisting of non-NaN values
valid_series = df[col].dropna()
nan_indices = np.argwhere(np.isnan(df[col]))
for nan_index in nan_indices:
df[col][nan_index] = valid_series.sample(n=1)
I'm thinking there's a much better, more Pythonic way. Any thoughts?
Thanks!