I have a dataframe with some string columns and number columns. I want to manage the missing values. I want to change the "nan" values with mean of each row. I saw the different question in this website, however, they are different from my question. Like this link: Pandas Dataframe: Replacing NaN with row average
If all the values of a rows are "Nan" values, I want to delete that rows. I have also provide a sample case as follows:
import pandas as pd
import numpy as np
df = pd.DataFrame()
df['id'] = ['a', 'b', 'c', 'n']
df['md'] = ['d', 'e', 'f', 'l']
df['c1'] = [2, np.nan,np.nan, 5]
df['c2'] = [0, 5, np.nan, 3]
df['c3'] = [8, 7, np.nan,np.nan]
df = pd.DataFrame()
df['id'] = ['a', 1, 'n']
df['md'] = ['d', 6, 'l']
df['c1'] = [2, 6, 5]
df['c2'] = [0, 5, 3]
df['c3'] = [8, 7,4]
df
Note: I have used the following code, however it is very slow and for a big dataframe it take a looong time to run.
index_colum = df.columns.get_loc("c1")
df_withno_id = df.iloc[:,index_colum:]
rowsidx_with_all_NaN = df_withno_id[df_withno_id.isnull().all(axis=1)].index.values
df = df.drop(df.index[rowsidx_with_all_NaN])
for i, cols in df_withno_id.iterrows():
if i not in rowsidx_with_all_NaN:
endsidx = len(cols)
extract_data = list(cols[0:endsidx])
mean = np.nanmean(extract_data)
fill_nan = [mean for x in extract_data if np.isnan(x)]
df.loc[i] = df.loc[i].replace(np.nan, mean)
Can anybody help me with this? thanks.