1

I have a very big DataFrame that looks like:

    c1   c2    c3
0  NaN  1.0   NaN
1  NaN  NaN   NaN
2  3.0  6.0   9.0
3  NaN  7.0  10.0
...

I want to:

1- Delete the rows with all "Nan" values. like the second row in the sample.

2- Replace all the "Nan" values in other rows with the mean of the rows.

Note: in the rows, we have different "Nan" values. could you please help me with that? Thanks.

Also, this link does not solve my question: Pandas Dataframe: Replacing NaN with row average

Here is a sample of my DataFrame:

import pandas as pd
import numpy as np


df = pd.DataFrame()
df['c1'] = [np.nan, np.nan, 3, np.nan]
df['c2'] = [1, np.nan, 6, 7]
df['c3'] = [np.nan, np.nan, 9, 10]

Update: When we don't want to consider the mean of all rows. sample dataframe:

import pandas as pd
import numpy as np


df = pd.DataFrame()
df['id'] = [1, 2, 3, 4, 5]
df['c1'] = [np.nan, np.nan, 3, np.nan, 5]
df['c2'] = [1, np.nan, 3, 11, 5]
df['c3'] = [1, np.nan, 3, 11, np.nan]
df['c4'] = [3, np.nan, 3, 11, 5]

output: 
df = pd.DataFrame()
df['id'] = [1,  3, 4, 5]
df['c1'] = [ 5/3, 3, 11, 5]
df['c2'] = [1,  3, 11, 5]
df['c3'] = [1,  3, 11, 5]
df['c4'] = [3,  3, 11, 5]
df

For this part, I don't want to consider the value of id for calculating the mean of row.

2 Answers2

1

how about this :

df = df.T.fillna(df.mean(axis=1)).T.dropna()
print(df)

output:

>>>
    c1   c2    c3
0  1.0  1.0   1.0
2  3.0  6.0   9.0
3  8.5  7.0  10.0
eshirvana
  • 23,227
  • 3
  • 22
  • 38
  • Thanks @eshirvana. However, this provide the mean of coulmns, not the rows. –  Feb 08 '22 at 20:55
  • @Yellowman you already got your answer , but see my updated answer as well – eshirvana Feb 08 '22 at 21:44
  • Thank you so much. Yes. It works now. –  Feb 08 '22 at 22:11
  • Can I ask a quick question? I want to replace the df.mean which return the mean of row. I mean, I want to replace the Nan with mean of column 1 until the end. Do you know how to solve that? I asked this question in the first version very bad. –  Feb 08 '22 at 22:16
  • Not sure I understand you correctly , provide sample data and desired output – eshirvana Feb 08 '22 at 22:50
  • I have added an update to the main question. –  Feb 08 '22 at 22:55
0

You could create a dictionary from the column names and row means and pass it to fillna to fill the NaN values. Then drop the NaN rows (which won't get filled in because all NaN rows have mean NaN).

out = df.fillna(dict.fromkeys(df.columns, df.mean(axis=1))).dropna()

Another possibility is to transpose the DataFrame and use fillna to fill, then transpose back:

df_T = df.T
df_T.fillna(df_T.mean()).T.dropna()

Output:

    c1   c2    c3
0  1.0  1.0   1.0
2  3.0  6.0   9.0
3  8.5  7.0  10.0
  • Thank you @enke. It works for me. A quick question, since my data is very very large, is this method fast enough? Sorry, I am new to python and data frame and probably this is very basic question. –  Feb 08 '22 at 20:59
  • @Yellowman should be fast enough, I think –  Feb 08 '22 at 21:01
  • Thank you so much. –  Feb 08 '22 at 21:06