1

I have a DataFrame with 14 variables which has several NaNs. I would like to fill these NaNs with a specific value existing in a list.

This is the df:

          Date  CalamarQ  InkorQ  ...  SHelena2P   GamboteP    DiqueP
243 1990-09-01    6033.0     NaN  ...    0.00000   0.000000  0.000000
244 1990-09-02    6167.0     NaN  ...    0.00000   0.000000  0.000000
245 1990-09-03    6338.0     NaN  ...   11.04219  24.769873  8.574909
246 1990-09-04    6372.0     NaN  ...   10.57562   0.000000  0.000000
247 1990-09-05    6412.0     NaN  ...    0.00000   0.000000  0.000000
[5 rows x 14 columns]

Averages to fill NaNs:

[Timestamp('2006-10-31 14:54:32.727272704'), 8644.611548793053, 728.1650092081031, 526.5549786455156, 465.1721615134183, 610.2233078727016, 23.790737704918033, 4.019485580670304, 3.868294812414307, 4.023018770419772, 5.2216239281718835, 5.221623927818927, 5.118163585974665, 3.7718975661621954]

I would like that the NaNs are filled with the averages that correspond to the column placement, i.e., the NaNs from the 2nd column ('CalamarQ') will be replaced by 8644.611548793053.

So far, I have tried the following codes, but to no avail:

i = 1
for i in range(len(averages_wet)):
    for col in ts_wet.columns:
        ts_wet[cols].fillna(value = averages_wet[i], inplace=True)

and:

ts_wet.CalamarQ.fillna(averages_wet[1], inplace=True)

I get the following output:

<ipython-input-9-b7d4c7c0bde3>:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

Any help is appreciated.

MatBailie
  • 83,401
  • 18
  • 103
  • 137
Amy Zammit
  • 11
  • 2

2 Answers2

2

Use fillna with a dictionary made of the column names and values (with dict and zip):

out = ts_wet.fillna(dict(zip(ts_wet.columns, averages_wet)))

Alternatively, using a Series (probably less efficient):

out = ts_wet.fillna(pd.Series(averages_wet, index=ts_wet.columns))

Output:

           Date  CalamarQ      InkorQ  ...  SHelena2P   GamboteP    DiqueP
243  1990-09-01    6033.0  728.165009  ...    0.00000   0.000000  0.000000
244  1990-09-02    6167.0  728.165009  ...    0.00000   0.000000  0.000000
245  1990-09-03    6338.0  728.165009  ...   11.04219  24.769873  8.574909
246  1990-09-04    6372.0  728.165009  ...   10.57562   0.000000  0.000000
247  1990-09-05    6412.0  728.165009  ...    0.00000   0.000000  0.000000

Intermediate dictionary:

{'Date': Timestamp('2006-10-31 14:54:32.727272704'),
 'CalamarQ': 8644.611548793053,
 'InkorQ': 728.1650092081031,
 ... 
}
mozway
  • 194,879
  • 13
  • 39
  • 75
1

Use DataFrame.fillna by dictionary created by zip with columns names:

out = ts_wet.fillna(dict(zip(ts_wet.columns, averages_wet)))

If want replace all numeric columns by their means:

out = ts_wet.fillna(ts_wet.select_dtype(np.number).mean())
#alternative
#out = ts_wet.fillna(ts_wet.select.mean())
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252