2

Say I have a vector ValsHR which looks like this:

valsHR=[78.8, 82.3, 91.0]

And I have a dataframe MainData

Age  Patient  HR             
21   1        NaN
21   1        NaN
21   1        NaN
30   2        NaN
30   2        NaN
24   3        NaN
24   3        NaN
24   3        NaN 

I want to fill the NaNs so that the first value in valsHR will only fill in the NaNs for patient 1, the second will fill the NaNs for patient 2 and the third will fill in for patient 3.

So far I've tried using this: mainData['HR'] = mainData['HR'].fillna(ValsHR) but it fills all the NaNs with the first value in the vector.

I've also tried to use this: mainData['HR'] = mainData.groupby('Patient').fillna(ValsHR) fills the NaNs with values that aren't in the valsHR vector at all.

I was wondering if anyone knew a way to do this?

sahmed
  • 69
  • 6

2 Answers2

2

Create dictionary by Patient values with missing values, map to original column and replace missing values only:

print (df)
   Age  Patient     HR
0   21        1    NaN
1   21        1    NaN
2   21        1    NaN
3   30        2  100.0 <- value is not replaced
4   30        2    NaN
5   24        3    NaN
6   24        3    NaN
7   24        3    NaN


p = df.loc[df.HR.isna(), 'Patient'].unique()
valsHR = [78.8, 82.3, 91.0]

df['HR'] = df['HR'].fillna(df['Patient'].map(dict(zip(p, valsHR))))
print (df)
   Age  Patient     HR
0   21        1   78.8
1   21        1   78.8
2   21        1   78.8
3   30        2  100.0
4   30        2   82.3
5   24        3   91.0
6   24        3   91.0
7   24        3   91.0

If some groups has no NaNs:

print (df)
   Age  Patient     HR
0   21        1    NaN
1   21        1    NaN
2   21        1    NaN
3   30        2  100.0 <- group 2 is not replaced
4   30        2  100.0 <- group 2 is not replaced
5   24        3    NaN
6   24        3    NaN
7   24        3    NaN


p = df.loc[df.HR.isna(), 'Patient'].unique()
valsHR = [78.8, 82.3, 91.0]

df['HR'] = df['HR'].fillna(df['Patient'].map(dict(zip(p, valsHR))))
print (df)
   Age  Patient     HR
0   21        1   78.8
1   21        1   78.8
2   21        1   78.8
3   30        2  100.0
4   30        2  100.0
5   24        3   82.3
6   24        3   82.3
7   24        3   82.3
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

It is simply mapping, if all of NaN should be replaced

import pandas as pd
from io import StringIO


valsHR=[78.8, 82.3, 91.0]
vals = {i:k for i,k in enumerate(valsHR, 1)}

df = pd.read_csv(StringIO("""Age  Patient             
21   1
21   1
21   1
30   2
30   2
24   3
24   3
24   3"""), sep="\s+")


df["HR"] = df["Patient"].map(vals)

>>> df
   Age  Patient    HR
0   21        1  78.8
1   21        1  78.8
2   21        1  78.8
3   30        2  82.3
4   30        2  82.3
5   24        3  91.0
6   24        3  91.0
7   24        3  91.0
crayxt
  • 2,367
  • 2
  • 12
  • 17