3

As is known all probabilities need to sum up to 1. I do have a Pandas Dataframe where sometimes the probabiltiy of one event does miss.
Since I know all elements of a row need to sum up to one. I want to replace Nan by a calculated Value. With something like the following for each row in my Pandas Data Frame

for item, row in df:
    df.replace(Nan,(1-sum of row()) 

As an example, here's the array I do use as testing Data the moment:

    matrixsum
     e    f    g
a  0.3  0.2  Nan
b  0.2  0.2  0.6
c  0.7  0.1  Nan

By using df.fillna(0) i do get this:

  matrixsum
     e    f    g
a  0.3  0.2  0.0
b  0.2  0.2  0.6
c  0.7  0.1  0.0

An additional problem is the fact that only rows with float or int format can be summed to 1, but nan has string-formated. At the moment I just use df.fillna(0) but this is a bad thing to do.

Expectedt Output:

  matrixsum
     e    f    g
a  0.3  0.2  0.5
b  0.2  0.2  0.6
c  0.7  0.1  0.2
Hans Peter
  • 99
  • 8

2 Answers2

2

If you are sure that your Nan for all rows always appear in a single column(let's say g), you can do this:

Consider below df:

In [21]: df
Out[21]: 
     e    f    g
a  0.3  0.2  Nan
b  0.2  0.2  0.6
c  0.7  0.1  Nan

In [22]: df['g'] = 1 - df.sum(1)

In [23]: df
Out[23]: 
     e    f    g
a  0.3  0.2  0.5
b  0.2  0.2  0.6
c  0.7  0.1  0.2
Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58
  • The missing Values are randomly distributed troughout the whole Dataset. So your Solution does not really solve my Problem right now, but i can imagine someone else might be very happy about this. Maybe i should have clarified this in my Question, so its my bad. Thanks for your Efford and your help to improve my Question text. – Hans Peter Apr 26 '21 at 15:35
1

You can first convert your dataframe to numeric values, and then fill the NaNs of each row by 1- row.sum():

df = df.apply(pd.to_numeric, errors="coerce")
df = df.apply(lambda row: row.fillna(1 - row.sum()), axis=1)

or equivalently, you can combine these two in a function:

def markovize(row):
    row = pd.to_numeric(row, errors="coerce")
    return row.fillna(1 - row.sum())

df = df.apply(markovize, axis=1)

Before:

     e    f    g
a  0.3  0.2  Nan
b  0.2  0.2  0.6
c  0.7  0.1  Nan

After:

     e    f    g
a  0.3  0.2  0.5
b  0.2  0.2  0.6
c  0.7  0.1  0.2
Mustafa Aydın
  • 17,645
  • 4
  • 15
  • 38
  • 1
    i did try this solution and as far as i could check it, it does work for me. I think i will accpet the Answer after some more compatibilty checks tomorrow. Thanks for your Help and have nice Evening. – Hans Peter Apr 26 '21 at 15:28