0

i'm in this situation, my df is like that

    A   B   
0   0.0 2.0 
1   3.0 4.0 
2   NaN 1.0 
3   2.0 NaN 
4   NaN 1.0 
5   4.8 NaN 
6   NaN 1.0 

and i want to apply this line of code: df['A'] = df['B'].fillna(df['A'])

and I expect a workflow and final output like that:

    A   B   
0   2.0 2.0 
1   4.0 4.0 
2   1.0 1.0 
3   NaN NaN 
4   1.0 1.0 
5   NaN NaN 
6   1.0 1.0 

    A   B   
0   2.0 2.0 
1   4.0 4.0 
2   1.0 1.0 
3   2.0 NaN 
4   1.0 1.0 
5   4.8 NaN 
6   1.0 1.0 

but I receive this error:

TypeError: Unsupported type Series

probably because each time there is an NA it tries to fill it with the whole series and not with the single element with the same index of the B column.

I receive the same error with a syntax like that: df['C'] = df['B'].fillna(df['A']) so the problem seems not to be the fact that I'm first changing the values of A with the ones of B and then trying to fill the "B" NA with the values of a column that is technically the same as B

I'm in a databricks environment and I'm working with koalas data frames but they work as the pandas ones. can you help me?

2 Answers2

0

IIUC:

try with max():

df['A']=df[['A','B']].max(axis=1)

output of df:

    A       B
0   2.0     2.0
1   4.0     4.0
2   1.0     1.0
3   2.0     NaN
4   1.0     1.0
5   4.8     NaN
6   1.0     1.0
Anurag Dabas
  • 23,866
  • 9
  • 21
  • 41
0

Another option

Suppose the following dataset

import pandas as pd
import numpy as np

df = pd.DataFrame(data={'State':[1,2,3,4,5,6, 7, 8, 9, 10], 
                         'Sno Center': ["Guntur", "Nellore", "Visakhapatnam", "Biswanath", "Doom-Dooma", "Guntur", "Labac-Silchar", "Numaligarh", "Sibsagar", "Munger-Jamalpu"], 
                         'Mar-21': [121, 118.8, 131.6, 123.7, 127.8, 125.9, 114.2, 114.2, 117.7, 117.7],
                         'Apr-21': [121.1, 118.3, 131.5, np.NaN, 128.2, 128.2, 115.4, 115.1, np.NaN, 118.3]})

df
State   Sno Center      Mar-21  Apr-21
0   1   Guntur          121.0   121.1
1   2   Nellore         118.8   118.3
2   3   Visakhapatnam   131.6   131.5
3   4   Biswanath       123.7   NaN
4   5   Doom-Dooma      127.8   128.2
5   6   Guntur          125.9   128.2
6   7   Labac-Silchar   114.2   115.4
7   8   Numaligarh      114.2   115.1
8   9   Sibsagar        117.7   NaN
9   10  Munger-Jamalpu  117.7   118.3

Then

df.loc[(df["Mar-21"].notnull()) & (df["Apr-21"].isna()), "Apr-21"] = df["Mar-21"]

df
State   Sno Center      Mar-21  Apr-21
0   1   Guntur          121.0   121.1
1   2   Nellore         118.8   118.3
2   3   Visakhapatnam   131.6   131.5
3   4   Biswanath       123.7   123.7
4   5   Doom-Dooma      127.8   128.2
5   6   Guntur          125.9   128.2
6   7   Labac-Silchar   114.2   115.4
7   8   Numaligarh      114.2   115.1
8   9   Sibsagar        117.7   117.7
9   10  Munger-Jamalpu  117.7   118.3
Samir Hinojosa
  • 825
  • 7
  • 24