-1

I am working on Dataset and it has some missing values. I am trying to fill those values. Here a snap of my code.

table = df.pivot_table(values='LoanAmount', index='Self_Employed' ,columns='Education', aggfunc=np.median)

def fage(x):
    return table.loc[x['Self_Employed'],x['Education']]

df['LoanAmount'].fillna(df[df['LoanAmount'].isnull()].apply(fage, axis=1), inplace=True)

this is showing the following error.

KeyError: (nan, 'occurred at index 95') 

here is a snap of dataset.dataset snap

I don't know why this error has occured, although there is nan

at index 95 of column LoanAmount

Dataset copied from clipboard

,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status,TotalIncome,TotalIncome_log
89,LP001310,1,1,0,0,No,5695,4167.0,175.0,360.0,1.0,Semiurban,Y,9862.0,9.196444266784072
90,LP001316,1,1,0,0,No,2958,2900.0,131.0,360.0,1.0,Semiurban,Y,5858.0,8.675563527387679
91,LP001318,1,1,2,0,No,6250,5654.0,188.0,180.0,1.0,Semiurban,Y,11904.0,9.384629757072872
92,LP001319,1,1,2,1,No,3273,1820.0,81.0,360.0,1.0,Urban,Y,5093.0,8.535622326884605
93,LP001322,1,0,0,0,No,4133,0.0,122.0,360.0,1.0,Semiurban,Y,4133.0,8.326758814511733
94,LP001325,1,0,0,1,No,3620,0.0,25.0,120.0,1.0,Semiurban,Y,3620.0,8.194229304819817
95,LP001326,1,0,0,0,,6782,0.0,,360.0,1.0,Urban,N,6782.0,8.822027322685583
96,LP001327,0,1,0,0,No,2484,2302.0,137.0,360.0,1.0,Semiurban,Y,4786.0,8.47345026846832
97,LP001333,1,1,0,0,No,1977,997.0,50.0,360.0,1.0,Semiurban,Y,2974.0,7.9976631270201
98,LP001334,1,1,0,1,No,4188,0.0,115.0,180.0,1.0,Semiurban,Y,4188.0,8.339978571990427
99,LP001343,1,1,0,0,No,1759,3541.0,131.0,360.0,1.0,Semiurban,Y,5300.0,8.575462099540212
100,LP001345,1,1,2,1,No,4288,3263.0,133.0,180.0,1.0,Urban,Y,7551.0,8.929435283803425
101,LP001349,1,0,0,0,No,4843,3806.0,151.0,360.0,1.0,Semiurban,Y,8649.0,9.065198986306513
102,LP001350,1,1,0,0,No,13650,0.0,,360.0,1.0,Urban,Y,13650.0,9.521494800613105
103,LP001356,1,1,0,0,No,4652,3583.0,,360.0,1.0,Semiurban,Y,8235.0,9.016148642611741
104,LP001357,1,1,0,0,No,3816,754.0,160.0,360.0,1.0,Urban,Y,4570.0,8.42726848388825
105,LP001367,1,1,1,0,No,3052,1030.0,100.0,360.0,1.0,Urban,Y,4082.0,8.31434234336979
106,LP001369,1,1,2,0,No,11417,1126.0,225.0,360.0,1.0,Urban,Y,12543.0,9.436918020024674
107,LP001370,1,0,0,1,,7333,0.0,120.0,360.0,1.0,Rural,N,7333.0,8.9001399880938
108,LP001379,1,1,2,0,No,3800,3600.0,216.0,360.0,0.0,Urban,N,7400.0,8.909235279192261
109,LP001384,1,1,3,1,No,2071,754.0,94.0,480.0,1.0,Semiurban,Y,2825.0,7.946263643580541
110,LP001385,1,0,0,0,No,5316,0.0,136.0,360.0,1.0,Urban,Y,5316.0,8.578476419833136


majid bhatti
  • 83
  • 12

1 Answers1

0
  • table is the issue when you get a nan value.
    • table only contains Education values for 0 & 1 and Self_Employed values for Yes or No, not nan because nan conditions are not considered.

enter image description here

  • Row 95 is as follows
Loan_ID              LP001326
Gender                      1
Married                     0
Dependents                  0
Education                   0
Self_Employed             NaN
ApplicantIncome          6782
CoapplicantIncome           0
LoanAmount                NaN
Loan_Amount_Term          360
Credit_History              1
Property_Area           Urban
Loan_Status                 N
TotalIncome              6782
TotalIncome_log       8.82203
Name: 95, dtype: object
  • Because Self_Employed == nan, it's not found in table, so you get a KeyError
  • To resolve the KeyError issue, consider dropping rows where Self_Employed or Education are nan, or replace nan in those two columns before creating table.
df.Self_Employed.fillna('Unknown', inplace=True)
df.Education.fillna('Unknown', inplace=True)

table = df.pivot_table(values='LoanAmount', index='Self_Employed' ,columns='Education', aggfunc=np.median)
table

enter image description here

Data

  • I used this modified data to populate all conditions of table
    • This data just has more nan values for Self_Employed and Education
,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status,TotalIncome,TotalIncome_log
89,LP001310,1,1,0,0,No,5695,4167.0,175.0,360.0,1.0,Semiurban,Y,9862.0,9.196444266784072
90,LP001316,1,1,0,,No,2958,2900.0,131.0,360.0,1.0,Semiurban,Y,5858.0,8.675563527387679
91,LP001318,1,1,2,0,No,6250,5654.0,188.0,180.0,1.0,Semiurban,Y,11904.0,9.384629757072872
92,LP001319,1,1,2,1,No,3273,1820.0,81.0,360.0,1.0,Urban,Y,5093.0,8.535622326884605
93,LP001322,1,0,0,0,,4133,0.0,122.0,360.0,1.0,Semiurban,Y,4133.0,8.326758814511733
94,LP001325,1,0,0,1,,3620,0.0,25.0,120.0,1.0,Semiurban,Y,3620.0,8.194229304819817
95,LP001326,1,0,0,0,Yes,6782,0.0,,360.0,1.0,Urban,N,6782.0,8.822027322685583
96,LP001327,0,1,0,,Yes,2484,2302.0,137.0,360.0,1.0,Semiurban,Y,4786.0,8.47345026846832
97,LP001333,1,1,0,0,No,1977,997.0,50.0,360.0,1.0,Semiurban,Y,2974.0,7.9976631270201
98,LP001334,1,1,0,1,No,4188,0.0,115.0,180.0,1.0,Semiurban,Y,4188.0,8.339978571990427
99,LP001343,1,1,0,0,Yes,1759,3541.0,131.0,360.0,1.0,Semiurban,Y,5300.0,8.575462099540212
100,LP001345,1,1,2,1,Yes,4288,3263.0,133.0,180.0,1.0,Urban,Y,7551.0,8.929435283803425
101,LP001349,1,0,0,0,No,4843,3806.0,151.0,360.0,1.0,Semiurban,Y,8649.0,9.065198986306513
102,LP001350,1,1,0,0,Yes,13650,0.0,,360.0,1.0,Urban,Y,13650.0,9.521494800613105
103,LP001356,1,1,0,0,No,4652,3583.0,,360.0,1.0,Semiurban,Y,8235.0,9.016148642611741
104,LP001357,1,1,0,0,No,3816,754.0,160.0,360.0,1.0,Urban,Y,4570.0,8.42726848388825
105,LP001367,1,1,1,0,No,3052,1030.0,100.0,360.0,1.0,Urban,Y,4082.0,8.31434234336979
106,LP001369,1,1,2,0,No,11417,1126.0,225.0,360.0,1.0,Urban,Y,12543.0,9.436918020024674
107,LP001370,1,0,0,1,,7333,0.0,120.0,360.0,1.0,Rural,N,7333.0,8.9001399880938
108,LP001379,1,1,2,0,No,3800,3600.0,216.0,360.0,0.0,Urban,N,7400.0,8.909235279192261
109,LP001384,1,1,3,,,2071,754.0,94.0,480.0,1.0,Semiurban,Y,2825.0,7.946263643580541
110,LP001385,1,0,0,,,5316,0.0,136.0,360.0,1.0,Urban,Y,5316.0,8.578476419833136
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
  • this method has worked for me, instead of making an 'unknown' for 'nan' I have filled the column 'Self_Employed' with mode of column – majid bhatti May 25 '20 at 08:47