Say I have a pd DataFrame that looks like this:
price volume cat_count
zipcode date
91111.0 01/01/2018 10 5 NaN
02/10/2018 NaN 9 NaN
94312.0 04/04/2018 7 4 6
02/10/2018 NaN 3 4
96666.0 05/05/2018 NaN 3 14
02/10/2018 NaN NaN 8
07/08/2018 NaN 0 NaN
98432.0 06/08/2018 4 NaN NaN
And say I have a dictionary whose keys are zipcodes and whose values are lists of nearby zipcodes (within x kilometers of they key zipcode), sorted by how close they are to the key zipcode with the closer ones appearing first. This dictionary looks like:
nearby_zips = {
91111.0 : [94312.0],
94312.0 : [91111.0, 96666.0],
96666.0 : [94312.0],
98432.0 : []
}
How can I efficiently interpolate the data so if for any column, all the values are NaNs in a zipcode index, find the nearest zipcode that has non NaN values for said column, and use these values to fill in the for the zipcode that has all NaN values for the column.
For reference output on the above example DataFrame would look like:
price volume cat_count
zipcode date
91111.0 01/01/2018 10 5 NaN
02/10/2018 NaN 9 4
04/04/2018 NaN NaN 6
94312.0 04/04/2018 7 4 6
02/10/2018 NaN 3 4
96666.0 05/05/2018 NaN 3 14
02/10/2018 NaN NaN 8
07/08/2018 NaN 0 NaN
04/04/2018 7 NaN NaN
98432.0 06/08/2018 4 NaN NaN
Notice the data in the zipcode index 91111.0 and 96666.0 and how they changed.