I have a house rent price data as follows:
import pandas as pd
import numpy as np
data = {
"HouseName": ["A", "A", "B", "B", "B"],
"Type": ["OneRoom", "TwoRooms", "OneRoom", "TwoRooms", "ThreeRooms"],
"Jan_S": [1100, 1776, 1228, 1640, np.NaN],
"Feb_S": [1000, 1805, 1231, 1425, 1800],
"Mar_S": [1033, 1748, 1315, 1591, 2900],
"Jan_L": [1005, np.NaN, 1300, np.NaN, 7000]
}
df = pd.DataFrame.from_dict(data)
print(df)
HouseName Type Jan_S Feb_S Mar_S Jan_L
0 A OneRoom 1100.0 1000 1033 1005.0
1 A TwoRooms 1776.0 1805 1748 NaN
2 B OneRoom 1228.0 1231 1315 1300.0
3 B TwoRooms 1640.0 1425 1591 NaN
4 B ThreeRooms NaN 1800 2900 7000.0
I need to realize two things: first, I want to find a reasonable rent price for January based on columns 'Jan_S', 'Feb_S', 'Mar_S', 'Jan_L'. Here S and L mean two different data sources, both of them may have outliers and nans but data from S will be taken as final price for January at priority. Second, For the same HouseName I need to check and make sure that the price of one room is lower than two rooms, and prices of two rooms is lower than three rooms. My final results will look like this:
HouseName Type Jan_S Feb_S Mar_S Jan_L
0 A OneRoom 1100.0 1000 1033 1005.0
1 A TwoRooms 1776.0 1805 1748 NaN
2 B OneRoom 1228.0 1231 1315 1300.0
3 B TwoRooms 1640.0 1425 1591 NaN
4 B ThreeRooms NaN 1800 2900 7000.0
Result(Jan)
0 1100
1 1776
2 1228
3 1640
4 1800
My idea is check if Jan_S is in range of 0.95 and 1.05 of Jan_L, if yes, take Jan_S as final result, otherwise, continue to check a value from Feb_S as Jan_S.
Please share any ideas that you might have to deal with this problem in Python. Thanks! Here are some references which may helps.
Find nearest value from multiple columns and add to a new column in Python
Compare values under multiple conditions of one column in Python
Check if values in one column is in interval values of another column in Python